User Tools

Site Tools


Singlish Dependency Parsing

Singlish text and dialogue can also be parsed using dependency syntax based on Stanford Typed Dependencies guidelines. Where possible, Singlish dependencies should follow the standard guidelines as detailed by de Marneffe and Manning in their Stanford Typed Dependencies Manual. Where there are variations in Singlish vs. standard English or spoken vs. written, please refer to the guidelines below.

As a whole, dependencies were much easier to adapt to Singlish grammar and I did not run into as many issues, questions or overall problems while parsing the Singlish dependencies. Furthermore, due to the nature of spoken language to be on average simpler and shorter in utterance length than written language, spoken Singlish leads to rather simple and straight-forward dependency arches.


Singlish Particles

Similar to how they were handled in the Singlish constituent parsing section, particles are treated as adverbs as the ones found in my data set, lar and lah were observed to be rather adverbial in nature. As a result, I labeled all particles as advmod and attached them to the root of the phrase they are modifying.

An option I chose not to follow was to create a new tag par if the case arises that not all particles are adverbial in nature. par would then be used to label all particles, with their arrow coming from the token or phrase that it is most related to or is modifying. However, based on the data I have now, the advmod tag is very effective and straight-forward so I chose to follow this strategy for my dependency parsing.

A simple example of labeling particles in dependencies:


Since the Stanford Typed Dependencies were developed for standard texts, no method was created for dealing with things like repeats or repairs that occur in spoken dialogues. To deal with repeats (which are the most common type of disfluency in my text), I used the label repeat. This label attaches the repeated word (or the root of the repeated phrase) to the word/phrase it is repeating. This label can then act in a way similar to conj, with the ability to follow the arrow backwards to view what type of word or phrase this repeat was.

Repairs were almost non-existent in my data, although there was one disfluency that I chose to label repeat which is borderline to repair and may actually be more of a repair than it is a repeat. I chose to label it as a repeat, but in the case of repairs, the same methodology is used in labeling the word/phrase that was repaired as a repair and in attaching the repair to the word/phrase that repaired it.

Repeat Possible repair labeled as repeat

Notice in the “possible repair labeled as repeat” example, the repeat arrow comes out of the root of the phrase and attaches to the root of the disfluent phrase. The other tokens in the phrase are then attached as their actual label (det, etc.), not another repeat or repair label.

Fragments and RRCs

Due to the nature of dependencies, fragments and RRCs do not even need to be explicitly handled. Fragments and RRCs, which pose a problem in PTB constituent parsing, can be annotated exactly the same as a standard utterance in dependencies. In dependences, the root of the phrase is chosen to be the root of the entire clause/utterance and from that point, the fragment or RRC becomes a non-issue and can be treated as if it were a complete clause.

Fragment RRC

In the fragment example above, I chose to interpret the sentence to mean “Their chatterbox is the same chatterbox as what we have lar”, with a deleted copula of some sort, leaving “the same chatterbox” to be the predicate of the sentence. As dependencies treat sentences of this format to be rooted in the predicate, not the copula, it is a minor change to annotate the entire sentence as if it were a standard utterance and merely have no copula label.

Difficult-to-parse Sentence Structures

In general, Stanford Typed Dependencies created the label dep for cases “when the system is unable to determine a more precise dependency relation between two words”. This dep label can be used when the utterance is difficult to understand in terms of meaning or syntax. For example, here is the (possibly transcribed incorrectly) utterance that was labeled as a FRAG in PTB constituent parsing due to its opaque meaning:

In this case, the choice of root is somewhat arbitrarily chosen, first based on which section of the utterance has a clearer meaning and thus should be focused on more, and then its position in the utterance, which should default to rightmost.

singlish_dependency_parsing.txt · Last modified: 2018/09/11 10:02 (external edit)