User Tools

Site Tools


singlish_constituent_parsing

Singlish Constituent Parsing

Most of Singlish follows closely to standard English and should be parsed using the PTB bracketing guidelines.

Guidelines

Particles

In my data, I only found the particles, lar and lah which are both very adverbial in nature. I generalized from this that particles could be considered as adverbs or adverb phrases (ADVP). These particles are attached to the phrase preceding it and modify the entire verb's meaning. While this may not be the case with all particles, the ones that I encountered fit this description, so I followed this decision for my data set.

I decided that the particle should be attached to the tree as an adverb phrase at the VP level. Simplistically, it should look like this:

(VP (VVD ) 
  (NP (NN )
  (ADVP (PAR ))))

If no verb is present (fragments, reduced relative clauses, etc.), the particle can stand on its own as a phrase like so:

(FRAG 
  (ADJP (JJ ))
  (ADVP (PAR )))

In terms of the syntax tree, an example would look something like these two example trees from my data:

Full Sentence Fragment

Interjections

Interjections in Singlish should be treated as normal interjections would in the PTB guidelines. Please refer to the PTB section on INTJ on general guidelines and rules. An explicit rule to follow, as spoken Singlish tends to have multiple interjections utterance-initial, is to annotate the interjections flat when the interjections seem to semantically go together.

For example, this utterance in my data set consists of 3 interjections:

 ya ya ya

The 3 ya's seem to work together to get across one meaning and can be imagined to be spoken without any pauses between them. It follows from these observations that they should be annotated flat under one INTJ.

A good rule of thumb for choosing when to annotate flatly and when to separate is first to decide whether or not the multiple tokens are meant to get across one meaning or multiple, and second to decide whether or not they could plausibly be spoken together without pause or whether they need to be separated by a short pause in between each token.

Disfluencies

Though not unique to Singlish, because my data set is spoken dialogue, it is relevant to discuss how to deal with disfluent utterances. The PTB guidelines were written for standard texts and thus do not discuss how to deal with words that do not add to the meaning of the utterance and are considered mistakes in the spoken utterance.

Specifically in my data set, I did not find very many disfluencies. Of the disfluencies in my text, the majority were repeats and repairs that bordered repeats as well. As such, my strategy for dealing with disfluencies was to link it to the phrase that it repeated. For example, if the preposition and following determiner were repeated in an utterance, it would be linked in this way:

 (PP
    (PP (IN in)
       (NP (DT this))
    (PP-LOC (IN in)
       (NP (DT this) (NN shop))))

Notice how the label was only added to the full PP, despite the repeated PP consisting of the same words as the full PP. Because only the full PP has the full meaning of location, the tag is only added to the completed PP. The repeated phrase and the full phrase are placed under the same phrasal tag in order to link them together and provide an understanding that they linked on the discourse level. This is the example above in a full sentence, portrayed as a tree:

A good idea for dealing with disfluencies that I did not utilize while parsing is creating a label for the disfluency. Labeling the disfluent phrase as either -REPEAT or -REPAIR would be advantageous for searching these phrases later and would very clearly define what type of disfluency it was.

This tag would be added on the repeat or repair phrase like so:

 (PP
    (PP-REPEAT (IN in)
       (NP (DT this))
    (PP-LOC (IN in)
       (NP (DT this) (NN shop))))

Fragments and RRCs

Observing from my data set, Singlish seems to have an abundance of dropped verbs. Because the PTB bracketing's sentences are centered around the NP-VP format, clauses or utterances without a VP are considered fragments or RRCs. In PTB guidelines, fragments are defined as “FRAG marks those portions of text that appear to be clauses, but lack too many essential elements for the exact structure to be easily determined (e.g., answers to questions). Predicate argument structure therefore cannot be extracted from FRAGs.”

For purposes of Singlish, where many predicates are verbless and there may be multiple different interpretations of what the sentence could mean (making it undesirable to insert NULL elements into the sentence), I have chosen to add to the definition of fragments predicate phrases that are missing verbs or functional elements that leave the structure hard to determine. The entire clause, including this fragment predicate phrase would also be labeled as a fragment.

A generalized example could look something like this, where the predicate is nothing but a NP and PP:

 (FRAG
    (NP-SBJ (NN ))
    (FRAG
       (NP (NN ))
       (PP-CLR (IN ) (NP (NN )))))

Similarly, the PTB guidelines states about RRCS that “The label RRC is used only if the “reduced relative” is not a VP, but rather some other postmodifier such as NP, PP, ADJP, or ADVP that itself has “sentential” modifiers.” In essence, I took this to mean that RRCs are fragment-like phrases that are attached to NPs. I have chosen to mark clauses as RRCs if they seem to be non-verb phrases that are relative clauses. Here is an example of utterances that include FRAGs or RRCs:

Utterance with FRAG Utterance with RRC

Difficult-to-parse Sentence Structures

In the case that the meaning or syntax of the sentence is difficult to figure out, first try to research Singlish syntax and see if the difficulty arises from a unique Singlish syntax. Try to parse the sentences in difficult to way to find the most latent and easy-to-believe meaning that fits well with the context. If this fails, combine the phrases that you can and label the entire utterance as a FRAG.

The example below shows one such case where the meaning of the utterance is unclear and is thus placed under a FRAG phrase. A possible interpretation of this utterance is that the “know” was in fact a transcription error and was meant to be “no”, as in “only one serving, no?”. However, without further information, it is impossible to annotate this utterance further, and thus it was left as a fragment.

singlish_constituent_parsing.txt · Last modified: 2018/09/11 10:02 (external edit)