User Tools

Site Tools


Singlish Part-of-Speech Tagging

Most parts of speech are the same as standard English. Differences that cannot be tagged using PTB guidelines are detailed here.



Ya but anyway Bandung has got a lot of T-shirts and shoes and jeans

In spoken Singlish, the interjections are slightly different than standard English interjections. The most frequently seen example is ya, though other examples include aye and ah. These words have little to no semantic meaning and tend to be used in the utterance initial position (as seen in the example above). These interjections are still labeled “UH” just like in standard English.

Note: If the interjection is utterance final or is unfamiliar, first check to ensure that the word is not in fact a phrasal particle.

Phrasal Particles

That means it's on higher up lah

Particles often come in the utterance final position in Singlish. These particles come from the Chinese particle system and may serve a tonal or grammatical function depending on the particle. Some particles may only serve a pragmatic function. They tend to modify the phrase they are attached to, rather than a single word. These phrasal particles unique to Singlish are considered a single token–PAR.

A list of common/possible particles in Singlish includes lah, lar, lor, ma, hor, lei. This list is not exhaustive and comes from a variety of online sources.

Plural Nouns

They emphasize on these three thing

Sometimes, plural nouns are not marked. This may be due to influence from Chinese, in which plural nouns are unmarked from singular nouns. These unmarked plural nouns are marked as NNSX so that they can be searched for as plural nouns, but are still differentiated from regular plural nouns. This will allow these nouns to be lumped with all nouns if necessary and to be searched for separately if need be. I made this decision because based on the context of these occurrences, these unmarked plural nouns are meant to be understood as plural despite not being morphologically marked in the standard way.

The only problem that could arise is the use of determiners with these nouns. In Singlish, based on the current data, they may be treated as singular nouns or plural nouns. These plural nouns may therefore be paired with “these three” or “a”, which could lead to problems with searching or data analysis.

A possible analysis of these unmarked plural nouns could be that if the noun comes with a number already specified (e.g., “these three” or “a pair of”), then the noun will not be marked for plurality, similar to the measure words seen in Chinese grammar. However, if these measure words are not present then the plural will need to be marked. This possible hypothesis has not been tested, however, and should be researched further before being used as reasoning for this phenomenon.

Tenseless Verbs

Last time we didn't stay but when we reach there we felt that we should have stayed because the drive is four hours

There was only one particular case of a verb lacking tense in this text. However, this phenomenon is documented in Singlish analysis and may occur more frequently in other texts. I have chosen to tag these verbs as they appear morphologically (VVP) so that it can be differentiated from the standard.

This decision differs diametrically from my decision for unmarked plural nouns. I made this decision based on my rather small data set, based on the fact that I was unsure whether these verbs are still meant to keep their past tense meaning. It is a possibility that these verbs are not meant to have tense at all, mirroring the Chinese ability to drop tense in utterances.

Based on further research, another option for tagging this phenomenon could be (like with unmarked plural nouns) to mark them as VVDX to differentiate them from past-tense verbs but also maintain the ability to lump them with them.

singlish_parts_of_speech_tagging.txt · Last modified: 2021/02/11 16:44 (external edit)