The PTB tagging guidelines contain 36 part of speech tags. These updated guidelines, for ease of use and compatibility, restrict themselves to the tagset used in the PTB guidelines, but some cases are encountered where novel standards must be introduced. These cases are laid out below.
The preceding section on word segmentation introduces words found in the corpus and considered to stand alone apart from the multi-word sequences from which they were originally derived. These words, gonna, wanna, gotta, kinda, and Imma, will need to be tagged according to their role in the sentence. For instance, as discussed earlier, Imma contains both a subject and an auxiliary (and some would argue that subject, auxiliary, present participle, and infinitival ‘to’ are all fully present in this one form).
The PTB part of speech tagging guidelines do not differentiate between auxiliaries and main verbs, but only between inflected and uninflected verbs. Therefore auxiliaries get the tag corresponding to their inflection (here meaning whether they carry third person singular –s or not).
I mma switch places with you PP VVP VV NNP IN PP
We wanna know PP VVP VV
I gotta go PP VVP VV
The last example, gotta, alternates in this corpus with have gotta. If strings like this are encountered, the first verb in the sequence should be assumed to carry the inflection, with all following verbs being VV/VVG/VVN, as with more standard sequences of auxiliaries (e.g., will have + main verb)
You have gotta email his name PP VHP VV VV PP$ NN
Contracted forms such as kinda are simpler to analyze. Instead of the NN PP sequence it comes from, it functions in the sentence as a pure adverbial modifier.
Y’all telling stories that kinda now put together the pieces PP VV NNS IN/that RB RB VVP RB DT NNS
The most complicated of the five novel forms laid out above is the case of gonna. In the LCDC, it sometimes appears with a preceding auxiliary (suggesting that it is not inflected), and sometimes the auxiliary is dropped. This only occurs in a particular context—when the auxiliary would in Standard English be a present form of be that is not the first person singular form. Therefore, this should be treated as auxiliary-dropping of the preceding be auxiliary rather than gonna sometimes carrying inflection and sometimes being bare. The earlier rule—inflection only applies to the first verb in a sequence of verbs, all others should be treated as uninflected (or VVG or VVN—still holds in the case of gonna (even though it is hard to view gonna as a traditional infinitival form). The word gone, when used in a context where gonna could replace it, should be treated in the same manner.
I m gonna take you to see this show PP VBP VV VV PP TO VV DT NN
We gonna give you our personal email addresses PP VV VV PP PP$ JJ NN NN
Where you gone meet WRB PP VV VV
The feature common to African American English of auxiliary dropping in certain contexts was introduced in the previous subsection. The following section on constituency annotation will discuss this in far more depth. There are actually two phenomena that occur in African American English which both result in bare-seeming verbs: (1) variable auxiliary dropping, and (2) –s dropping in 3rd person singular present verbs.
3rd person singular present forms without -s should be tagged as VVP for consistency with other forms in the present tense paradigm, with the VVZ tag only being used when the standard third person singular present –s marking is present, which was most likely its original function as separate from the VVP tag. These are rare in the transcription analyzed, although common to general corpora of African American English.
what he look like WRB PP VVP IN
When an auxiliary would appear in Standard English but does not appear in the text, the verb following should be tagged as a bare verb.
y’all telling stories PP VVG NNS
Wait , what ? VVP , WP SENT
When you going on your date , boo boo ? WRB PP VVG IN PP$ NN , NN NN SENT