User Tools

Site Tools


constituent_parsing

Constituent parsing

Any annotation schema dealing with data from speakers of African American English will have to be able to take into account morphosyntactic variation between African American English and General American English forms. Some common features encountered in African American English are the variably null auxiliary (as mentioned in a previous section), the null copula (which did not appear in the training corpus, although it is highly likely to appear in a larger corpus of African American English), aspect markers such as been, be, and done, which have different properties than homophonous forms found in General American English, and structural differences such as yes-no questions without a raised verb.

African American English phenomena in the LCDC

Variable null auxiliary

An important property of the PTB constituent annotation coding is that it includes the possibility for the existence of null elements that can occupy nodes of a tree. This is beneficial for the type of work that is commonly done by variationists on corpora of sociolinguistic data, where it can sometimes be just as important to preserve the context where a variant doesn't appear as where it does.

The PTB annotation guidelines specify the use of 0 to represent where a complementizer such as that could have appeared but does not. This variation between an overt item and a null item is the most similar out of the list of available PTB null elements to the cases encountered in the LCDC, so the use of 0 will be extended in the guidelines that follow to more contexts than the one to which it was originally applied.

The following is an example of an utterance containing a null auxiliary from the LCDC corpus. This example mimicks the PTB guidelines for the formation of a wh- question, which specify that the wh- word appear in the SBARQ node, while the subordinate SQ node contains the fronted auxiliary followed by the subject and predicate (there is movement of the wh- word but not the auxiliary out of the predicate assumed and thus a trace left behind in the predicate for the wh- word but not for the auxiliary). The only difference is introduced by the lack of an overt auxiliary in the original utterance: Where you gone meet?. From this example it should be clear that the place in the tree where an auxiliary should be is occupied by a 0 with part of speech NONE, to make this parallel to the structure of wh- questions in General American English.

Ex:
  • Where you gone meet?
(ROOT
    (SBARQ (WHADVP-LOC-1 (WRB Where) )
        (SQ (NONE 0)
            (NP-SBJ (PRP you) )
            (VP (VV gone) 
		(VP (VV meet)
                    (ADVP-LOC (NULL *T*-1) )))))
(SENT ?))

When the auxiliary appears inside of the VP as prescribed by the PTB guidelines, the null auxiliary should also be posited as inside of the VP in the same place an overt auxiliary would occupy.

Ex:
  • y'all telling, y'all telling stories
(S
    (FRAG
    (NP-SBJ-1 (PRP y'all) )
    (VP (NONE 0)
        (VP (VVG telling) )))
        (, ,)
    (NP-SBJ (PRP y'all) )
    (VP (NONE 0)
        (VP (VVG telling) 
	(NP (NNS stories)))))

Questions without a raised verb

This only occurs once in the training data, but is a legitimate structural variant in African American English. Indeed, General American English is usually seen to permit yes/no question stranding of the verb as long as the intonation of the utterance aligns with English question-formation intonation. African American English, in contrast, also permits the verb to remain in place in the predicate in wh- questions, so that only the wh- word fronts.

To follow the PTB guidelines, when this is encountered the sentence should still have an SBARQ top node containing the wh- word, but the SQ level which must proceed directly underneath it according to the PTB guidelines is empty in this data, with the verb still appearing inside the VP.

Ex:
  • what the deal is?
(SBARQ (WHNP-1 (WP what) )
    (SQ (NONE 0)
        (NP-SBJ (DT the)  (NN deal) )
        (VP (VBZ is) 
            (NP (NONE *T*-1)))))
(SENT ?) 

A trace appears between where the wh- word was generated in the predicate and it's position in SBARQ, but no trace appears between the null element in SQ and the verb in the predicate that has failed to raise. This mainly due to (1) the fact that the verb itself would not be seen as a constituent under the PTB guidelines, thus A'-movement (which uses *T*) cannot be assumed to have taken place, and (2) the position of the trace would be higher in the tree than its overt element, which is questionable.

In any case, it is crucial to any study of morphosyntactic variation in the LCDC to preserve this context where the verb could have raised but didn't, and thus the null marker 0 is used for convenience.

General conversational phenomena

Clausal discourse particles

The PTB guidelines indicate that discourse particles tagged with part of speech UH can appear in flat relation with the subject and the predicate of a sentence. Some discourse particles in the LCDC, though, are full clauses unto themselves. Clausal discourse particles should be easily able to have their structure decomposed into its parts. Two examples of this follow. The first example contains the clausal discourse element you know what and the second example contains the discourse particle wait that also functions as an imperative (thus having a null element * in the subject position).

Ex:
  • You know what, I forgot about Hains Point.
(ROOT
    (S
        (NP-SBJ (PRP You) )
        (VP (VVP know)
            (WHNP (WP what) )))
    (, ,)
    (NP-SBJ (PRP I) )
    (VP (VVD forgot)
        (PP (IN about)
            (NP (NNP Hains) (NNP Point) )))
    (SENT .))
    
  • Wait what?
(ROOT
    (FRAG
        (S
            (NP-SBJ (NONE *) )
            (VP (VV Wait) ))
        (WHNP (WP what) ))
        (SENT ?))

Deletion of stranded preposition with wh- questions

Sometimes in conversational data inexplicable utterances are encountered. As an example, one such utterance follows. Here, it appears that the stranded preposition in a wh- question deletes utterance-finally. This is creatively treated as a case where the wh- pronoun has taken on an adverbial function, and thus only leaves an adverbial trace in the predicate.

Ex:
  • What she do that and then point to me?
 
(ROOT
    (SBARQ (WHADVP-1 (WP What) )
        (SQ (NONE 0)
            (NP-SBJ (PRP she) )
            (VP
                (VP (VV do) 
                    (NP (DT that) )) 
                (CC and) 
		(ADVP (RB then) )
		(VP (VV point) 
		    (PP-DIR
			(TO to) 
			(NP (PRP me) )))
                (ADVP (NONE *T*-1))))) 
(SENT ?) )

This example, while being a guideline for the case that a similar utterance should ever be again encountered, also serves as a reminder that annotating conversational data involves both an in-depth knowledge of any tagset used and the ability to apply this tagset to newly discovered contexts in creative and logical ways.

constituent_parsing.txt · Last modified: 2018/09/11 10:02 (external edit)