User Tools

Site Tools


Syntactic Dependency Annotation

Dependency annotation generally follows Stanford Typed Dependencies, using the Basic, non-collapsed dependency inventory (prep → pobj, no crossing edges), as described in de Marneffe & Manning (2013). One frequent later addition adopted in this corpus too is the 'vocative' label, which attaches to the root of the sentence or utterance.

Instructions for some special cases follow below the list of labels.

List of dependency function labels used in GUM

  1. acomp
  2. advcl
  3. advmod
  4. amod
  5. appos
  6. aux
  7. auxpass
  8. cc
  9. ccomp
  10. conj
  11. cop
  12. csubj
  13. dep
  14. det
  15. discourse
  16. dobj
  17. expl
  18. iobj
  19. mark
  20. mwe
  21. neg
  22. nn
  23. npadvmod
  24. nsubj
  25. nsubjpass
  26. num
  27. parataxis
  28. pcomp
  29. pobj
  30. poss
  31. possessive
  32. preconj
  33. predet
  34. prep
  35. prt
  36. punct
  37. quantmod
  38. rcmod
  39. root
  40. tmod
  41. vmod
  42. vocative
  43. xcomp

Handling copula verbs

The copula 'be' appears primarily in three constructions:

A is B

In the normal predicative construction, the nominal predicate 'B' is the root, and 'A' is the nsubj. The verb 'be' itself is a dependent of the predicate B and takes the label cop.

There is A

In the existential construction 'there is A', the verb 'be' is taken to mean 'exist', and is labeled as the root. The subject is A (nsubj) and expletive 'there' is labeled expl.

A is in B

When the predicate is a prepositional phrase, the convention is to analyze 'be' as the root, taking the predicate's preposition as a prep, which in turn has a normal pobj. The motivation is to not have to interpret whether 'be' is used existentially with a locative, or in some other sense more like the 'A is B' construction.


Dates with multiple coreferent parts are handled as appositions (appos). For example, “Sunday, the 13th”, constitutes two mentions of the same day. By 'rule of first dibs', the apposition goes from 'Sunday' to '13th'. Months with dates are treated as nn, i.e. 'October 15' is a type of '15' (note that October 15 is an instance of 'day', not 'month'). Years added to dates are seen as temporal modifiers of the day expression, and are labeled as tmod.

Image credits and quotation attribution

Image credits of the type: 'Image: XYZ' are seen as an individual construction and not analyzed as parataxis or nominal predication (root+nsubj). Instead, the convention is to use the dep label to point from the first part ('image') to the head of the second part. This avoids counting these constructions when searching e.g. for subjects or nominal sentences.

The same logic applies to quotation attribution with a speech verb. For example, in:

“To be or not to be” – Hamlet

The root is in the quotation, and 'Hamlet' is attached to that as dep. This is not the guideline if a speech verb is present, i.e. 'said' is the root in:

“To be or not to be”, said Hamlet.

Internal analysis of complex names

Although the proper noun tag is applied even to (capitalized) adjectives in complex names, syntactic analysis should still treat them as adjectives etc. The rationale is that the POS tag can help find names, while a function label such as amod allows us to identify the internal structure of the name in question.

For complex personal name, we make the last name be the head, and everything else is nn to that:

Make A B

The apparent 'double object' construction with 'make' and similar verbs is given a small clause type of analysis, wherein the object of the verb 'make' is seen as the subject of an embedded predication. In other words, 'make A a B' is analyzed as making it so, that A be a B. As a result, the analysis uses the xcomp label emanating from 'make' to signify that the accusative object of 'make' is the same as the subject of the clausal predicate, but the 'thing being made' is internally labeled as the subject of the small clause predication. This can be seen in the image below:

Another way of thinking of this is that the analysis means: make(that woman is the president)

Let N V

Verbs such as 'let' in “let someone do something” or 'allow' in “allow A to do B” are analyzed as governing an xcomp clause, where the noun following the verb acts the subject of the subordinate small clause, not as the object of 'let', etc.

Call, name, etc.

Verbs like 'call' or 'name' appear to take a double accusative object, e.g. “John called [Mary] [a saint]”. This makes it hard to distinguish the name argument from the named theme argument. The guidelines instead favor a different analysis using xcomp. The idea is that the naming action creates a small clause with the named as subject and the name as predicate: John performs a naming act, whose content is: “Mary is a saint”.

Complex phrases as 'words' (compound modifiers etc.)

In some cases, a whole phrase can be used in place of a single word, e.g. as a compound modifier. In these cases, the complex modifier should be analyzed internally, and its local root is still attached to token it modifies with the normal label.

In the example, 'what to buy' is an infinitive + object with an internal analysis, but it functions much like a compound modifier (cf. 'the shopping section'). For this reason, it is attached at its head (the verb) with the function nn.

mark vs. advmod

In adverbial clauses, the subordinating conjunction is labeled as 'mark' by convention for 'if' and 'whether' clauses. However other conjunctions have an adverbial function within the clause. Much like a direct object 'whom' is not a mark but still dobj inside a relative clause, adverbial conjunctions with temporal or locative meaning, as well as manner adverbials, are labeled advmod inside subordinate clauses. This applies to 'when', 'where' and 'how', paralleling such adverbs as 'then', 'there', and 'thus'.


Sentence initial 'and'

Sentence initial coordinating conjunctions are attached to the root, pointing backwards, with the cc function.

Raising verbs (seem, happen...)

Raising verbs appear to take a subject that actually belongs to a subordinate predicate semantically. The can be identifies by alternations such as “John seems sick” vs “it seems John is sick” or “I happen to own a boat” vs. “It so happens I own a boat”. In both cases, the subject is predicated on in the embedded predicate (e.g. happens(I own a boat), not happen(I), or “I happen”). In these cases, the subject is attached to the subordinate predicate, and the main predicate dominates the subordinate predicate. If the subordinate predicate is an infinitive, it is labeled xcomp, but if it's a full subject-predicate finite clause, it's ccomp. This allows both constructions (with/without 'it') to receive the same analysis with respect to who's the subject, as shown below.

Attaching footnote markers

Footnote markers (the footnote number) should be attached as dep to the root of the constituent that the footnote refers to. If the footnote refers to the entire sentence, then it attaches to the root. If the footnote refers to a smaller constituent, then its root is the source of the dep arrow.

In order to

'In order' is seen as a multi-word expression, which may or may not appear with 'to' (cf. 'in order that'). The function of 'in order' is mark and it is attached at the 'in'. The token 'order' is pointed at with mwe as shown below:

The verb of 'in order to' clause is attached as advcl to the main clause.

Clausal subjects (csubj)

Subject clauses can be full finite clauses, as in “[that they came] annoyed me”. But the csubj label can also apply to gerund clauses, as in “[doing that] can cause trouble”. In both of these cases, the subordinate clause verb is labeled as csubj to the main clause predicate.

Indirect objects of saying verbs

Verbs of saying can have two objects, direct (dobj) and indirect (iobj). Both are present in

  • John told Maryiobj the storydobj

In this case, Mary is the indirect object. It's important that, even if what is said is missing, the person being told is still iobj. For example, the following has iobj only:

  • He told the policeiobj.

out of

The expression out of [something]NN is a prep+pcomp+pobj.


For compound nouns generally written as one word, or as two words separated by a hyphen, that you feel have been incorrectly split apart, treat the relation as an nn.

more than

If more than is modifying a quantity, then the lexical word is the head. more than is a modifier which is internally a mwe.

If more than is used to compare things (a is more than b), then it is not an mwe, reverting to prep/pobj.

Using mwe and goeswith

The multi-word expression relation is used for certain multi-word idioms that behave as one function word. MWEs are always annotated head-initially.

List of mwe expressions

The current list of mwes includes 32 expressions:

  • according to
  • all but
  • all in all
  • as if
  • as in (in the sense: “as in: I like it”, not literally “as cold as in Oslo”)
  • as of
  • as opposed to
  • as such
  • as to
  • as well
  • as well as (but not we didn't play as well as we thought)
  • at least (when not used with quantities)
  • because of (and alternate forms, i.e. b/c of)
  • depending on
  • depending upon
  • due to
  • had better (and 'd better)
  • how come
  • instead of
  • in between
  • in case
  • in case of
  • in order
  • kind of (but not a kind of)
  • less than (with quantities)
  • let alone
  • more than (with quantities)
  • not to mention
  • of course
  • out of
  • per se
  • prior to
  • rathercc than
  • so as to
  • so that
  • sort of (but not a sort of)
  • such as
  • that is
  • up to (with quantities)
  • vice versa
  • whether or not

mwe vs goeswith

MWE dependencies should be limited to these specific expressions. If you have a word that seems to have been incorrectly split apart, such as with out, use goeswith instead. The head is what you feel is the “main” part of the word. goeswith should only be used as a last resort, when you feel like you have exhausted all other possible dependencies.

gum/dependencies.txt · Last modified: 2018/09/11 10:02 (external edit)