User Tools

Site Tools


RST Signaling Guidelines

Lexical Chains

Lexical chains are annotated for words with the same lemma. For example:

  • Interest rates below … The rate … rate-based …

Note how 'rate-based' is not considered part of the lexical chain.

In some cases, lexical_chain is annotated for synonyms or other non-identical terms. In such cases, if the similar words can be identified, we annotate lexical_chain as usual, but add 'non_ident' in the notes column. For example:

  • [not comparable] … [vary] - lexical_chain (notes: non_ident)

Reported speech

  • Annotate any verb of saying (e.g. “said”, “reported”) and any subordinator introducing reported speech (e.g. “that”

Annotations not included in our scheme

  • 'Unsure' signals are ignored

Choosing source and target

  • For satellite-nucleus relations, the satellite is the source and the nucleus is the target
  • For multinucs, the child is the source, and the non-terminal multinuc node is the target

Position of the annotation

  • Normal anchored signals are placed on all signalling words in one contiguous span if possible, otherwise, multiple contiguous spans
    • Discontinuous spans receive an automatically increased co-index in a column 'discontinuous' (e.g. all parts receive coindex 1, then next discontinuous item receives 2 … 2, etc.)
    • The special co-index 0 is used for non-discontinuous annotations that share a row with a discontinuous annotations (e.g. '3|0', marking a line sharing discontinuous index '3' and a second, non-discontinuous annotation signified by '0').
    • '0' is also used for all other annotations that would otherwise be empty when a '|' is used to separate multiple annotations. For example, if we have 'note' applying to one of two annotations, we use 'some_note|0' to indicate the note applying to the first annotation.
  • Unanchored signals are placed on the single first token after the position of the annotation
  • Multiple signals annotated at the same token are separated by pipe in ALL cells of that row's signaling annotation, including:
    • signal (but|items_in_sequence)
    • type (dm|graphical)
    • anchoring (e.g. no|no)
    • relation (List|List)
    • source and target
  • If a multiple token signal (e.g. several words on multiple row in GitDox) overlaps a smaller signal (e.g. single word), we split up the larger span into multiple identical annotations, since we can't use the '|' syntax for only part of the span.

morphological tense

  • 'Tense' signals cover all aspects of tense, aspect and mood, including periphrastic constructions in English
  • It is not necessary to automatically annotate every verb in the source and target spans - only occurrences of tenses that matter for the relations being signaled should be annotated.
    • In particular, tenses in relative clauses are often not relevant to tense signals affecting the main clause
  • For all tense/aspect/mood signals, the entire verbal complex should be annotated, creating parity between simple lexical verbs, periphrastic tenses, and passives:
    • John [went] there (simple past - annotate just the verb)
    • John [had gone] there (periphrastic tense, annotate auxiliary and lexical verb)
    • John [was brought] there (passive, annotate auxiliary and lexical verb)
    • John [was] happy (non-verbal predicate, only the verb should be annotated)

Interpreted explicit signals

Some words are interpreted as explicit anchored signals of e.g. genre-based signaling. Examples:

  • newspaper_style_attribution - the word 'source', when the text specifically specifies the source.

Signal labels

  • Labels containing + for combined signals are always alphabetized (e.g. always semantic+syntactic, not syntactic+semantic)
  • We found a questionable distinction between 'past_participial_clause' and nominal_modifier, the former is used used in a non-restrictive vmod clause: [The average of interbank offered rates for dollar deposits in the London market] [based on quotations at five major banks .]

Correcting annotation errors

If the Signaling Corpus contains a clear annotation error, we do not include that signal, but add a note structured as follows: rem:TYPE:SIGNAL. For example: rem:semantic:lexical_chain

general_guidelines.txt · Last modified: 2018/07/25 10:30 by amir