User Tools

Site Tools


general_guidelines

RST Signaling Guidelines

Lexical Chains

Lexical chains are annotated for words with the same lemma. For example:

  • Interest rates below … The rate … rate-based …

Note how 'rate-based' is not considered part of the lexical chain.

In some cases, lexical_chain is annotated for synonynms or other non-identical terms. In such cases, if the similar words can be identified, we annotate lexical_chain as usual, but add 'non_ident' in the notes column. For example:

  • [not comparable] … [vary] - lexical_chain (notes: non_ident)

Annotations not included in our scheme

  • 'Unsure' signals are ignored

Choosing source and target

  • For satellite-nucleus relations, the satellite is the source and the nucleus is the target
  • For multinucs, the child is the source, and the non-terminal multinuc node is the target

Position of the annotation

  • Normal anchored signals are placed on all signalling words in one contiguous span if possible, otherwise, multiple contiguous spans
    • Discontinuous spans receive an automatically increased co-index in a column 'discontinuous' (e.g. all parts receive coindex 1, then next discontinuous item receives 2 … 2, etc.)
    • The special co-index 0 is used for non-discontinuous annotations that share a row with a discontinuous annotations (e.g. '3|0', marking a line sharing discontinuous index '3' and a second, non-discontinuous annotation signified by '0').
  • Unanchored signals are placed on the single first token after the position of the annotation
  • Multiple signals annotated at the same token are separated by pipe in ALL cells of that row's signaling annotation, including:
    • signal (but|items_in_sequence)
    • type (dm|graphical)
    • anchoring (e.g. no|no)
    • relation (List|List)
    • source and target
  • If a multiple token signal (e.g. several words on multiple row in GitDox) overlaps a smaller signal (e.g. single word), we split up the larger span into multiple identical annotations, since we can't use the '|' syntax for only part of the span.

Interpreted explicit signals

Some words are interpreted as explicit anchored signals of e.g. genre-based signaling. Examples:

  • newspaper_style_attribution - the word 'source', when the text specifically specifies the source.

Signal labels

  • Labels containing + for combined signals are always alphabetized (e.g. always semantic+syntactic, not syntactic+semantic)
  • We found a questionable distinction between 'past_participial_clause' and nominal_modifier, the former is used used in a non-restrictive vmod clause: [The average of interbank offered rates for dollar deposits in the London market] [based on quotations at five major banks .]
general_guidelines.txt · Last modified: 2018/05/21 13:28 by amir