RST Signaling Guidelines
Lexical chains are annotated for words with the same lemma. For example:
Note how 'rate-based' is not considered part of the lexical chain.
In some cases, lexical_chain is annotated for synonynms or other non-identical terms. In such cases, if the similar words can be identified, we annotate lexical_chain as usual, but add 'non_ident' in the notes column. For example:
Annotations not included in our scheme
Choosing source and target
For satellite-nucleus relations, the satellite is the source and the nucleus is the target
For multinucs, the child is the source, and the non-terminal multinuc node is the target
Position of the annotation
Normal anchored signals are placed on all signalling words in one contiguous span if possible, otherwise, multiple contiguous spans
Discontinuous spans receive an automatically increased co-index in a column 'discontinuous' (e.g. all parts receive coindex 1, then next discontinuous item receives 2 … 2, etc.)
The special co-index 0 is used for non-discontinuous annotations that share a row with a discontinuous annotations (e.g. '3|0', marking a line sharing discontinuous index '3' and a second, non-discontinuous annotation signified by '0').
Unanchored signals are placed on the single first token after the position of the annotation
Multiple signals annotated at the same token are separated by pipe in ALL cells of that row's signaling annotation, including:
If a multiple token signal (e.g. several words on multiple row in GitDox) overlaps a smaller signal (e.g. single word), we split up the larger span into multiple identical annotations, since we can't use the '|' syntax for only part of the span.
Interpreted explicit signals
Some words are interpreted as explicit anchored signals of e.g. genre-based signaling. Examples:
Labels containing + for combined signals are always alphabetized (e.g. always semantic+syntactic, not syntactic+semantic)
We found a questionable distinction between 'past_participial_clause' and nominal_modifier, the former is used used in a non-restrictive vmod clause: [The average of interbank offered rates for dollar deposits in the London market] [based on quotations at five major banks .]
Correcting annotation errors
If the Signaling Corpus contains a clear annotation error, we do not include that signal, but add a note structured as follows:
rem:TYPE:SIGNAL. For example: