The main sentences (or equivalent fragments/utterances) identified as <s> tags in the markup phase, which are also the basis of syntactic analysis in the dependency annotation phase, are always separate segments for the purpose of RST analysis.
Additional segments may be needed, often to delineate subordinate clauses that match an RST relation function. In practice, the following types of clauses are usually made into separate segments:
Subject and object clauses are not segmented, with the exception of attribution, in which reported speech or thoughts are more central than the speech verb:
Using elaboration for speech verb object clauses means we think the main rhetorical act is considered to be reporting the fact that something was said; if we want to assign the nested rhetorical structure of what was said in itself as the main point, this means we should use attribution and treat the speech itself as the nucleus, with its own discourse function.
Relative and adverbial clauses modifying nouns (using a relative pronoun, zero relative or participial clauses) are all segmented and typically act as elaboration:
To-infinitives or that clauses modifying a noun and prepositional adnominal clauses are also segmented, and analyzed similarly to relative clauses (by default as elaboration, but other options are possible in context). Note that for prepositional modifiers, the head word of the adnominal clause must be a verb (typically a gerund in -ing):
Pure prepositional noun modifiers are not segmented. In the following example 'survival' is a noun and therefore not eligible for EDU status:
Infrequently, the adnominal clause can be the nucleus:
But note that if the modified noun is only a small part of the main clause, the adnominal clause is usually a satellite:
If a relative or other adnominal clause interrupts a larger EDU, we join both parts of that unit using the same-unit relation, but if such a unit is followed only by the sentence's final punctuation, there is no need for same-unit (i.e. we do not make a segment just for the final period after a sentence-final relative clause).
Trailing non-opening punctuation (i.e. not '(') is attached to the modifier clause if present:
Most full clauses coordinated by 'and', 'or' or 'but' are made into independent EDUs. The coordinating conjunction belongs to the second EDU, e.g.:
Exceptions to splitting EDUs include:
Elliptical coordinate VPs are segmented (roughly corresponding to gapping constructions or Right-Node-Raising, or cases analyzed as orphan in Universal Dependencies):
Clefts and pseudo clefts are left unsegmented, following RST-DT guidelines:
Similarly, extraposed clauses with dummy pronouns are not segmented, because the expletive 'it' can be treated as equivalent to the extraposed clause:
In this example, we treat the whole EDU as “To water plants is important”. Even though 'important' is evaluative, “to water plants” is equivalent to the subject, and subject clauses are generally not segmented, meaning this is identical to the unsegmented treatment of [Watering plants is important].
Following RST-DT guidelines, parentheticals set apart by parentheses or dashes are segmented, even if they are otherwise syntactically ineligible to be EDUs, such as appositions. Compare:
Syntactically unintegrated citations are segmented, but integrated ones which function as an argument are not:
Following RST-DT guidelines:
Text fragments followed by colons are treated as separate EDUs, even when the fragment is a word or phrase, as long as the text that follows the colon provides further elaboration on the topic introduced by the colon
Notice that this does not mean that all colons separate EDUs. Specifically, “:” inside an NP is not an EDU break point:
But colons are used as segmentation points when introducing a new idea, elaborating on a previous point, etc.:
In accordance with RST-DT guidelines, EDUs without a verbal predicate (e.g. prepositional phrases) are segmented in the presence of a strong discourse marker. The RST-DT guidelines list the following exhaustive set of markers:
In clauses with 'every time' or 'by the time', we segment not at the relative clause boundary, but before the 'time' expression:
In other words, we do not segment […time] [you…] and we do segment before 'every time', 'by the time', etc., which is treated the same as 'whenever'/'when', etc. (cf. two instances of by the time/every time in RST-DT)
The complex conjunction 'as soon as' is taken to introduce a single temporal EDU (usually circumstance), similarly to 'when'. It is NOT segmented into [as soon] [as], but left whole:
Initial conjunctions before a subordinator are segmented, and same-unit is used to join them to their predicate:
Note that in this example, the 'and' belongs to the verb 'continued', and the circumstance could be dropped, leaving the 'and' in the main clause: “and […] we continued”. The same can happen with other coordinations and subordinating conjunctions (“but [if …] then we will…”)
A satellite which has a satellite will have a span grouping the lower satellite before it modifies something else. In the example below, if we think that 33 is an elaboration of 32, and 32 is an elaboration of 30, then by extension, 33 is also part of the complex elaboration of 30. This means that 32+33 need to be grouped together first under a span, and then form the higher elaboration. The bad example at the top is a case of what we call 'chaining' (a flat sequence of arrows). Relative clause elaborations are also usually grouped with their main clause into a span (30-31) before the span is modified.
When a single EDU is interrupted, for example by a relative clause, a same-unit multinuclear relation is created to contain the embedded unit. That embedded unit is attached based on syntactic criteria to the part of the same-unit which has its head (for a relative clause: attach the relative clause EDU to the part that contains the noun being modified).
If a same-unit construct has a modifier which applies to the entire interrupted EDU, then it is attached above the same-unit, not inside. For example in the image below, the purpose EDU is attached to the entire same-unit, not to part 2 (the sub-unit on the right), since it would have modified the entire, single EDU if it hadn't needed the same-unit split:
You should not have spans that have no incoming connections except for another span or multinuc above them. Spans are there to group elements, so that they can have an incoming or outgoing connection relating to some other node. In particular, EDUs should not have a span containing only themselves:
(e.g. [image of a magician] ←elaboration– [photo: Paul Budd])
In academic articles, the paper is often preceded by contact details for the authors, such as affiliations and e-mail addresses. These can be seen as ‘background’ information to the entire article, and usually attach to the top level node unifying all subsequent nodes.
Some discourse markers are ambiguous or behave in ways which are initially hard to interpret. The following guidelines help with some common dilemmas.
When used with past tense predicates, 'until' is often temporal and therefore circumstantial:
But with non-past tense, it often marks condition, for example:
Is equivalent to:
Unless is generally seen as signaling a negative conditional:
This is similar to:
Instead is often indicative of antithesis, but can appear either in the antithesis satellite itself, or in the nucleus:
Rather or rather than work very similarly to 'instead', and can also indicate antithesis in either the satellite or nucleus.
Depending clauses are often interpreted as conditionals:
This construction is conditional, as it corresponds roughly to “if the weather is a certain way…”.
If an acronym for an expression within a sentence is specified in parentheses, it is considered a satellite restatement, but not multinunclear (since the parentheses only repeat part of the main sentence):
Translations can be analyzed in the same way; if they have a language specified before a colon, that is segmented based on EDU segmentation guidelines, and can be considered a preparation (but only if there is a colon). Compare:
When considering two similar relations between sentences without an explicit connective like 'beacuse' or 'if', sometimes inserting a connective or phrase can help to disambiguate. Useful phrases include:
Comparative correlatives are interpreted as conditional constructions:
References forming an EDU (i.e. non-syntactically integrated, see segmentation guidelines) typically function as evidence:
Parenthetical currency and other measurement unit conversions are taken to be restatements. If the parent EDU contains more than just the unit term, then the restatement is satellite-nucleus, otherwise multinuclear:
Tag questions, including negative tag questions, are interpreted as restatements (and not as contrast), since they presuppose and re-assert the initial statement:
They are usually satellite-nucleus, since the tag question conveys less explicit information than the initial statement, though it is possible to have multinuclear constructions when the tag question expresses the full content of the initial statement.
Free relatives, which are predicates attached to a WH word which simultaneously occupies a grammatical function in the matrix and relative clause, are not segmented:
Note that these can be identified by the non-insertability of a relative pronoun:
Date EDUs in parentheses can be circumstance if they specify the date when something happened:
But if the date provides more information about an entity, such as years of life in a biography, it is an elaboration: