Entity annotation concerns the annotation of referring expressions in a text, i.e. spans of text that refer to things in the world, and their classification into entity types. The purpose of entity annotation in Coptic Scriptorium is to facilitate searches which include specific entity types (e.g. finding a certain epithet using linguistic annotations, such as 'ⲟⲩⲁⲁⲃ' “holy”, but only when applied to a person), to inventorize entities (find all cases of e.g. places mentioned in the Apophthegmata Patrum), and to function as a gateway for entity linking, enabling searches for specific persons (“John the Baptist”), regardless of the exact expression used to mention them. The latter task of entity linking is left outside of the scope of the current guidelines.
Entity annotation can be applied to three types of referring expressions:
Almost all nouns and proper nouns correspond to referring expressions, with the exception of non-referring nouns, such as:
One test for referentiality is whether a pronominal or nominal subsequent mention is possible/plausible. For example, the following sounds odd:
We distinguish 11 entity types:
Groups of entities are interpreted as the entity type of their constituents, for example, a herd of animals is of the type animal:
An exception to this guideline is groups of people who form an organization, e.g. ⲥⲩⲛⲁⲅⲱⲅⲏ, ⲥⲧⲣⲁⲧⲉⲩⲙⲁ etc are 'organization', not 'person'
In morphologically complex items containing a verb inside a larger token, that noun cannot be annotated:
Container and substance form two entities, for example:
Entity expressions interrupted by a copula or particle are spanned to contain the copula or particle. For example, the following span includes the intervening copula:
The following are considered idioms, in which the constituent nouns are not construed as referential:
(note: currently we do not annotate pronouns!)
In projects where pronouns are annotated (note: currently we do not annotate pronouns!) we recommend that correlative/expletive pronouns are not annotated as entities at all:
The relative converter ⲉⲧ is not considered referential. In relative clauses with explicit subject pronouns, those pronouns are annotated as usual:
Note that this results in the second pronoun pointing back to the span that contains it - this is allowed in WebAnno.