User Tools

Site Tools



Entity annotation concerns the annotation of referring expressions in a text, i.e. spans of text that refer to things in the world, and their classification into entity types. The purpose of entity annotation in Coptic Scriptorium is to facilitate searches which include specific entity types (e.g. finding a certain epithet using linguistic annotations, such as 'ⲟⲩⲁⲁⲃ' “holy”, but only when applied to a person), to inventorize entities (find all cases of e.g. places mentioned in the Apophthegmata Patrum), and to function as a gateway for entity linking, enabling searches for specific persons (“John the Baptist”), regardless of the exact expression used to mention them. The latter task of entity linking is left outside of the scope of the current guidelines.

Entity annotation can be applied to three types of referring expressions:

  • Named entities, which are headed by a proper noun (e.g. “Apa Papnoute”)
  • Non-named entities, headed by a common noun (e.g. “the angel”)
  • Pronouns – these are currently not annotated by our schema (e.g. “she” is a person)

Referring expressions

Almost all nouns and proper nouns correspond to referring expressions, with the exception of non-referring nouns, such as:

  • ⲁϩⲉ ⲣⲁⲧ.. - “stand, set foot” - does not actually refer to the foot of a person
  • ϩⲛ ⲟⲩ ⲙⲉ - “truly” - does not actually introduce a referenceable 'truth'

One test for referentiality is whether a pronominal or nominal subsequent mention is possible/plausible. For example, the following sounds odd:

  • ⲁϥⲁϩⲉⲣⲁⲧϥ ⲁⲩⲱ ⲡⲉⲓⲣⲁⲧ …“he stood on foot, and this foot…”

Entity Types

We distinguish 11 entity types:

  • abstract - intangible entities not covered by other classes (incl. ideas, emotions)
  • animal
  • event - an occurrence, e.g “the death of the king”, “the arrival of a monk”
  • object - concrete inanimate object
  • organization - organized body of people, e.g. ⲧⲉⲕⲕⲗⲏⲥⲓⲁ, ⲧⲉⲥⲧⲣⲁⲧⲉⲓⲁ
  • person
  • place
  • quantity - a unit of weight or measure, e.g. “six pounds”, “many miles”
  • substance - mass noun indicating a material, e.g. “sand”, “water”, “wine”
  • time

Specific entity guidelines


Repeated mentions of the same entity in apposition are considered a single span, and do not contain more mentions of the same entity:

  • [ⲓⲱϩⲁⲛⲛⲉⲥ ⲡ ⲃⲁⲡⲧⲓⲥⲧⲏⲥ]
  • [ⲡ ⲣⲣⲟ ⲍⲏⲛⲱⲛ]
  • [ⲡⲉⲛ ⲡ ⲉⲧ ⲟⲩⲁⲁⲃ ⲕⲁⲧⲁ ⲥⲙⲟⲧ ⲛⲓⲙ ⲁⲡⲁ ⲕⲩⲣⲟⲥ ⲡ ⲉⲛⲧ ⲁ ϥ …]

Although outwardly very similar, appositions must be distinguished from dislocations, in which a pronominal subject or object is repeated separately. For personal pronouns, the pronoun is simply left out of the nominal span:

  • [ⲡⲉϥ ⲉⲓⲱⲧ] ϥ ⲛⲁⲩ ⲉⲣⲟ ⲟⲩ - “[his father], he sees them”
  • ϥ ⲛⲁⲩ ⲉⲣⲟ ⲟⲩ ⲛϭⲓ [ⲡⲉϥ ⲉⲓⲱⲧ] - “he sees them, that is [his father]”

If the pronoun is a substitutive demonstrative (ⲡⲁⲓ, ⲧⲁⲓ, ⲛⲁⲓ), then two spans are annotated:

  • [ⲡⲉϥ ⲉⲓⲱⲧ] [ⲡⲁⲓ] ⲛⲁⲩ ⲉⲣⲟ ⲟⲩ - “[his father], [this one] sees them”
  • [ⲡⲁⲓ] ⲛⲁⲩ ⲉⲣⲟ ⲟⲩ ⲛϭⲓ [ⲡⲉϥ ⲉⲓⲱⲧ] - “[this one] sees them, [that is his father]”

But note that it is also possible for a substitutive demonstrative to stand in true apposition to a noun without dislocation, in which case a single span is annotated as for any apposition:

  • ⲁ ⲓ ⲛⲁⲩ ⲉ [ⲡⲉϥ ⲉⲓⲱⲧ , ⲡⲁⲓ ⲉⲧ ⲙⲉⲣⲓⲧ ⲥ] - “I saw [their father, the one who loves her]”

See the UD Coptic guidelines for more information on identifying dislocation vs. apposition.

Expanded Relative Constructions

The relative construction expanding an article is annotated as an entity:

  • [ⲡ ⲉⲧ ⲟⲩ ⲥⲱⲧⲙ ⲉⲣⲟ ϥ] “the one they listened to” (person)

However, if the ⲡ is tagged as a copula, that part of the construction is not part of the entity span, since it is part of a predication. In these instances, we view the predicate noun phrase as an entity, and the relative clause as a subject clause (compare the Universal Dependency annotation guidelines):

  • [ⲡ ⲛⲟⲩⲧⲉ] ⲡ ⲉⲛⲧ ⲁ ϥ ⲁⲩⲝⲁⲛⲉ “It is God who made them grow”

In this example, “God” receives a span, but “who made them grow” is considered a subject clause (i.e. 'who made them grow is God'), which is not nominal and hence not annotated. Note that according to the tagging guidelines, the second ⲡ should be tagged as COP and lemmatized ⲡⲉ in this sentence.

Body Parts

Most body parts are marked as objects, since they are tangible:

  • ⲟⲩ ϭⲓϫ - “a hand”
  • ⲡⲉϥ ⲃⲁⲗ - “his eye”

However some referential body parts are considered abstract, notably ϩⲏⲧ “heart”

  • ϯ ⲛⲁ ⲧⲣⲉ ⲡⲟⲩ ϩⲏⲧ ⲙⲕⲁϩ - “I will make your [heart] suffer”

Other uses of body parts may be totally figurative or idiomatic, in which case they are not annotated.

Non-numeral modifiers

  • ϩⲁϩ n-X - lots of x
  • ⲛⲟⲩⲟⲉⲓϣ ⲛⲓⲙ - always - Mark these with one referent only

Possessive Constructions

  • ⲡⲁⲡⲁⲩⲗⲟⲥ - the ones belonging to Paul - Such constructions should take the possessor in the span. [ⲡⲁ[ⲡⲁⲩⲗⲟⲥ]]


Groups of entities are interpreted as the entity type of their constituents, for example, a herd of animals is of the type animal:

  • [ⲟⲩ ⲁⲅⲉⲗⲏ ⲛ ϣⲟϣ] - a herd of buffaloes. Note that there is no nested entity for 'buffalo' in this case.

An exception to this guideline is groups of people who form an organization, e.g. ⲥⲩⲛⲁⲅⲱⲅⲏ, ⲥⲧⲣⲁⲧⲉⲩⲙⲁ etc are 'organization', not 'person'

No reference inside compounds

In morphologically complex items containing a verb inside a larger token, that noun cannot be annotated:

  • ⲁ ϥ ϫⲓⲃⲁⲡⲧⲓⲥⲙⲁ - he received-baptism (baptism cannot be annotated as a markable, since it's part of an incorporated verb 'to baptize')


We do not mark coordinate entities in addition to their constituents:

  • [ⲓⲱϩⲁⲛⲛⲏⲥ] ⲙⲛ [ⲁⲛⲧⲱⲛⲓⲟⲥ] (but not also [ⲓⲱϩⲁⲛⲛⲏⲥ ⲙⲛ ⲁⲛⲧⲱⲛⲓⲟⲥ] as a third mentioned entity)

Container and substance

Container and substance form two entities, for example:

  • [ⲟⲩ ⲡⲩⲅⲏ ⲙ [ⲙⲟⲟⲩ]] - a fountain of water (the fountain is 'object', the water is 'substance', and the water can be referred to separately later on)

Peoples and demonyms

Pluralized demonyms indicating members of a people are labeled person:

  • [ⲛ ϩⲉⲗⲗⲏⲛ]

However peoples mentioned as a people (not as a group of individuals) are labeled organization:

  • [ⲡⲉⲕ ⲗⲁⲟⲥ ⲓⲥⲣⲁⲏⲗ]

These cases are usually singular and involve a named people. This guideline does not apply to ad-hoc groups of people who do not form an organized entity, e.g. ⲙⲏⲏϣⲉ 'crowd' is still usually 'person'.

Predicate of unchanging identity

  • ⲡ ⲟⲩⲁ ⲡ ⲟⲩⲁ - each man - Mark as a single entity

Singular noun referring to a group of people

  • ⲡ ⲙⲏⲏϣⲉ - The crowd: is marked with “person”

Two entity types in one chain

  • ⲡⲉ ⲭⲣⲓⲥⲧⲟⲥ… ⲧⲉⲓ ⲥⲛⲧⲉ - the Christ… this foundation - Mark each entity with its own type, e.g., Christ as person and Foundation as object

interruption by copula or particle

Entity expressions interrupted by a copula or particle are spanned to contain the copula or particle. For example, the following span includes the intervening copula:

  • [ⲛⲉϥ ⲁⲡⲟⲥⲧⲟⲗⲟⲥ ⲛⲉ ⲉⲧⲟⲩⲁⲁⲃ]


  • [ϥⲧⲟⲟⲩ ⲇⲉ ⲛ ϩⲟⲟⲩ]
  • [ⲟⲩ ϣⲃⲏⲣ · ϩⲱⲱ ⲕ ⲟⲛ ⲛⲧⲉ ⲡ ⲛⲟⲩⲧⲉ]

Non-adjacent relative clauses are included, unless the interruption contains the verb controlling the head noun (this prevents some possibly very long 'hermeneutical' relatives inside mentions):

  • [ⲣⲱⲙⲉ ⲛⲓⲙ ⲟⲛ ⲉⲧ ⲥⲱⲧⲙ] - and also [any man who hears] (note the interruption 'ⲟⲛ')

But not:

  • ⲉⲣϣⲁⲛ [ⲧ ⲃⲁϣⲟⲣ] ⲁϣⲕⲁⲕ ⲉⲃⲟⲗ ⲁⲛ ⲉⲧⲉ ⲛⲧⲟⲕ ⲡⲉ … - it is not when [the fox] barks, which is you, …

The interruption by the verb 'bark' which is the predicate of 'fox' triggers the guideline to omit the relative clause. Otherwise, the mention could potentially cover the entire clause '[ⲧ ⲃⲁϣⲟⲣ ⲁϣⲕⲁⲕ ⲉⲃⲟⲗ … ]'.

Non-referring cases


  • No annotations are needed for interrogatives (ⲛⲓⲙ, ⲟⲩ)
  • Complex interrogatives follow the same guidelines, but note the subject of a question predicate *can* be an entity. In the following example we have one (non-interrogative) person entity span:
    • ⲛⲓⲙ ⲅⲁⲣ ⲛ ⲣⲱⲙⲉ [ⲡ ⲉⲧ ⲥⲟⲟⲩⲛ ⲛ [ⲛⲁ ⲛ ⲣⲱⲙⲉ]] = “what human is [he who knows [those things which are human]]?”

Note that ⲣⲱⲙⲉ without an article functions adjectivally here, and is not an entity; the phrase with ⲛⲓⲙ is interrogative and therefore not an entity; but the 'p-et-…' phrase is still annotated, as none of these exceptions apply to it.

Figurative body parts and other fixed expressions

The following are considered idioms, in which the constituent nouns are not construed as referential:

  • ⲁϩⲉ ⲣⲁⲧ ϥ - 'stand, set foot' - “foot” is not an entity mention
  • ϯ ⲧⲟⲟⲧ ϥ - 'help, give a hand'
  • ⲉ ⲡ ⲉⲥⲏⲧ - 'down', lit. 'to the ground'
  • ⲟⲩⲏⲣ ⲛ ⲟⲩⲟⲉⲓϣ - 'how long'
  • ⲛ ⲟⲩ ϩⲟⲩⲟ - 'more'
  • (ϩⲱⲃ) ⲛ ϭⲓϫ - 'handy work' - the whole phrase (handywork) is 'abstract' or 'object' in context, but 'hand' is not a referent
  • ⲣ ϩⲛⲁ ϥ - 'want, do one's will' - the word ϩⲛⲁ / ϩⲛⲉ 'will' is figurative, this is a fixed expression for 'desire'
  • ⲉⲡⲧⲏⲣϥ meaning 'at all' is not referential
  • ϩⲁ ⲉⲟⲟⲩ - 'glorious' - the ⲉⲟⲟⲩ is not referential
  • ⲛ ⲟⲩ ⲕⲟⲩⲓ - 'a little' (manner adverbial)
  • ⲛ ⲧ ϩⲉ - meaning 'like'
  • ⲛ ϣⲟⲣⲡ - 'first'
  • ϭⲟⲙ - meaning 'capable' in constructions like ⲛⲧⲕ ϭⲟⲙ ⲁⲛ 'you are not capable'
  • ⲛ ⲟⲩⲱⲧ - together
  • ϩⲓ ⲟⲩ ⲥⲟⲡ - at once
  • ⲙ ⲙⲏⲛⲉ - daily

Special guidelines for projects annotating pronouns

<color red>(note: currently we do not annotate pronouns!)</color>

Expletive pronouns

In projects where pronouns are annotated (note: currently we do not annotate pronouns!) we recommend that correlative/expletive pronouns are not annotated as entities at all:

  • ⲁ [ϥ] ϫⲱ ⲙⲙⲟ ⲥ - he said: - ⲥ is not annotated

Dynamic Passive

  • ⲥⲉ ⲕⲁ ⲛⲉⲕ ⲛⲟⲃⲉ - Either “They forgive your sins” or “Your sins are forgiven”: Always annotate the subject (ⲥⲉ) as an entity.


  • ⲛⲉⲥ ⲉⲣⲏⲩ -itself - Mark with one entity

Relative Clauses

The relative converter ⲉⲧ is not considered referential. In relative clauses with explicit subject pronouns, those pronouns are annotated as usual:

  • ⲙ [ⲡ ⲙⲁ ⲉⲧ [ϥ] ⲛϩⲏⲧ [ϥ]] - The place that he is in (it)

Note that this results in the second pronoun pointing back to the span that contains it - this is allowed in WebAnno.

coptic_entities.txt · Last modified: 2021/02/11 16:44 (external edit)