User Tools

Site Tools


coptic_entities

Premable

Entity annotation concerns the annotation of referring expressions in a text, i.e. spans of text that refer to things in the world, and their classification into entity types. The purpose of entity annotation in Coptic Scriptorium is to facilitate searches which include specific entity types (e.g. finding a certain epithet using linguistic annotations, such as 'ⲟⲩⲁⲁⲃ' “holy”, but only when applied to a person), to inventorize entities (find all cases of e.g. places mentioned in the Apophthegmata Patrum), and to function as a gateway for entity linking, enabling searches for specific persons (“John the Baptist”), regardless of the exact expression used to mention them. The latter task of entity linking is left outside of the scope of the current guidelines.

Entity annotation can be applied to three types of referring expressions:

  • Named entities, which are headed by a proper noun (e.g. “Apa Papnoute”)
  • Non-named entities, headed by a common noun (e.g. “the angel”)
  • Pronouns – these are currently not annotated by our schema (e.g. “she” is a person)

Referring expressions

Almost all nouns and proper nouns correspond to referring expressions, with the exception of non-referring nouns, such as:

  • ⲁϩⲉ ⲣⲁⲧ.. - “stand, set foot” - does not actually refer to the foot of a person
  • ϩⲛ ⲟⲩ ⲙⲉ - “truly” - does not actually introduce a referenceable 'truth'

One test for referentiality is whether a pronominal or nominal subsequent mention is possible/plausible. For example, the following sounds odd:

  • ⲁϥⲁϩⲉⲣⲁⲧϥ ⲁⲩⲱ ⲡⲉⲓⲣⲁⲧ …“he stood on foot, and this foot…”

Entity Types

We distinguish 11 entity types:

  • abstract - intangible entities not covered by other classes (incl. ideas, emotions)
  • animal
  • event - an occurrence, e.g “the death of the king”, “the arrival of a monk”
  • object - concrete inanimate object
  • organization - organized body of people, e.g. ⲧⲉⲕⲕⲗⲏⲥⲓⲁ, ⲧⲉⲥⲧⲣⲁⲧⲉⲓⲁ
  • person
  • place
  • quantity - a unit of weight or measure, e.g. “six pounds”, “many miles”
  • substance - mass noun indicating a material, e.g. “sand”, “water”, “wine”
  • time

Specific entity guidelines

Body Parts

  • ϭⲓϫ - Mark body parts as objects.
  • figurative body parts are also considered objects (e.g. “in my heart”)

Non-numeral modifiers

  • ϩⲁϩ n-X - lots of x
  • ⲛⲟⲩⲟⲉⲓϣ ⲛⲓⲙ - always - Mark these with one referent only

Possessive Constructions

  • ⲡⲁⲡⲁⲩⲗⲟⲥ - the ones belonging to Paul - Such constructions should take the possessor in the span. [ⲡⲁ[ⲡⲁⲩⲗⲟⲥ]]

Groups

Groups of entities are interpreted as the entity type of their constituents, for example, a herd of animals is of the type animal:

  • [ⲟⲩ ⲁⲅⲉⲗⲏ ⲛ ϣⲟϣ] - a herd of buffaloes. Note that there is no nested entity for 'buffalo' in this case.

An exception to this guideline is groups of people who form an organization, e.g. ⲥⲩⲛⲁⲅⲱⲅⲏ, ⲥⲧⲣⲁⲧⲉⲩⲙⲁ etc are 'organization', not 'person'

No reference inside compounds

In morphologically complex items containing a verb inside a larger token, that noun cannot be annotated:

  • ⲁ ϥ ϫⲓⲃⲁⲡⲧⲓⲥⲙⲁ - he received-baptism (baptism cannot be annotated as a markable, since it's part of an incorporated verb 'to baptize')

Container and substance

Container and substance form two entities, for example:

  • [ⲟⲩ ⲡⲩⲅⲏ ⲙ [ⲙⲟⲟⲩ]] - a fountain of water (the fountain is 'object', the water is 'substance', and the water can be referred to separately later on)

Predicate of unchanging identity

  • ⲡ ⲟⲩⲁ ⲡ ⲟⲩⲁ - each man - Mark as two separate entities

Singular noun referring to a group of people

  • ⲡ ⲙⲏⲏϣⲉ - The crowd: is marked with “person”

Two entity types in one chain

  • ⲡⲉ ⲭⲣⲓⲥⲧⲟⲥ… ⲧⲉⲓ ⲥⲛⲧⲉ - the Christ… this foundation - Mark each entity with its own type, e.g., Christ as person and Foundation as object

interruption by copula or particle

Entity expressions interrupted by a copula or particle are spanned to contain the copula or particle. For example, the following span includes the intervening copula:

  • [ⲛⲉϥ ⲁⲡⲟⲥⲧⲟⲗⲟⲥ ⲛⲉ ⲉⲧⲟⲩⲁⲁⲃ]

Similarly:

  • [ϥⲧⲟⲟⲩ ⲇⲉ ⲛ ϩⲟⲟⲩ]
  • [ⲟⲩ ϣⲃⲏⲣ · ϩⲱⲱ ⲕ ⲟⲛ ⲛⲧⲉ ⲡ ⲛⲟⲩⲧⲉ]

Non-referring cases

Interrogatives

  • No annotations are needed for interrogatives (ⲛⲓⲙ, ⲟⲩ)

Figurative body parts and other fixed expressions

The following are considered idioms, in which the constituent nouns are not construed as referential:

  • ⲁϩⲉ ⲣⲁⲧ ϥ - 'stand, set foot' - “foot” is not an entity mention
  • ϯ ⲧⲟⲟⲧ ϥ - 'help, give a hand'
  • ⲕⲱ ⲛ ⲣⲱϥ - 'be silent' - “mouth” is not tagged as an entity
  • ⲉ ⲡ ⲉⲥⲏⲧ - 'down', lit. 'to the ground'
  • ⲟⲩⲏⲣ ⲛ ⲟⲩⲟⲉⲓϣ - 'how long'
  • ⲛ ⲟⲩ ϩⲟⲩⲟ - 'more'
  • (ϩⲱⲃ) ⲛ ϭⲓϫ - 'handy work' - the whole phrase (handywork) is 'abstract', but 'hand' is not a referent
  • ⲣ ϩⲛⲁ ϥ - 'want, do one's will' - the word ϩⲛⲁ / ϩⲛⲉ 'will' is figurative, this is a fixed expression for 'desire'
  • ⲉⲡⲧⲏⲣϥ meaning 'at all' is not referential
  • ϩⲁ ⲉⲟⲟⲩ - 'glorious' - the ⲉⲟⲟⲩ is not referential
  • ⲛ ⲟⲩ ⲕⲟⲩⲓ - 'a little' (manner adverbial)
  • ⲛ ⲧ ϩⲉ - meaning 'like'
  • ⲛ ϣⲟⲣⲡ - 'first'
  • ϭⲟⲙ - meaning 'capable' in constructions like ⲛⲧⲕ ϭⲟⲙ ⲁⲛ 'you are not capable'
  • ⲛ ⲟⲩⲱⲧ - together
  • ϩⲓ ⲟⲩ ⲥⲟⲡ - at once
  • ⲙ ⲙⲏⲛⲉ - daily

Special guidelines for projects annotating pronouns

(note: currently we do not annotate pronouns!)

Expletive pronouns

In projects where pronouns are annotated (note: currently we do not annotate pronouns!) we recommend that correlative/expletive pronouns are not annotated as entities at all:

  • ⲁ [ϥ] ϫⲱ ⲙⲙⲟ ⲥ - he said: - ⲥ is not annotated

Dynamic Passive

  • ⲥⲉ ⲕⲁ ⲛⲉⲕ ⲛⲟⲃⲉ - Either “They forgive your sins” or “Your sins are forgiven”: Always annotate the subject (ⲥⲉ) as an entity.

Reflexivity

  • ⲛⲉⲥ ⲉⲣⲏⲩ -itself - Mark with one entity

Relative Clauses

The relative converter ⲉⲧ is not considered referential. In relative clauses with explicit subject pronouns, those pronouns are annotated as usual:

  • ⲙ [ⲡ ⲙⲁ ⲉⲧ [ϥ] ⲛϩⲏⲧ [ϥ]] - The place that he is in (it)

Note that this results in the second pronoun pointing back to the span that contains it - this is allowed in WebAnno.

coptic_entities.txt · Last modified: 2020/03/24 19:06 by amir