Coptic NLP Service

Enter Coptic text in UTF-8 (XML markup is also allowed, 10,000 characters max).
Bound groups should be separated by spaces or underscores.

If you need to analyze longer texts or multiple texts automatically, you can log in to the secure area or use the API. For a login please contact Amir Zeldes.


My data contains meaningful linebreaks This inserts <line>..</line> tags around each line of text.
If you already have <lb/> tags or your data is already tokenized, you probably want to ignore line breaks.

Ignore linebreaks in my data


Use machine leaning tokenizer [stk-1.0.0] Highly experimental.
Should be more accurate but less stable. Crashes are possible.

SGML pipeline
Just piped and dashed morphemes