Welcome to the GU ANNIS Web Interface

ANNIS
Legend for symbols Annotation layer symbols:

For all questions and details about obtaining a login to restricted corpora, see this page. For larger/flat annotated corpora, also see our CQP web interface.

This page is maintain by the Corpus Linguistics lab, Corpling@GU

Multilayer Corpora

  • Georgetown University Multilayer Corpus (GUM)
     eng-us / 228,399 / 235
           
  • OntoNotes 3.0 - WSJ section (OntoNotes)  
     eng-us / 370,789 / 597
        
  • OntoNotes 5.0 Chinese Dependencies (OntoNotes5_Chinese_dep)  
     zho / 1,050,841 / 2,036
      
  • OntoNotes 5.0 Coref Section (OntoNotes5_coref)  
     eng-us / 1,590,885 / 3,393
       
  • OntoNotes 5.0 Dependencies (OntoNotes5_dep)  
     eng-us / 2,589,499 / 12,721
      
  • The Potsdam Commentary Corpus Sampler (pcc2)
     deu / 399 / 2
          
  • Gendered Ambiguous Pronouns (UA_English-GAP)
     eng / 374,975 / 4,454
     
  • Treebanks

  • Arabic Treebank (Buckwalter vocalized) (arabic.treebank)  
     ara / 177,950 / 734
      
  • Chinese Treebank 9.0 (Chinese Treebank 9.0)  
     zho / 2,287,073 / 3,726
      
  • English Web Treebank (English.Web.Treebank)
     eng-us / 272,779 / 1,174
      
  • English Web Treebank - Universal Dependencies (English.Web.Treebank_UD)
     eng-us / 254,830 / 1,174
      
  • English Web Treebank - Universal Dependencies V2 (English.Web.Treebank_UD2)
     eng-us / 254,829 / 1,174
      
  • Foreebank En - English Web Support Forum Treebank (Foreebank-en)  
     eng / 15,613 / 1
      
  • Foreebank Fr - French Web Support Forum Treebank (Foreebank-fr)  
     fra / 19,667 / 1
      
  • IAHLT UD Hebrew Treebank (IAHLT_HTB)
     he / 155,919 / 203
  • Open American National Corpus - Manually Annotated Subcorpus (Court Transcripts) (MASC_court)
     eng-us / 37,756 / 39
      
  • Switchboard Telephone Conversation Constituent Corpus (switchboard_const)
     eng-us / 1,095,089 / 646
      
  • Switchboard Telephone Conversation Dependency Corpus (Switchboard (dep))
     eng-us / 1,287,379 / 649
      
  • The Tiger Treebank version 2 (tiger2)  
     deu / 888,578 / 1,971
      
  • French and English Coreference Databases and Corpora (UA_French-Democrat1921)
     fr / 284,885 / 126
     
  • Potsdam Commentary Corpus (UA_German-PCC)
     deu / 33,222 / 176
     
  • UD Hebrew IAHLT Wikipedia section (UD_Hebrew-IAHLTwiki)
     he / 140,949 / 39
      
  • UD Spanish AnCora (UD_Spanish-AnCora)
     es / 559,782 / 1,635
      
  • Spanish Universal Dependency Treebank 2.0 (unidep.es)
     spa / 375,180 / 369
      
  • Japanese Universal Dependency Treebank 2.0 (unidep.jp)
     jap / 80,172 / 80
      
  • Wall Street Journal Dependency Corpus (Wall Street Journal (dep))  
     eng-us / 1,173,766 / 2,312
      
  • Wall Street Journal Constituent Treebank (wsj.const_ptb)  
     eng-us / 1,209,785 / 2,235
      
  • CALLHOME Mandarin Telephone Conversation Treebank (zh.callhome.tb)  
     zho / 108,531 / 41
      
  • Xinhua Mandarin News Treebank (zh.xinhua.tb)  
     zho / 106,934 / 325
      
  • Historical Corpora

  • Penn Parsed Corpus of Early Modern English - Helsinki Subcorpus (PPCEME_helsinki)  
     eng-eme / 627,993 / 147
      
  • Penn Parsed Corpus of Early Modern English - Penn Subcorpus 1 (PPCEME_penn1)  
     eng-eme / 636,421 / 152
      
  • T-CODEX Tatian V2.1 (Tatian 2.1)
     ohg / 11,295 / 2,030
       
  • TraCES Corpus of the Classical Ethiopic Language (Ge'ez) (Traces_SGML)
     gez / 181,577 / 23
     
  • Parallel Corpora

  • SMULTRON Parallel Treebank Sampler (SMULTRON_Banana)
     eng-us,deu / 3,782 / 2
       
  • Learner Corpora

  • CityU Corpus of Essay Drafts of English Language Learners (cityu-2007-08A)  
     eng-L2 / 600,031 / 1,018
       
  • CityU Corpus of Essay Drafts of English Language Learners (cityu-2007-08B)  
     eng-L2 / 1,173,329 / 1,696
       
  • CityU Corpus of Essay Drafts of English Language Learners (cityu-2008-09A)  
     eng-L2 / 3,428,414 / 3,872
       
  • CityU Corpus of Essay Drafts of English Language Learners (cityu-2008-09B)  
     eng-L2 / 2,151,821 / 4,046
       
  • CityU Corpus of Essay Drafts of English Language Learners (cityu-2009-10B)  
     eng-L2 / 424,841 / 532
       
  • The MERLIN corpus - L2 Czech (MERLIN_Czech)
     cze-L2 / 79,969 / 441
       
  • The MERLIN corpus - L2 German (MERLIN_German)
     deu-L2 / 154,335 / 1,033
       
  • The MERLIN corpus - L2 Italian (MERLIN_Italian)
     ita-L2 / 107,211 / 813
       
  • Miscellaneous Corpora

  • VU Amsterdam Metaphor Corpus (VUAMC)
     eng-uk / 238,905 / 117
       
  • Hausa Corpora

  • SFB632 A5 Hausa News Corpus (a5.hausa.news)  
     hau / 2,017 / 4
     
  • SFB632 A5 Hausa Film Corpus [Umarnin Uwa] (a5.hausa.umarnin.uwa_V2)
     hau / 10,194 / 47
     
  • Discourse Treebanks

  • COVID Discourse Treebank (CovidDTB)
     eng / 60,849 / 300
     
  • Georgetown Chinese Discourse Treebank (GCDT)
     zho / 62,905 / 50
       
  • Instructional Discourse Treebank (Instr-DT)
     eng / 56,337 / 176
     
  • The Penn Discourse Treebank 3.0 (PDTB)  
     eng-us / 1,156,308 / 2,161
     
  • RST Discourse Treebank (RST-DT)  
     eng-us / 203,352 / 385
     
  • RST Discourse Treebank (dependencies) (RST-DT_rsd)  
     eng-us / 203,352 / 385
     
  • RST Spanish Treebank (rst.spanish.treebank)
     spa / 57,895 / 267
      
  • The Chinese Science Discourse Treebank (Sci-CDTB)
     zh / 18,761 / 109
     
  • Science Discourse Treebank (SciDTB)
     eng / 102,493 / 798
     
  • Coptic SCRIPTORIUM Corpora

  • Apophthegmata Patrum (apophthegmata.patrum)
     cop / 12,117 / 94
        
  • Besa - Letters (besa.letters)
     cop / 4,543 / 5
      
  • Coptic Universal Dependency Treebank (coptic.treebank)
     cop / 55,016 / 80
      
  • Documentary Papyri (doc.papyri)  
     cop / 289 / 3
      
  • Dormition of John (dormition.john)
     cop / 3,211 / 1
      
  • Canons of Apa Johannes (johannes.canons)  
     cop / 22,509 / 14
      
  • Life of Aphou (life.aphou)
     cop / 4,848 / 2
      
  • Life of Cyrus (life.cyrus)
     cop / 3,559 / 2
      
  • The History of Eustathius and Theopiste (life.eustathius.theopiste)
     cop / 11,000 / 2
      
  • Life of John the Kalybites (life.john.kalybites)
     cop / 8,373 / 2
      
  • Life of Longinus and Lucius (life.longinus.lucius)
     cop / 11,903 / 5
      
  • Life of Onnophrius (life.onnophrius)
     cop / 8,677 / 4
      
  • Life of Paul of Tamma (life.paul.tamma)
     cop / 4,147 / 2
      
  • Life of Phib (life.phib)
     cop / 4,691 / 2
      
  • Life of Pisentius (life.pisentius)
     cop / 23,057 / 3
      
  • Coptic SCRIPTORIUM, Coptic Magical Papyri (magical.papyri)
     cop / 578 / 4
      
  • Martyrdom of Victor (martyrdom.victor)  
     cop / 18,253 / 8
      
  • Mysteries of John the Evangelist (mysteries.john)
     cop / 6,458 / 2
      
  • Instructions of Apa Pachomius (pachomius.instructions)
     cop / 12,986 / 2
      
  • Coptic SCRIPTORIUM, Marcion (pistis.sophia)
     cop / 39,271 / 8
      
  • Pistis Sophia (proclus.homilies)
     cop / 5,214 / 2
      
  • Pseudo-Athanasius Discourses (pseudo.athanasius.discourses)
     cop / 13,976 / 3
      
  • Pseudo-Basil of Caesarea Discourse (pseudo.basil)
     cop / 3,837 / 1
      
  • Encomium on Victor (pseudo.celestinus)
     cop / 23,584 / 3
      
  • Pseudo-Chrysostom (pseudo.chrysostom)
     cop / 8,606 / 2
      
  • Pseudo-Ephrem Writings (pseudo.ephrem)
     cop / 11,245 / 3
      
  • Encomium on Demetrius Archbishop of Alexandria (pseudo.flavianus)
     cop / 7,639 / 2
      
  • Pseudo-Theophilus on the Cross (pseudo.theophilus)  
     cop / 4,974 / 4
      
  • Pseudo-Timothy of Alexandria Discourses (pseudo.timothy)
     cop / 9,749 / 2
      
  • Sahidica Bible - 1 Corinthians (sahidica.1corinthians)
     cop / 12,454 / 16
      
  • Sahidica Bible - Mark (sahidica.mark)
     cop / 20,278 / 16
      
  • Sahidica Coptic New Testament (sahidica.nt)
     cop / 248,718 / 259
      
  • The Book of Ruth (OT) of the Old and New Testament (sahidic.ot)
     cop / 464,977 / 729
      
  • The Gospel of Mark (NT) of the Old and New Testament (sahidic.ruth)
     cop / 3,503 / 4
      
  • Shenoute - Acephalous 22 (shenoute.a22)
     cop / 8,351 / 6
      
  • Shenoute - Abraham Our Father: YA 535-40 (shenoute.abraham)
     cop / 7,696 / 7
      
  • Shenoute - Some Kinds of People Sift Dirt (shenoute.dirt)  
     cop / 6,236 / 6
      
  • Shenoute - I See Your Eagerness (shenoute.eagerness)
     cop / 18,368 / 17
      
  • Shenoute - Not Because a Fox Barks (shenoute.fox)  
     cop / 2,812 / 1
       
  • Shenoute - In the Night: BV278-282 (shenoute.night)
     cop / 1,180 / 1
      
  • Shenoute - Because of You Too O Prince of Evil XH 185-194 (shenoute.prince)
     cop / 4,613 / 2
      
  • Shenoute - Whoever Seeks God Will Find CZ 129-137 (shenoute.seeks)
     cop / 2,195 / 1
      
  • Shenoute - God Says Through Those Who Are His: GF 259-262 (shenoute.those)
     cop / 9,488 / 13
      
  • Shenoute - Unknown Work 5-1: GF 381-88 (shenoute.unknown5_1)
     cop / 2,602 / 2
      
  • [Admin logon]