Welcome to the GU ANNIS Web Interface

ANNIS
Legend for symbols Annotation layer symbols:

For all questions and details about obtaining a login to restricted corpora, see this page. For larger/flat annotated corpora, also see our CQP web interface.

This page is maintain by the Corpus Linguistics lab, Corpling@GU

Multilayer Corpora

  • Georgetown University Multilayer Corpus (GUM)
     eng-us / 85,353 / 101
           
  • OntoNotes 3.0 - WSJ section (OntoNotes)  
     eng-us / 370,789 / 597
        
  • OntoNotes 5.0 Chinese Dependencies (OntoNotes5_Chinese_dep)  
     zho / 1,050,841 / 2,036
      
  • OntoNotes 5.0 Coref Section (OntoNotes5_coref)  
     eng-us / 1,590,885 / 3,393
       
  • OntoNotes 5.0 Dependencies (OntoNotes5_dep)  
     eng-us / 2,589,499 / 12,721
      
  • The Potsdam Commentary Corpus Sampler (pcc2)
     deu / 399 / 2
          
  • RST Spanish Treebank (rst.spanish.treebank)
     spa / 57,895 / 267
      
  • T-CODEX Tatian V2.1 (Tatian 2.1)
     ohg / 11,295 / 2,030
       
  • Treebanks

  • Arabic Treebank (Buckwalter vocalized) (arabic.treebank)  
     ara / 177,950 / 734
      
  • Chinese Treebank 9.0 (Chinese Treebank 9.0)  
     zho / 2,287,073 / 3,726
      
  • English Web Treebank (English.Web.Treebank)
     eng-us / 272,779 / 1,174
      
  • English Web Treebank - Universal Dependencies (English.Web.Treebank_UD)
     eng-us / 254,830 / 1,174
      
  • English Web Treebank - Universal Dependencies V2 (English.Web.Treebank_UD2)
     eng-us / 254,829 / 1,174
      
  • Foreebank En - English Web Support Forum Treebank (Foreebank-en)  
     eng / 15,613 / 1
      
  • Foreebank Fr - French Web Support Forum Treebank (Foreebank-fr)  
     fra / 19,667 / 1
      
  • Open American National Corpus - Manually Annotated Subcorpus (Court Transcripts) (MASC_court)
     eng-us / 37,756 / 39
      
  • Switchboard Telephone Conversation Constituent Corpus (switchboard_const)
     eng-us / 1,095,089 / 646
      
  • Switchboard Telephone Conversation Dependency Corpus (Switchboard (dep))
     eng-us / 1,287,379 / 649
      
  • The Tiger Treebank version 2 (tiger2)  
     deu / 888,578 / 1,971
      
  • Spanish Universal Dependency Treebank 2.0 (unidep.es)
     spa / 375,180 / 369
      
  • Japanese Universal Dependency Treebank 2.0 (unidep.jp)
     jap / 80,172 / 80
      
  • Wall Street Journal Dependency Corpus (Wall Street Journal (dep))  
     eng-us / 1,173,766 / 2,312
      
  • Wall Street Journal Constituent Treebank (wsj.const_ptb)  
     eng-us / 1,209,785 / 2,235
      
  • CALLHOME Mandarin Telephone Conversation Treebank (zh.callhome.tb)  
     zho / 108,531 / 41
      
  • Xinhua Mandarin News Treebank (zh.xinhua.tb)  
     zho / 106,934 / 325
      
  • Historical Corpora

  • Penn Parsed Corpus of Early Modern English - Helsinki Subcorpus (PPCEME_helsinki)  
     eng-eme / 627,993 / 147
      
  • Penn Parsed Corpus of Early Modern English - Penn Subcorpus 1 (PPCEME_penn1)  
     eng-eme / 636,421 / 152
      
  • Parallel Corpora

  • SMULTRON Parallel Treebank Sampler (SMULTRON_Banana)
     eng-us,deu / 3,782 / 2
       
  • Learner Corpora

  • CityU Corpus of Essay Drafts of English Language Learners (cityu-2007-08A)  
     eng-L2 / 600,031 / 1,018
       
  • CityU Corpus of Essay Drafts of English Language Learners (cityu-2007-08B)  
     eng-L2 / 1,173,329 / 1,696
       
  • CityU Corpus of Essay Drafts of English Language Learners (cityu-2008-09A)  
     eng-L2 / 3,428,414 / 3,872
       
  • CityU Corpus of Essay Drafts of English Language Learners (cityu-2008-09B)  
     eng-L2 / 2,151,821 / 4,046
       
  • CityU Corpus of Essay Drafts of English Language Learners (cityu-2009-10B)  
     eng-L2 / 424,841 / 532
       
  • The MERLIN corpus - L2 Czech (MERLIN_Czech)
     cze-L2 / 79,969 / 441
       
  • The MERLIN corpus - L2 German (MERLIN_German)
     deu-L2 / 154,335 / 1,033
       
  • The MERLIN corpus - L2 Italian (MERLIN_Italian)
     ita-L2 / 107,211 / 813
       
  • Miscellaneous Corpora

  • VU Amsterdam Metaphor Corpus (VUAMC)
     eng-uk / 238,905 / 117
       
  • Hausa Corpora

  • SFB632 A5 Hausa News Corpus (a5.hausa.news)  
     hau / 2,017 / 4
     
  • SFB632 A5 Hausa Film Corpus [Umarnin Uwa] (a5.hausa.umarnin.uwa_V2)
     hau / 10,194 / 47
     
  • Coptic SCRIPTORIUM Corpora

  • Apophthegmata Patrum (apophthegmata.patrum)
     cop / 10,351 / 75
        
  • Besa - Letters (besa.letters)
     cop / 2,296 / 3
      
  • Documentary Papyri (doc.papyri)  
     cop / 290 / 3
      
  • Canons of Apa Johannes (johannes.canons)  
     cop / 2,257 / 3
      
  • Martyrdom of Victor (martyrdom.victor)  
     cop / 2,033 / 1
      
  • Pseudo-Theophilus on the Cross (pseudo.theophilus)  
     cop / 4,971 / 4
      
  • Sahidica Bible - 1 Corinthians (sahidica.1corinthians)
     cop / 12,471 / 16
      
  • Sahidica Bible - Mark (sahidica.mark)
     cop / 20,185 / 16
      
  • Sahidica Coptic New Testament (sahidica.nt)
     cop / 244,077 / 259
      
  • Shenoute - Acephalous 22 (shenoute.a22)
     cop / 7,589 / 4
      
  • Shenoute - Abraham our Father (shenoute.abraham.our.father)
     cop / 7,670 / 7
      
  • Shenoute - Some Kinds of People Sift Dirt (shenoute.dirt)  
     cop / 888 / 1
      
  • Shenoute - I See Your Eagerness (shenoute.eagerness)
     cop / 18,353 / 17
      
  • Shenoute - Not Because a Fox Barks (shenoute.fox)  
     cop / 2,814 / 1
       
  • [Admin logon]