Welcome to the Georgetown University CQPweb Interface!

Legend for symbols

For all questions and details about obtaining a login to restricted corpora, see this page. For richly annotated multilayer corpora/treebanks, see our ANNIS interface.

Please select a corpus from the list below to enter.

English Reference Corpora

  • British National Corpus (bnc)  
     eng-uk / 111,246,947 / 4,054
  • The Brown Corpus (brown)
     eng-us / 1,172,053 / 500
  • COCA - Corpus of Contemporary American English - Academic (cocaacademic)  
     eng-us / 99,136,626 / 21,338
  • COCA - Corpus of Contemporary American English - Fiction (cocafiction)  
     eng-us / 101,672,888 / 19,437
  • COCA - Corpus of Contemporary American English - Magazine (cocamagazine)  
     eng-us / 104,565,975 / 53,186
  • COCA - Corpus of Contemporary American English - Newspaper (cocanews)  
     eng-us / 101,423,806 / 57,146
  • COCA - Corpus of Contemporary American English - Spoken (cocaspoken)  
     eng-us / 102,873,602 / 37,758
  • Frown Corpus J (frown_j)  
     eng-uk / 185,276 / 80
  • Penn Treebank CQPfied (ptbcqp)  
     eng-us / 1,678,233 / 2
  • Spoken Language Corpora

  • HCRC Map Task Corpus (hcrcmap_2)
     eng-sct / 188,898 / 128
  • Switchboard Corpus (switchboard)
     eng-us / 1,159,308 / 649
  • Literary Corpora

  • Jane Austen Corpus (austen)
     eng-uk / 423,669 / 3
  • Charles Dickens Corpus (dickens)
     eng-uk / 3,407,085 / 14
  • Don Quijote (don_quijote_spa)
     spa / 429,855 / 1
  • Tom Sawyer (tom_sawyer_eng)
     eng-us / 86,747 / 1
  • Newspaper Corpora

  • Arabic Treebank CQPfied (arabictb)  
     ara / 168,722 / 734
  • Chinese Treebank 9.0 (chinese_treebank9)  
     zho / 2,080,333 / 3,726
  • New York Times - Arts Subcorpus (nyt_arts)  
     eng-us / 101,087,365 / 118,433
  • Slate Magazine Corpus (slate)
     eng-us / 4,957,498 / 4,531
  • Political Corpora

  • Bush and Kerry Presidential Debate (bush_kerry_debate)
     eng-us / 48,230 / 2
  • Inaugural Address Corpus (inaugural)
     eng-us / 144,980 / 56
  • German Bundestag Protocols (parlament)
     deu / 36,723,139 / 837
  • State of the Union Corpus (stateoftheunion)
     eng-us / 448,968 / 73
  • Web Corpora

  • DECOW - COrpora from the Web - German - Part 01 (decow01)  
     deu / 300,002,861 / 198,608
  • DECOW - COrpora from the Web - German - Part 02 (decow02)  
     deu / 300,003,990 / 231,532
  • DECOW - COrpora from the Web - German - Part 03 (decow03)  
     deu / 300,008,102 / 325,463
  • DE Web as Corpus - Part 1 (dewac01)  
     deu / 268,848,124 / 289,824
  • DE Web as Corpus - Part 2 (dewac02)  
     deu / 268,848,124 / 288,223
  • DE Web as Corpus - Part 3 (dewac03)  
     deu / 268,884,554 / 290,941
  • DE Web as Corpus - Part 4 (dewac04)  
     deu / 268,931,207 / 289,139
  • DE Web as Corpus - Part 5 (dewac05)  
     deu / 268,908,956 / 288,386
  • DE Web as Corpus - Part 6 (dewac06)  
     deu / 282,733,943 / 305,382
  • Corpus of Web-Based Global English - US Blogs (glowbe_usblog)  
     eng-us / 142,425,833 / 106,385
  • Corpus of Web-Based Global English - US General Web (glowbe_usgenl)  
     eng-us / 272,905,012 / 168,771
  • Russian Internet Corpus Sampler (i_ru_sample)  
     rus / 5,231,112 / 843
  • Stanford Sentiment Analyzed Twitter Corpus (sentiment140)  
     eng / 24,473,485 / 2
  • UK Web as Corpus - Part 1 (ukwac01)  
     eng-uk / 277,566,848 / 330,390
  • UK Web as Corpus - Part 2 (ukwac02)  
     eng-uk / 277,590,843 / 331,233
  • UK Web as Corpus - Part 3 (ukwac03)  
     eng-uk / 277,580,108 / 332,942
  • UK Web as Corpus - Part 4 (ukwac04)  
     eng-uk / 277,586,138 / 332,744
  • UK Web as Corpus - Part 5 (ukwac05)  
     eng-uk / 277,569,079 / 333,030
  • UK Web as Corpus - Part 6 (ukwac06)  
     eng-uk / 277,609,694 / 331,553
  • UK Web as Corpus - Part 7 (ukwac07)  
     eng-uk / 277,587,296 / 330,634
  • UK Web as Corpus - Part 8 (ukwac08)  
     eng-uk / 308,472,497 / 370,107
  • Learner Corpora and Native Controls

  • Hong Kong City University Corpus of English Learner Academic Drafts (cityu)  
     eng-L2 / 7,720,912 / 11,170
  • The Gachon Korean EFL Learner Corpus (gachon)
     eng-L2 / 1,824,373 / 16,111
  • International Corpus of Learner English (icle)  
     eng-L2 / 2,808,577 / 3,701
  • International Corpus Network of Asian Learners of English (icnale)  
     eng-L2 / 1,963,147 / 9,836
  • Louvain Corpus of Native English Student Essays (LOCNESS)  
     eng-uk/us / 346,906 / 388
  • World English Corpora

  • ICE Jamaica (ice_ja)  
     eng-ja / 1,156,149 / 500
  • ICE Singapore (ice_sg)  
     eng-sg / 1,163,008 / 500
  • National University of Singapore SMS Corpus (nus_sms)
     eng-sg / 150,397 / 10,117
  • Bible Corpora

  • The King James Bible Corpus (biblekjv)
     eng-uk / 915,179 / 66
  • The Luther Bible Corpus (bibleluther)
     deu / 813,333 / 66
  • The Dutch Statenvertaling Bible (biblestv)
     nld / 920,759 / 66
  • The World English Bible Corpus (bibleweb)
     eng-us / 901,701 / 66
  • Historical Corpora

  • Corpus of Historical American English (coha)  
     eng-us / 448,200,483 / 116,773
  • Penn Parsed Corpus of Middle English v2 (ppcme2)  
     eng-enm / 1,351,054 / 56
  • Coptic Scriptorium Corpora

  • Besa Letters Corpus (besa)
     cop / 1,907 / 2
  • Uncategorised

  • Chatino Zapotec Corpus (chazap)  
     zap / 721,976 / 290,292
  • Gradable Modal Expressions (for CQP, build 76) (GMEv01b76)  
     eng-us / 301,243 / 533
  • Gradable Modal Expressions (for CQP, build 78) (GMEv01b78)
     eng-us / 301,243 / 533
  • Gradable Modal Expression (Version 1.0, Build 79) (GMEv10b79)
     eng-us / 301,090 / 534
  • Georgetown University Multilayer Corpus (gum)
     eng / 42,414 / 54
  • Indexing a total of 6,470,983,860 tokens in 64 corpora.

    CQPweb v3.0.16 © 2008-2013 [Admin logon] You are not logged in