Welcome to CQPweb!

Legend for symbols

For all questions and details about obtaining a login to restricted corpora, see this page. For richly annotated multilayer corpora/treebanks, see our ANNIS interface.

This page is maintain by the Corpus Linguistics lab, Corpling@GU

Log in
Username:
Password:
Stay logged in on this computer:
 
I'm not a robot and I want guest access

English Reference Corpora

  • ACL Anthology (1983-2022) (aclanthology)
     eng / 294,834,441 / 64,755
  • A Free, Balanced, Multilayer English Web Corpus (amalgum)
     eng / 3,852,110 / 4,723
  • British National Corpus (bnc)  
     eng-uk / 111,246,947 / 4,054
  • The Brown Corpus (brown)
     eng-us / 1,172,053 / 500
  • COCA - Corpus of Contemporary American English - Academic (cocaacademic)  
     eng-us / 122,445,373 / 26,203
  • COCA - Corpus of Contemporary American English - Blog (cocablog)  
     eng-us / 108,622,152 / 82,021
  • COCA - Corpus of Contemporary American English - Fiction (cocafiction)  
     eng-us / 126,484,317 / 25,993
  • COCA - Corpus of Contemporary American English - Magazine (cocamagazine)  
     eng-us / 129,459,423 / 86,225
  • COCA - Corpus of Contemporary American English - News (cocanews)  
     eng-us / 126,231,664 / 90,243
  • COCA - Corpus of Contemporary American English - Spoken (cocaspoken)  
     eng-us / 130,081,788 / 44,803
  • COCA - Corpus of Contemporary American English - TV and Movies (cocatvmovies)  
     eng-us / 153 / 23,975
  • COCA - Corpus of Contemporary American English - Web (cocaweb)  
     eng-us / 139 / 88,989
  • Frown Corpus J (frown_j)  
     eng-uk / 185,276 / 80
  • Penn Treebank CQPfied (gold tagged) (ptbcqp_gold)  
     eng-us / 2,467,944 / 3,153
  • Spoken Language Corpora

  • CIEMPIESS: A New Open-Sourced Mexican Spanish Radio Corpus (ciempiess)
     spa / 349,321 / 174
  • HCRC Map Task Corpus (hcrcmap_2)
     eng-sct / 188,898 / 128
  • Switchboard Corpus (switchboard)
     eng-us / 1,159,308 / 649
  • TED talks English (ted)
     eng / 5,159,586 / 2,085
  • Non-English Reference Corpora

  • Lancaster Corpus of Mandarin Chinese (lcmc)
     zho / 1,002,340 / 500
  • Literary Corpora

  • The Complete Works of Jane Austen (austen_complete)
     eng-uk / 1,012,185 / 9
  • Charles Dickens Corpus (dickens)
     eng-uk / 3,407,085 / 14
  • Don Quijote (don_quijote_spa)
     spa / 429,855 / 1
  • Tom Sawyer (tom_sawyer_eng)
     eng-us / 86,747 / 1
  • Political Corpora

  • Bush and Kerry Presidential Debate (bush_kerry_debate)
     eng-us / 48,230 / 2
  • Inaugural Address Corpus (inaugural)
     eng-us / 144,980 / 56
  • The Mueller Report Corpus (mueller)
     eng-us / 228,799 / 2
  • German Bundestag Protocols (parlament)
     deu / 36,723,139 / 837
  • State of the Union Corpus (1790-2021) (sotu2021)
     eng-us / 1,962,452 / 235
  • State of the Union (1790-2023) (sotu2023)
     eng-us / 2,004,687 / 237
  • Web Corpora

  • DECOW - COrpora from the Web - German - Part 01 (decow01)  
     deu / 300,002,861 / 198,608
  • DECOW - COrpora from the Web - German - Part 02 (decow02)  
     deu / 300,003,990 / 231,532
  • DECOW - COrpora from the Web - German - Part 03 (decow03)  
     deu / 300,008,102 / 325,463
  • DE Web as Corpus - Part 1 (dewac01)  
     deu / 268,848,124 / 289,824
  • DE Web as Corpus - Part 2 (dewac02)  
     deu / 268,848,124 / 288,223
  • DE Web as Corpus - Part 3 (dewac03)  
     deu / 268,884,554 / 290,941
  • DE Web as Corpus - Part 4 (dewac04)  
     deu / 268,931,207 / 289,139
  • DE Web as Corpus - Part 5 (dewac05)  
     deu / 268,908,956 / 288,386
  • DE Web as Corpus - Part 6 (dewac06)  
     deu / 282,733,943 / 305,382
  • ENCOW2016 - COrpora from the Web - English - Part 01 (encow01)  
     eng / 300,004,068 / 225,073
  • ENCOW2016 - COrpora from the Web - English - Part 02 (encow02)  
     eng / 300,000,358 / 222,638
  • ENCOW2016 - COrpora from the Web - English - Part 03 (encow03)  
     eng / 214,108,271 / 163,651
  • ENCOW2016 - COrpora from the Web - English - Part 04 (encow04)  
     eng / 289,139 / 289,139
  • ENCOW2016 - COrpora from the Web - English - Part 05 (encow05)  
     eng / 288,386 / 288,386
  • ENCOW2016 - COrpora from the Web - English - Part 06 (encow06)  
     eng / 161,936 / 161,936
  • Corpus of Web-Based Global English - US Blogs (glowbe_usblog)  
     eng-us / 142,425,833 / 106,385
  • Corpus of Web-Based Global English - US General Web (glowbe_usgenl)  
     eng-us / 272,905,012 / 168,771
  • Russian Internet Corpus Sampler (i_ru_sample)  
     rus / 5,231,112 / 843
  • Stanford Sentiment Analyzed Twitter Corpus (sentiment140)  
     eng / 24,473,485 / 2
  • UK Web as Corpus - Part 1 (ukwac01)  
     eng-uk / 277,566,848 / 330,390
  • UK Web as Corpus - Part 2 (ukwac02)  
     eng-uk / 277,590,843 / 331,233
  • UK Web as Corpus - Part 3 (ukwac03)  
     eng-uk / 277,580,108 / 332,942
  • UK Web as Corpus - Part 4 (ukwac04)  
     eng-uk / 277,586,138 / 332,744
  • UK Web as Corpus - Part 5 (ukwac05)  
     eng-uk / 277,569,079 / 333,030
  • UK Web as Corpus - Part 6 (ukwac06)  
     eng-uk / 277,609,694 / 331,553
  • UK Web as Corpus - Part 7 (ukwac07)  
     eng-uk / 277,587,296 / 330,634
  • UK Web as Corpus - Part 8 (ukwac08)  
     eng-uk / 308,472,497 / 370,107
  • Newspaper Corpora

  • Arabic Treebank CQPfied (arabictb)  
     ara / 168,722 / 734
  • Chinese Treebank 9.0 (chinese_treebank9)  
     zho / 2,080,333 / 3,726
  • New York Times - Arts Subcorpus (nyt_arts)  
     eng-us / 101,087,365 / 118,433
  • Slate Magazine Corpus (slate_alt)
     eng-us / 4,929,752 / 4,531
  • Learner Corpora and Native Controls

  • Arabic Learner Corpus (arablearn)  
     ara / 444,321 / 1,585
  • Hong Kong City University Corpus of English Learner Academic Drafts (cityu)  
     eng-L2 / 7,720,912 / 11,170
  • English Language Questions and Answers (elqa)
     eng / 36,656,346 / 71,052
  • The Gachon Korean EFL Learner Corpus (gachon)
     eng-L2 / 1,824,373 / 16,111
  • International Corpus of Learner English (icle)  
     eng-L2 / 2,808,577 / 3,701
  • International Corpus Network of Asian Learners of English (icnale)  
     eng-L2 / 1,963,147 / 9,836
  • Louvain Corpus of Native English Student Essays (LOCNESS)  
     eng-uk/us / 346,906 / 388
  • Spanish Learner Language Oral Corpora (SPLLOC) (splloc)
     spa-L2 / 372,567 / 561
  • Bible Corpora

  • The King James Bible Corpus (biblekjv)
     eng-uk / 915,179 / 66
  • The Luther Bible Corpus (bibleluther)
     deu / 813,333 / 66
  • The Dutch Statenvertaling Bible (biblestv)
     nld / 920,759 / 66
  • The World English Bible Corpus (bibleweb)
     eng-us / 901,701 / 66
  • World English Corpora

  • Corpus of Web-Based Global English - Hong Kong (glowbe_hk)  
     eng-hk / 42,979,217 / 43,936
  • ICE Jamaica (ice_ja)  
     eng-ja / 1,156,149 / 500
  • ICE Singapore (ice_sg)  
     eng-sg / 1,163,008 / 500
  • National University of Singapore SMS Corpus (nus_sms)
     eng-sg / 150,397 / 10,117
  • Historical Corpora

  • Ancient Chinese Corpus - Zuozhuan (acc)  
     zho-lzh / 194,258 / 2
  • Corpus of Historical American English (coha)  
     eng-us / 448,200,483 / 116,773
  • Georgetown University Historical Reddit Corpus 2020 (guhrc) (guhrc2020)  
     eng / 43,858,955 / 557,579
  • Penn Parsed Corpus of Middle English v2 (ppcme2)  
     eng-enm / 1,351,054 / 56
  • Sheffield Corpus of Chinese - Historical Chinese Sample (sheffieldchinese)
     zho / 14,282 / 3
  • Coptic Scriptorium Corpora

  • Besa Letters Corpus (besa)
     cop / 1,907 / 2
  • Uncategorised

  • Chatino Zapotec Corpus (chazap)  
     zap / 721,976 / 290,292
  • Gradable Modal Expressions (for CQP, build 76) (GMEv01b76)  
     eng-us / 301,243 / 533
  • Gradable Modal Expressions (for CQP, build 78) (GMEv01b78)
     eng-us / 301,243 / 533
  • Gradable Modal Expression (Version 1.0, Build 79) (GMEv10b79)
     eng-us / 301,090 / 534
  • Georgetown University Multilayer Corpus v9 (gum9)
     eng-us / 0 / 0
  • Rohingya News Corpus (rohingya)
     eng / 168,372 / 412
  • Indexing a total of 7,954,429,283 tokens in 88 corpora.

    CQPweb v3.2.11 © 2008-2016 [Admin logon] You are not logged in