Citation

License and citation information

AMALGUM annotations are made available under a Creative Commons Attribution (CC-BY) 4.0 license. For the underlying licenses of the source texts please see the terms of use for the individual sources listed on the main page. Reddit plain text data in particular must be obtained separately (please contact us for details).

As a scholarly citation for the corpus in articles, please use this paper:

  • Gessler, Luke, Peng, Siyao, Liu, Yang, Zhu, Yilun, Behzad, Shabnam and Zeldes, Amir (2020) "AMALGUM – A Free, Balanced, Multilayer English Web Corpus". In: Proceedings of LREC 2020. Marseille, France, 5267–5275. Zeldes, Amir (2017) "AMALGUM – A Free, Balanced, Multilayer English Web Corpus". In: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). Marseille, France, 5267–5275.
 @InProceedings{GesslerEtAl2020,
   author    = {Gessler, Luke and Peng, Siyao 
     and Liu, Yang and Zhu, Yilun and Behzad, 
     Shabnam and Zeldes, Amir },
   title     = {{AMALGUM} -- A Free, Balanced, 
     Multilayer {E}nglish {W}eb Corpus},
   booktitle   = {Proceedings of the 12th 
     Conference on Language Resources 
     and Evaluation (LREC 2020)},
   year      = {2020},
   pages     = {5267--5275},
   url       = {https://www.aclweb.org/anthology/2020.lrec-1.648/}
 }

The ISLRN for the corpus is 903-099-580-537-2.