Overview

I am a computational linguist specializing in work on and with corpora, including corpus linguistics studies, building corpora, and creating annotation interfaces and NLP tools that make corpus creation easier. I also run the Georgetown University Corpus Linguistics lab, Corpling@GU

My main research interests are at the syntax-semantics interface: I study "how we say what we want to say". In particular, I have been working on predictive computational models of referentiality and discourse relations. For example, which entities do we track in conversation? How are they introduced into the discourse and referred back to? How do we recognize discourse relations which signal how a current utterance relates to preceding or subsequent utterances, such as by contrasting with other claims, or supporting them with evidence?

I am also interested in how we learn to be productive in our first, second and subsequent languages, producing some (but not only, and not just any) utterances and combinations we have never heard before. I believe that very many factors constantly and concurrently influence the choice between competing constructions, which means that we need multifactorial methods and multilayer corpus data in order to understand what it is that we do when we produce and understand language.

Research Interests

  • Corpus Linguistics
  • Building and using multilayer corpora
  • Predictive modelling of syntactic alternations
  • Productivity in argument selection
  • Information structure
  • Digital Humanities for Coptic studies
  • Coreference and entity resolution
  • Discourse annotation (especially in Rhetorical Structure Theory)
  • Corpus search and annotation interfaces

Stuff I work on

News and events

Send me an e-mail if you'd like to join corpinfo, the GU mailing list for information on corpus linguistics events, jobs and corpus releases at GU and the DC area. For more news check out the Corpling@GU page

  • The Corpus Linguistics lab now has a new website: Corpling@GU
  • Our GUM corpus is now part of Universal Dependencies: GitDox
  • Check out our new XML and spreadsheet annotation tool: GitDox
  • The KELLIA project's new Coptic Online Dictionary interface for the BBAW's Coptic Lemma List is online! Many thanks to Frank Feder, Maxim Kupreyev and Tonio Sebastian Richter for making this data available!

Older events...