You can play around with the GUM corpus online using the ANNIS search and visualization platform. ANNIS is an open-source database and front-end query system built to handle multilayer corpora, allowing users to search for sequences of words, part of speech annotations, syntactic categories and relations, entity types, discourse graphs, and much more. It allows users to form complex queries combining information from all annotation layers in the corpus.
To access GUM in ANNIS we use the ANNIS query language (AQL), a graph-based formalism which expresses relations between search nodes, such as words, parts of speech or entities, and their annotations.
- Find adverbs in the Penn tagset that are ordinals in the CLAWS5 tagset:
pos="RB" _=_ claws5="ORD"
- Search for "one" anaphora (e.g. "this... the other one"):
entity ->bridge entity & #1 ->head lemma="one"
- Find an ADJP modified adverbially by an NP:
cat="ADJP" >[func="ADV"] cat="NP"
- Find places mentioned in the same sentence as the United Nations:
s_type _i_ identity="United_Nations" ^* entity="place" _=_ identity!="United_Nations" & #1 _i_ #4
- Search for clauses supplying evidence relations in interviews:
rst:kind >[relname="evidence"] rst:kind & meta::type="interview"
- Find places referred to as "there" and their antecedents:
lemma="there" _=_ entity="place" ->coref entity="place"
- Search for headings beginning with a transitive verb:
head _l_ pos=/V.*/ ->dep[func="obj"] tok