User Tools

Site Tools


tokenization

Tokenization Guidelines

The following form separate tokens:

  • Genitive 's and apostrophe (tokens: 's and ')
  • n't is tokenized separately (also 2 tokens for: wo n't)
  • Acronyms and abbreviations are kept together (U.S., etc.)
tokenization.txt · Last modified: 2018/09/11 10:02 (external edit)