Skip to content

GuaraniTextProcessing

Alex Rudnick edited this page Sep 26, 2013 · 1 revision

Maybe this is going to be a separate package.

sentence segmentation

  • the default NLTK sentence segmenter might not be quite what we want; need to look into this.

text normalization

  • If we have text with circumflex accents, we want to change those to tildes.
  • what else?