You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, text preprocessing is performed via third-party command line tools (Moses and SentencePiece), which makes their use less convenient, especially when processing one sentence at a time.
Currently, text preprocessing is performed via third-party command line tools (Moses and SentencePiece), which makes their use less convenient, especially when processing one sentence at a time.
We will need to switch to their Python implementations (i.e. sacremoses and Python interface for SentencePiece) and wrap them into an interface like Tokenizer in the transformers package responsible for all the text preprocessing.
Some of the Moses scripts may be available in the Stopes repository, and some of them might be needed to re-implemented from scratch.
The text was updated successfully, but these errors were encountered: