You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a recently created huggingface_pipelines directory with some nice code, but no obvios examples of how to use it.
One could create a documentation page that explains the purpose of the pipelines and illustrates the code with which they could be applied.
An example of the task would be to use the FLORES dataset (https://huggingface.co/datasets/facebook/flores) to compare the quality of translation from various languages to one (e.g. to English or to Spanish).
Motivation for the task
A typical way to evaluate SONAR models for a particular language would be to encode some dataset of sentences and then to decode it to the same language (reconstruction) or to another language (translation). Then the generated texts get compared with the reference texts using numeric scores such as BLEU (from the sacrebleu package).
We want to use this task as an opportunity of learning more about the pipelines which are kind of glue that connects the models to the data (by e.g. batching the data to feed to the models).
How to approach
All or most of the code elements are (probably) already somewhere in the repo, the goal is to put them together with the new Hugginface pipeline using segmentation, encoding, decoding, and BLEU computation.
We have a recently created huggingface_pipelines directory with some nice code, but no obvios examples of how to use it.
One could create a documentation page that explains the purpose of the pipelines and illustrates the code with which they could be applied.
An example of the task would be to use the FLORES dataset (https://huggingface.co/datasets/facebook/flores) to compare the quality of translation from various languages to one (e.g. to English or to Spanish).
Motivation for the task
A typical way to evaluate SONAR models for a particular language would be to encode some dataset of sentences and then to decode it to the same language (reconstruction) or to another language (translation). Then the generated texts get compared with the reference texts using numeric scores such as BLEU (from the
sacrebleu
package).We want to use this task as an opportunity of learning more about the pipelines which are kind of glue that connects the models to the data (by e.g. batching the data to feed to the models).
How to approach
All or most of the code elements are (probably) already somewhere in the repo, the goal is to put them together with the new Hugginface pipeline using segmentation, encoding, decoding, and BLEU computation.
A good entrypoint might be the tests (e.g. https://github.com/facebookresearch/SONAR/blob/main/tests/unit_tests/huggingface_pipelines/text.py) that illustrate some of potential use cases of the HF pipeline.
The text was updated successfully, but these errors were encountered: