Create documentation for the HF pipelines. #41

avidale · 2024-09-17T08:57:13Z

We have a recently created huggingface_pipelines directory with some nice code, but no obvios examples of how to use it.

One could create a documentation page that explains the purpose of the pipelines and illustrates the code with which they could be applied.

An example of the task would be to use the FLORES dataset (https://huggingface.co/datasets/facebook/flores) to compare the quality of translation from various languages to one (e.g. to English or to Spanish).

Motivation for the task

A typical way to evaluate SONAR models for a particular language would be to encode some dataset of sentences and then to decode it to the same language (reconstruction) or to another language (translation). Then the generated texts get compared with the reference texts using numeric scores such as BLEU (from the sacrebleu package).

We want to use this task as an opportunity of learning more about the pipelines which are kind of glue that connects the models to the data (by e.g. batching the data to feed to the models).

How to approach

All or most of the code elements are (probably) already somewhere in the repo, the goal is to put them together with the new Hugginface pipeline using segmentation, encoding, decoding, and BLEU computation.

A good entrypoint might be the tests (e.g. https://github.com/facebookresearch/SONAR/blob/main/tests/unit_tests/huggingface_pipelines/text.py) that illustrate some of potential use cases of the HF pipeline.

The text was updated successfully, but these errors were encountered:

avidale added documentation Improvements or additions to documentation good first issue Good for newcomers labels Sep 17, 2024

David-OC17 added a commit to David-OC17/SONAR that referenced this issue Oct 11, 2024

Save first changes for issue facebookresearch#41

f39d920

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create documentation for the HF pipelines. #41

Create documentation for the HF pipelines. #41

avidale commented Sep 17, 2024 •

edited

Loading

Create documentation for the HF pipelines. #41

Create documentation for the HF pipelines. #41

Comments

avidale commented Sep 17, 2024 • edited Loading

Motivation for the task

How to approach

avidale commented Sep 17, 2024 •

edited

Loading