Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create documentation for the HF pipelines. #41

Open
avidale opened this issue Sep 17, 2024 · 0 comments
Open

Create documentation for the HF pipelines. #41

avidale opened this issue Sep 17, 2024 · 0 comments
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers

Comments

@avidale
Copy link
Contributor

avidale commented Sep 17, 2024

We have a recently created huggingface_pipelines directory with some nice code, but no obvios examples of how to use it.

One could create a documentation page that explains the purpose of the pipelines and illustrates the code with which they could be applied.

An example of the task would be to use the FLORES dataset (https://huggingface.co/datasets/facebook/flores) to compare the quality of translation from various languages to one (e.g. to English or to Spanish).

Motivation for the task

A typical way to evaluate SONAR models for a particular language would be to encode some dataset of sentences and then to decode it to the same language (reconstruction) or to another language (translation). Then the generated texts get compared with the reference texts using numeric scores such as BLEU (from the sacrebleu package).

We want to use this task as an opportunity of learning more about the pipelines which are kind of glue that connects the models to the data (by e.g. batching the data to feed to the models).

How to approach

All or most of the code elements are (probably) already somewhere in the repo, the goal is to put them together with the new Hugginface pipeline using segmentation, encoding, decoding, and BLEU computation.

A good entrypoint might be the tests (e.g. https://github.com/facebookresearch/SONAR/blob/main/tests/unit_tests/huggingface_pipelines/text.py) that illustrate some of potential use cases of the HF pipeline.

@avidale avidale added documentation Improvements or additions to documentation good first issue Good for newcomers labels Sep 17, 2024
David-OC17 added a commit to David-OC17/SONAR that referenced this issue Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant