Update the main README file with a mention of `laser_encoders` #266

avidale · 2023-11-16T10:20:30Z

Here I add information about laser_encoders and their installation into the main readme file, to ensure its visibility.

I also add a command for installing laser_encoders locally into the install_external_tools script, to make sure that the old installation process will include the new package.

heffernankevin

LGTM!

README.md

heffernankevin · 2023-11-17T08:42:18Z

README.md

@@ -3,6 +3,7 @@
 LASER is a library to calculate and use multilingual sentence embeddings.

 **NEWS**
+* 2023/11/16 Released [**laser_encoders**](laser_encoders), a pip-installable package supporting LASER-2 and LASER-3 models


[nit] Let's remember to update this date when we release

* feat: converted SPMapply function to use python script * modified laserTokenizer class to have a seperate function for tokenizing a file * modified tokenize_file function * removed instances of Path * created new function for opening files * test for LaserTokenizer.tokenize * tests for normalisation, descape and lower_case * deleted test dir because of relative import error * modified test tokenizer function to use the downloaded model before exiting the context manager * test for tokenize_file * added test for is_printable * test for over_write when equal to True and False * added some type hints for tests * added type hint for log function * added header comment * feat: make LASER pip installable (#239) * feat: make LASER pip installable * Added GitHub Actions workflow for tests and linting * upgraded python version due to node depreciation error * removed updated python version * removed poetry * bug fixes * removed dependencies install * updated pyproject and made lint_and_test to install dev and mono dependencies * removed isort and black * removed mono dependencies * removed version from pyproject * removed duplicate of classifiers * removed description * removed dynamic * added src-layout to discover only laser_encoder * added build backend * updated project name * changed license to BSD * removed src-layout to test * added linting to actions * updated linting to only check the laser_encoders folder * fixed linting issues * fixed black linting issues * added white-space * Refactor embedder (#241) * feat: make LASER pip installable * Added GitHub Actions workflow for tests and linting * upgraded python version due to node depreciation error * removed updated python version * removed poetry * bug fixes * removed dependencies install * updated pyproject and made lint_and_test to install dev and mono dependencies * removed isort and black * removed mono dependencies * removed version from pyproject * removed duplicate of classifiers * removed description * removed dynamic * added src-layout to discover only laser_encoder * added build backend * updated project name * changed license to BSD * removed src-layout to test * added linting to actions * updated linting to only check the laser_encoders folder * fixed linting issues * fixed black linting issues * added white-space * refactored emmbeder to work in the laser tokenizer package * downgraded numpy version to suit the installled python version * added test for sentence encoder * added whitespace to test workflow * restructured test for sentence encoder * restructured test for sentence encoder * fixed black issues * restructured test for sentence encoder * changed python version because of workflow error * updated dependencies requirements version * removed unneccessary print statement * updated python version * restructured test_sentence_encoder * restructured test_sentence encoder * black linting fixes * restructure calling of tempile module * updated workflow to remove pip cache * removed commented code * refactored code and added type hints * fixed black issues * fixed no module found error by adding Laser environment * feat: Add Python function to download LASER models (#244) * feat: make LASER pip installable * Added GitHub Actions workflow for tests and linting * upgraded python version due to node depreciation error * removed updated python version * removed poetry * bug fixes * removed dependencies install * updated pyproject and made lint_and_test to install dev and mono dependencies * removed isort and black * removed mono dependencies * removed version from pyproject * removed duplicate of classifiers * removed description * removed dynamic * added src-layout to discover only laser_encoder * added build backend * updated project name * changed license to BSD * removed src-layout to test * added linting to actions * updated linting to only check the laser_encoders folder * fixed linting issues * fixed black linting issues * added white-space * refactored emmbeder to work in the laser tokenizer package * downgraded numpy version to suit the installled python version * added test for sentence encoder * added whitespace to test workflow * restructured test for sentence encoder * restructured test for sentence encoder * fixed black issues * restructured test for sentence encoder * changed python version because of workflow error * updated dependencies requirements version * removed unneccessary print statement * updated python version * restructured test_sentence_encoder * restructured test_sentence encoder * black linting fixes * restructure calling of tempile module * updated workflow to remove pip cache * removed commented code * refactored code and added type hints * fixed black issues * fixed no module found error by adding Laser environment * feat:created download function for downloading laser models in python * added language list and made some changes to the download models * fixed linting issues * added type hints * fixed linting issues * added progress bar for downloading of models * fixed black issues * updated code to download laser model based on where the language is found * fixed black and linting issues * fixed black issues * fixed bug in sentence encoder * black issues and relative import issues * removed addition of laser path * fixed isort issues * refactored the python entrypoint functions * fixed black issues * updated laguage list with some laser2 and laser3 languages * refactor: added option for laser * added laser2 language list * added laser3 language list * fixed black issues * updated language list * refactoed download function to display total filesize in MB and also made some changes to raise an error when laser is not passed * fixed black issues * refactored download models to move model_dir to the class * fixed black issues * refactored laser tokenizer test to use the laser downloader class methods * documentation for the laser_encoder * added tokenizer part * added some docs for tokenize file and download models * updated readme to include supported flore200 langs * corrected readme path and license * added requirements for laser_encoder * added __main__.py file for running download command easily * black and isort fixes, updated docs to effect changes due to creation of __main__.py file * added contributors section * Revert "added requirements for laser_encoder" This reverts commit 431780e. reverting back * reverting creation of main.py * fixed isort and black issues * removed irrelevant comment * moved pyproject to laser direcory and adjust contributors name * workflow issues due to removal of pyproject * pointed workflow to laser_encoders dir * fixed EOF error * fixed EOF error * debuging * debuging * debuging * debuging * debuging * debuging * debuging * debuging * debuging * debuging * debuging * debuging * bug fixes and new implementation of convert_tokens_to_id function * bug fix * bug fix * bug fix * bug fix * bug fix * bug fix * bug fix * bug fix * bug fix * reverting back because of workflow error * reverting back because of workflow error * some extra adjustment * changed ibo to igbo * updated doc to effect the ibo to igbo change * refactore: modified the sentence encoder to tokenize a text before encodingit * debugging failed test * added a call method to seperately handle the tokenization before encodding * added value error for when there is no spm_model * documentation for the new __call__ method for tokenization with encoder * docs: Update docs to include reference to laserembeddings (#254) * Handle Interrupted Model Weight Downloads (#253) * fix: Fix interrupted downloads issue * style: Format code using black * Update download method to use tempfile * style: Remove unnecessary space * Fix OSError by using shutil.move for cross-filesystem moves Using os.rename caused an OSError when trying to move files across different filesystems (e.g., from /tmp to another directory). By using shutil.move, we gracefully handle such situations, ensuring files are moved correctly regardless of the source and destination filesystems. * Refactor `initialize_encoder` to `LaserEncoderPipeline` (#256) * Remove 'tokenize' argument from initialize_encoder function * Add LaserEncoderPipeline for streamlined tokenization and encoding * docs: Update README to show use of LaserEncoderPipeline * style: Reformat code using black * refactor: move encoder and tokenizer initialization into repective files * style: run black * test: Add test for LaserEncoderPipeline * test to validate languages * test to validate languages * Delete flores directory * Update validate_models.py * Update validate_models.py * Update validate_models.py * Update validate_models.py * Update .gitignore * added pytest to validate_models.py * Update validate_models.py * Update validate_models.py * Update validate_models.py using mock downloader * Update validate_models.py * Update validate_models.py * Update validate_models.py * Update validate_models.py * Extend Tokenizer to Support Single Strings and Lists of Strings (#258) * Handle case for both str and list in tokenizer * test: Add test for tokenizer call method * Rename 'sentences' argument to 'text_or_batch' for clarity * Handle string input in call method * Update validate_models.py * Update download_models.py according to 1. * Update download_models.py * Update download_models.py * Update download_models.py * Enhance LaserTokenizer with Perl Parity, Optional Punctuation Normalization, and Embedding Normalization (#262) * Introduce pearl compability flag * Add argument `normalize_punct` to `LaserTokenizer` * Add normalize_embeddings option to encode_sentences * Update README on normalize_embeddings option * style: Run black and isort * test: Add tests for normalize_embeddings flag in sentence encoder * style: Run black * Update validate_models.py * Update models.py * Update laser_tokenizer.py * Update download_models.py * Update validate_models.py * Update validate_models.py * Added slow and fast tests to validate_models.py * Update validate_models.py * Update validate_models.py * Create test_validate_models.py * Rename test_validate_models.py to test_models_initialization.py * Update test_models_initialization.py * Update test_models_initialization.py * Update download_models.py * Update test_models_initialization.py * Update test_models_initialization.py * Update download_models.py * Update validate_models.py * Update validate_models.py * Update validate_models.py * Update validate_models.py * Update validate_models.py * Update validate_models.py * Update validate_models.py * Update validate_models.py * Update README.md * Update README.md * Decrease versions of numpy and torch required by laser-encoders (#264) * Update requirements to follow fairseq * Update README * Update dependencies in toml file * Remove requirements.txt * Update laser_encoders README * resolve parity with MOSES-4.0 release * update test * Update the main README file with a mention of `laser_encoders` (#266) * update the main readme file * wording changes * update the example in the readme * fix readme text * Update language_list.py (#269) * Update language_list.py * Update language_list.py * Update language_list.py * Updated laser encoder pipeline * Update models.py * Update models.py * Added warning for using laser2 with a language * add tests to test_laser_tokenizer.py * Update test_laser_tokenizer.py * Update models.py * Update test_laser_tokenizer.py * Update test_laser_tokenizer.py * Update language_list.py * Update language_list.py * Update language_list.py --------- Co-authored-by: CaptainVee <[email protected]> Co-authored-by: Victor Joseph <[email protected]> Co-authored-by: Kevin Heffernan <[email protected]> Co-authored-by: Okewunmi Paul <[email protected]> Co-authored-by: NIXBLACK11 <[email protected]> Co-authored-by: Siddharth Singh Rana <[email protected]> Co-authored-by: Kevin Heffernan <[email protected]>

…bookresearch#249) * feat: converted SPMapply function to use python script * modified laserTokenizer class to have a seperate function for tokenizing a file * modified tokenize_file function * removed instances of Path * created new function for opening files * test for LaserTokenizer.tokenize * tests for normalisation, descape and lower_case * deleted test dir because of relative import error * modified test tokenizer function to use the downloaded model before exiting the context manager * test for tokenize_file * added test for is_printable * test for over_write when equal to True and False * added some type hints for tests * added type hint for log function * added header comment * feat: make LASER pip installable (facebookresearch#239) * feat: make LASER pip installable * Added GitHub Actions workflow for tests and linting * upgraded python version due to node depreciation error * removed updated python version * removed poetry * bug fixes * removed dependencies install * updated pyproject and made lint_and_test to install dev and mono dependencies * removed isort and black * removed mono dependencies * removed version from pyproject * removed duplicate of classifiers * removed description * removed dynamic * added src-layout to discover only laser_encoder * added build backend * updated project name * changed license to BSD * removed src-layout to test * added linting to actions * updated linting to only check the laser_encoders folder * fixed linting issues * fixed black linting issues * added white-space * Refactor embedder (facebookresearch#241) * feat: make LASER pip installable * Added GitHub Actions workflow for tests and linting * upgraded python version due to node depreciation error * removed updated python version * removed poetry * bug fixes * removed dependencies install * updated pyproject and made lint_and_test to install dev and mono dependencies * removed isort and black * removed mono dependencies * removed version from pyproject * removed duplicate of classifiers * removed description * removed dynamic * added src-layout to discover only laser_encoder * added build backend * updated project name * changed license to BSD * removed src-layout to test * added linting to actions * updated linting to only check the laser_encoders folder * fixed linting issues * fixed black linting issues * added white-space * refactored emmbeder to work in the laser tokenizer package * downgraded numpy version to suit the installled python version * added test for sentence encoder * added whitespace to test workflow * restructured test for sentence encoder * restructured test for sentence encoder * fixed black issues * restructured test for sentence encoder * changed python version because of workflow error * updated dependencies requirements version * removed unneccessary print statement * updated python version * restructured test_sentence_encoder * restructured test_sentence encoder * black linting fixes * restructure calling of tempile module * updated workflow to remove pip cache * removed commented code * refactored code and added type hints * fixed black issues * fixed no module found error by adding Laser environment * feat: Add Python function to download LASER models (facebookresearch#244) * feat: make LASER pip installable * Added GitHub Actions workflow for tests and linting * upgraded python version due to node depreciation error * removed updated python version * removed poetry * bug fixes * removed dependencies install * updated pyproject and made lint_and_test to install dev and mono dependencies * removed isort and black * removed mono dependencies * removed version from pyproject * removed duplicate of classifiers * removed description * removed dynamic * added src-layout to discover only laser_encoder * added build backend * updated project name * changed license to BSD * removed src-layout to test * added linting to actions * updated linting to only check the laser_encoders folder * fixed linting issues * fixed black linting issues * added white-space * refactored emmbeder to work in the laser tokenizer package * downgraded numpy version to suit the installled python version * added test for sentence encoder * added whitespace to test workflow * restructured test for sentence encoder * restructured test for sentence encoder * fixed black issues * restructured test for sentence encoder * changed python version because of workflow error * updated dependencies requirements version * removed unneccessary print statement * updated python version * restructured test_sentence_encoder * restructured test_sentence encoder * black linting fixes * restructure calling of tempile module * updated workflow to remove pip cache * removed commented code * refactored code and added type hints * fixed black issues * fixed no module found error by adding Laser environment * feat:created download function for downloading laser models in python * added language list and made some changes to the download models * fixed linting issues * added type hints * fixed linting issues * added progress bar for downloading of models * fixed black issues * updated code to download laser model based on where the language is found * fixed black and linting issues * fixed black issues * fixed bug in sentence encoder * black issues and relative import issues * removed addition of laser path * fixed isort issues * refactored the python entrypoint functions * fixed black issues * updated laguage list with some laser2 and laser3 languages * refactor: added option for laser * added laser2 language list * added laser3 language list * fixed black issues * updated language list * refactoed download function to display total filesize in MB and also made some changes to raise an error when laser is not passed * fixed black issues * refactored download models to move model_dir to the class * fixed black issues * refactored laser tokenizer test to use the laser downloader class methods * documentation for the laser_encoder * added tokenizer part * added some docs for tokenize file and download models * updated readme to include supported flore200 langs * corrected readme path and license * added requirements for laser_encoder * added __main__.py file for running download command easily * black and isort fixes, updated docs to effect changes due to creation of __main__.py file * added contributors section * Revert "added requirements for laser_encoder" This reverts commit 431780e. reverting back * reverting creation of main.py * fixed isort and black issues * removed irrelevant comment * moved pyproject to laser direcory and adjust contributors name * workflow issues due to removal of pyproject * pointed workflow to laser_encoders dir * fixed EOF error * fixed EOF error * debuging * debuging * debuging * debuging * debuging * debuging * debuging * debuging * debuging * debuging * debuging * debuging * bug fixes and new implementation of convert_tokens_to_id function * bug fix * bug fix * bug fix * bug fix * bug fix * bug fix * bug fix * bug fix * bug fix * reverting back because of workflow error * reverting back because of workflow error * some extra adjustment * changed ibo to igbo * updated doc to effect the ibo to igbo change * refactore: modified the sentence encoder to tokenize a text before encodingit * debugging failed test * added a call method to seperately handle the tokenization before encodding * added value error for when there is no spm_model * documentation for the new __call__ method for tokenization with encoder * docs: Update docs to include reference to laserembeddings (facebookresearch#254) * Handle Interrupted Model Weight Downloads (facebookresearch#253) * fix: Fix interrupted downloads issue * style: Format code using black * Update download method to use tempfile * style: Remove unnecessary space * Fix OSError by using shutil.move for cross-filesystem moves Using os.rename caused an OSError when trying to move files across different filesystems (e.g., from /tmp to another directory). By using shutil.move, we gracefully handle such situations, ensuring files are moved correctly regardless of the source and destination filesystems. * Refactor `initialize_encoder` to `LaserEncoderPipeline` (facebookresearch#256) * Remove 'tokenize' argument from initialize_encoder function * Add LaserEncoderPipeline for streamlined tokenization and encoding * docs: Update README to show use of LaserEncoderPipeline * style: Reformat code using black * refactor: move encoder and tokenizer initialization into repective files * style: run black * test: Add test for LaserEncoderPipeline * test to validate languages * test to validate languages * Delete flores directory * Update validate_models.py * Update validate_models.py * Update validate_models.py * Update validate_models.py * Update .gitignore * added pytest to validate_models.py * Update validate_models.py * Update validate_models.py * Update validate_models.py using mock downloader * Update validate_models.py * Update validate_models.py * Update validate_models.py * Update validate_models.py * Extend Tokenizer to Support Single Strings and Lists of Strings (facebookresearch#258) * Handle case for both str and list in tokenizer * test: Add test for tokenizer call method * Rename 'sentences' argument to 'text_or_batch' for clarity * Handle string input in call method * Update validate_models.py * Update download_models.py according to 1. * Update download_models.py * Update download_models.py * Update download_models.py * Enhance LaserTokenizer with Perl Parity, Optional Punctuation Normalization, and Embedding Normalization (facebookresearch#262) * Introduce pearl compability flag * Add argument `normalize_punct` to `LaserTokenizer` * Add normalize_embeddings option to encode_sentences * Update README on normalize_embeddings option * style: Run black and isort * test: Add tests for normalize_embeddings flag in sentence encoder * style: Run black * Update validate_models.py * Update models.py * Update laser_tokenizer.py * Update download_models.py * Update validate_models.py * Update validate_models.py * Added slow and fast tests to validate_models.py * Update validate_models.py * Update validate_models.py * Create test_validate_models.py * Rename test_validate_models.py to test_models_initialization.py * Update test_models_initialization.py * Update test_models_initialization.py * Update download_models.py * Update test_models_initialization.py * Update test_models_initialization.py * Update download_models.py * Update validate_models.py * Update validate_models.py * Update validate_models.py * Update validate_models.py * Update validate_models.py * Update validate_models.py * Update validate_models.py * Update validate_models.py * Update README.md * Update README.md * Decrease versions of numpy and torch required by laser-encoders (facebookresearch#264) * Update requirements to follow fairseq * Update README * Update dependencies in toml file * Remove requirements.txt * Update laser_encoders README * resolve parity with MOSES-4.0 release * update test * Update the main README file with a mention of `laser_encoders` (facebookresearch#266) * update the main readme file * wording changes * update the example in the readme * fix readme text * Update language_list.py (facebookresearch#269) * Update language_list.py * Update language_list.py * Update language_list.py * Updated laser encoder pipeline * Update models.py * Update models.py * Added warning for using laser2 with a language * add tests to test_laser_tokenizer.py * Update test_laser_tokenizer.py * Update models.py * Update test_laser_tokenizer.py * Update test_laser_tokenizer.py * Update language_list.py * Update language_list.py * Update language_list.py --------- Co-authored-by: CaptainVee <[email protected]> Co-authored-by: Victor Joseph <[email protected]> Co-authored-by: Kevin Heffernan <[email protected]> Co-authored-by: Okewunmi Paul <[email protected]> Co-authored-by: NIXBLACK11 <[email protected]> Co-authored-by: Siddharth Singh Rana <[email protected]> Co-authored-by: Kevin Heffernan <[email protected]>

update the main readme file

b943033

avidale requested a review from heffernankevin November 16, 2023 10:20

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Nov 16, 2023

avidale added 2 commits November 16, 2023 02:22

wording changes

1f5b2e5

update the example in the readme

93bbbad

heffernankevin approved these changes Nov 17, 2023

View reviewed changes

fix readme text

3f270aa

avidale merged commit 90db293 into MLH-dev Nov 17, 2023
3 checks passed

avidale deleted the update-laser-dependencies branch November 17, 2023 13:30

avidale restored the update-laser-dependencies branch November 17, 2023 13:31

avidale deleted the update-laser-dependencies branch November 17, 2023 13:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update the main README file with a mention of `laser_encoders` #266

Update the main README file with a mention of `laser_encoders` #266

avidale commented Nov 16, 2023 •

edited

Loading

heffernankevin left a comment

heffernankevin Nov 17, 2023

Update the main README file with a mention of laser_encoders #266

Update the main README file with a mention of laser_encoders #266

Conversation

avidale commented Nov 16, 2023 • edited Loading

heffernankevin left a comment

Choose a reason for hiding this comment

heffernankevin Nov 17, 2023

Choose a reason for hiding this comment

Update the main README file with a mention of `laser_encoders` #266

Update the main README file with a mention of `laser_encoders` #266

avidale commented Nov 16, 2023 •

edited

Loading