Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLH fellowship contribution: adding the laser_encoders module #249

Merged
merged 134 commits into from
Nov 21, 2023
Merged
Changes from 1 commit
Commits
Show all changes
134 commits
Select commit Hold shift + click to select a range
407a1d0
feat: converted SPMapply function to use python script
CaptainVee Jul 5, 2023
d68131d
Merge branch 'facebookresearch:main' into tokenize
CaptainVee Jul 6, 2023
521ac85
modified laserTokenizer class to have a seperate function for tokeniz…
CaptainVee Jul 9, 2023
23528fb
Merge branch 'facebookresearch:main' into tokenize
CaptainVee Jul 9, 2023
a168556
modified tokenize_file function
CaptainVee Jul 12, 2023
b730274
removed instances of Path
CaptainVee Jul 12, 2023
03b521b
created new function for opening files
CaptainVee Jul 17, 2023
2a9b30f
test for LaserTokenizer.tokenize
CaptainVee Jul 17, 2023
14c4336
tests for normalisation, descape and lower_case
CaptainVee Jul 18, 2023
199671d
deleted test dir because of relative import error
CaptainVee Jul 18, 2023
ab614b8
modified test tokenizer function to use the downloaded model before e…
CaptainVee Jul 18, 2023
fb1d213
test for tokenize_file
CaptainVee Jul 18, 2023
1871cc0
added test for is_printable
CaptainVee Jul 18, 2023
9c5503a
test for over_write when equal to True and False
CaptainVee Jul 19, 2023
29b3f32
added some type hints for tests
CaptainVee Jul 19, 2023
3031db6
added type hint for log function
CaptainVee Jul 19, 2023
296656b
added header comment
CaptainVee Jul 20, 2023
4424b35
Merge pull request #238 from CaptainVee/tokenize
avidale Jul 24, 2023
e45f7b6
feat: make LASER pip installable (#239)
CaptainVee Jul 26, 2023
7bb9822
Refactor embedder (#241)
CaptainVee Aug 2, 2023
fc0fd16
feat: Add Python function to download LASER models (#244)
CaptainVee Aug 18, 2023
f6e557d
documentation for the laser_encoder
CaptainVee Aug 21, 2023
2c90bbc
added tokenizer part
CaptainVee Aug 22, 2023
7059758
added some docs for tokenize file and download models
CaptainVee Aug 22, 2023
4e3c42b
updated readme to include supported flore200 langs
CaptainVee Aug 24, 2023
54a7d92
corrected readme path and license
CaptainVee Aug 24, 2023
431780e
added requirements for laser_encoder
CaptainVee Aug 24, 2023
4234e7b
added __main__.py file for running download command easily
CaptainVee Aug 25, 2023
8e46691
black and isort fixes, updated docs to effect changes due to creation…
CaptainVee Aug 25, 2023
8d5a192
added contributors section
CaptainVee Aug 25, 2023
eb4fdcb
Revert "added requirements for laser_encoder"
CaptainVee Aug 28, 2023
76843f7
reverting creation of main.py
CaptainVee Aug 28, 2023
676f3e1
fixed isort and black issues
CaptainVee Aug 28, 2023
013fcbd
removed irrelevant comment
CaptainVee Aug 28, 2023
83b2e01
moved pyproject to laser direcory and adjust contributors name
CaptainVee Aug 30, 2023
2a073f6
workflow issues due to removal of pyproject
CaptainVee Aug 30, 2023
c30c6aa
pointed workflow to laser_encoders dir
CaptainVee Aug 30, 2023
fdb5ffd
fixed EOF error
CaptainVee Aug 30, 2023
cccb24f
fixed EOF error
CaptainVee Aug 30, 2023
b1d1138
debuging
CaptainVee Aug 30, 2023
8276b5b
debuging
CaptainVee Aug 30, 2023
ba2e8c6
debuging
CaptainVee Aug 30, 2023
976cbed
debuging
CaptainVee Aug 30, 2023
8e3e19b
debuging
CaptainVee Aug 30, 2023
af8d095
debuging
CaptainVee Aug 30, 2023
726fb28
debuging
CaptainVee Aug 30, 2023
d953140
debuging
CaptainVee Aug 30, 2023
f253487
debuging
CaptainVee Aug 30, 2023
793756e
debuging
CaptainVee Aug 30, 2023
ee6def4
debuging
CaptainVee Aug 30, 2023
2f73b9e
debuging
CaptainVee Aug 30, 2023
bb768e6
bug fixes and new implementation of convert_tokens_to_id function
CaptainVee Sep 5, 2023
b79b15b
bug fix
CaptainVee Sep 5, 2023
6684564
bug fix
CaptainVee Sep 5, 2023
6966a5e
bug fix
CaptainVee Sep 5, 2023
d9b8882
bug fix
CaptainVee Sep 5, 2023
7d68522
bug fix
CaptainVee Sep 5, 2023
24cd881
bug fix
CaptainVee Sep 5, 2023
d5a4829
bug fix
CaptainVee Sep 5, 2023
acbbc36
bug fix
CaptainVee Sep 5, 2023
c4129dc
bug fix
CaptainVee Sep 5, 2023
c889d82
reverting back because of workflow error
CaptainVee Sep 5, 2023
5a1c476
reverting back because of workflow error
CaptainVee Sep 5, 2023
5d649c5
some extra adjustment
CaptainVee Sep 5, 2023
c69e749
changed ibo to igbo
CaptainVee Sep 6, 2023
b97fd24
updated doc to effect the ibo to igbo change
CaptainVee Sep 6, 2023
94bc7aa
Merge pull request #246 from CaptainVee/documentation
heffernankevin Sep 6, 2023
d8e6983
refactore: modified the sentence encoder to tokenize a text before en…
CaptainVee Sep 8, 2023
af224c6
debugging failed test
CaptainVee Sep 8, 2023
2ac3362
added a call method to seperately handle the tokenization before enco…
CaptainVee Sep 18, 2023
c2f66cd
added value error for when there is no spm_model
CaptainVee Sep 21, 2023
0858676
documentation for the new __call__ method for tokenization with encoder
CaptainVee Sep 21, 2023
51b4293
Merge pull request #248 from CaptainVee/refactor-sentence-encoder
heffernankevin Sep 22, 2023
0976ee8
docs: Update docs to include reference to laserembeddings (#254)
Paulooh007 Oct 11, 2023
e3257c1
Handle Interrupted Model Weight Downloads (#253)
Paulooh007 Oct 13, 2023
e6f4805
Refactor `initialize_encoder` to `LaserEncoderPipeline` (#256)
Paulooh007 Oct 31, 2023
8fc4b9a
test to validate languages
NIXBLACK11 Oct 31, 2023
9a3228b
test to validate languages
NIXBLACK11 Oct 31, 2023
ad9a588
Delete flores directory
NIXBLACK11 Oct 31, 2023
7f32d7a
Update validate_models.py
NIXBLACK11 Oct 31, 2023
ff3254b
Update validate_models.py
NIXBLACK11 Oct 31, 2023
cb2d91a
Update validate_models.py
NIXBLACK11 Oct 31, 2023
f4e84d2
Update validate_models.py
NIXBLACK11 Oct 31, 2023
109eac2
Update .gitignore
NIXBLACK11 Oct 31, 2023
2236fe0
added pytest to validate_models.py
NIXBLACK11 Nov 1, 2023
472657b
Update validate_models.py
NIXBLACK11 Nov 1, 2023
c744030
Update validate_models.py
NIXBLACK11 Nov 1, 2023
c71aec7
Update validate_models.py using mock downloader
NIXBLACK11 Nov 4, 2023
c816d79
Update validate_models.py
NIXBLACK11 Nov 6, 2023
31aa252
Update validate_models.py
NIXBLACK11 Nov 6, 2023
c34279d
Update validate_models.py
NIXBLACK11 Nov 6, 2023
8b25a3d
Update validate_models.py
NIXBLACK11 Nov 6, 2023
c5b6f60
Extend Tokenizer to Support Single Strings and Lists of Strings (#258)
Paulooh007 Nov 7, 2023
302d068
Update validate_models.py
NIXBLACK11 Nov 7, 2023
73f873f
Update download_models.py according to 1.
NIXBLACK11 Nov 7, 2023
5e04a2a
Update download_models.py
NIXBLACK11 Nov 7, 2023
e3552a7
Update download_models.py
NIXBLACK11 Nov 7, 2023
1d74246
Update download_models.py
NIXBLACK11 Nov 7, 2023
3c5f5ed
Enhance LaserTokenizer with Perl Parity, Optional Punctuation Normali…
Paulooh007 Nov 8, 2023
1bddd81
Update validate_models.py
NIXBLACK11 Nov 8, 2023
e4f3fd0
Update models.py
NIXBLACK11 Nov 8, 2023
03284a2
Update laser_tokenizer.py
NIXBLACK11 Nov 8, 2023
43f4d1a
Update download_models.py
NIXBLACK11 Nov 8, 2023
6ef54c2
Update validate_models.py
NIXBLACK11 Nov 8, 2023
89c9dde
Update validate_models.py
NIXBLACK11 Nov 8, 2023
d883ee0
Added slow and fast tests to validate_models.py
NIXBLACK11 Nov 9, 2023
e1e22a3
Update validate_models.py
NIXBLACK11 Nov 9, 2023
a8f4135
Update validate_models.py
NIXBLACK11 Nov 9, 2023
4cd83e8
Create test_validate_models.py
NIXBLACK11 Nov 9, 2023
e0be04f
Rename test_validate_models.py to test_models_initialization.py
NIXBLACK11 Nov 9, 2023
9ec012f
Update test_models_initialization.py
NIXBLACK11 Nov 9, 2023
fbbc6fc
Update test_models_initialization.py
NIXBLACK11 Nov 9, 2023
99ebbfd
Update download_models.py
NIXBLACK11 Nov 9, 2023
6356c4d
Update test_models_initialization.py
NIXBLACK11 Nov 9, 2023
eac3674
Update test_models_initialization.py
NIXBLACK11 Nov 9, 2023
d3935f9
Update download_models.py
NIXBLACK11 Nov 9, 2023
18c1657
Update validate_models.py
NIXBLACK11 Nov 14, 2023
c26e775
Update validate_models.py
NIXBLACK11 Nov 14, 2023
023eab2
Update validate_models.py
NIXBLACK11 Nov 14, 2023
3944556
Update validate_models.py
NIXBLACK11 Nov 14, 2023
0a4d983
Update validate_models.py
NIXBLACK11 Nov 14, 2023
e5823d6
Update validate_models.py
NIXBLACK11 Nov 14, 2023
92345be
Update validate_models.py
NIXBLACK11 Nov 14, 2023
87a08e9
Update validate_models.py
NIXBLACK11 Nov 14, 2023
b0131d9
Merge pull request #257 from NIXBLACK11/Language_model_validation
heffernankevin Nov 14, 2023
89ec5f3
Update README.md
NIXBLACK11 Nov 15, 2023
30856cc
Update README.md
NIXBLACK11 Nov 15, 2023
6360627
Merge pull request #265 from NIXBLACK11/Laser_readme_update
heffernankevin Nov 15, 2023
cd6118e
Decrease versions of numpy and torch required by laser-encoders (#264)
Paulooh007 Nov 15, 2023
ea7691c
resolve parity with MOSES-4.0 release
Nov 17, 2023
77bf7fb
update test
Nov 17, 2023
90db293
Update the main README file with a mention of `laser_encoders` (#266)
avidale Nov 17, 2023
b4aed58
Merge pull request #268 from facebookresearch/fix-parity
heffernankevin Nov 20, 2023
9cde37a
Update language_list.py (#269)
NIXBLACK11 Nov 21, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
corrected readme path and license
  • Loading branch information
CaptainVee committed Aug 24, 2023
commit 54a7d92902c34d1f1f473e5a15e4dd0102b87f25
6 changes: 3 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ name = "laser_encoders"
version = "0.0.1"
authors = [{name = "Facebook AI Research"}]
description = "LASER Language-Agnostic SEntence Representations is a toolkit to calculate multilingual sentence embeddings and to use them for document classification, bitext filtering and mining"
readme = "README.md"
readme = "laser_encoders/README.md"
requires-python = ">=3.8"

dependencies = [
Expand All @@ -19,7 +19,7 @@ dependencies = [
]

classifiers=[
"License :: BSD License",
"License :: OSI Approved :: BSD License",
"Topic :: Scientific/Engineering",
"Development Status :: 4 - Beta",
]
Expand Down Expand Up @@ -66,4 +66,4 @@ ignore_missing_imports = true
testpaths = ["laser_encoders"]
python_files = [
"test_*.py",
]
]