-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix streams #350
Fix streams #350
Conversation
e0eee39
to
a1dfe8a
Compare
a1dfe8a
to
28aa022
Compare
Coverage Report
Files without new missing coverage
264 files skipped due to complete coverage. Coverage success: total of 97.99% is above 97.98% 🎉 |
28aa022
to
aef9fbf
Compare
aef9fbf
to
8e4b91d
Compare
Quality Gate passedIssues Measures |
Description
Added
edsnlp.data.read_parquet
now accept awork_unit="fragment"
option to split tasks between workers by parquet fragment instead of row. When this is enabled, workers do not read every fragment while skipping 1 in n rows, but read all rows of 1/n fragments, which should be faster.Fixed
random.RandomState()
) when shuffling in data readers : this is important forChecklist