-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster Ragged Reading of Markers #169
Conversation
@ericpre Right now I just have the chunks span the entire ragged dataset. I'm not sure that this is the best way to do things but you can always set the chunks using a lazy dataset if you really want to set the chunks. Because there isn't a good way to automate the setting of the chunks I think that this is a good solution. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #169 +/- ##
==========================================
- Coverage 85.59% 85.59% -0.01%
==========================================
Files 76 76
Lines 10148 10154 +6
Branches 2216 2217 +1
==========================================
+ Hits 8686 8691 +5
Misses 944 944
- Partials 518 519 +1
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. When I tried, I used len
instead of np.prod
and it seems that it was the reason why it was still slow!
I am not sure about how useful the benchmark file would be, because we may well forget about it if there isn't something that run regularly and check how long it takes.
Maybe a test with a upper limit on the execution time would be more useful?
62e5e5e
to
d7857fd
Compare
@ericpre I'll just remove the Jupyter notebook for now. I think the information is in this PR and 168 to recreate the tests so that should be good. Maybe something to come back to in time. I think that some sort of benchmarking for the different file loaders would be helpful as it would help to identify which ones are faster/slower and help to identify if certain file readers could be made faster. |
Description of the change
This fixes (some) of the bugs brought up in #168 #164
Progress of the PR
upcoming_changes
folder (seeupcoming_changes/README.rst
),docs/readthedocs.org:rosettasciio
build of this PR (link in github checks)This should be much faster than previous. (For large arrays this makes ragged arrays actually usable) I wanted to start adding some benchmarks but I'm not sure the best way to do that? Jupyter notebooks don't seem like the best way to do that, maybe something similar to examples?
Minimal example of the bug fix or the new feature