Updated pargen.utils.load_data to use LH5Iterator and field_mask to be more memory efficient #589

iguinn · 2024-08-25T17:05:01Z

load_data was using too much memory (at least in P08 which is a larger dataset), and crashing my attempts to process it on NERSC. Switch it to use the fieldmask when reading from the input files and the LH5Iterator to limit the number of entries in memory at once. Also improved commenting/docstring.

Note this should not be merged until this is also merged: legend-exp/legend-pydataobj#100

…e more memory efficient

codecov · 2024-08-25T17:11:28Z

Codecov Report

Attention: Patch coverage is 0% with 29 lines in your changes missing coverage. Please review.

Project coverage is 48.91%. Comparing base (981877e) to head (ee73d2a).

Files	Patch %	Lines
src/pygama/pargen/utils.py	0.00%	29 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #589      +/-   ##
==========================================
+ Coverage   48.80%   48.91%   +0.11%     
==========================================
  Files          59       59              
  Lines        7846     7821      -25     
==========================================
- Hits         3829     3826       -3     
+ Misses       4017     3995      -22

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

gipert · 2024-10-21T17:43:42Z

Is this tested @ggmarshall? @iguinn can you bump the pydataobj version in pyproject.toml, if this is not backward compatible?

ggmarshall · 2024-10-21T18:59:20Z

Not on my end but I can have a look later this week

Updated pargen.utils.load_data to use LH5Iterator and field_mask to b…

023addd

…e more memory efficient

iguinn and others added 3 commits August 25, 2024 14:07

Whoops pushed the wrong version of this file

c28dfed

Bug fixes

087b028

style: pre-commit fixes

ee73d2a

Change default parallel to false

fa52dec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated pargen.utils.load_data to use LH5Iterator and field_mask to be more memory efficient #589

Updated pargen.utils.load_data to use LH5Iterator and field_mask to be more memory efficient #589

iguinn commented Aug 25, 2024

codecov bot commented Aug 25, 2024 •

edited

Loading

gipert commented Oct 21, 2024

ggmarshall commented Oct 21, 2024

Updated pargen.utils.load_data to use LH5Iterator and field_mask to be more memory efficient #589

Are you sure you want to change the base?

Updated pargen.utils.load_data to use LH5Iterator and field_mask to be more memory efficient #589

Conversation

iguinn commented Aug 25, 2024

codecov bot commented Aug 25, 2024 • edited Loading

Codecov Report

gipert commented Oct 21, 2024

ggmarshall commented Oct 21, 2024

codecov bot commented Aug 25, 2024 •

edited

Loading