Merge LH5 files from multiple threads #190

gipert · 2024-12-09T13:28:13Z

Is it possible to handle this in pure C++ in a reasonable way? If not, we should provide some Python remage wrapper that performs the concatenation.

ManuelHu · 2024-12-10T09:08:18Z

there are "hyperslabs" to read and write partial datasets availbale in the C++ API, but their usage is very verbose, if you compare them to equivalent python slices. Also you need to take care about all the low-level things (chunking, ...) yourself.

I would really not like to maintain such code (I found a - certainly much more sophisticated and feature-rich - implementation of dataset merging having >>2k LOC).
That is a much more than just the "name and attribute juggling" that we are currently doing...

Yurivanderburg · 2024-12-12T10:09:04Z

Not sure if this helps, but I found loading the lh5 files as awkward arrays and use ak.concatenate works very nicely. I can share some code if you want.

tdixon97 · 2024-12-12T10:19:43Z

This is exactly the approach we suggest people to use.
However, the combination of the lh5 files directly in C++ / with an extra python script is a bit more tricky since you do not want to just concatenate but also sort by g4_evtid (since the files each contain a random set of g4_evtids). Then this also has to be done in a memory efficiency way (not reading the full data into memory), and it should be fast.

gipert · 2024-12-12T12:09:53Z

We also want to do some merging in the simulation production workflow to avoid cluttering the filesystem with a huge amount of files.

gipert added discussion Further information is requested output Output Schemes labels Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge LH5 files from multiple threads #190

Merge LH5 files from multiple threads #190

gipert commented Dec 9, 2024

ManuelHu commented Dec 10, 2024

Yurivanderburg commented Dec 12, 2024

tdixon97 commented Dec 12, 2024

gipert commented Dec 12, 2024

Merge LH5 files from multiple threads #190

Merge LH5 files from multiple threads #190

Comments

gipert commented Dec 9, 2024

ManuelHu commented Dec 10, 2024

Yurivanderburg commented Dec 12, 2024

tdixon97 commented Dec 12, 2024

gipert commented Dec 12, 2024