Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ragged Array Reading is Slow(er than it should be) #168

Closed
CSSFrancis opened this issue Oct 5, 2023 · 1 comment
Closed

Ragged Array Reading is Slow(er than it should be) #168

CSSFrancis opened this issue Oct 5, 2023 · 1 comment
Labels
type: bug Something isn't working

Comments

@CSSFrancis
Copy link
Member

Describe the bug

There appears to be a bug somewhere in zarr (and I would imagine h5py based on the fact that the loading speed is similar) with regards to loading ragged arrays.

To Reproduce

Steps to reproduce the behavior:

data = np.array([np.random.randint(0, 100, size=np.random.randint(0, 20)).astype(np.float64)
                  for i in range(i)], dtype=object)
s = hs.signals.BaseSignal(data)
s.save("data.zspy", overwrite=True)
%prun hs.load("data.zspy")

Expected behavior

Ideally some chunk would be compressed once and then loaded once on each save/load cycle. What is actually happening is different for .zspy and .hspy.

For .hspy each index in the ragged array is compressed individually and then uncompressed individually. This isn't efficient at all but isn't the worst case scenario.

For .zspy multiple indexes in the ragged array are compressed at one time but it seems like only 1 index is being uncompressed at a time. The result is that as you increase the number of indexes compressed (n) the time to uncompress is multiplied by (n).

Python environement:

  • RosettaSciIO version: 0.0.2dec
  • Python version: 3.9

Additional context

See #164 for more context

@CSSFrancis CSSFrancis added the type: bug Something isn't working label Oct 5, 2023
@ericpre
Copy link
Member

ericpre commented Oct 6, 2023

Done in #169.

@ericpre ericpre closed this as completed Oct 6, 2023
@ericpre ericpre added this to the v0.1.0 initial release milestone Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants