-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support pcodec
v0.3
#639
base: main
Are you sure you want to change the base?
Support pcodec
v0.3
#639
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #639 +/- ##
=======================================
Coverage 99.92% 99.92%
=======================================
Files 62 62
Lines 2721 2750 +29
=======================================
+ Hits 2719 2748 +29
Misses 2 2
|
Maybe we should expand the numcodecs API to support future specs? delta_spec: Literal["auto", "none", "try_consecutive", "try_lookback"] = "auto" # ignored if <v0.3
delta_encoding_order: Optional[int] = None # ignored if >=v0.3 and spec != try_consecutive? Or accept actual |
Could you say more about what specifically you mean here? I don't quite get the context for this comment. |
@mwlon presumably understands the underlying changes much better than I do. But the gist is pcodec now takes a more flexible This broke the numcodecs interface on the latest release, so I'm just trying to figure out the right path to support any future additions without breaking existing numcodecs users. |
Can we just bump pcodec version to >=0.3,<0.4 so we don't need to support both code paths? You're correct: I added a new type of delta encoding, now handled by delta spec. This should be harder to break API wise in the future. |
Yeah that makes a lot more sense 👍 |
A key consideration here, and an important priority for numcodecs, is backwards compatibility. Ideally any data written by Zarr will be readable for a long time into the future. This means that breaking changes in the codec parameters, which would cause decoding of existing data to fail, should be avoided. This could be relaxed if we had some sort of versioning system for codecs. But unfortunately we don't. For any of the proposals above, I would ask these questions:
|
I believe these are purely API changes and feature additions, since the Would be good for @mwlon to confirm though. And I do wonder if the new modes (e.g.
Seems all of the config API has evolved to these spec objects, so maybe now is a good time to make the numcodecs API match fully:
and the user is responsible for passing through an explicit pcodec object if they want custom behavior. |
Yes! Pcodec will always be able to decode older format versions. And I think I agree with @slevang 's last comment: I'm in favor of passing the pcodec specs straight through to keep everything simple. Presumably we can convert non-spec kwargs into these (based on type) for numcodec's API backward compatibility. @rabernat do you agree? |
It's entirely possible Ryan and I are the only ones actually using |
Ah nevermind I think, because numcodecs expects the codec config to be json-serializable which is a requirement of zarr. I guess we'll have to just pile on the kwargs to support additional features. |
Definitely not the case. We have customers with petabytes of Zarr data (now a little bit less! 🙌 ) encoded with pcodec. |
8bfb684
to
b2c40a8
Compare
Ok new approach, keep the string style args but also support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, and I think these configurations will be pretty robust to future changes.
Pretty sure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a bunch for this PR, and sorry for taking so long to review.
This looks great overall. I've left some questions inline, and in addition to those please could you add a changelog that outlines:
- That pcodec 0.3 is now supported
- What this means for forward and backward comaptibility with data compressed using older versions of pcodec and numcodecs
- That the order of arguments in
PCodec.__init__
has changed
Compatibility for the new
delta_spec
argument and modes inpcodec.ChunkConfig
. Maintains backwards compatibility if onlydelta_encoding_order
is passed. Also adds thepaging_spec
arg.Closes #623
TODO: