-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding the python bitinfo to numcodecs #257
Comments
Sounds great to me, I let Hauke comment though. |
Hi @thodson-usgs, |
Exactly right, of course. I'll proceed with Along that tack, a possible outcome might be to implement such a function in filters = [BitRound(xbitinfo.helper_function)] |
zarr-developers/numcodecs#503 (comment) I'm by no means the expert, but as I look through the respective code bases, a reimplementation might be the simplest option, and could refer users to |
...true, this might reduce direct traffic to ...or |
@martindurant, I'm bringing you into this discussion with the The two options I'd like both camps to mull over are:
|
Rather than including this in Here's my draft proposal: (ah, still a couple bugs here, but you'll get the gist) |
Okay works now. Going for multithread. |
Zarr is commonly used with dask, so it should be enough to have your algorithms not hold the GIL |
Thanks @thodson-usgs for all your work! If I understand it correctly you copy/reimplement part of the xbitinfo package to add to a numcodec algorithm. Can't we find a different solution, where e.g. the xbitinfo package provides an entry point to numcodecs? This will require xbitinfo to get installed to provide the codec but as mentioned above this should only be necessary for users who write data and not for those that read it and we make it easier for new features to be implemented across the board. |
I sense that could get complicated, but I defer to the respective maintainers for guidance. Currently, import xarray as xr
ds = xr.tutorial.open_dataset("air_temperature")
from numcodecs import Blosc, BitInfo
compressor = Blosc(cname="zstd", clevel=3)
filters = [BitInfo(info_level=0.99)]
encoding = {"air": {"compressor": compressor, "filters": filters}}
ds.to_zarr('xbit.zarr', mode="w", encoding=encoding) |
@observingClouds, def _cdf_from_info_per_bit(info_per_bit):
"""Convert info_per_bit to cumulative distribution function"""
# I suspect something is wrong with tol
#tol = info_per_bit[-4:].max() * 1.5
#info_per_bit[info_per_bit < tol] = 0
cdf = info_per_bit.cumsum()
return cdf / cdf[-1] I think the first objective is to write a good codec. Once that's done, we can assess how and to what extent this gets rolled back into |
Maybe |
That looks indeed promising. Thanks for the pointer! I actually have played with an external BitRounding codec before it got implemented directly in numcodecs. This should be relatively straight forward to add to xbitinfo. |
I take that back,
Please let me know if I've misjudged the necessity of a |
I've been looking into modifying the numcodecs
bitround
codec to accept a user defined function to determine the number of bits to round. namelyxbitinfo.bitinformation
. This would streamline the process of chunk-wise bitrounding to something likewhere
custom_function
would essentially wrapget_bitinformation
andget_keepbits
. All in all, that may not offer much over your current chunk-wise approach, except enabling us to usebitinformation
inpangeo-forge
compression and rechunking pipelines.Alternatively, we could add some stripped down
bitinfo
implementation to numcodec and avoid the need forcustom_function()
. I'd be happy to help with that, but I don't want to advance that without permission from thexbitinfo
team. Both projects have MIT license.The text was updated successfully, but these errors were encountered: