new zarr release / numcodecs #868

sofroniewn · 2020-01-11T03:49:40Z

🐛 Bug

Looks like zarr's new release requires numcodecs > 0.6.4, but we pinned to exclude it see discussion #666. I think we need to resolve this ASAP and then make the 0.2.10 release (which also includes the #866 bug fix). Thoughts @tlambert03 @jni? Has the 0.6.4 numcodecs install problem been resolved? You can see our failing tests in #867.

tlambert03 · 2020-01-11T03:53:34Z

that's funny... I was literally about to hit "submit" on a similar issue.
here is the commit where they bump the dependency, and they do it " to use ensure_text text handling functionality from it". I don't know zarr deeply enough to know how frequently that is likely to happen...

numcodecs 0.6.4 is still not working, (this actually was just checked in the dependabot fork today 😊)... so not sure what to do about that

tlambert03 · 2020-01-11T03:56:12Z

actually... looks like it was just the windows builds that broke with numcodecs... so maybe something has changed in the meantime?

sofroniewn · 2020-01-11T03:57:32Z

ok - also if I remember zarr-developers/numcodecs#210 correctly the issue is just with pip install on our CI, so people in the wild should still be installing ok. Maybe then this can wait till tomorrow

jni · 2020-01-11T04:11:28Z

pip install works fine on my machine. imho the fix is to stop depending on zarr. We can use dask.array.from_zarr instead of zarr.open, and it should work whenever the file has standard compression, if I understand correctly.

jni · 2020-01-11T04:20:47Z

Ah. I just saw why we started using zarr directly: we can tell whether there is a dataset instead of an array. Guh. =)

sofroniewn · 2020-01-11T06:35:06Z

hmm - relevant part of the codebase is here

napari/napari/utils/io.py

Line 118 in e2b90e9

if isinstance(zr, zarr.core.Array):

I agree that we should figure out a way to drop our dependency on both zarr and numcodecs, and just leverage dask, possibly with try catches to do everything, or say that additional functionality belongs in a plugin. We can still have zarr as one of our test dependencies (if that works with our CI) or just use dask functions for all our tests too. I'll think more about this tomorrow too

sofroniewn · 2020-01-11T20:40:16Z

so i'm still really not sure how to deal with this one. On the one hand I like the idea of not explicitly depending on zarr and numcodecs and just leveraging fileIO options here in dask, but then there is no good way to deal with the pyramid case where we want a list of dask arrays out. We could try raising an issue on dask to see if they'd ever be interested in adding the functionality we have here to their from_zarr method, but it might be strange for them to return a list of dask arrays, or a dask array where each subarray along the first axis doesn't have the same shape.

I will also note that when I do pyramid loading from my laptop it worked fine, but when I did it from s3 it couldn't find the components properly. I need to look into that more, but it makes me think that we might not have a perfect solution even now.

tlambert03 · 2020-01-11T22:48:44Z

just played with it a little bit, and it looks like the main thing that fails when using dask.from_zarr on a zarr group is the fact that the .zarray file doesn't exist. Here's a function that works on both zarr single arrays and groups of arrays, without importing zarr:

def read_zarr_dataset(filename):
    if os.path.exists(os.path.join(filename, '.zarray')):
        # load zarr array
        image = da.from_zarr(filename)
        shape = image.shape
    elif os.path.exists(os.path.join(filename, '.zgroup')):
        # else load zarr all arrays inside file, useful for pyramid data
        image = []
        for subd in os.listdir(filename):
            full_subd = os.path.join(filename, subd)
            if os.path.exists(os.path.join(full_subd, '.zarray')):
                image.append(da.from_zarr(full_subd))
        shape = image[0].shape
    else:
        raise ValueError("Not a zarr dataset or group")
    return image, shape

(improvements that could be made to that function to provide dask with the proper component string)... and I think folder structure assumption is probably pretty safe since it's in the Zarr spec
thoughts?

tlambert03 · 2020-01-12T00:08:23Z

one last thought and then I have to sign off for the night...
regarding our CI strategy, I think we can still pip install zarr on CI, (making sure that windows has the required build tools) so we can at least test zarr even if it's not a strict dependency.

sofroniewn · 2020-01-12T00:09:34Z

This looks great! I will play with now and try and break, but I think it is the way to go

sofroniewn · 2020-01-12T00:44:12Z

This is really great - I don't think we need to use the dask component syntax. I had to add a sorted call to the os.listdir to make everything work there, but otherwise its great. I have the code ready to go in a PR that I can make right now if you're out for a bit, but feel free to then take over if you needed.

I'll add here that this solution won't now work for things like s3 links, but that didn't really work properly before either, so not sure we're in a worse place, and we can make a separate push around that remote stuff later

sofroniewn added the bug Something isn't working as expected label Jan 11, 2020

sofroniewn added this to the 0.2.0 milestone Jan 11, 2020

sofroniewn modified the milestones: 0.2, 0.2.10 Jan 11, 2020

tlambert03 mentioned this issue Jan 12, 2020

fix aspect ratio press #871

Merged

1 task

sofroniewn mentioned this issue Jan 12, 2020

Drop zarr and numcodecs dependency #873

Merged

1 task

jni closed this as completed in #873 Jan 12, 2020

tlambert03 mentioned this issue Jan 22, 2020

Bump tests to python 3.8, general tests fix #878

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new zarr release / numcodecs #868

new zarr release / numcodecs #868

sofroniewn commented Jan 11, 2020

tlambert03 commented Jan 11, 2020

tlambert03 commented Jan 11, 2020

sofroniewn commented Jan 11, 2020

jni commented Jan 11, 2020

jni commented Jan 11, 2020

sofroniewn commented Jan 11, 2020 •

edited

Loading

sofroniewn commented Jan 11, 2020

tlambert03 commented Jan 11, 2020 •

edited

Loading

tlambert03 commented Jan 12, 2020

sofroniewn commented Jan 12, 2020

sofroniewn commented Jan 12, 2020

new zarr release / numcodecs #868

new zarr release / numcodecs #868

Comments

sofroniewn commented Jan 11, 2020

🐛 Bug

tlambert03 commented Jan 11, 2020

tlambert03 commented Jan 11, 2020

sofroniewn commented Jan 11, 2020

jni commented Jan 11, 2020

jni commented Jan 11, 2020

sofroniewn commented Jan 11, 2020 • edited Loading

sofroniewn commented Jan 11, 2020

tlambert03 commented Jan 11, 2020 • edited Loading

tlambert03 commented Jan 12, 2020

sofroniewn commented Jan 12, 2020

sofroniewn commented Jan 12, 2020

sofroniewn commented Jan 11, 2020 •

edited

Loading

tlambert03 commented Jan 11, 2020 •

edited

Loading