Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jzarr read/write (close #36) #37

Merged
merged 16 commits into from
Jun 16, 2021
Merged

Conversation

joshmoore
Copy link
Member

The only real issue (repeated from #25) is that I don't see a way to return an array from the read_from_jzarr method which means that non-Python implementations may need to handle the verification code themselves internally.

cc: @constantinpape

String dsname = args[2];
ZarrArray verification = ZarrGroup.open(fpath).openArray(dsname);
int[] shape = verification.getShape();
if (!Arrays.equals(new int[]{}, shape)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am reading this wrong, but why does it look like this line is comparing a new empty array to shape? I was expecting Array.equals(SHAPE, shape)? (I am not a Java dev)

Am I correct that when called with arguments, this program only checks the shape, but not the actual image contents?

What do you think about writing out a NumPy .npy or .npz file from Java? (A quick search indicates there is at least one library available for this (GitHub). That is what the xtensor-zarr executable does (using an .npz writer that was already available in xtensor-io). The array comparison to the reference can then done in Python. Or maybe it is just easier to do the verification in java itself instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am reading this wrong, but why does it look like this line is comparing a new empty array to shape? I was expecting Array.equals(SHAPE, shape)? (I am not a Java dev)

Argh. No, A) you're completely right. I added this before the troubles with <u1 began in order to force a failure so B) why isn't it forcing a failure?!

What do you think about writing out a NumPy .npy or .npz file from Java?

I'm personally skeptical. I think we're going to want to increase the amount of verification done in each language and it seems like that will hit limits via NumPy. But I assume I'm going to lose this one to you and @constantinpape 🙂

Copy link
Member Author

@joshmoore joshmoore Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see: 1ad9af0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm personally skeptical. I think we're going to want to increase the amount of verification done in each language and it seems like that will hit limits via NumPy. But I assume I'm going to lose this one to you and @constantinpape slightly_smiling_face

I am pretty agnostic on this one. I think it's fine if you validate in java. But we should somehow return the correct error codes to the python test to populate the test summary. (I haven't checked the rest of this PR yet, so maybe you are doing that already ;)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isn't it forcing a failure?!

I opened joshmoore#1 with a simplified way of calling the subprocess that I am sure is actually running it (it takes a few seconds per test case). That one will fail if the exit code is not zero.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also do not have a problem with just doing the validation in Java

use subprocess.check_output to call jzarr test script
test/test_read_all.py Outdated Show resolved Hide resolved
@joshmoore
Copy link
Member Author

Looking at one of these failures (reading js.zr/blosc/lz4/ with Zarr) the first few values differ:
Screen Shot 2021-05-04 at 21 37 17
Screen Shot 2021-05-04 at 21 37 26
Modifying |u1 back to jzarr's <u1 doesn't correct the issue.

cc: @SabineEmbacher

@grlee77
Copy link
Contributor

grlee77 commented May 4, 2021

Looking at one of these failures (reading js.zr/blosc/lz4/ with Zarr) the first few values differ:

Yeah, the values seen there are what would be obtained when trying to convert uint8 values to int8 so that values >=128 overflow and become negative.

For example

import numpy as np
np.asarray([154, 147, 151, 109], dtype=np.int8)

gives

array([-102, -109, -105,  109], dtype=int8)

@SabineEmbacher
Copy link

JZarr is bound to the primitive data types of Java. There is no primitive unsigned byte data type in Java.
Data types which are offered in the JZarr API (see com.bc.zarr.DataType) only serve to be written into the .zarray file so that the reader of the data knows how to interpret it. When the written data is read with Python, the interpretation happens automatically.
If the data is read again with JZarr, the correct interpretation of the raw data is up to the user.

Here is an example of interpreting byte data as unsigned bytes. org.esa.snap.core.datamodel.ProductData.UByte

The same overflow effect will be observed with the u2 and u4 data types.

@joshmoore
Copy link
Member Author

Thanks for the explanation, @SabineEmbacher. I clearly misunderstood https://jzarr.readthedocs.io/en/latest/datatype.html?highlight=unsigned#data-types

@joshmoore
Copy link
Member Author

Failures now look to all be related to nesting:

9 failures
FAILED test/test_read_all.py::test_correct_read[read xtensor_zarr (nested) zarr using jzarr, blosc]
10268
FAILED test/test_read_all.py::test_correct_read[read xtensor_zarr (nested) zarr using jzarr, raw]
10269
FAILED test/test_read_all.py::test_correct_read[read xtensor_zarr (nested) zarr using jzarr, zlib]
10270
FAILED test/test_read_all.py::test_correct_read[read zarr (FSStore, nested) zarr using jzarr, blosc]
10271
FAILED test/test_read_all.py::test_correct_read[read zarr (FSStore, nested) zarr using jzarr, raw]
10272
FAILED test/test_read_all.py::test_correct_read[read zarr (FSStore, nested) zarr using jzarr, zlib]
10273
FAILED test/test_read_all.py::test_correct_read[read zarr (NestedDirectoryStore, nested) zarr using jzarr, blosc]
10274
FAILED test/test_read_all.py::test_correct_read[read zarr (NestedDirectoryStore, nested) zarr using jzarr, raw]
10275
FAILED test/test_read_all.py::test_correct_read[read zarr (NestedDirectoryStore, nested) zarr using jzarr, zlib]
which should be fixed once JZarr releases https://github.com/zarr-developers/zarr-python/pull/715

@grlee77: don't know if you want to hold off on this PR until then, or introduce a way to exclude/skip/tolerate that part of the matrix.

@grlee77
Copy link
Contributor

grlee77 commented May 5, 2021

@grlee77: don't know if you want to hold off on this PR until then, or introduce a way to exclude/skip/tolerate that part of the matrix.

I am fine with waiting, but if you would rather merge sooner we could introduce an environment variable to enable skipping the known failures.

@SabineEmbacher
Copy link

Thanks for the explanation, @SabineEmbacher. I clearly misunderstood https://jzarr.readthedocs.io/en/latest/datatype.html?highlight=unsigned#data-types

Do you have a suggestion how I could write this in the documentation to avoid misinterpretation?

@joshmoore
Copy link
Member Author

Hey @SabineEmbacher. I guess the question is if you see a way forward here. Or is this simply an incompatibility that will remain between zarr-python and jzarr?

@joshmoore
Copy link
Member Author

Whew. 💚 @grlee77, turning this over to you. I imagine we can continue the discussion around signedness (#37 (comment)) elsewhere.

Copy link
Contributor

@grlee77 grlee77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @joshmoore, this looks good to me now.

@joshmoore joshmoore merged commit b02f2b8 into zarr-developers:master Jun 16, 2021
@joshmoore joshmoore deleted the jzarr branch June 16, 2021 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants