Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for bioformats2raw transitional layout, etc. #36

Open
krokicki opened this issue Aug 27, 2024 · 8 comments
Open

Support for bioformats2raw transitional layout, etc. #36

krokicki opened this issue Aug 27, 2024 · 8 comments

Comments

@krokicki
Copy link
Member

When you use bioformats2raw it produces one or more images at the root:
https://ngff.openmicroscopy.org/0.4/#bf2raw

When I try to open one of these zarrs:

from pydantic_ome_ngff.v04.multiscale import Group
import zarr
url = "s3://janelia-flylight-imagery/Fly-eFISH/EASI-FISH_NP_SS/NP01_R1_20230906/NP01_R1_1_1_SS00790_AstA_546_CCHa1_647_100x_LOL.zarr"
zgroup = zarr.open(url)
group_model = Group.from_zarr(zgroup)

it results in an error:

KeyError: 'Failed to find mandatory `multiscales` key in the attributes of the Zarr group at <zarr.storage.FSStore object at 0x7fd55c0f0160>://janelia-flylight-imagery/Fly-eFISH/EASI-FISH_NP_SS/NP01_R1_20230906/NP01_R1_1_1_SS00790_AstA_546_CCHa1_647_100x_LOL.zarr://.'

I think that a Group should never need a multiscales attribute. The OME-Zarr spec does not have high level types, but one way to interpret the spec is that there is a concept of an "Image" which is a type of group with multiscales, so maybe that is a better way to model it.

In any case, I think it should be possible to parse any valid OME-Zarr and it should just fall back to standard Zarr constructs whenever a concept is missing. For example, even if it doesn't explicitly model Plate and Well as classes, they could still be expressed as Group objects.

@d-v-b
Copy link
Collaborator

d-v-b commented Aug 27, 2024

I think that a Group should never need a multiscales attribute. The OME-Zarr spec does not have high level types, but one way to interpret the spec is that there is a concept of an "Image" which is a type of group with multiscales, so maybe that is a better way to model it.

Maybe this comes down to me making a bad naming decisions -- the multiscale.Group class is designed to model exactly the structure described for a multiscale image in the OME-NGFF spec, i.e. a zarr group with attributes that contains a multiscales attribute, with a particular structure, etc. By contrast, the zarr group created by multiscales2raw is not a multiscale group, so multiscale.Group does not model it. To model a zarr group that contains OME-NGFF groups or non-ome-ngff groups, I would do something like this:

# /// script
# requires-python = ">=3.9"
# dependencies = [
#   "pydantic-ome-ngff",
#   "fsspec[s3]",
# ]
# ///

from typing import Any
from pydantic_ome_ngff.v04.multiscale import Group
from pydantic_zarr.v2 import GroupSpec

import zarr
url = "s3://janelia-flylight-imagery/Fly-eFISH/EASI-FISH_NP_SS/NP01_R1_20230906/NP01_R1_1_1_SS00790_AstA_546_CCHa1_647_100x_LOL.zarr"
zgroup = zarr.open(url)
# a model of a zarr group with any attributes that could contain any zarr group OR an ome-ngff multiscale group
ContainsOmeGroup = GroupSpec[Any, GroupSpec | Group]
group_model = ContainsOmeGroup.from_zarr(zgroup)

It seems like pydantic handles the union properly in my case: group_model.members has 2 elements, one of which is a GroupSpec and the other is a multiscale.Group

I do think my choice of the name Group was unfortunate. Do you think MultiscaleGroup would make things more clear?

@d-v-b
Copy link
Collaborator

d-v-b commented Aug 27, 2024

and in case it wasn't clear, multiscale.Group is just a subclass of GroupSpec, with some additional validation logic the ensures that the group attributes and the array members are consistent

@JaneliaSciComp JaneliaSciComp deleted a comment from mengyanshou Aug 27, 2024
@krokicki
Copy link
Member Author

Yes, MultiscaleGroup would be better naming. I think what would be really nice is a Group (or maybe OmeZarr) class that I can use to import the top level of any Zarr I have, and for it to provide access to any multiscale.Group images underneath it.

@d-v-b
Copy link
Collaborator

d-v-b commented Aug 27, 2024

I think what would be really nice is a Group (or maybe OmeZarr) class that I can use to import the top level of any Zarr I have, and for it to provide access to any multiscale.Group images underneath it.

That's a cool idea. I'm not sure there's a simple way to define this is as a generic pydantic model, i.e. to get the behavior you want from Model.from_zarr (I will keep thinking about this though), but it would definitely be straightforward to create a function that produces a GroupSpec instance where all sub-groups are either vanilla GroupSpecs or instances of multiscale.Group.

@d-v-b
Copy link
Collaborator

d-v-b commented Aug 29, 2024

@krokicki take a look at #37, in particular the docs changes -- I put your specific use case in as an example in the docs. let me know if there's anything I should add to remove there.

@krokicki
Copy link
Member Author

Nice! That's most of what I wanted. The only thing left is to make a from_zarr method that creates those from disk representations.

@d-v-b
Copy link
Collaborator

d-v-b commented Aug 30, 2024

The only thing left is to make a from_zarr method that creates those from disk representations.

The example in the docs does a full round-trip to and from disk, albeit just for a hierarchy defined as a Zarr group that contains OME-NGFF groups OR regular zarr groups. Here's a commented, abridged form of the relevant part of the docs example:

# data structure in memory
multi_image_group = GroupOfMultiscales(members=groups)
# memory -> disk
zgroup = multi_image_group.to_zarr(store, path='multi_image_group')
# disk -> memory
GroupOfMultiscales.from_zarr(zgroup)

let me know if I should make this more clear in the docs.

So the specific use case that motivated you to open this issue should be addressed in #37, but the general problem of defining a model of a Zarr group that could contain an OME-NGFF group at any level remains open.

@d-v-b
Copy link
Collaborator

d-v-b commented Aug 30, 2024

Actually, we can do with self-referential types. Here I amend my original example to show how to express the general case of a zarr hierarchy which might contains ome-ngff groups at any level:

# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "pydantic-ome-ngff==0.6.0",
#   "fsspec[s3]",
# ]
# ///

from typing import Any, Union
from pydantic_ome_ngff.v04 import MultiscaleGroup, Axis
from pydantic_zarr.v2 import GroupSpec, ArraySpec
import zarr
import numpy as np

# this class is self-referential
class ContainsOmeGroup(GroupSpec[Any, Union[MultiscaleGroup, GroupSpec, ArraySpec, "ContainsOmeGroup"]]):
    ...

axes = [Axis(name='x', type='space'), Axis(name='y', type='space')]

m_group_a = MultiscaleGroup.from_array_props(
    dtype=np.dtype('uint8'),
    shapes = [(10,10)],
    paths=['s0'],
    axes=axes,
    scales=[[1,1]],
    translations=[[0,0]],
    order='C')

m_group_c = MultiscaleGroup.from_array_props(
    dtype=np.dtype('uint16'),
    shapes = [(20,20)],
    paths=['s0'],
    axes=axes,
    scales=[[10,10]],
    translations=[[5,5]], 
    order='C')

# this is a sub-group that contains a multiscale group
group_b = GroupSpec(attributes={'foo': 10}, members={'b_c': m_group_c})
multi_image_group = ContainsOmeGroup(members={'a': m_group_a, 'b': group_b})
store = zarr.MemoryStore()

zgroup = multi_image_group.to_zarr(store, path='multi_image_group')
g = ContainsOmeGroup.from_zarr(zgroup)
print(f"{type(g.members['a'])=}")
print(f"{type(g.members['b'].members['b_c'])=}")
"""
type(g.members['a'])=<class 'pydantic_ome_ngff.v04.multiscale.MultiscaleGroup'>
type(g.members['b'].members['b_c'])=<class 'pydantic_ome_ngff.v04.multiscale.MultiscaleGroup'>
"""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants