-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v3 core spec: Consider to drop /meta prefix, have file at URI #177
Comments
Due to the global storage transformers proposal #182 from @rabernat, my opinion on this has changed a bit:
In summary, I think zarr v3 should just use a zarr v2-like naming scheme by default, and rely on a storage transformer extension to provide an alternative naming scheme if it is desired. This would also allow the root metadata file, and the concept of a root, to be eliminated entirely --- instead just the group metadata file could be used. |
There's a lot to unpack here, but let me just respond to one specific point first.
Even on traditional POSIX filesystems, there is a strong benefit to separating metadata and chunks into separate directories. Here's an example: we frequently use Zarr on NASA's Pleiades supercomputer. This system has a large shared filesystem which performs well for some operations. However, listing directories with a large number of files in them can be extremely slow. Separating metadata from chunks allows us to quickly discover the structure of a Zarr hierarchy without having to list millions of chunks. |
Interesting idea, I guess that's also where the motivation for #184 is coming from, @jbms?
True, especially with something like consolidated metadata.
There's a similar discussion going on for OME-NGFF: 144 |
While it might be tricky to accomplish with regular shell utilities like GNU find, I think it can be done from zarr implementations relatively easily: start at the root, check if the current directory is an array based on presence of array metadata, if not, list it and recurse on subdirectories. |
I agree with this.
I like this idea in general. However, I have found it to be problematic that the entry point to a zarr store is a directory, not an actual file, because in some contexts (e.g. S3) it is impossible to tell whether a directory exists or not. Given that the first thing that all V3 stores must do is open the root metadata document, wouldn't it make more sense to use that as the actual file / URI identifying the zarr store, e.g.
This opens the question of whether the root group actually needs to be named
This is an interesting suggestion which I think we should examine closely in today's call. It would be particularly helpful if anyone could recall the original use cases that motivated the new layout proposal. I am trying to think of one (e.g. a case where you are forced to list a directory with millions of chunks in it just to find a metadata document), but I am not coming up with anything. 🤔
What is this hypothetical store that doesn't support directory-based listing? As you mentioned above, it's not actually S3 / GCS, since they effectively support directory listing. So is it even an important case to consider? In your previous point, you used the fact that S3 does support directory listing as an argument for reverting to the unified storage layout. If your million arrays are sufficiently nested, traversing the directory tree should be feasible, no?
This makes sense to me. Basically a generalization of the global storage transformers idea (#182) to act at the group level. Perhaps one path forward would be to go back to the unified storage layout as the default, and then redefine the separate data / meta layout as a global (or possible group-level) storage transformer (here called {
"storage_transformers": [
{
"extension": "https://purl.org/zarr/spec/storage_transformers/storage_layout/1.0",
"configuration": {
"metadata_path": "./meta/root/",
"chunk_path": "./data/root/"
}
}
]
} This could be used to provide all sorts of redirection to different storage locations, e.g. pointing at a completely different service. (I would actual prefer this over being able to specify a seperate |
Crosslinking the discussion from the ZEP meeting yesterday: Two action items regarding this issue: The agreement seemed to be to have somthing similar to the v2 format, but with an extra folder for the chunk-data per array. For an array
|
Before I create a PR, I wanted to discuss a few options for the new storage layout: The basic assumptions I'm making for the default storage layout are:
In my mind, the key question to consider is whether the directory name should encode whether a group member is an array, a group, or perhaps something else, like a bare key-value store prefix to be used by some extension, such as for storing meshes. If don't encode the group member type in the name (as in zarr v2), then we have a layout like: Option 1:
Pros:
Cons:
If we encode the group member type in the name, then we have: Option 2:
Option 2A: (use a file extension only for arrays, not for subgroups, or vice versa)
Pros:
Cons:
With this storage layout, there are two possible solutions to the issue of avoiding having a group and array with the same name: It is not clear to me how important the "list group members including their type" operation is; if that operation is important, then Option 2 is better. But currently I lean towards Option 1 since it is simpler, and Option 2 could potentially be provided via an extension. Note that the chunks of an array could optionally be nested within a
This naming choice appears to be orthogonal to the choice between Option 1 and Option 2. I don't think it provides any benefit to a zarr implementation, but may provide a better user experience for users interactively browsing a store, e.g. using path completion --- without a "chunks" prefix, it is easy for a user to accidentally list all the chunks of an array. The "chunks" prefix would serve as a warning, and users could avoid accidentally listing the chunks. I don't have a strong opinion on whether to use the "chunks/" prefix. |
Thanks for the great write-up @jbms!
+1 from me.
I'd prefer to use a prefix (maybe rather PS: I just openened a separate issue about dropping the entrypoint metadata / explicit root: #192 I think it's fair to discuss those two things separately, but 👍 for waiting on a decision there before preparing a PR. |
Another argument in favor of this is that you still have to read each individual metadata file to determine such information as data type, shape, etc., which in practice is likely to be more useful even than checking for group vs. array, and chicken's consolidated metadata solves group vs array also.
You could have an empty array with no chunks, and on s3/gcs where there are no real directories, there would be no _chunks directory. Also due to the layout, to avoid also listing chunks on s3 and gcs, a separate list request would be required for each array, and list operations cost 10x as much as read operations.
|
As we discussed at the last ZEP meeting, a central tradeoff when deciding whether to keep the split
To understand this tradeoff, I did a benchmarking experiment. In this experiment, I create very large hierarchies of documents in S3 using different levels of nesting and compare the time it takes to list all documents vs. traversing the hierarchy. The code is all async and is probably as performant as we can get without a lot more work. Here are some of the interesting results: Listing time as a function of number of objects
Listing time sensitivity to nestingIn the next figures, we have the same number of objects nested in different ways, from flat (depth = 1) to as deep as possible (using a binary tree). We can compare how the different strategies perform.
ConclusionsThe concern about the cost of recursive listing is real. However, I am not convinced that it is really necessary for a client to discover the entire hierarchy immediately upon opening a store. There is no way to present such a large hierarchy to a user all at once anyways. Instead, it would be better if clients would use a lazy strategy, only performing Based on this, I think we can move forward with the proposal to drop the separate |
Thanks a ton for the experiments, @rabernat!
Very good point! I find the argument that the |
Can we talk about this a bit more? I think I understand the reason why this is desirable: you can copy and array or group by copying a directory. On the other hand, I find myself quite liking the V3 layout for other reasons
Having the metadata document just above the directory in the hierarchy resolves many of the issues related to exploring and concurrently writing stores raised in #177 (comment). The main pros:
Cons:
I feel like we should have a bit more discussion about these tradeoffs before abandoning this layout. I have serious concerns about Option 1:
The requirement that the metadata document must be read to discover the contents of a store, together with the V3 change that array / group metadata and user attributes are all stored in the same file I believe can lead to unacceptable performance degradations in very common use cases. It's typical for NetCDF type datasets to be stored as a flat group with 100s of arrays. Each of these arrays can have a lot of user metadata. This proposal would require that all of that be read in just to open the store. So if we are going with one of the proposal above, I would favor option 2. But I actually think I favor option 0 (sticking with the existing V3 layout), minus the |
In regards to exploring and concurrently writing a zarr hierarchy, it seems to me that it has similar trade-offs as my option 2 --- we can distinguish groups and arrays when listing a group, but we then have to worry about a group and array having the same name. Are there things I'm missing?
I guess the idea here is that when listing the location of a dataset, e.g. on a website, you would provide a URL to the metadata file, rather than a URL to the directory? That way if they attempt to just navigate to the URL in a browser, it will return the JSON metadata rather than either a directory listing or an error? This can already be done with zarr v2 and with both my proposed option 1 and 2, but I guess the difference here is that under your proposed scheme zarr v3 implementations might expect to be passed the path to the metadata file rather than the path to the directory? I think there is a risk that this could be more confusing to users not familiar with zarr v3: specifying a directory very clearly indicates that the group/array is represented as a collection of files. If you specify a URL to just a single json file, someone may visit it in their browser, download it to their local machine, and then later open it and find an empty group or empty array since they did not download the rest of the files.
One slightly awkward aspect of this approach is that even a standalone array or the root group, needs to have a "name". For example, with zarr v2 or my proposed option 1 and option 2, we can have a cloud storage bucket or a zip file that contains just an array or a group at its root. With this approach we instead would need to give it a name, like "array" or "root".
We also lose the ability to atomically rename an array or group on stores, such as the filesystem, that support atomic renames of individual files/directories. With two things to rename, if the program crashes in the middle, we end up with a corrupt store.
Merely opening a group should not necessarily involve listing its contents. Furthermore, it seems that in many cases where you do want to list the contents of a group, you would also care about other information, such as the data types and shapes of each array, not merely the names of the arrays. If the default representation makes that too expensive, consolidated metadata would allow you to both efficiently determine the names of the arrays, and also determine their shapes, etc. However, you have a good point that in some cases the metadata file may become large and then it is problematic to read the whole thing just for one piece of information. That issue also applies more generally, though --- we may only wish to read the array, and not care about its user-defined attributes. Or we may only wish to determine its shape and data type and not read it at all. Splitting the user-defined attributes from the zarr-defined metadata, as in v2, would be one solution, but there may be others.
|
Just to summarize, I think this is set, right? We are in favor of droppping the I have no strong opinion about the filenames. I just think we should include a clear argument for any deviations from v2 in the ZEP. |
I am 👍 on dropping the separation, as along as we create a group-level storage transformer / extension that allows you to bring it back. However, given the fundamental nature of this change, I would love to hear a few more options from e.g. @zarr-developers/python-core-devs. Does anyone strongly want to to keep the separation as the default in V3? |
I wasn't sure what this means exactly, so I don't know if it is or isn't possible. Certainly, traversing the sets of directories and/or possibly listing big directories is not something we want to do. In fact, it we can do without any directory listing actions ever, that's the best. |
From an implementation point of view, and kerchunk's interaction with this, a unseparated layout more similar to v2 is more convenient. That, by itself, is not a great motivator. |
At today's ZEP meeting, @jstriebel, @joshmoore, @jbms and myself all seemed to be in agreement that option 1 is preferable. I also favor placing chunks in a separate subdir within an array directory. We also agreed it would be useful to sketch out the algorithm one would use to recursively browse / explore a store. |
Option 1 means a zarr.json file in every group and array alongside the chunks, and that subgroups or array members of a group are subdirectories? I can support this model, it is very similar to v2. I assume the plan is to have the option for an extension to implement things like consolidated metadata. Question: does a group metadata list its submembers, or is this still done by directory listing? |
Still done by directory listing, otherwise adding members to a group concurrently from multiple machines may be problematic. An extension/storage transformer/storage adapter like kerchunk |
To summarize, we agreed to drop the
for a group One important point @joshmoore emphasized is to add a prefix for the |
A group or array called exactly "zarr.json" seems unlikely, but one called "chunks" is definitely possible. |
|
OK, fair point. "_" is also used by OSs, I think, no? And we shouldn't use capitals, URL-sensitive chars or non-ascii, just in case that's a problem :) |
Not that I'm aware of, but that doesn't mean much ;)
👍 |
@martindurant Do you have an example where |
I thought the DS_Store, but I see it is a dot. Maybe windows does this for its directory config files? Since I can't find it, assume I am wrong and |
Following the discussion at the ZEP meeting on 2023-01-12, @jstriebel and I discussed this further over email and have the following proposal:
Rationale: On some filesystems, especially distributed filesystems, directories are somewhat expensive; in fact we have run into issues with this at Google when creating a large number of arrays at once. On those systems, it would be unfortunate to have to create two or three directories for every array instead of just one (one extra for chunks, and possibly one extra for extension metadata, if using e.g. "zarr.extensions/" as a prefix). By slightly tweaking the chunk key encoding compared to the current v3 proposal, the chunks end up in a separate directory by default, but users can choose Key properties:
|
I really like this proposal except for one thing:
Can we consider decoupling the questions of should ALL chunks be in a subdirectory? from should we store chunks in a nested series of directories? I can imagine that I might like to do The rationale is as follows. We can imagine that the array directory might have other documents in it related to extensions that are discovered by listing, e.g.
I want to absolutely avoid having to list a directory with millions of chunks in it. But I still might not want a nested hierarchy of chunks. Does that make sense? |
As far as extensions, I imagined that they would always be specified in the |
Yes this makes sense. If we require this I can drop my objection.
Solving this in a general way (beyond the extension example) requires us to resolve the issues that motivated consolidated metadata (#136). For our applications, we must be able to discover the children of a group over HTTP. All solutions I can think of here require explicitly enumerating the children in |
Not sure there is a single best/perfect solution. Options I can think of, other than the solution you mentioned, are:
|
Thanks Jeremy! One thing to mention is that this HTTP use case is mostly a sort of archival, read-only scenario. So I think it should be acceptable for our purposes to enumerate the children explicitly. I'm going to propose an extension to enable this. |
fsspec supports directory listing on HTTP servers that produce a list of links for child folders/files like this |
@jbms and I discussed that this might not be as easy as we thought first, since we don't have a clear entrypoint for a group anymore. This means that the following fs path can be opened differently:
Depending on the (user-defined) entry-point of the hierarchy the escaping needs to be done or not, which seems confusing. Also, an array in a path Since this would be a problem with all escaping schemes we tend towards disallowing a prefix again. Just
|
Citing @jbms from #149 (comment):
Some more comments from discussion rounds I remember:
s3://bucket-name/key-name/name-of-the-zarr-path.zarr/hierarchy/path/my-data.array.json
could be a URI to point to themy-data
array at the pathhierarchy/path/my-data
of the zarr hierarchy which is placed unders3://bucket-name/key-name/name-of-the-zarr-path.zarr/
. (Just made up a URI here as an example, feel free to discuss this in v3: Define standard "URL" syntax for referencing a specific array, group, attribute within a zarr repository #132). Using such a URI schema and dropping the/meta
prefix, one could find the relevant file (at least for filesystem or http stores or using appropriate clients for other stores)./meta
and/data
separate is to be able to list all meta keys without also listing the chunk files for efficiency reasons. If it's possible to exclude directories for key-listings for most relevant stores, only using a prefix for the chunk files would still give this efficiency, but it's unclear if that's the case.Pinging discussion participants I remember so far: @joshmoore @jbms @rabernat @WardF
The text was updated successfully, but these errors were encountered: