-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing a name with colon :
fails in zarr in windows
#219
Comments
Thanks for reporting this issue. Unfortunately I don't think there is much we can do about this. ":" is not allowed in file names on Windows, and since Zarr translates the TimeSeries to a folder on disk, the name must be a valid name. |
Perhaps we can make user experience better? A similar error is hdf5 limitation about slashes in names: from pynwb.testing.mock.file import mock_NWBFile
from pynwb.image import TimeSeries
nwbfile = mock_NWBFile()
time_series = TimeSeries(name="Name/name", rate=1.0, data=[1, 2, 3], unit="unit")
nwbfile.add_acquisition(time_series)
nwbfile_path = "test.nwb"
from pynwb import HDMFIO
with HDMFIO(path=nwbfile_path, mode="w") as io:
io.write(nwbfile) Which outputs a value error:
Perhaps we could have something similar? More deeply this seems like a leaky abstraction where an implementation detail (the naming scheme in the backend) is leaked to something that the user can control directly. It is an interesting coupling. |
Adding an error check in ZarrIO to improve reporting sounds reasonable. We would need to add a check for both groups and datasets, so we'd probably need a helper function to check if a path-name is permitted.
There are certain limitations of storage backends that are tricky to work around. One approach could be to have docval validate names when creating new objects (i.e., before we call write) to prevent creation of bad names. |
Make sense. Thinking about this, this raises other questions:
|
@rly @bendichter thoughts on this? |
I think for 1 yes, we should prohibit ":" on other operating systems, as I would argue cross-OS compatibility is a core feature of NWB. For 2, I think maybe not. I don't think interoperability between backends is necessarily a core feature, though perhaps we should put this as an NWB Best Practice. there's another question that comes up for me: |
I agree with @bendichter on 1 and 3 and am ambivalent on 2. Allowing but discouraging people from putting ":" in the name feels a little in conflict with preventing people from requiring the name to have a ":". I understand the reason. A simpler rule is to just not allow ":" in the name for all backends moving forward. I would say most, if not all, people do not care much about the names of their NWB objects. |
I think that one advantage of disallowing |
My thoughts on (2): I'm thinking of a use-case where the user wants to write HDF5-based NWB files. They want to use object names that reflect their source data structure in some specific way that includes the On the other hand, we may down the line decide it's better to store all NWB data on DANDI as Zarr objects, and it might then be much nicer if we already had constraints in place that allowed us to do that more easily. I see some advantages and disadvantages. I tend to err on the side of freedom, since you can never fully anticipate every use-case and you are potentially causing a big headache for someone by making constraints you don't need to. |
We discussed this at our issue-a-thon today and 4 out of 4 voted for strictness: raise an error when trying to create an |
This should be implemented in HDMF in a major release. |
This produces an error on windows (see the name of the TimeSeries)
Error trace:
Click here for error trace
{ "name": "KeyError", "message": "'acquisition/Name: name/.zgroup'", "stack": "--------------------------------------------------------------------------- NotADirectoryError Traceback (most recent call last) File c:\\Users\\heberto\\miniconda3\\envs\ euro\\Lib\\site-packages\\zarr\\storage.py:1136, in DirectoryStore.__setitem__(self, key, value) 1135 try: -> 1136 os.makedirs(dir_path) 1137 except OSError as e:File :225, in makedirs(name, mode, exist_ok)
NotADirectoryError: [WinError 267] The directory name is invalid: 'c:\\Users\\heberto\\development\
euroconv\\test.nwb\\acquisition/Name: name'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
Cell In[5], line 13
10 nwbfile_path = "test.nwb"
12 with NWBZarrIO(path=nwbfile_path, mode="w") as io:
---> 13 io.write(nwbfile)
File c:\Users\heberto\miniconda3\envs
euro\Lib\site-packages\hdmf\utils.py:668, in docval..dec..func_call(*args, **kwargs)
666 def func_call(*args, **kwargs):
667 pargs = _check_args(args, kwargs)
--> 668 return func(args[0], **pargs)
File c:\Users\heberto\miniconda3\envs
euro\Lib\site-packages\hdmf_zarr\backend.py:277, in ZarrIO.write(self, **kwargs)
267 cache_spec, number_of_jobs, max_threads_per_process, multiprocessing_context = popargs(
268 "cache_spec", "number_of_jobs", "max_threads_per_process", "multiprocessing_context", kwargs
269 )
271 self.__dci_queue = ZarrIODataChunkIteratorQueue(
272 number_of_jobs=number_of_jobs,
273 max_threads_per_process=max_threads_per_process,
274 multiprocessing_context=multiprocessing_context,
275 )
--> 277 super(ZarrIO, self).write(**kwargs)
278 if cache_spec:
279 self.__cache_spec()
File c:\Users\heberto\miniconda3\envs
euro\Lib\site-packages\hdmf\utils.py:668, in docval..dec..func_call(*args, **kwargs)
666 def func_call(*args, **kwargs):
667 pargs = _check_args(args, kwargs)
--> 668 return func(args[0], **pargs)
File c:\Users\heberto\miniconda3\envs
euro\Lib\site-packages\hdmf\backends\io.py:99, in HDMFIO.write(self, **kwargs)
97 """Write a container to the IO source."""
98 f_builder = self.__manager.build(container, source=self.__source, root=True)
---> 99 self.write_builder(f_builder, **kwargs)
File c:\Users\heberto\miniconda3\envs
euro\Lib\site-packages\hdmf\utils.py:668, in docval..dec..func_call(*args, **kwargs)
666 def func_call(*args, **kwargs):
667 pargs = _check_args(args, kwargs)
--> 668 return func(args[0], **pargs)
File c:\Users\heberto\miniconda3\envs
euro\Lib\site-packages\hdmf_zarr\backend.py:440, in ZarrIO.write_builder(self, **kwargs)
436 f_builder, link_data, exhaust_dci, export_source, consolidate_metadata = getargs(
437 'builder', 'link_data', 'exhaust_dci', 'export_source', 'consolidate_metadata', kwargs
438 )
439 for name, gbldr in f_builder.groups.items():
--> 440 self.write_group(
441 parent=self.__file,
442 builder=gbldr,
443 link_data=link_data,
444 exhaust_dci=exhaust_dci,
445 export_source=export_source,
446 )
447 for name, dbldr in f_builder.datasets.items():
448 self.write_dataset(
449 parent=self.__file,
450 builder=dbldr,
(...)
453 export_source=export_source,
454 )
File c:\Users\heberto\miniconda3\envs
euro\Lib\site-packages\hdmf\utils.py:668, in docval..dec..func_call(*args, **kwargs)
666 def func_call(*args, **kwargs):
667 pargs = _check_args(args, kwargs)
--> 668 return func(args[0], **pargs)
File c:\Users\heberto\miniconda3\envs
euro\Lib\site-packages\hdmf_zarr\backend.py:527, in ZarrIO.write_group(self, **kwargs)
525 if subgroups:
526 for subgroup_name, sub_builder in subgroups.items():
--> 527 self.write_group(
528 parent=group,
529 builder=sub_builder,
530 link_data=link_data,
531 exhaust_dci=exhaust_dci,
532 )
534 datasets = builder.datasets
535 if datasets:
File c:\Users\heberto\miniconda3\envs
euro\Lib\site-packages\hdmf\utils.py:668, in docval..dec..func_call(*args, **kwargs)
666 def func_call(*args, **kwargs):
667 pargs = _check_args(args, kwargs)
--> 668 return func(args[0], **pargs)
File c:\Users\heberto\miniconda3\envs
euro\Lib\site-packages\hdmf_zarr\backend.py:522, in ZarrIO.write_group(self, **kwargs)
520 group = parent[builder.name]
521 else:
--> 522 group = parent.require_group(builder.name)
524 subgroups = builder.groups
525 if subgroups:
File c:\Users\heberto\miniconda3\envs
euro\Lib\site-packages\zarr\hierarchy.py:1025, in Group.require_group(self, name, overwrite)
1000 def require_group(self, name, overwrite=False):
1001 """Obtain a sub-group, creating one if it doesn't exist.
1002
1003 Parameters
(...)
1022
1023 """
-> 1025 return self._write_op(self._require_group_nosync, name, overwrite=overwrite)
File c:\Users\heberto\miniconda3\envs
euro\Lib\site-packages\zarr\hierarchy.py:952, in Group._write_op(self, f, *args, **kwargs)
949 lock = self._synchronizer[group_meta_key]
951 with lock:
--> 952 return f(*args, **kwargs)
File c:\Users\heberto\miniconda3\envs
euro\Lib\site-packages\zarr\hierarchy.py:1032, in Group._require_group_nosync(self, name, overwrite)
1030 # create terminal group if necessary
1031 if not contains_group(self._store, path):
-> 1032 init_group(
1033 store=self._store, path=path, chunk_store=self._chunk_store, overwrite=overwrite
1034 )
1036 return Group(
1037 self._store,
1038 path=path,
(...)
1043 zarr_version=self._version,
1044 )
File c:\Users\heberto\miniconda3\envs
euro\Lib\site-packages\zarr\storage.py:677, in init_group(store, overwrite, path, chunk_store)
674 store["zarr.json"] = store._metadata_class.encode_hierarchy_metadata(None) # type: ignore
676 # initialise metadata
--> 677 _init_group_metadata(store=store, overwrite=overwrite, path=path, chunk_store=chunk_store)
679 if store_version == 3:
680 # TODO: Should initializing a v3 group also create a corresponding
681 # empty folder under data/root/? I think probably not until there
682 # is actual data written there.
683 pass
File c:\Users\heberto\miniconda3\envs
euro\Lib\site-packages\zarr\storage.py:738, in _init_group_metadata(store, overwrite, path, chunk_store)
736 key = _prefix_to_group_key(store, _path_to_prefix(path))
737 if hasattr(store, "_metadata_class"):
--> 738 store[key] = store._metadata_class.encode_group_metadata(meta)
739 else:
740 store[key] = encode_group_metadata(meta)
File c:\Users\heberto\miniconda3\envs
euro\Lib\site-packages\zarr\storage.py:1139, in DirectoryStore.setitem(self, key, value)
1137 except OSError as e:
1138 if e.errno != errno.EEXIST:
-> 1139 raise KeyError(key)
1141 # write to temporary file
1142 # note we're not using tempfile.NamedTemporaryFile to avoid restrictive file permissions
1143 temp_name = file_name + "." + uuid.uuid4().hex + ".partial"
KeyError: 'acquisition/Name: name/.zgroup'"
}
Whereas this works prefectly:
The text was updated successfully, but these errors were encountered: