Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a collection of .nwb files of small sizes with various "features" #1087

Closed
yarikoptic opened this issue Oct 8, 2019 · 5 comments
Closed
Labels
category: enhancement improvements of code or code behavior

Comments

@yarikoptic
Copy link
Contributor

ATM there is already a good wild range of .nwb files which could be found in the wild. They differ in version of nwb, and data types they contain etc. For demonstration and testing purposes it would be nice to collate a collection of sample .nwb files, including the ones representing NWB 1.0 flavors (possibly with conversion scripts from 1.0 to 2.0), and modern NWB files.
Unfortunately all of the ones in the wild are quite large to come up with a collection which would be feasible for use in regression tests etc.

I wondered if it would be feasible to "minimize" existing files. E.g. given an .nwb file (possibly 1.0), would there be some legit way to minimize it (strip multiple sessions/subjects to a single one, reduce # of measurements to a single one, etc) while preserving at least the top levels of the hierarchy etc.

If not -- how potentially such a collection of representative .nwb files could be established?

@yarikoptic yarikoptic added the category: enhancement improvements of code or code behavior label Oct 8, 2019
@yarikoptic yarikoptic mentioned this issue Oct 8, 2019
5 tasks
@oruebel
Copy link
Contributor

oruebel commented Oct 8, 2019

including the ones representing NWB 1.0 flavors

It will be hard to support all flavors of NWB 1.0 as the flavors are just too different, and most files I have encountered are also only NWB 1.0 in spirit.

it would be feasible to "minimize" existing files

For testing you could just strip out the bulk data, i.e., replace the large arrays that store the majority of the data with smaller version .

how potentially such a collection of representative .nwb files could be established?

Currently we create test files from the integration test suite. I think a good way may be to have tutorials that show best practices on how to generate good NWB files and then use those for testing.

@bendichter
Copy link
Contributor

We have created something like this already. Maybe it will fit your needs. Near the bottom of https://www.nwb.org/example-datasets/ you'll find

PyNWB Test Data
PyNWB generates as part of its test suite a collection of small NWB files with synthetic test data.

The download link leads to this google drive folder, which has NWB files generated from previous versions of pynwb. The idea is to cache them in the CI and use them to ensure that we do not break backwards compatibility, though I admit I never followed it to see if that was implemented in the CI tests. We could add an NWB 1.0 file and make sure that we catch it and throw an informative error message (#1086). Would these work for your purposes?

@yarikoptic
Copy link
Contributor Author

Yes! Thank you @bendichter - that sounds exactly what I desire! Adding nwb 1 file there to use in a test would be great!

@yarikoptic
Copy link
Contributor Author

And thank you @oruebel for your feedback! I might get back to desire of getting a utility to strip bulk data from the files - might be useful to be about to share minimized versions of files found in the wild for troubleshooting purposes

@yarikoptic
Copy link
Contributor Author

FWIW, I posted that google folder directly to git repo (top level .zip still under annex, larger than 100kb, I decided not to bother providing its content extracted since not sure if of benefit) to https://github.com/dandi-datasets/nwb_test_data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior
Projects
None yet
Development

No branches or pull requests

3 participants