Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nwb_version is neither populated, nor validation causes any error #1085

Closed
yarikoptic opened this issue Oct 8, 2019 · 3 comments
Closed
Labels
category: bug errors in the code or code behavior

Comments

@yarikoptic
Copy link
Contributor

Looking at the schema:

(git)hopa:~/proj/dandi/nwb/pynwb[dev]git
$> git -C src/pynwb/nwb-schema grep -A5 '"nwb_version'
orig/schema.json:                    "nwb_version": {
orig/schema.json-                        "description": "File version string. COMMENT: Eg, NWB-1.0.0. This will be the name of the format with trailing major, minor and patch numbers.",
orig/schema.json-                        "data_type": "text"
orig/schema.json-                    }
orig/schema.json-                },
orig/schema.json-                "PupilTracking/": {

it doesn't suggest that it is optional (fix is submitted for the example version) , so I assume that it is mandatory (is it?);
and within #1077 it came as the one to be used as a decision point either a given file NWB 2.0 or not.

While looking at a sample file from https://github.com/dandi/najafi-2018-nwb (generated with pynwb 1.0.3) I saw

that it does not carry this field although carries specifications/core/:
(git-annex)hopa:~/proj/dandi/nwb-datasets/najafi-2018-nwb[master]data/FN_dataSharing/nwb
$> h5ls -S  mouse1_fni16_150818_001_ch2-PnevPanResults-170808-180842.nwb/specifications/core
2.0.2                    Group
1 34090 [1].....................................:Mon 07 Oct 2019 08:53:49 PM EDT:.
(git-annex)hopa:~/proj/dandi/nwb-datasets/najafi-2018-nwb[master]data/FN_dataSharing/nwb
$> h5ls -S mouse1_fni16_150818_001_ch2-PnevPanResults-170808-180842.nwb/                   
acquisition              Group
analysis                 Group
file_create_date         Dataset {1}
general                  Group
identifier               Dataset {SCALAR}
intervals                Group
processing               Group
session_description      Dataset {SCALAR}
session_start_time       Dataset {SCALAR}
specifications           Group
stimulus                 Group
timestamps_reference_time Dataset {SCALAR}
And apparently pynwb (as of current 1.1.1) does not bother to populate it while creating a simple NWBFile file.
exit:1 /home/yoh/proj/dandi/trash > cat mksample_nwb.py
import os                           
import time
from datetime import datetime
from dateutil.tz import tzlocal, tzutc

tm1 = time.time()

from pynwb import NWBFile, TimeSeries
from pynwb import NWBHDF5IO
from pynwb.file import Subject, ElectrodeTable
from pynwb.epoch import TimeIntervals

t0 = time.time()
print("Took %.2f sec to import pynwb" % (t0-tm1))

filename ="testfile.nwb"
nwbfile = NWBFile('session description',
                  'identifier',
                  datetime.now(tzlocal()),
                  file_create_date=datetime.now(tzlocal()),
                  lab='a Lab',
                  # Keywords cause that puke upon `repr` ValueError: Not a dataset (not a dataset)
                  keywords=('these', 'are', 'keywords')
                 )
t1 = time.time()
print("Took %.2f sec to create NWB instance" % (t1-t0))

with NWBHDF5IO(filename, 'w') as io:
    io.write(nwbfile, cache_spec=False)

t2 = time.time()
print("Took %.2f sec to write NWB instance" % (t2-t1))

with NWBHDF5IO(filename, 'r', load_namespaces=True) as reader:
    nwbfile = reader.read()
    t3 = time.time()
    print("Took %.2f sec to read NWB instance" % (t3-t2))
    print(nwbfile)
    t4 = time.time()
    print("Took %.2f sec to print repr of NWB instance" % (t4-t3))
print(filename)

/home/yoh/proj/dandi/trash > python mksample_nwb.py
Took 4.09 sec to import pynwb
Took 0.00 sec to create NWB instance
Took 0.04 sec to write NWB instance
/home/yoh/deb/gits/pkg-exppsy/hdmf/src/hdmf/backends/hdf5/h5tools.py:99: UserWarning: No cached namespaces found in testfile.nwb
  warnings.warn(msg)
Took 0.03 sec to read NWB instance
root pynwb.file.NWBFile at 0x139706436159248
Fields:
  file_create_date: [datetime.datetime(2019, 10, 7, 20, 48, 14, 387764, tzinfo=tzoffset(None, -14400))]
  identifier: identifier
  keywords: <HDF5 dataset "keywords": shape (3,), type "|O">
  lab: a Lab
  session_description: session description
  session_start_time: 2019-10-07 20:48:14.387678-04:00
  timestamps_reference_time: 2019-10-07 20:48:14.387678-04:00

Took 0.00 sec to print repr of NWB instance
testfile.nwb

/home/yoh/proj/dandi/trash > h5ls -v testfile.nwb/nwb_version
Opened "testfile.nwb" with sec2 driver.
nwb_version**NOT FOUND** %                                                                                                                           
exit:1 /home/yoh/proj/dandi/trash > h5ls -v testfile.nwb/           
Opened "testfile.nwb" with sec2 driver.
acquisition              Group
    Location:  1:800
    Links:     1
analysis                 Group
    Location:  1:1832
    Links:     1
file_create_date         Dataset {1/1}
    Location:  1:11576
    Links:     1
    Storage:   8 logical bytes, 16 allocated bytes, 50.00% utilization
    Type:      variable-length null-terminated ASCII string
general                  Group
    Location:  1:5680
    Links:     1
identifier               Dataset {SCALAR}
    Location:  1:11848
    Links:     1
    Storage:   8 logical bytes, 16 allocated bytes, 50.00% utilization
    Type:      variable-length null-terminated UTF-8 string
processing               Group
    Location:  1:2536
    Links:     1
session_description      Dataset {SCALAR}
    Location:  1:12120
    Links:     1
    Storage:   8 logical bytes, 16 allocated bytes, 50.00% utilization
    Type:      variable-length null-terminated UTF-8 string
session_start_time       Dataset {SCALAR}
    Location:  1:12392
    Links:     1
    Storage:   8 logical bytes, 16 allocated bytes, 50.00% utilization
    Type:      variable-length null-terminated ASCII string
stimulus                 Group
    Location:  1:3240
    Links:     1
timestamps_reference_time Dataset {SCALAR}
    Location:  1:15400
    Links:     1
    Storage:   8 logical bytes, 16 allocated bytes, 50.00% utilization
    Type:      variable-length null-terminated ASCII string

I think PyNWB should by default populate it with the version of the standard it supports! otherwise there is no way to figure out what version of NWB a given file is.

@yarikoptic yarikoptic added the category: bug errors in the code or code behavior label Oct 8, 2019
@yarikoptic
Copy link
Contributor Author

oh, forgot to include detail that validation using pynwb also does not cause any error (here I am using `dandi ls` which merely prints output from `pynwb.validate`
/home/yoh/proj/dandi/trash > dandi validate testfile.nwb 
/home/yoh/deb/gits/pkg-exppsy/hdmf/src/hdmf/backends/hdf5/h5tools.py:99: UserWarning: No cached namespaces found in testfile.nwb
  warnings.warn(msg)
No validation errors among 1 files

yarikoptic added a commit to dandi/dandi-cli that referenced this issue Oct 8, 2019
I went for a shorter name in  ls  command since space is scarse.

Unfortunately I could not immediately figure out how to even populate
that field e.g. using NWBFile.  And PyNWB does not populate it
either ATM: NeurodataWithoutBorders/pynwb#1085

The only example I found quickly with nwb_version field populated was from
///crcns/ssc-7 dataset -- and in particular
data/L4E_whole_cell/Exp_2015-09-05_001_0001-0162.nwb
file there.

I really think we should come up with some good, and not that large collection
of sample .nwb files to test on.  Unfortunately any "real" .nwb file seems to be
quite large
@bendichter
Copy link
Contributor

The version is still populated. In NWB 1 it's a dataset and in 2 it's an attribute.

from datetime import datetime
from dateutil.tz import tzlocal
from pynwb import NWBFile, NWBHDF5IO
from h5py import File

nwb = NWBFile(session_description='session', identifier='1', session_start_time=datetime.now(tzlocal()))

with NWBHDF5IO('test.nwb', 'w') as io:
    io.write(nwb)

with File('test.nwb','r') as file:
    print(file.attrs['nwb_version'])
2.1.0

I don't know how to read this using pynwb though

@yarikoptic
Copy link
Contributor Author

xo xo @bendichter -- thank you! I will never trust h5ls now and will resort to h5dump! It even confirmed that those sample .nwb files you pointed to, do not have a stored version to match the version of the directory they are under, e.g.

/tmp/nwb_test_data > h5dump v2.0.1/test_timestamps_linking.nwb | grep -A10 nwb_version
   ATTRIBUTE "nwb_version" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "2.0b"
      }

but that would be a separate issue! Thanks again, I am tuning up dandi ls now to try both to gather the version

yarikoptic added a commit to dandi/dandi-cli that referenced this issue Oct 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: bug errors in the code or code behavior
Projects
None yet
Development

No branches or pull requests

2 participants