-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
timezone-less NWB files are failing DANDI-side validation #1944
Comments
hm, not yet sure if that was a good step forward. @bendichter you said
was there some discussion of concern and/or why you would think they were incorrect? so far I seems to spot only technical considerations/problems which could have possibly inspired such a change. |
@yarikoptic see discussion here: NeurodataWithoutBorders/nwb-schema#321 |
Ben can you test with current validate function in dandischema. The server
essentially uses this. In the original discussion I only checked the
snippet that was generated.
…On Thu, May 23, 2024, 2:23 PM Yaroslav Halchenko ***@***.***> wrote:
hm, not yet sure if that was a good step forward. @bendichter
<https://github.com/bendichter> you said
, but were concerned that this was often incorrect.
was there some discussion of concern and/or why you would think they were
incorrect? so far I seems to spot only technical considerations/problems
which could have possibly inspired such a change.
—
Reply to this email directly, view it on GitHub
<#1944 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABM574CJAJYCT2XEJUFGCTZDYX2BAVCNFSM6AAAAABIGE2SKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRXG44DGNZVHA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
i downloaded this example and ran it through dandischema validate and it does not error. so we need to check if versions are matched between the archive and the schema. |
@satra I can pull the metadata down using the API into an client = DandiAPIClient(api_url="https://api-staging.dandiarchive.org/api")
metadata = client.get_asset("d5c1c760-2d7e-4a52-bbf0-52225e88cc3c").get_metadata()
type(metadata)
|
The from typing import Optional
from datetime import datetime
from pydantic import BaseModel
class A(BaseModel):
t: Optional[datetime]
a = A(t='2024-05-06T10:56:41') # no error from here |
ok, thank you! I think it would still be great may be at the level of nwb inspector now to warn on times without timezone and possibly loosing ability to align multiple data files/points. |
We agree, already a WIP PR about that (awaiting some final debugging): NeurodataWithoutBorders/nwbinspector#458 |
Summary so far from my looking at the problem - it is due to jsonschema formatting check requiring date-time to be standard compliant RFC3339 and thus with time zone. More info etc |
It would be nice to fix this upstream in pydantic, but until then, we could create our own special datetime type that uses a regex pattern in the JSON schema, rather than the built-in RFC 3339 date-time validator, to validate a datetime that may or may not have an offset/timezone. We could decide on whatever pattern is best here - ISO 8601, or RFC 3339 where offset is optional, or something else. From what I can gather, there is no good or consensus regex for ISO 8601. Code seems to be preferred. But I think we are limited to using regex within JSON schema. The below code uses one regex for ISO 8601 that I found on the internet. regex = r"(\d{4}-[01]\d-[0-3]\dT[0-2]\d:[0-5]\d:[0-5]\d\.\d+)|(\d{4}-[01]\d-[0-3]\dT[0-2]\d:[0-5]\d:[0-5]\d)|(\d{4}-[01]\d-[0-3]\dT[0-2]\d:[0-5]\d)"
from pydantic import WithJsonSchema, BaseModel
from typing import Annotated
import datetime
BetterDatetime = Annotated[
datetime.datetime,
WithJsonSchema({"pattern": regex, "type": "string"}, mode="serialization"),
WithJsonSchema({"pattern": regex, "type": "string"}, mode="validation")
]
class Model(BaseModel):
dt: BetterDatetime
schema = Model.model_json_schema()
schema
# {'properties': {'dt': {'pattern': '(\\d{4}-[01]\\d-[0-3]\\dT[0-2]\\d:[0-5]\\d:[0-5]\\d\\.\\d+)|(\\d{4}-[01]\\d-[0-3]\\dT[0-2]\\d:[0-5]\\d:[0-5]\\d)|(\\d{4}-[01]\\d-[0-3]\\dT[0-2]\\d:[0-5]\\d)', 'title': 'Dt', 'type': 'string'}}, 'required': ['dt'], 'title': 'Model', 'type': 'object'}
m1 = Model(dt="2032-04-23T10:20:30")
j1 = m1.model_dump_json()
j1
# '{"dt":"2032-04-23T10:20:30"}'
m2= Model(dt="2032-04-23T10:20:30+02:30")
j2 = m2.model_dump_json()
j2
# '{"dt":"2032-04-23T10:20:30+02:30"}'
import json
import jsonschema
jsonschema.validate(schema, json.loads(j1)) # success
jsonschema.validate(schema, json.loads(j2)) # success |
I think it is a good approach. This allows custom JSON schema for a particular set of BetterDatetime = Annotated[
datetime.datetime,
WithJsonSchema({"pattern": regex, "type": "string"}),
] |
We have a server-side issue with validation of NWB files. Until this is resolved, most NWB files submitted to DANDI will be unpublishable.
NWB recently removed the requirement for NWB session_start_time to have a timezone (PR). We previously had been defaulting to assigning the timezone of where the conversion was run, but were concerned that this was often incorrect. It’s also very unusual for recording systems or any other metadata to include the timezone, so the timezone requirement made it challenging to automate conversion.
The necessary changes have been made to the software, and these new timezone-less files propagate just fine through NWB GUIDE, NeuroConv, PyNWB, HDMF, NWB Inspector, and the dandi CLI. Creation, validation, and upload work well.
However, these new files are failing validation on DANDI server side, which means it is currently impossible to publish these files. See here for an example.
Our preferred solution would be to relax the validation on DANDI server side to allow for NWB files that have datetime that do not have timezones. We're not sure where this line would be. The schema just lists the requirement as
datetime
, and it's not clear how that is meant to be serealized/deserialized and if that requires a timezone to be present.https://github.com/dandi/dandi-schema/blob/e135307e38e8fccf83676cf1d6faba9b142a8de0/dandischema/models.py#L1186
The text was updated successfully, but these errors were encountered: