-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dandiset.get_metadata
not working for some dandisets
#1205
Comments
dandiset.get_metadata
not working for all dandisets
ah - right, now I remembered it, thank you @bendichter and I searched its previous instantiation in #1181 which is a twin of #1189 ;) So the issue here is pretty much on what is our "guarantee" here while returning metadata -- is it valid according to the schema or not. We have closed two prior instances without discussing, but may be let's discuss here. What would you Ben prefer it to behave like? May be if we somehow inform user that if they are interested in possibly invalid metadata, they could use |
for the specific issue, we can add a keyword arg to of course a user can do that too. the bigger question is around what should it return. in this particular case |
@satra that would make the query code more convoluted than it already is because we'd have to handle both cases. Also, if I understand correctly it would prevent users from editing metadata using the
|
i don't see why this would be the case. it's still a for 1, there is already a field that reflects validity of a dandiset: @jwodder - is the dandiset version i just checked 2, and no published dandiset is invalid under the current schema. from pydantic import ValidationError
for idx, dandiset in enumerate(dandisets):
if dandiset.version.identifier == 'draft':
continue
try:
metadata = dandiset.get_metadata()
except ValidationError as e:
print(idx, e) |
That endpoint is exposed via |
@satra I'd like to homogenize the dandisets to facilitate reasonably terse and simple query code. Ideally users would not have to worry about cases where metadata does not match our schema. Your code would not allow the search to include dandisets that fail validation. It is true that many of these are empty or test datasets of little value, but there are also a substantial number that are high-value contributions including from the Allen Institute and others. I'd prefer to repair these if possible- at least the large ones. Also, if I understand your code correctly you are skipping all non-published dandisets. That's fine as a user option but I would not expect everyone to want to do that. I went through all the validation errors for existing dandisets. Many are empty dandisets. Most validation errors are a missing license. Some are contributor validation error. I would really prefer if these contributor validation errors were caught in the metadata editor form. Let's use these to try to patch the UI to prevent users from creating invalid metadata. 18, 24, 30, 31, 32, 33, 38, 42, 46, 47, 63, 71, 72, 106, 112, 113, 114, 116, 120, 124, 131, 132 (and probably more, I started skipping them): missing a license but is also empty. 32 does not even have an owner. I'm not sure how that happened. It's not too hard to skip these with something like: if dandiset.version.size == 0:
continue; invalid metadata |
dandiset.get_metadata
not working for all dandisetsdandiset.get_metadata
not working for some dandisets
my code was simply to show that published dandisets don't have validation errors, nothing beyond that. regarding the issues. it's going to be two fold:
for the general issue, we will not necessarily have valid draft dandisets, for whatever reason. hence i don't think any code can assume it to be valid at that stage. we can keep improving all kinds of interfaces, but we have not imposed a zero tolerance model for all aspects of validity. we have only done so for triggering publication. also these are the metadata issues, once we trigger layout validation, several of these dandisets will be invalid as well. we need to add that to the validation services. (@yarikoptic - this was done outside of dandi schema and also related to web-based validation, rather than local validation). |
ah ok I see.
It makes sense to me why we would allow for improper metadata for old datasets that were created before certain rules were in place. It seems to me the best approach there is to work with these groups to update the metadata, which ideally would be a simple task. For new dandisets, I don't see why we would allow the creation of metadata that does not follow our schema. Why aren't we properly validating this data in the metadata editor form? In my opinion, every way a user is able to use the web UI to create improper metadata is a bug in the metadata editor form validation. I also think this is true for the API, but I feel a bit less strongly about that. In the current state, there is not enough enforcement on the metadata to facilitate simple structured queries, particularly if you are requiring metadata to fit the schema perfectly to even build the metadata object I am meant to be querying. I can run queries on the |
NB I think the discussion is great but have potential to derail. Hence for
filed dandi/dandi-schema#157 so we better continue on that aspect there. |
The text was updated successfully, but these errors were encountered: