Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proper migration for "Organization" -> "Affiliation" change #276

Open
yarikoptic opened this issue Jan 22, 2025 · 2 comments
Open

Add proper migration for "Organization" -> "Affiliation" change #276

yarikoptic opened this issue Jan 22, 2025 · 2 comments
Assignees

Comments

@yarikoptic
Copy link
Member

In #266 (comment) @candleindark identified oddity in our metadata records, that Affiliation records include fields which are not part of the Affiliation model, e.g.

https://api.dandiarchive.org/api/dandisets/000029/versions/draft/info/ ATM has

                "affiliation": [
                    {
                        "name": "An Institution",
                        "roleName": [],
                        "schemaKey": "Affiliation",
                        "contactPoint": [],
                        "includeInCitation": false
                    }
                ],

after doing archeological metadata expedition we figured that it is 99% likely due to

where affiliations got their own Affiliation class. But migrate() function was not adjusted to filter them out somehow... but here we do not even need explicit migration since pydantic likely to do the right thing:

In [10]: Affiliation.model_construct(**{
    ...:                         "name": "An Institution",
    ...:                         "roleName": [],
    ...:                         "schemaKey": "Organization",
    ...:                         "contactPoint": [],
    ...:                         "includeInCitation": False
    ...:                                             }).model_dump()
Out[10]: 
{'id': None,
 'schemaKey': 'Organization',
 'identifier': None,
 'name': 'An Institution'}

and here is with the full

In [11]: Affiliation(**{
    ...:                         "name": "An Institution",
    ...:                         "roleName": [],
    ...:                         "contactPoint": [],
    ...:                         "includeInCitation": False
    ...:                                             }).model_dump()
Out[11]: 
{'id': None,
 'schemaKey': 'Affiliation',
 'identifier': None,
 'name': 'An Institution'}

so the hypothesis that absence of metadata migration on dandi-archive side, ref:

keeps old metadata versions present, and it is so:

dandi@drogon:/mnt/backup/dandi/dandisets$ grep -h schemaVersion */dandiset.yaml  | sort | uniq -c
      8 schemaVersion: 0.4.4
    139 schemaVersion: 0.6.0
     26 schemaVersion: 0.6.2
     85 schemaVersion: 0.6.3
    311 schemaVersion: 0.6.4
     12 schemaVersion: 0.6.6
    111 schemaVersion: 0.6.7
    109 schemaVersion: 0.6.8

which would forbid us to validate using more strict models such as the ones disallowing for extra fields, but also potentially simply having "bugs" due to migration not carried out at all.

On the side of dandi-schema I would like us to check what would happen if we .migrate() metadata records for dandisets -- would they succeed/fail and get rid of those irrelevant values.

@yarikoptic
Copy link
Member Author

ok, the migrate takes dict and returns dict, so we should "manually" code up removal of those extra fields from Affiliation records.

@candleindark
Copy link
Member

Per @yarikoptic suggestion, generate a basic table that go through all the public dandisets and print the validation statuses of both before and after the .migrate() function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants