Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Lookup Fails for doi:10.5063/F1BV7DV0 #397

Open
ThomasThelen opened this issue Feb 20, 2020 · 7 comments
Open

Data Lookup Fails for doi:10.5063/F1BV7DV0 #397

ThomasThelen opened this issue Feb 20, 2020 · 7 comments
Labels

Comments

@ThomasThelen
Copy link
Member

doi:10.5063/F1BV7DV0 is a DataONE doi that points to the link at the bottom of this issue. When attempting to register, we get a file that points us to the package landing page.

To reproduce:

  1. Navigate to the Manage page
  2. Attempt to register the dataset with doi doi:10.5063/F1BV7DV0
  3. Note that you get a file instead of a folder

https://knb.ecoinformatics.org/view/doi:10.5063/F1BV7DV0

@ThomasThelen ThomasThelen self-assigned this Feb 20, 2020
@Xarthisius
Copy link
Collaborator

To reproduce:

#!/usr/bin/env girder-shell
# -*- coding: utf-8 -*-

from girder.plugins.wholetale.lib.dataone.provider import DataOneImportProvider
from girder.plugins.wholetale.lib.entity import Entity
from girder.plugins.wholetale.lib.dataone import DataONELocations

uri = "https://knb.ecoinformatics.org/view/doi:10.5063/F1BV7DV0"
entity = Entity(uri, None)
entity["base_url"] = DataONELocations.prod_cn
data_map = DataOneImportProvider().lookup(entity)

yields:

Traceback (most recent call last):
  File "<string>", line 11, in <module>
  File "server/lib/dataone/provider.py", line 43, in lookup
    dm = D1_lookup(entity.getValue(), entity['base_url'])
  File "server/lib/dataone/register.py", line 244, in D1_lookup
    package_pid = get_package_pid(path, base_url)
  File "server/lib/dataone/register.py", line 210, in get_package_pid
    pid = find_resource_pid(initial_pid, base_url)
  File "server/lib/dataone/register.py", line 114, in find_resource_pid
    base_url=base_url)
  File "server/lib/dataone/register.py", line 156, in find_nonobsolete_resmaps
    raise RestException('No results were found for identifier(s): {}.'.format(", ".join(pids)))
girder.exceptions.RestException: No results were found for identifier(s): resource_map_doi:10.5063/F1BV7DV0, urn:uuid:73cb0fbb-2ff2-452d-bc7b-d946968d1aad.

@ThomasThelen
Copy link
Member Author

@ThomasThelen
Copy link
Member Author

Here's the full story of what's happening. The DOI that I was attempting to register was obsoleted by a newer version. When packages have newer versions, we attempt to register the latest version. But. The latest version has a resource map that's private, which we can't deal with (because it's private).

From @mbjones, we shouldn't be trying to register the latest version and instead registering the one that was entered into the field.

The behaviour that was encountered here shouldn't happen after we remove the logic for locating new versions.

@Xarthisius
Copy link
Collaborator

Xarthisius commented Feb 20, 2020

We will have to take care of the names in that case. For Zenodo we prepend the version number to the name of the dataset. See e.g.:

https://girder.stage.wholetale.org/#collection/596793f2ebde2c0001b03dbe/folder/5e3879ff8bec16c2663dd119

Is it possible to get info about number of preceding versions in DataONE, or is it just a linked list of "ObsoletedBy" and "Replaces"?

@mbjones
Copy link

mbjones commented Feb 20, 2020

Right now its just the list of obsoletes/obsoletedBy adjacency pairs. But we were just talking today about the need to be able to provide full version chain metadata through our API without a client having to walk the chain. If that is critical to your implementation, we should discuss what you would need so we can incorporate it into the API. Note, of course, that we don't use a simple serial version numbers.

@ThomasThelen
Copy link
Member Author

ThomasThelen commented Feb 20, 2020

It's a doubly linked list with obsoletedBy and obsoletes in the system metadata.

Example:

Initial package which has its system metadata here
is obsoleted by
this package which has a system metadata doc here
which is in turn obsoleted by this package with this system metadata document.

Looking at the system metadata document of the first package, which has two new versions, we can see it only point to the one above.
<obsoletedBy>urn:uuid:cd77eb64-a9bf-4989-af6a-15d3f981188c</obsoletedBy>

We can ask SOLR for the pid that obsoletes another pid. If SOLR gives us a result, we would run that same query on the new pid to see if it's obsoleted, and repeat. It's O(n), but long obsoletion chains aren't common. If we walk the obsoleteness chain backwards (looking for things that were obsoletedBy our package) we can get a count, which could be a version number

@Xarthisius
Copy link
Collaborator

Appending some sort of a unique identifier to denote a version is purely a user facing change. On the backend we match our internal uuid with an external uuid, so it easy to detect that dataset A and dataset B differ, even though they have the same name. On the other hand, users only see the Catalog with the names and have to somehow know which one they want to pick.

While the "version" doesn't have to be a number corresponding to the position in the chain, I don't think we can use something as complicated as a urn:uuid.

There are only two requirements: it has to be unique and "pretty" :)

@ThomasThelen ThomasThelen removed their assignment Sep 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants