Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http or https for canonical URIs of Getty, Wiki*, and others? #577

Open
beaudet opened this issue Mar 6, 2024 · 8 comments
Open

http or https for canonical URIs of Getty, Wiki*, and others? #577

beaudet opened this issue Mar 6, 2024 · 8 comments
Labels
API The issue is about an API or service discuss Discussion of this topic needed
Milestone

Comments

@beaudet
Copy link
Collaborator

beaudet commented Mar 6, 2024

We say here: https://linked.art/api/1.0/protocol/

that https is the preferred protocol for Linked Art implementations and presumably that also means for the canonical URIs of published entities.

Getty is still reporting http:// at the top of their concept pages, e.g.: http://vocab.getty.edu/aat/300311458

Will referencing the AAT with https:// create problems with linking up data sets due to the URI's scheme differences? I see that Yale is using http for both Getty and Wikidata. Wikimedia, by the way, might have officially switched theirs to https. So, I guess my question for those familiar with processing data sets from multiple institutions is whether this is a problem that is so common that it must be solved by any system consuming linked data or should implementations pay close attention to the examples given by the authorities when those examples are present? Or maybe so long as an http -> https redirect is in place at the authority, either will work?

@azaroth42
Copy link
Collaborator

Yes to Linked Art implementations for https if at all possible. At this point in the evolution of the web, I think it's borderline irresponsible to not use HTTPS.

The URI (as opposed to URL) of existing instances and ontological terms however is important to be consistent, otherwise the graph doesn't connect properly and applications relying on the exact URI don't process things as expected.

The namespaces for instances we might refer to that I believe are correct, and please correct if not:

The issue that would come up is if you dereference an http URI in a browser via XHR/fetch from within an environment that is served via HTTPS, you get the mixed active content error. This comes up in IIIF relatively often when some organizations serve content via HTTP and others via HTTPS, and then Mirador via HTTPS won't load the HTTP manifest.

Docs from Mozilla on mixed content are very good: https://developer.mozilla.org/en-US/docs/Web/Security/Mixed_content

@azaroth42 azaroth42 added API The issue is about an API or service discuss Discussion of this topic needed labels Mar 6, 2024
@beaudet
Copy link
Collaborator Author

beaudet commented Mar 6, 2024

I think wikidata recommends https but I think that's from a security perspective, not necessarily a change in the scheme of canonical URIs.

It looks like RDK's permalink's use https

https://rkd.nl/artists/10024

@azaroth42
Copy link
Collaborator

Wikidata asserts their namespace as:

@prefix wd: <http://www.wikidata.org/entity/> .

in the RDF serializations.

e.g. (and I don't recommend this as it's LONG ... don't say I didn't warn you ...)

curl -L -H "Accept: text/n3" http://www.wikidata.org/entity/Q42

(You can | head -35 to grab the prefixes off the top)

@edwardanderson
Copy link
Collaborator

edwardanderson commented Mar 7, 2024

It seems like the "Semantic View" URIs for AAT resources discovered through search has recently (?) switched from http to https, but these then point at the http one.

@azaroth42
Copy link
Collaborator

@edwardanderson True, but in the representation it's still http. Eg: https://vocab.getty.edu/aat/300194222.jsonld
The subject of this representation served via https is http://vocab.getty.edu/aat/300194222

It's a pervasive and super frustrating issue :( And compounded by people using the human view rather than the canonical URI in the data (e.g. aat/page/300194222 instead of aat/300194222)

In terms of "solve at the right level" this is a data concern, and we could create some helpful tooling around it to fix URIs in bad data (e.g. consider: https://github.com/project-lux/data-pipeline/blob/main/pipeline/config.py#L168-L234 ) and document what we expect ... but data is what it is, and all we can do is whack the moles when they show up.

@workergnome
Copy link
Contributor

And here at Getty we're aware of this, and as we work to improve the Vocabs over the next year or so we're trying to think of what we can do to solve the UX issues here—as Rob said, the conflict between Cool URLs and changes in browser/internet security practices over the past decade are a thorny issue.

@azaroth42
Copy link
Collaborator

@workergnome Let's chat next week -- and we could have that chat in public (again) too if you want, given the topic of the meeting :)

@beaudet
Copy link
Collaborator Author

beaudet commented Mar 7, 2024

sounds informative. where do I get tickets?

@azaroth42 azaroth42 added this to the Questions milestone Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API The issue is about an API or service discuss Discussion of this topic needed
Projects
None yet
Development

No branches or pull requests

4 participants