-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extending the spec to include content creation/modification #140
Comments
First, let me say it pleases me so much to read such a thing. I think this one will need more time than the current fixes / small improvement we draft. It was originally thought about when we started working on DTS but we decided that the first release should focus on delivery rather than both ingestion + delivery. I think it would be nice to start thinking about it, leaving space for it for it to not be mandatory being an absolute necessity (for reasons that might be too obvious but, basically, not everyone has the ability to handle this kind of input flow or want to, while they might still want to serve). I think any draft will be welcome :). On the side, pure curiosity, can we know where the paper will be given ? :) |
For epigraphy.info as is being currently planned, this would be very useful. As some of you know the plan since the Heidelberg meeting was exactly to rely on DTS fully and this would need that the API specification at least includes POST. At the moment (please don't faint...) my demo applications for that use an intermediate bespoke non standard model which is posted and contains links to GET the data which has changed from the DTS API. |
Sure, I'm happy to share the draft once I've got an initial version pulled together. Is this the best forum for doing that and discussing it? |
The paper is being delivered (with Ken Penner) at the Society of Biblical Literature meeting in Denver this November, in the Humanities Computing section. The proposed DTS extensions are part of a broader interest in moving toward more modular and interoperable editing/publishing tools. Any feedback I can get before then would be fantastic. |
(Own opinion) I think the best is to do it with some kind of personal repository which could be a clone of this repository, and make a new page about "Extending services with publishing options" or some kind of way better title than this one ? |
Yes, I could set up a cloned repo. In the meantime, since there seems to be some interest, I'd be glad for input here on two basic questions:
Thanks for any input you can offer. |
There will be potentially huge complexities involved here. Not a reason to avoid doing it, just a warning from someone with a few scars.
|
I would definitely up the used of JSON-LD and look at the
And obviously, we need to at least accept TEI on the update/put front. But this one feels particularly hard to specify without stepping in project boundaries : we'll need to agree on error codes like "InvalideSchema" for service providers to explain why they did not accept the content. |
Thanks for the input so far. I'm going to go ahead and fork this repository, adding some proposed extensions as the basis for further discussion. Just two follow-up questions for the moment.
|
I will reply quickly to 1., a little about 2, and will need to think a bit more about 3.
I am not sure I am clear, so feel free to tell me, English is definitely not my first language :) |
Thanks @PonteIneptique. No worry about language. If I were trying to write this in French it would take me all day! I like the idea of having the collection object returned with modifications. I may have been unclear in what I said about the TEI header. I didn't mean that it was designed to be the main storage medium for every purpose. As you say, an application will often want to store some kinds of metadata in other places as well. I just meant that the TEI header is explicitly designed to be able to organize extremely rich and varied information about a document. And the TEI spec seems to encourage that wherever else metadata is stored it is also stored in the TEI header. I'm uncomfortable in principle with an API that lets me update the same information about a document through two different endpoints (document and collection). That seems to me to break the semantics of the endpoint. I don't think the API should have any opinion on how or where metadata is stored. It should just provide a semantically rational endpoint for sending the data. It doesn't make sense to me to send "creator" data to one endpoint if it's in JSON and a different endpoint if it's in xml. Similarly, I'm not comfortable with using one endpoint to update "creator" information and a different endpoint to update information on (say) a document's normalization scheme or orthography. Both are metadata, and so it seems to me that both should go to the same endpoint. I also think we should distinguish generally between the format of a request payload and the internal storage mechanisms of the application. Right now the document endpoint returns TEI xml, but that doesn't mean a project is storing the document as xml. The api just requires that output because it's standard. Similarly, I think we should choose a payload format for modifying metadata based on what format is (a) standard, and (b) semantically rich enough. It's then up to the project to decide how they want to represent and store that data internally. I hope this is a bit clearer. I'll look forward to hearing your thoughts on my 3. Basically, I want to be able to modify metadata that is as rich as the TEI syntax allows. To the extent that DC supports some of those semantics, I'm all for sending it as a modified JSON-LD object. But where DC doesn't support the TEI semantics, I think we need to find a way to support it. Since xml is already an exchange format, and you're already using xml in some responses, it would make sense to me in the short run to support those semantics by allowing xml upload as well. The user could then decide whether to send basic metadata using DC or richer metadata using ETI xml. I hope my reasoning is a bit clearer now. |
Sorry for the wordiness of my replies, too. Can you tell that I'm on sabbatical right now? |
Sorry for the delayed answer.
To that, I'd also add that would definitely feel weird as a client to provide different-than-the-output input mimetype. Going to
In an ideal world, probably. Unfortunately, some projects do not store rich metadata in TEI, some do. Mostly, what we decided is that the metadata endpoint should be different from the document endpoint in output type, also because
That's also an issue though, isn't it ? TEI is so flexible in how it handles its metadata that there is a metadata scheme per project, when it's not more. If some metadata are too rich for the output format of
I just want to add again that in
The question still stands though : how should we translate TEI input into LD+JSON that is the output default format of Is my point of view clearer ? |
I agree with Thibault. While I could see the usefulness of a “single source” implementation, where you could upload a TEI doc and it would automatically get a record in the collection endpoint, I would definitely not want to extract and attempt to represent all of the TEIHeader metadata there. That way lies madness. We might end up issuing recommendations for providers of TEI documents on ways to represent information so that DTS can leverage it, though. |
Thanks @PonteIneptique and @hcayless. Your responses are helpful and I've got some thinking to do. What I'm struggling with as I read what you're saying is that it seems like we need an extended API to serve a very common use-case in text-editing: recording extended information about the text, its transcription, related publications, etc. I don't want to simply leave all of this up to each implementation to do ad hoc, because I'm envisioning tools that are highly inter-operable. So I don't want to just abandon the use-case. It sounds like there are a couple of principles guiding both of your responses:
What about adding another endpoint for extended document data? Something like "docinfo." Semantically it makes sense to me if we distinguish rich background information from the basic metadata listed in a collection catalogue (i.e., the collection endpoint). It also seems to me that the data format(s) we allow in sending and fetching that broader information might be different (JSON-LD?) than what we want to use for sections of document text (i.e., the document endpoint). If we add a "docinfo" endpoint, then would it make more sense to allow it to accept TEI header fragments? Or maybe allow it to return and accept either TEI header xml or JSON-LD? By the way, @PonteIneptique, when you mention using other namespaces, are you suggesting that such namespaces already exist or that we create one? I think some of the TEI-header semantics should translate fairly easily to JSON. So if we had a "docinfo" endpoint I could look at starting to build such a namespace. But, again, if it already partially exists somewhere (beyond DC) I definitely don't want to reinvent the wheel. |
For clarification: I'm thinking that a "docinfo" endpoint could return either a full TEI header or JSON-LD data, based on a url parameter in the request. Then the user could edit and return the same object, whether it's the TEI header or the JSON-LD object. |
I think I would be more inclined to add selected information from the |
I just submitted a pull request (to dev branch) for a draft of the expanded Documents endpoint documentation. I'm still integrating my separate document into the Collections-Endpoint.md file. So I'll submit a second pull request when that's done. |
Okay, I merged the (revised) pull request to the dev branch today. |
I've finally made the lingering fix to the Link headers in the revised and expanded Document endpoint docs. The issue was that hydra requires every response to include the URL of the json-ld api documentation in the Link header. So I've added this to the link header of every response in the docs: </dts/api/document/documentation>; rel="apiDocumentation" I'll be committing the updated version shortly and then I'll submit a pull request against the dev branch. Before we can merge that with master I'm going to have to go through and more-or-less manually merge dev with the changes to master made since June. (Github won't automerge them.) I'll work on getting that ready for final approval (with a pull request against master) for the next committee meeting. |
Oh, quick question before I make the PR: Hydra includes a properly namespaced term to use in the "rel" value for those Link headers: http://www.w3.org/ns/hydra/core#apiDocumentation. I'm assuming that we should be using that full URL in that Link header: </dts/api/document/documentation>; rel="http://www.w3.org/ns/hydra/core#apiDocumentation" |
I'm wondering whether there's interest in the DTS group in the idea of extending the API spec to provide for creating and modifying content, in addition to fetching it. The context for this is that I'm part of a project looking at how we might create standard APIs to allow better modularity, reusability, and interoperability for TEI editing tools. It strikes me that some of the DTS endpoints could naturally be exposed for POST, PUT, PATCH, and DELETE requests. That would fairly naturally allow the API to cover a much broader range of use cases. I'm actually going to be drafting a proposal for that extension for a paper I'm giving in November.
The text was updated successfully, but these errors were encountered: