-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How Metadata Works in the Publishing World #57
Comments
This is generally why its a bad idea for publication specifications to dive so deeply into metadata vocabularies. We wanted to provide a framework for metadata expressions with EPUB 3, but got sucked into the metadata vortex of despair by introducing some "essential" metadata that didn't seem to exist but that looked like the non-ONIX folks would need. That's led to EPUB being looked at as having to define the essential metadata, when metadata expressions really should be figured out at the publishing/publication level. I opened w3c/wpub#429 in part because I see a lot of the same happening here. The more we layer in the more it looks like what we exclude doesn't matter, and that leads to more requests to include things. Plus the more we recommend for certain areas of publishing the more annoying we make metadata for others. The manifest is somewhat unencumbered now that most metadata is optional, but we still define a whole lot of concepts that really aren't essential to user agents (dates, etc.). During 3.1, we started to look at defining prescriptive metadata guidelines for publishers using alternative means, like best practices documents. Call me old fashioned, but it still strikes me as the best balance. Let each community define what it wants to express and how it wants to express it outside the specification. |
Well... we should be careful. The presence of, e.g., ONIX is clearly important for trade publishing. But we also know from our discussions that, at this moment, Publication Manifests will not be considered by trade publishing for some times which will keep to EPUB 3.x. What is the situation in other areas? E.g., the little I know about scholarly, where (at least for journals) the "publication" content is not dominated by packaging, which put things in a very different perspective. |
Right, I'm not suggesting we pick a side in that debate. I just look at the publication manifest and I see a "format" that is itself also one big metadata expression language so we're already in an ideal scenario. We don't need to turn the specification into a rehash of schema.org or dcterms or any other scheme, as these are already accommodated. I just feel we're better off staying out of the metadata sphere as much as we can. If we can't pin a property to something the user agent needs for some specific purpose, then the property probably doesn't belong in the specification. We need to find a way to empower the publishing communities (hint, hint) to work out the details of what metadata belongs in the manifest when an ONIX record isn't the primary source of that descriptive detail, and for these communities to publish notes or guides for each relevant publishing realm. |
It strikes me since I came in this publishing domain that EPUB metadata lack a clear use case: who are these metadata made for? And the question is now the same for publication manifest metadata. IMO they should be made for end users, helping users classify and find publications they have "acquired" and are present on their large "bookshelf" or "personal library". This is not discovery / commercial data to be used by booksellers (ONIX is made for that). This is not classification data to be used by academic libraries (MARC is good for that). Once the use case is clear, we can decide which metadata are useful and which are not so. |
As a note, for possible new versions of the specifications: comparing the metadata available in EPUB and those available in the pub-manifest, I noticed the lack of some information, which is used in real use cases. These are:
|
@gregoriopellegrino the only metadata in your list I don't see the use for end-users (readers) is the rights information: if a user has a publication in his hands, what is the use of rights information for him? would it contain things like "you, reader, have the right to do this, but do not have the right to do that, with the publication you have acquired"? |
In EPUBs was used as copyright information "© 2019 Publisher" |
@gregoriopellegrino if a consensus is found around a copyright notice (I would support it), then a "copyrightNotice" property would be more interesting than a "rights" property then. We can have a look at the news industry, where a copyrightNotice property is defined (https://www.iptc.org/std/NewsML-G2/guidelines/#copyright-notice) as a child of a bigger "rightsInfo" structure (https://www.iptc.org/std/NewsML-G2/guidelines/#rights-metadata). schema.org has another way: copyrightHolder + copyrightYear. In case of consensus around the concept, we'll have to choose our way. |
This issue was discussed in a meeting.
View the transcriptissue 57, metadata in the publishing worldWendy Reid: #57 Wendy Reid: dave raised an issue about how metadata works in the publishing world Dave Cramer: Everybody knows I worry about a lot of things, our experience with EPUB has been spent in metadata rabbit holes, new vocabularies … everyone has a property that is important to them, we spend a lot of effort … metadata is not always exposed to the reader, and it travels separately from the EPUB itself Charles LaPierre: except VitalSource is starting to expose the Accessibility Metadata Dave Cramer: I raised this so we could be thoughtful about the metadata we expose Ben Schroeter: to add to that … we do supply a11y metadata in the epub that is used by distributors Charles LaPierre: I’m on the a11y metadata thing … vitalsource is exposing EPUB a11y metadata to users Wendy Reid: that metadata is in ONIX, too? Charles LaPierre: some of it; it’s not a 1:1 mapping; there’s more in ONIX 3 … but it’s not in ONIX 2.1, which is still widely used in US Brady Duga: what Dave said is true for publisher-supplied ebooks, but not so much from user supplied epub … but the metadata that matters is mostly author and title Ivan Herman: I’ve said several times, if this manifest exercise becomes successful, it may not be in the worlds where EPUB is already successful … we should not be bound by EPUB or ONIX Laurent Le Meur: in the thorium reader we try to present metadata in the OPDS feed or in EPUB … but there is no consistent set of user-oriented metadata … but we would like to get it right… publisher, language, category, subject, narrator… … all would be useful Matt Garrish: there’s room to do metadata standardization outside of the standard itself … rather than putting every metadata scheme in the spec itself, leave it to the communities … there should be some core stuff … it’s probably more efficient to do things outside Bill Kasdorf: will there be a generic way to incorporate community-specific metadata? … so scholarly publishers can include what they think is essential? Matt Garrish: that’s exactly how it would work and how it is set up right now … we’re a proxy for schema.org; we can use anything there without having it directly in our spec … and our context files include more prefixes … we are very flexible Avneesh Singh: +1 Matt Matt Garrish: there should be a clear purpose to list metadata in the core spec Bill Kasdorf: I am anti-bloat Gregorio Pellegrino: I understand what Matt says but some metadata is essential, like description … we should suggest to use some metadata, because otherwise reading systems won’t implement Laurent Le Meur: I agree that in the core spec maybe we don’t need an extensive set … communities can define their own community … who is the community? Is the audiobook community literally this working group? … but we need something defined somewhere Ivan Herman: one of the reasons we took JSON-LD is because schema.org used it … but JSON-LD is ideally suited for this… you can just add things and it’s OK. … laurent is right; for different areas there should be communities defining metadatas … I don’t know if there is additional metadata required by audiobooks, if so let’s add it … it depends… a CG might be able to define some of these things … the main goal is to provide a framework Dave Cramer: I’m not opposed to metadata, we seem to think that embedding metadata is always good but past experience shows this data is rarely used, I’m aware of few reading systems that use title and author but we’ve made many EPUBs using copyright statements Avneesh Singh: let this spec go to CR as is Ivan Herman: See example to link external metadata (ONIX in this case) Avneesh Singh: and we have the audiobooks spec; we need to know what audio publishers want Ivan Herman: +1 to Avneesh Avneesh Singh: and we can do a note or registry with metadata Gregorio Pellegrino: I agree with avneesh … the possibility to define the role of contributor in schema.org are very poor … we need ways to add that Laurent Le Meur: adding metadata can be done step-by-step … we need a group that can host these needs … which group is it? This WG working on audiobooks? … then we can wait for needs from publishers of audiobooks Matt Garrish: to what gregorio said, we can request that schema.org add stuff that’s missing Ivan Herman: +1 to matt Gregorio Pellegrino: +1 to matt Matt Garrish: it never ends well when we add metadata to our own standards Ivan Herman: a partial answer to laurent … there are two issues. one is, who are the groups that develop metadata? I don’t think there is one answer. … two: how do you find the metadata that has already been developed? … we may need a registry Bill Kasdorf: in some sectors of publishing there are organizations that govern metadata … IPTC, JATS, etc … as we reach out to other sectors, we will find there are already metadata standards Wendy Reid: this is not a question for us to solve today … we have sufficient metadata in our specs for now. We’ll see how it goes in CR. … so let’s move on Ivan Herman: is it OK if we close this issue? Dave Cramer: refresh your github Gregorio Pellegrino: if we close the issue, how can we say we are thinking about this? Gregorio Pellegrino: Fine Ivan Herman: we could say it’s deferred Wendy Reid: #98 Gregorio Pellegrino: Defer |
JFI we have practical case when something like Schema.org very unclear about start year as: EPUB license manifest template stored on license server managed by Readium platform covers this case, its possible to check on server as well as it defining time cap for any bundled EUL manifest file. Internally to describe license status resolution we are using model very close to But TransferAction Schema.org concept seems to be good and compatible equivalent. Maybe it worth considering as possible recommended description practice and i may be answer to the questions where to put all this parties, dates, rights and so on. Its really hard to define perfect couple of fields for this needs. CreativeWork itself not that good for this. Legal side of business knowledge usually about linkage forming legal status of all entities in their sum with temporal bindings, not about one thing with limitless properties. From my side i see a coming up with idea to add some more aspects there as a sanity crime, So i see some sense in using existing part of model related to business activity being linked with work as it supposed, And preserve only place for EULA and information about responsible parties/authorities (possible to use on practice like reg num/contact info) for the case of transmission or bundling. The other thing that happened to be very important for our digital publishing activity is a public domain status of work. And (ongoing/past) date of this status transfer. But this is a question as complex as the whole current topic. |
In much of the EPUB world, the metadata that matters is not inside the EPUB, but outside (in the form of ONIX). The metadata inside EPUBs is often wrong, is difficult to change, and there is very little incentive to make it accurate since it's mostly unused.
In the web world, page metadata directly affects search ranking, Google rich snippets, etc. There is no out-of-band transmission of metadata. There is strong incentive to make it accurate.
How do we avoid the situation with EPUB, where we've spent decades worrying about metadata, continually changing how it's expressed, without really benefiting users?
The text was updated successfully, but these errors were encountered: