Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

titles handling #14

Open
7 tasks
ilovan opened this issue Mar 18, 2024 · 1 comment
Open
7 tasks

titles handling #14

ilovan opened this issue Mar 18, 2024 · 1 comment

Comments

@ilovan
Copy link
Contributor

ilovan commented Mar 18, 2024

  1. simplest scenario: only one titleInfo per level (main or related item), with no qualifying attributes and no subtitle element (by my calculation - 94,123 titles):
    //*[local-name()="relatedItem"]/*[local-name()="titleInfo" and not(@*)]/*[local-name()="title" and not(@*) and not(following-sibling::*[local-name()='subTitle'])]
    (check for length - if over 252 characters - truncate and add full_title corresponding field)

  2. slightly more complicated: only one titleInfo per level (main or related item), with no qualifying attributes and a subTitle child (by my calculation - 3,748 titles)
    //*[local-name()="titleInfo" and not(@*)]/*[local-name()="title" and not(@*) and following-sibling::*[local-name()='subTitle' and text()]]
    (concatenate title and subtitle, separate by a dot, and then calculate total length and truncate / add full title field as discussed above)

  3. even more complicated - multiple titleInfo per level
    there are a total of 5 objects in CWRC that contain a titleInfo element that is preceded by another titleInfo sibling but doesn’t have a type attribute. (//*[local-name()='titleInfo' and preceding-sibling::*[local-name()="titleInfo"] and not(@type)])
    These, along with the titles that are typed ‘alternative’ (//*[local-name()='titleInfo' and @type='alternative']) should go in the corresponding alternative title field. About 4 alternative titles have subtitles as well, so those should be concatenated like all the other title/subtitle pairs -no need to test for # of characters since it’s not the main title field and can exceed 253
    There are also 1560 instances of @type='abbreviated' , which should also be mapped to an alternative title field.

  4. titleInfo with nonSort children (2,595 objects): concatenate the nonSort content with the title content no need to fiddle with capitalization, as for the title values I have seen, the capitalization is consistent with the title language conventions. count length and truncate if need

  5. 620 descendants of titleInfo are enclosed in TEI elements - @ilovan to add a "Display title" field with full HTML formatting and provide mappings for TEI elements.

To Dos:

  • check if we can have one or more alternative titles.
  • handle modCollection better
  • handle nonSort (concat with title)
  • check titleInfor types to see if all handled properly (some are in place but I'm not certain of coverage)
  • check usage and type attributes
  • check relatedItem containing descendant relatedItem
  • general checks to verify the current work (see test_column_title.xquery - currently filtering on orlando namespace)

Spreadsheet with mappings and objects inventory: https://docs.google.com/spreadsheets/d/1S-TYcNnv3g8EQPUwqbJDVO5xpDwIHVTL/edit#gid=2097076917

@jefferya
Copy link
Contributor

A basic implementation is in place however, a more thorough look into title is needed. I'm finding places where the above is not strictly true, for example, using the not(@*) is removing titleInfo elements with a valueURI.

Another area is if the item has multiple mods:relatedItem elements -- I'm not sure what the end result should be:

declare namespace mods = "http://www.loc.gov/mods/v3";
for $item in 
  /metadata[count(resource_metadata/(mods:mods|mods:modsCollection/mods:mods)/mods:relatedItem[mods:titleInfo])>=2]
let $id := $item/@pid/data()
order by $id
return $item

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants