Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First CTS edition of Abbott #14

Merged
merged 6 commits into from
Jul 23, 2024
Merged

First CTS edition of Abbott #14

merged 6 commits into from
Jul 23, 2024

Conversation

cwulfman
Copy link
Collaborator

@cwulfman cwulfman commented Jul 2, 2024

Uses simple paragraph reference scheme.

@lcerrato
Copy link
Contributor

lcerrato commented Jul 2, 2024

@cwulfman I think, if reference works are not going to use EpiDoc, you would need a different repo. As this repo is dependent on the hook tests passing, and the hook tests use EpiDoc, then anything that does not pass will not be a part of the package and therefore won't be part of the build.
The Perseus lexical are not EpiDoc (and are not visible as texts at present).

There are other failures here such as the divs including sections/subsections, but the refsDecl saying paragraph, no top level div for the edition, no xml:base, etc. so a test apart from EpiDoc will also fail.

You may want to do section/paragraph that will permit a section="front" here and then a section="'body" but the current structure is not going to parse.

@@ -0,0 +1,4 @@
<?xml version="1.0" encoding="UTF-8"?>
<ti:textgroup xmlns:ti="http://chs.harvard.edu/xmlns/cts" urn="urn:cts:engLit:abbott">
<ti:groupname xml:lang="eng">Abbott</ti:groupname>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We typically include the full name here so I would use: Edwin Abbott Abbott.

</ti:title>
<ti:edition urn="urn:cts:engLit:abbott.grammar.perseus-eng1" workUrn="urn:cts:engLit:abbott.grammar" xml:lang="eng">
<ti:label xml:lang="eng">A Shakespearean Grammar</ti:label>
<ti:description xml:lang="eng">Edwin Abbott Abbott. A Shakespearean Grammar. London and New York: Macmillan and Company, 1870.</ti:description>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some extra spaces here before the publisher city.

@AlisonBabeu
Copy link
Contributor

Seconding Lisa's thoughts, but also a few other questions on the TEI-XML file:

  1. File is missing a publication statement, though I really like:
    <publicationStmt><p>later</p></publicationStmt>

  2. Whenever possible at the end of the <biblStruct><monogr>section we typically include a
    ` URL location with a link to the page level if we can find the book online.

  3. This is just my own observation, but I didn't realize that we encoded the front pages of reference works so I was surprised by the whole <front> section since it seemed a duplication of the structured bibliographic data.

  4. Also I think the title is misspelled, all online editions I have found (https://archive.org/details/shakespeariangra0000abbo/page/n5/mode/2up) have "Shakespearian" rather than "Shakespearean"

@cwulfman
Copy link
Collaborator Author

cwulfman commented Jul 2, 2024

Seconding Lisa's thoughts, but also a few other questions on the TEI-XML file:

1. File is missing a publication statement, though I really like:
   `<publicationStmt><p>later</p></publicationStmt>`

This was a lazy convention back in the day. Fixed.

@cwulfman
Copy link
Collaborator Author

cwulfman commented Jul 2, 2024

  • This is just my own observation, but I didn't realize that we encoded the front pages of reference works so I was surprised by the whole <front> section since it seemed a duplication of the structured bibliographic data.

As we discussed in the meeting this morning, these works were encoded as first-class texts in and of themselves. We can certainly remove the title page, prefaces, and introduction.

@cwulfman
Copy link
Collaborator Author

cwulfman commented Jul 2, 2024

2. Whenever possible at the end of the <biblStruct><monogr>section we typically include a
` URL location with a link to the page level if we can find the book online.

Fixed

@cwulfman
Copy link
Collaborator Author

cwulfman commented Jul 2, 2024

There are other failures here such as the divs including sections/subsections, but the refsDecl saying paragraph, no top level div for the edition, no xml:base, etc. so a test apart from EpiDoc will also fail.

I can certainly add patterns to the refsDecl for sect.subsect.para. But I would imagine any real citation would be to the paragraph number. Could you show me an example of how you would use xml:base in this instance?

@cwulfman
Copy link
Collaborator Author

cwulfman commented Jul 2, 2024

I think, if reference works are not going to use EpiDoc, you would need a different repo. As this repo is dependent on the hook tests passing, and the hook tests use EpiDoc, then anything that does not pass will not be a part of the package and therefore won't be part of the build.
The Perseus lexical are not EpiDoc (and are not visible as texts at present).

This is obviously a serious issue. I'd like to hear @jtauber's thoughts on this. Does this mean the Shakespeare texts won't pass either?

@lcerrato
Copy link
Contributor

lcerrato commented Jul 3, 2024

There are other failures here such as the divs including sections/subsections, but the refsDecl saying paragraph, no top level div for the edition, no xml:base, etc. so a test apart from EpiDoc will also fail.

I can certainly add patterns to the refsDecl for sect.subsect.para. But I would imagine any real citation would be to the paragraph number. Could you show me an example of how you would use xml:base in this instance?

I don't understand the structure well enough at present. Is it section/subsection/paragraph or just paragraph?

@@ -41,8 +41,12 @@
</correction>
</editorialDecl>
<refsDecl n="CTS">
<cRefPattern matchPattern="(\w+)"
replacementPattern="//tei:div[@subtype='paragraph' and @n=$1">
<cRefPattern matchPattern="(\w).(\w+).(\w+)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cRefPattern is unfamiliar to me. I don't understand it.

Copy link
Contributor

@lcerrato lcerrato Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would a simple paragraph capture not be ...??

<cRefPattern n="paragraph" matchPattern="(\w+)"
replacementPattern="#xpath(/tei:TEI/tei:text/tei:body/tei:div//tei:div[@n='$1'])">
<p>This pointer pattern extracts paragraph</p>
</cRefPattern>
</refsDecl>   

But this makes all of the front matter invisible, which could be a section n="front" with several subsections, like preface n="1" etc. and you have no structure when you get to here:

<div type="edition" n="urn:cts:engLit:abbott.perseus-eng1" xml:lang="eng">

		<div type="textpart" subtype="section" n="4">
			<head>GRAMMAR.</head>
			<div type="textpart" subtype="subsection" n="1">
				<head>ADJECTIVES.</head>
				<div type="textpart" subtype="paragraph" n="1">
				<head>ADJECTIVES used as adverbs</head>

So that the reader will not be in Grammar, Adjectives, Adjectives used as.... but be simply dropped in "Adjective used as adverbs..." with no hierarchy.

I see a hierarchy in the front matter that can be included as part of the text if you just stick with the 3 levels throughout. The front matter is also going to be important for understanding references.

I would have to pull this offline to see the whole file (too big for GitHub) so I feel like I am only getting part of the picture in this format. I just don't think that paragraph as the sole container is how I would present this when there is a clear hierarchy.

I see, for example,

Body
Div
Section = Front (or 0)
Subsection = Pref_1 (3rd ed)
paragraph (...)
Subsection = Pref_2 (1st ed)
...
Subsection = Refs
...
Subsection = Abbrevs
...
Subsection = Intro
...
then
Section = 1 (Grammar)
Subsection = 1 Adjectives
Subsubsection or Paragraph = 1 Adjectives used as adverbs

Or this could be Part/Section/Subsection or whatever convention makes sense in the literature.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The paragraphs are not <p> paragraphs but <div type = "textpart" subtype="paragraph">As in a legal document the term Paragraph is being used to denote a type of subsection.

I doubt there many instances of citations to Abbott's Grammar, so there may not be conventions to follow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I hear you: the citation unit as paragraph is certainly fine.

Here is an existing example:

<refsDecl n="CTS">
<cRefPattern n="paragraph" matchPattern="(\w+)"
replacementPattern="#xpath(/tei:TEI/tei:text/tei:body/tei:div//tei:div[@n='$1'])">
<p>This pointer pattern extracts paragraph</p>
</cRefPattern>
</refsDecl>

I think the overall structure tree of the work was what I was wondering about.
It didn't strike me that it was desirable to have a flat container structure, but I don't think that is a bad approach.

If a nested structure is undesirable, you can always put the front matter into preliminary paragraph containers (n="pref_1", n="refs", etc.). That's what I would do to ensure it's visible.

This may not be relevant anyhow due to the nature of these files (agree with the comment below) but happy to help if you want to hammer this out further.

@cwulfman
Copy link
Collaborator Author

cwulfman commented Jul 3, 2024

As we discussed in the meeting this morning, these works were encoded as first-class texts in and of themselves. We can certainly remove the title page, prefaces, and introduction.

I suspect we are going down the wrong path here anyway. Abbott, Onions, Schmidt, Dyce: these should no longer be encoded as first-class texts but instead reimagined as classical commentaries anchored to primary-source citations. I'm going to review James's Atlas guidelines and see what I can do, though again, several of these resources are not dictionaries.

@cwulfman cwulfman merged commit 452d522 into dictionaries Jul 23, 2024
2 checks passed
@cwulfman cwulfman deleted the abbott branch July 23, 2024 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants