First CTS edition of Abbott #14

cwulfman · 2024-07-02T15:41:52Z

Uses simple paragraph reference scheme.

lcerrato · 2024-07-02T18:36:13Z

@cwulfman I think, if reference works are not going to use EpiDoc, you would need a different repo. As this repo is dependent on the hook tests passing, and the hook tests use EpiDoc, then anything that does not pass will not be a part of the package and therefore won't be part of the build.
The Perseus lexical are not EpiDoc (and are not visible as texts at present).

There are other failures here such as the divs including sections/subsections, but the refsDecl saying paragraph, no top level div for the edition, no xml:base, etc. so a test apart from EpiDoc will also fail.

You may want to do section/paragraph that will permit a section="front" here and then a section="'body" but the current structure is not going to parse.

AlisonBabeu · 2024-07-02T18:46:00Z

data/abbott/__cts__.xml

@@ -0,0 +1,4 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<ti:textgroup xmlns:ti="http://chs.harvard.edu/xmlns/cts" urn="urn:cts:engLit:abbott">
+    <ti:groupname xml:lang="eng">Abbott</ti:groupname>


We typically include the full name here so I would use: Edwin Abbott Abbott.

AlisonBabeu · 2024-07-02T18:46:32Z

data/abbott/grammar/__cts__.xml

+    </ti:title>
+  <ti:edition urn="urn:cts:engLit:abbott.grammar.perseus-eng1" workUrn="urn:cts:engLit:abbott.grammar" xml:lang="eng">
+    <ti:label xml:lang="eng">A Shakespearean Grammar</ti:label>
+    <ti:description xml:lang="eng">Edwin Abbott Abbott. A Shakespearean Grammar.  London and New York: Macmillan and Company, 1870.</ti:description>


Some extra spaces here before the publisher city.

AlisonBabeu · 2024-07-02T19:06:16Z

Seconding Lisa's thoughts, but also a few other questions on the TEI-XML file:

File is missing a publication statement, though I really like:
<publicationStmt><p>later</p></publicationStmt>
Whenever possible at the end of the <biblStruct><monogr>section we typically include a
` URL location with a link to the page level if we can find the book online.
This is just my own observation, but I didn't realize that we encoded the front pages of reference works so I was surprised by the whole <front> section since it seemed a duplication of the structured bibliographic data.
Also I think the title is misspelled, all online editions I have found (https://archive.org/details/shakespeariangra0000abbo/page/n5/mode/2up) have "Shakespearian" rather than "Shakespearean"

cwulfman · 2024-07-02T20:39:04Z

Seconding Lisa's thoughts, but also a few other questions on the TEI-XML file:
1. File is missing a publication statement, though I really like:
   `<publicationStmt><p>later</p></publicationStmt>`

This was a lazy convention back in the day. Fixed.

cwulfman · 2024-07-02T20:42:21Z

This is just my own observation, but I didn't realize that we encoded the front pages of reference works so I was surprised by the whole <front> section since it seemed a duplication of the structured bibliographic data.

As we discussed in the meeting this morning, these works were encoded as first-class texts in and of themselves. We can certainly remove the title page, prefaces, and introduction.

cwulfman · 2024-07-02T20:51:10Z

2. Whenever possible at the end of the <biblStruct><monogr>section we typically include a
` URL location with a link to the page level if we can find the book online.

Fixed

cwulfman · 2024-07-02T21:02:44Z

There are other failures here such as the divs including sections/subsections, but the refsDecl saying paragraph, no top level div for the edition, no xml:base, etc. so a test apart from EpiDoc will also fail.

I can certainly add patterns to the refsDecl for sect.subsect.para. But I would imagine any real citation would be to the paragraph number. Could you show me an example of how you would use xml:base in this instance?

cwulfman · 2024-07-02T21:05:45Z

I think, if reference works are not going to use EpiDoc, you would need a different repo. As this repo is dependent on the hook tests passing, and the hook tests use EpiDoc, then anything that does not pass will not be a part of the package and therefore won't be part of the build.
The Perseus lexical are not EpiDoc (and are not visible as texts at present).

This is obviously a serious issue. I'd like to hear @jtauber's thoughts on this. Does this mean the Shakespeare texts won't pass either?

lcerrato · 2024-07-03T15:25:55Z

There are other failures here such as the divs including sections/subsections, but the refsDecl saying paragraph, no top level div for the edition, no xml:base, etc. so a test apart from EpiDoc will also fail.

I can certainly add patterns to the refsDecl for sect.subsect.para. But I would imagine any real citation would be to the paragraph number. Could you show me an example of how you would use xml:base in this instance?

I don't understand the structure well enough at present. Is it section/subsection/paragraph or just paragraph?

lcerrato · 2024-07-03T15:29:32Z

data/abbott/grammar/grammar.xml

@@ -41,8 +41,12 @@
 				</correction>
 			</editorialDecl>
 			<refsDecl n="CTS">
-				<cRefPattern matchPattern="(\w+)"
-					replacementPattern="//tei:div[@subtype='paragraph' and @n=$1">
+				<cRefPattern matchPattern="(\w).(\w+).(\w+)"


This cRefPattern is unfamiliar to me. I don't understand it.

Why would a simple paragraph capture not be ...??

<cRefPattern n="paragraph" matchPattern="(\w+)" replacementPattern="#xpath(/tei:TEI/tei:text/tei:body/tei:div//tei:div[@n='$1'])"> <p>This pointer pattern extracts paragraph</p> </cRefPattern> </refsDecl>

But this makes all of the front matter invisible, which could be a section n="front" with several subsections, like preface n="1" etc. and you have no structure when you get to here:

<div type="edition" n="urn:cts:engLit:abbott.perseus-eng1" xml:lang="eng"> <div type="textpart" subtype="section" n="4"> <head>GRAMMAR.</head> <div type="textpart" subtype="subsection" n="1"> <head>ADJECTIVES.</head> <div type="textpart" subtype="paragraph" n="1"> <head>ADJECTIVES used as adverbs</head>

So that the reader will not be in Grammar, Adjectives, Adjectives used as.... but be simply dropped in "Adjective used as adverbs..." with no hierarchy.

I see a hierarchy in the front matter that can be included as part of the text if you just stick with the 3 levels throughout. The front matter is also going to be important for understanding references.

I would have to pull this offline to see the whole file (too big for GitHub) so I feel like I am only getting part of the picture in this format. I just don't think that paragraph as the sole container is how I would present this when there is a clear hierarchy.

I see, for example,

Body
Div
Section = Front (or 0)
Subsection = Pref_1 (3rd ed)
paragraph (...)
Subsection = Pref_2 (1st ed)
...
Subsection = Refs
...
Subsection = Abbrevs
...
Subsection = Intro
...
then
Section = 1 (Grammar)
Subsection = 1 Adjectives
Subsubsection or Paragraph = 1 Adjectives used as adverbs

Or this could be Part/Section/Subsection or whatever convention makes sense in the literature.

The paragraphs are not <p> paragraphs but <div type = "textpart" subtype="paragraph">As in a legal document the term Paragraph is being used to denote a type of subsection.

I doubt there many instances of citations to Abbott's Grammar, so there may not be conventions to follow.

Yes, I hear you: the citation unit as paragraph is certainly fine.

Here is an existing example:

<refsDecl n="CTS"> <cRefPattern n="paragraph" matchPattern="(\w+)" replacementPattern="#xpath(/tei:TEI/tei:text/tei:body/tei:div//tei:div[@n='$1'])"> <p>This pointer pattern extracts paragraph</p> </cRefPattern> </refsDecl>

I think the overall structure tree of the work was what I was wondering about.
It didn't strike me that it was desirable to have a flat container structure, but I don't think that is a bad approach.

If a nested structure is undesirable, you can always put the front matter into preliminary paragraph containers (n="pref_1", n="refs", etc.). That's what I would do to ensure it's visible.

This may not be relevant anyhow due to the nature of these files (agree with the comment below) but happy to help if you want to hammer this out further.

cwulfman · 2024-07-03T20:51:50Z

As we discussed in the meeting this morning, these works were encoded as first-class texts in and of themselves. We can certainly remove the title page, prefaces, and introduction.

I suspect we are going down the wrong path here anyway. Abbott, Onions, Schmidt, Dyce: these should no longer be encoded as first-class texts but instead reimagined as classical commentaries anchored to primary-source citations. I'm going to review James's Atlas guidelines and see what I can do, though again, several of these resources are not dictionaries.

First CTS edition of Abbott

e51ff71

cwulfman requested review from jtauber, lcerrato and AlisonBabeu July 2, 2024 15:41

AlisonBabeu reviewed Jul 2, 2024

View reviewed changes

cwulfman added 2 commits July 2, 2024 16:25

Corrects spelling in title; full name in __cts__.xml

e4e7ece

adds publicationStmt to Abbott Grammar

54fcae2

cwulfman added 2 commits July 2, 2024 16:44

Removes titlePage element

cc43911

adds ref to text on archive.org in sourceDesc.

775ed55

Adds refsDecl for section.subsection.paragraph

452d522

lcerrato reviewed Jul 3, 2024

View reviewed changes

cwulfman merged commit 452d522 into dictionaries Jul 23, 2024
2 checks passed

cwulfman deleted the abbott branch July 23, 2024 19:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First CTS edition of Abbott #14

First CTS edition of Abbott #14

cwulfman commented Jul 2, 2024

lcerrato commented Jul 2, 2024

AlisonBabeu Jul 2, 2024

AlisonBabeu Jul 2, 2024

AlisonBabeu commented Jul 2, 2024

cwulfman commented Jul 2, 2024

cwulfman commented Jul 2, 2024

cwulfman commented Jul 2, 2024

cwulfman commented Jul 2, 2024

cwulfman commented Jul 2, 2024

lcerrato commented Jul 3, 2024

lcerrato Jul 3, 2024

lcerrato Jul 3, 2024 •

edited

Loading

cwulfman Jul 3, 2024

lcerrato Jul 12, 2024

cwulfman commented Jul 3, 2024

First CTS edition of Abbott #14

First CTS edition of Abbott #14

Conversation

cwulfman commented Jul 2, 2024

lcerrato commented Jul 2, 2024

AlisonBabeu Jul 2, 2024

Choose a reason for hiding this comment

AlisonBabeu Jul 2, 2024

Choose a reason for hiding this comment

AlisonBabeu commented Jul 2, 2024

cwulfman commented Jul 2, 2024

cwulfman commented Jul 2, 2024

cwulfman commented Jul 2, 2024

cwulfman commented Jul 2, 2024

cwulfman commented Jul 2, 2024

lcerrato commented Jul 3, 2024

lcerrato Jul 3, 2024

Choose a reason for hiding this comment

lcerrato Jul 3, 2024 • edited Loading

Choose a reason for hiding this comment

cwulfman Jul 3, 2024

Choose a reason for hiding this comment

lcerrato Jul 12, 2024

Choose a reason for hiding this comment

cwulfman commented Jul 3, 2024

lcerrato Jul 3, 2024 •

edited

Loading