-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First CTS edition of Abbott #14
Conversation
@cwulfman I think, if reference works are not going to use EpiDoc, you would need a different repo. As this repo is dependent on the hook tests passing, and the hook tests use EpiDoc, then anything that does not pass will not be a part of the package and therefore won't be part of the build. There are other failures here such as the divs including sections/subsections, but the refsDecl saying paragraph, no top level div for the edition, no xml:base, etc. so a test apart from EpiDoc will also fail. You may want to do section/paragraph that will permit a section="front" here and then a section="'body" but the current structure is not going to parse. |
data/abbott/__cts__.xml
Outdated
@@ -0,0 +1,4 @@ | |||
<?xml version="1.0" encoding="UTF-8"?> | |||
<ti:textgroup xmlns:ti="http://chs.harvard.edu/xmlns/cts" urn="urn:cts:engLit:abbott"> | |||
<ti:groupname xml:lang="eng">Abbott</ti:groupname> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We typically include the full name here so I would use: Edwin Abbott Abbott.
data/abbott/grammar/__cts__.xml
Outdated
</ti:title> | ||
<ti:edition urn="urn:cts:engLit:abbott.grammar.perseus-eng1" workUrn="urn:cts:engLit:abbott.grammar" xml:lang="eng"> | ||
<ti:label xml:lang="eng">A Shakespearean Grammar</ti:label> | ||
<ti:description xml:lang="eng">Edwin Abbott Abbott. A Shakespearean Grammar. London and New York: Macmillan and Company, 1870.</ti:description> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some extra spaces here before the publisher city.
Seconding Lisa's thoughts, but also a few other questions on the TEI-XML file:
|
This was a lazy convention back in the day. Fixed. |
As we discussed in the meeting this morning, these works were encoded as first-class texts in and of themselves. We can certainly remove the title page, prefaces, and introduction. |
Fixed |
I can certainly add patterns to the refsDecl for sect.subsect.para. But I would imagine any real citation would be to the paragraph number. Could you show me an example of how you would use xml:base in this instance? |
This is obviously a serious issue. I'd like to hear @jtauber's thoughts on this. Does this mean the Shakespeare texts won't pass either? |
I don't understand the structure well enough at present. Is it section/subsection/paragraph or just paragraph? |
@@ -41,8 +41,12 @@ | |||
</correction> | |||
</editorialDecl> | |||
<refsDecl n="CTS"> | |||
<cRefPattern matchPattern="(\w+)" | |||
replacementPattern="//tei:div[@subtype='paragraph' and @n=$1"> | |||
<cRefPattern matchPattern="(\w).(\w+).(\w+)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cRefPattern is unfamiliar to me. I don't understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would a simple paragraph capture not be ...??
<cRefPattern n="paragraph" matchPattern="(\w+)"
replacementPattern="#xpath(/tei:TEI/tei:text/tei:body/tei:div//tei:div[@n='$1'])">
<p>This pointer pattern extracts paragraph</p>
</cRefPattern>
</refsDecl>
But this makes all of the front matter invisible, which could be a section n="front" with several subsections, like preface n="1" etc. and you have no structure when you get to here:
<div type="edition" n="urn:cts:engLit:abbott.perseus-eng1" xml:lang="eng">
<div type="textpart" subtype="section" n="4">
<head>GRAMMAR.</head>
<div type="textpart" subtype="subsection" n="1">
<head>ADJECTIVES.</head>
<div type="textpart" subtype="paragraph" n="1">
<head>ADJECTIVES used as adverbs</head>
So that the reader will not be in Grammar, Adjectives, Adjectives used as.... but be simply dropped in "Adjective used as adverbs..." with no hierarchy.
I see a hierarchy in the front matter that can be included as part of the text if you just stick with the 3 levels throughout. The front matter is also going to be important for understanding references.
I would have to pull this offline to see the whole file (too big for GitHub) so I feel like I am only getting part of the picture in this format. I just don't think that paragraph as the sole container is how I would present this when there is a clear hierarchy.
I see, for example,
Body
Div
Section = Front (or 0)
Subsection = Pref_1 (3rd ed)
paragraph (...)
Subsection = Pref_2 (1st ed)
...
Subsection = Refs
...
Subsection = Abbrevs
...
Subsection = Intro
...
then
Section = 1 (Grammar)
Subsection = 1 Adjectives
Subsubsection or Paragraph = 1 Adjectives used as adverbs
Or this could be Part/Section/Subsection or whatever convention makes sense in the literature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The paragraphs are not <p>
paragraphs but <div type = "textpart" subtype="paragraph">
As in a legal document the term Paragraph is being used to denote a type of subsection.
I doubt there many instances of citations to Abbott's Grammar, so there may not be conventions to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I hear you: the citation unit as paragraph is certainly fine.
Here is an existing example:
<refsDecl n="CTS">
<cRefPattern n="paragraph" matchPattern="(\w+)"
replacementPattern="#xpath(/tei:TEI/tei:text/tei:body/tei:div//tei:div[@n='$1'])">
<p>This pointer pattern extracts paragraph</p>
</cRefPattern>
</refsDecl>
I think the overall structure tree of the work was what I was wondering about.
It didn't strike me that it was desirable to have a flat container structure, but I don't think that is a bad approach.
If a nested structure is undesirable, you can always put the front matter into preliminary paragraph containers (n="pref_1", n="refs", etc.). That's what I would do to ensure it's visible.
This may not be relevant anyhow due to the nature of these files (agree with the comment below) but happy to help if you want to hammer this out further.
I suspect we are going down the wrong path here anyway. Abbott, Onions, Schmidt, Dyce: these should no longer be encoded as first-class texts but instead reimagined as classical commentaries anchored to primary-source citations. I'm going to review James's Atlas guidelines and see what I can do, though again, several of these resources are not dictionaries. |
Uses simple paragraph reference scheme.