Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First CTS edition of Abbott #14

Merged
merged 6 commits into from
Jul 23, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions data/abbott/grammar/grammar.xml
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,12 @@
</correction>
</editorialDecl>
<refsDecl n="CTS">
<cRefPattern matchPattern="(\w+)"
replacementPattern="//tei:div[@subtype='paragraph' and @n=$1">
<cRefPattern matchPattern="(\w).(\w+).(\w+)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cRefPattern is unfamiliar to me. I don't understand it.

Copy link
Contributor

@lcerrato lcerrato Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would a simple paragraph capture not be ...??

<cRefPattern n="paragraph" matchPattern="(\w+)"
replacementPattern="#xpath(/tei:TEI/tei:text/tei:body/tei:div//tei:div[@n='$1'])">
<p>This pointer pattern extracts paragraph</p>
</cRefPattern>
</refsDecl>   

But this makes all of the front matter invisible, which could be a section n="front" with several subsections, like preface n="1" etc. and you have no structure when you get to here:

<div type="edition" n="urn:cts:engLit:abbott.perseus-eng1" xml:lang="eng">

		<div type="textpart" subtype="section" n="4">
			<head>GRAMMAR.</head>
			<div type="textpart" subtype="subsection" n="1">
				<head>ADJECTIVES.</head>
				<div type="textpart" subtype="paragraph" n="1">
				<head>ADJECTIVES used as adverbs</head>

So that the reader will not be in Grammar, Adjectives, Adjectives used as.... but be simply dropped in "Adjective used as adverbs..." with no hierarchy.

I see a hierarchy in the front matter that can be included as part of the text if you just stick with the 3 levels throughout. The front matter is also going to be important for understanding references.

I would have to pull this offline to see the whole file (too big for GitHub) so I feel like I am only getting part of the picture in this format. I just don't think that paragraph as the sole container is how I would present this when there is a clear hierarchy.

I see, for example,

Body
Div
Section = Front (or 0)
Subsection = Pref_1 (3rd ed)
paragraph (...)
Subsection = Pref_2 (1st ed)
...
Subsection = Refs
...
Subsection = Abbrevs
...
Subsection = Intro
...
then
Section = 1 (Grammar)
Subsection = 1 Adjectives
Subsubsection or Paragraph = 1 Adjectives used as adverbs

Or this could be Part/Section/Subsection or whatever convention makes sense in the literature.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The paragraphs are not <p> paragraphs but <div type = "textpart" subtype="paragraph">As in a legal document the term Paragraph is being used to denote a type of subsection.

I doubt there many instances of citations to Abbott's Grammar, so there may not be conventions to follow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I hear you: the citation unit as paragraph is certainly fine.

Here is an existing example:

<refsDecl n="CTS">
<cRefPattern n="paragraph" matchPattern="(\w+)"
replacementPattern="#xpath(/tei:TEI/tei:text/tei:body/tei:div//tei:div[@n='$1'])">
<p>This pointer pattern extracts paragraph</p>
</cRefPattern>
</refsDecl>

I think the overall structure tree of the work was what I was wondering about.
It didn't strike me that it was desirable to have a flat container structure, but I don't think that is a bad approach.

If a nested structure is undesirable, you can always put the front matter into preliminary paragraph containers (n="pref_1", n="refs", etc.). That's what I would do to ensure it's visible.

This may not be relevant anyhow due to the nature of these files (agree with the comment below) but happy to help if you want to hammer this out further.

replacementPattern="//tei:div[@subtype='section' and @n=$1]/tei:div[@subtype='subsection and @n=$2]/tei:div[@subtype='paragraph' and @n=$3]">
<p>This pointer extracts paragraph number.</p>
</cRefPattern>
<cRefPattern matchPattern="para (\w+)"
replacementPattern="//tei:div[@subtype='paragraph' and @n=$1]">
<p>This pointer extracts paragraph number.</p>
</cRefPattern>
</refsDecl>
Expand Down
Loading