-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
13 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,21 @@ | ||
# Tokenized CTSized Ancient Greek Literature Texts | ||
# Tokenized CTSized Ancient Greek Texts | ||
|
||
This repository contains the graphic-word tokenized texts of the following two repositories (I also provide them in zipped format): | ||
|
||
This repository contains the tokenized (by graphic word) texts of the following two repositories (XML format): | ||
* https://github.com/PerseusDL/canonical-greekLit | ||
* https://github.com/OpenGreekAndLatin/First1KGreek | ||
|
||
The texts have been automatically generated from the original XML files which are well-formed and CTS-compliant (some may not). Some | ||
conversion errors are due to annotation inconsistencies in the original files, which I have not tried to solve). | ||
The texts have been generated completely automatically from the original XML files which are well-formed and CTS-compliant (some are not). Some conversion errors are already known to be ascribable to annotation inconsistencies/errors in the original files (which errors I have not tried to solve). For example, an inconsistent cts-urn location in the xml file or lack of numeration for each verse in a poem will generate errors (typically missing text). | ||
|
||
Check the XQuery module in the ```scripts``` folder for details. | ||
|
||
Each file contains the following information: | ||
* the @p attribute lists the passage (the full cts urn derives from merging this value and the cts urn of the text in the @text-cts attribute in the text element) | ||
* the @n attribute shows the running number id for each word (numeration starts again as the passage changes) | ||
* the text() of each t element contains the word form | ||
* the optional @join attribute specifies whether a punctuation mark should be attached to either the preceding (b) or the following (a) word. | ||
* the optional @tag element shows some special elements which contained the given word: more precisely, the add, del, unclear, surplus, supplied and seg elements, which can be of interest to identify editorial interventions | ||
|
||
* the ```@p``` attribute lists the passage (the full cts urn derives from merging this value and the cts urn of the text in the ```@text-cts``` attribute in the text element) | ||
* the ```@n``` attribute shows the running number id for each word (numeration starts again as the passage changes) | ||
* the ```text()``` of each ```t``` element contains the word form | ||
* the optional ```@join``` attribute specifies whether a punctuation mark should be attached to either the preceding (b) or the following (a) word. | ||
* the optional ```@tag``` element shows some special elements which contained the given word: more precisely, the ```add```, ```del```, ```unclear```, ```surplus```, ```supplied``` and ```seg``` elements, which can be of interest to identify editorial interventions. | ||
|
||
# License | ||
<a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">Creative Commons Attribution-NonCommercial 4.0 International License</a>. |