Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
gcelano authored Mar 20, 2017
1 parent 85c61a1 commit 2579f11
Showing 1 changed file with 13 additions and 9 deletions.
22 changes: 13 additions & 9 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,21 @@
# Tokenized CTSized Ancient Greek Literature Texts
# Tokenized CTSized Ancient Greek Texts

This repository contains the graphic-word tokenized texts of the following two repositories (I also provide them in zipped format):

This repository contains the tokenized (by graphic word) texts of the following two repositories (XML format):
* https://github.com/PerseusDL/canonical-greekLit
* https://github.com/OpenGreekAndLatin/First1KGreek

The texts have been automatically generated from the original XML files which are well-formed and CTS-compliant (some may not). Some
conversion errors are due to annotation inconsistencies in the original files, which I have not tried to solve).
The texts have been generated completely automatically from the original XML files which are well-formed and CTS-compliant (some are not). Some conversion errors are already known to be ascribable to annotation inconsistencies/errors in the original files (which errors I have not tried to solve). For example, an inconsistent cts-urn location in the xml file or lack of numeration for each verse in a poem will generate errors (typically missing text).

Check the XQuery module in the ```scripts``` folder for details.

Each file contains the following information:
* the @p attribute lists the passage (the full cts urn derives from merging this value and the cts urn of the text in the @text-cts attribute in the text element)
* the @n attribute shows the running number id for each word (numeration starts again as the passage changes)
* the text() of each t element contains the word form
* the optional @join attribute specifies whether a punctuation mark should be attached to either the preceding (b) or the following (a) word.
* the optional @tag element shows some special elements which contained the given word: more precisely, the add, del, unclear, surplus, supplied and seg elements, which can be of interest to identify editorial interventions

* the ```@p``` attribute lists the passage (the full cts urn derives from merging this value and the cts urn of the text in the ```@text-cts``` attribute in the text element)
* the ```@n``` attribute shows the running number id for each word (numeration starts again as the passage changes)
* the ```text()``` of each ```t``` element contains the word form
* the optional ```@join``` attribute specifies whether a punctuation mark should be attached to either the preceding (b) or the following (a) word.
* the optional ```@tag``` element shows some special elements which contained the given word: more precisely, the ```add```, ```del```, ```unclear```, ```surplus```, ```supplied``` and ```seg``` elements, which can be of interest to identify editorial interventions.

# License
<a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">Creative Commons Attribution-NonCommercial 4.0 International License</a>.

0 comments on commit 2579f11

Please sign in to comment.