From 2579f11e2ed6e959cf05ca12e99bdbd846d7596b Mon Sep 17 00:00:00 2001 From: "Giuseppe G. A. Celano" Date: Mon, 20 Mar 2017 21:11:04 +0100 Subject: [PATCH] Update README.md --- docs/README.md | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/docs/README.md b/docs/README.md index efff1ad..5013629 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,17 +1,21 @@ -# Tokenized CTSized Ancient Greek Literature Texts +# Tokenized CTSized Ancient Greek Texts + +This repository contains the graphic-word tokenized texts of the following two repositories (I also provide them in zipped format): -This repository contains the tokenized (by graphic word) texts of the following two repositories (XML format): * https://github.com/PerseusDL/canonical-greekLit * https://github.com/OpenGreekAndLatin/First1KGreek -The texts have been automatically generated from the original XML files which are well-formed and CTS-compliant (some may not). Some -conversion errors are due to annotation inconsistencies in the original files, which I have not tried to solve). +The texts have been generated completely automatically from the original XML files which are well-formed and CTS-compliant (some are not). Some conversion errors are already known to be ascribable to annotation inconsistencies/errors in the original files (which errors I have not tried to solve). For example, an inconsistent cts-urn location in the xml file or lack of numeration for each verse in a poem will generate errors (typically missing text). + +Check the XQuery module in the ```scripts``` folder for details. + Each file contains the following information: -* the @p attribute lists the passage (the full cts urn derives from merging this value and the cts urn of the text in the @text-cts attribute in the text element) -* the @n attribute shows the running number id for each word (numeration starts again as the passage changes) -* the text() of each t element contains the word form -* the optional @join attribute specifies whether a punctuation mark should be attached to either the preceding (b) or the following (a) word. -* the optional @tag element shows some special elements which contained the given word: more precisely, the add, del, unclear, surplus, supplied and seg elements, which can be of interest to identify editorial interventions + +* the ```@p``` attribute lists the passage (the full cts urn derives from merging this value and the cts urn of the text in the ```@text-cts``` attribute in the text element) +* the ```@n``` attribute shows the running number id for each word (numeration starts again as the passage changes) +* the ```text()``` of each ```t``` element contains the word form +* the optional ```@join``` attribute specifies whether a punctuation mark should be attached to either the preceding (b) or the following (a) word. +* the optional ```@tag``` element shows some special elements which contained the given word: more precisely, the ```add```, ```del```, ```unclear```, ```surplus```, ```supplied``` and ```seg``` elements, which can be of interest to identify editorial interventions. # License Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.