Tokenized and sentence-splitted CTSized Ancient Greek Texts (v1.1.0)

This repository contains the graphic-word tokenized texts of the following two repositories (I also provide them in zipped format):

The texts have been generated completely automatically from the original XML files which are well-formed and CTS-compliant (some are not). Some conversion errors are already known to be ascribable to annotation inconsistencies/errors in the original files (which errors I have not tried to solve). For example, an inconsistent cts-urn location in the xml file or lack of numeration for each verse in a poem will generate errors (typically missing text).

Check the XQuery module in the scripts folder for details.

Each file contains the following information:

the @p attribute lists the passage (the full cts urn derives from merging this value and the cts urn of the text in the @text-cts attribute in the text element)
the @n attribute shows the running number id for each word (numeration starts again as the passage changes)
the text() of each t element contains the word form
the optional @join attribute specifies whether a punctuation mark should be attached to either the preceding (b) or the following (a) word.
the optional @tag element shows some special elements which contained the given word: more precisely, the add, del, unclear, surplus, supplied and seg elements, which can be of interest to identify editorial interventions.

Changes from previous releases

From release 1.0.0:

Correction to the cts-urn structure by considering the elements seg and p (currently div, seg, p, and l are considered)
Addition of sentence split (on the basis of the following characters: ".", "·", ";", ":")

Cite

Cite the following work thus:

Giuseppe G. A. Celano. (2017). Tokenized and sentence-splitted CTSized Ancient Greek texts (v1.1.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.438311

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
docs		docs
scripts		scripts
texts		texts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tokenized and sentence-splitted CTSized Ancient Greek Texts (v1.1.0)

Changes from previous releases

Cite

License

About

Releases 2

Packages

Languages

gcelano/CTSAncientGreekXML

Folders and files

Latest commit

History

Repository files navigation

Tokenized and sentence-splitted CTSized Ancient Greek Texts (v1.1.0)

Changes from previous releases

Cite

License

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages