Skip to content

dariok/w2tei

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word to TEI conversion - w2tei

A web service and collection of basic scripts to convert Word (XML or DOCX) to TEI.

Incompatible change: As of 2020-11-23, paragraph style names (w:pStyle/@w:val) are recorded in tei:p/@rendition, the definitions in w:rPr are kept in tei:p/@style.

Incompatible change: As of 2021-09-06, heading levels and div structure are evaluated based upon the values given by w:outlineLvl in a paragraph preset or paragraph style; empty lines or a style called “Heading \d” will not suffice. Paragraphs not having a defined outline level are not recognized as headings.

How to use (docx to Flat XML)

Oxygen

  1. Import the scenarios into “transformation scenarios“
  2. applying “DOCX → Flat OPC” to a docx file will create a (very basic) flat XML file

web service

Build the package using ant on build.xml. You can upload docx files via a web form or by POSTing them to cnvert2.xql

CLI

! The older zip:xml-entry function is available in Saxon ⩽ 9.5.1.1 while file:read-binary and archive:extract-text require Saxon PE or EE ⩾ 9.6. If you do not have Saxon PE or EE, you currently have to use a different XSLT processor, use oXygen (as it includes a recent version ofSaxon PE and EE), or use the web service implementation from within eXist-DB or baseX as these implement the ExPath modules independently.

To convert a docx, apply docx.xsl to word-empty.xml (or any XML file) and pass the parameter filename with a path to the docx to be converted.

How to use (XML to TEI)

apply basic transformation scenario to docx or apply wt0.xsl and wt1.xsl to any flat Word XML

Releases

No releases published

Packages

No packages published