Skip to content

v3.0.0a2

Pre-release
Pre-release
Compare
Choose a tag to compare
@kba kba released this 22 Aug 09:12
· 195 commits to master since this release

Changed:

  • 🔥 OcrdPage as proxy of PcGtsType instead of alias; also contains etree and mapping now
  • 🔥 Processor.zip_input_files now can throw ocrd.NonUniqueInputFile and ocrd.MissingInputFile
    (the latter only if OCRD_MISSING_INPUT=ABORT)
  • 🔥 Processor.zip_input_files does not by default use require_first anymore
    (so the first file in any input file tuple per page can be None as well)
  • 🔥 no more Workspace.overwrite_mode, merely delegate to OCRD_EXISTING_OUTPUT=OVERWRITE
  • 🎨 improve on docs result for ocrd_utils.config

Added:

  • 👉 OCRD_DOWNLOAD_INPUT for whether input files should be downloaded before processing
  • 👉 OCRD_MISSING_INPUT for how to handle missing input files (SKIP or ABORT)
  • 👉 OCRD_MISSING_OUTPUT for how to handle processing failures (SKIP or ABORT or COPY)
    the latter behaves like ocrd-dummy for the failed page(s)
  • 👉 OCRD_EXISTING_OUTPUT for how to handle existing output files (SKIP or ABORT or OVERWRITE)
  • new CLI option --debug as short-hand for ABORT choices above
  • Processor.logger set up by constructor already (for re-use by processor implementors)
  • default-expand and validate ocrd_tool.json in Processor constructor, log invalidities
  • handle JSON deprecation in ocrd_tool.json by reporting warnings