All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- SPDX license headers were added to source files
- Official support and testing for Python 3.13 (#25)
- Fixed the README referring to the wrong license text
- Fixed the creation of loggers for the library which were never utilized
- Bump pypa/gh-action-pypi-publish from 1.9.0 to 1.10.2 (by @dependabot in #22)
- Update
pre-commit
to 4.0.1 (#23) - Use pytest fixtures effectively (#24)
- Use pytest-docker in place of manual Docker (#26)
- Updated development tools
- Bump pypa/gh-action-pypi-publish from 1.8.12 to 1.8.14 (by @dependabot in #16)
- Update development to use
hatch test
andhatch fmt
(#17) - Included
mypy
typing in the linting checks
- Typo in README codeblock by @Chaostheorie (#19)
- Testing on PyPy 3.10
- Testing on released Python 3.12
.github
and.docker
folders are no longer included in the source distribution- Changed the license to Mozilla Public License Version 2.0
pypa/gh-action-pypi-publish
updated to v1.8.10- CI testing now uses the official Apache Tika image (minimal) instead of the paperless-ngx image
- More extensive testing of date and time strings in various formats, including RFC-3339, ISO-8061 and things in between
- Date parsing is now does not assume a timezone if none is provided (the parsed datetime will be naive)
pypa/gh-action-pypi-publish
updated to v1.8.8
- Restricted action permissions to minimal requirements to function
- Github CI also now creates a Github release with sdist, wheel and changelog
- Additional classifiers to the project on PyPI
- Handling of ISO-8061 dates with fractional seconds, which Python doesn't support natively
- Handling of filenames in the
Content-Disposition
header with non-ASCII characters
- All endpoints now return a
TikaResponse
, which will have many of the common keys parsed into Python native data types where possible, based on the list from the Tika wiki. If a key is not in the response, the value will beNone
- Fixes an incorrect key when parsing new content types
- Fixed handling of message/rfc822 content type documents
- Further refinements to the Tika response data models
- Testing against a .doc format file
- Testing against JPEG and PNG format files
- The plain text and html versions of the Tika endpoint have been renamed to
as_html
andas_text
, hopefully to make it clearer about the response type - The plain text and html versions of the recursive endpoint were renamed to
as_html
andas_text
- Optional gzip compression for use when parsing from a buffer instead of a file
- The optional dependencies have been removed as Tika does not support HTTP/2 or Brotli
- Print of the Python version to the test coverage running
- Optional dependencies for HTTP/2 and Brotli support in httpx
add_headers
to allow users to update the client's headers- Support for Tika endpoint with a string or byte buffer instead of a file
- Built wheels are now retained for 7 days instead of 90 days
- Reduces the frequency of CodeQL runs
- Support for Tika metadata, tika and recursive metadata endpoints
- Full test coverage
- Full typing
- A changelog
- Comprehensive CI configuration
- Code coverage through codedov.io
- CodeQL scanning
- Fixes the Github Actions test workflow concurrency setting
- Fixes workflow name and file name to reflect what it actually does