Breaking Changes
- Deprecating messytables and defaulting headers by default to string datatype
- New datatype_overrides object / dictionary to set specific datatypes for headers. e.g. "datatype_overrides":{"administration_number":"integer","BSA":"number"}
- Adding ci tests and minio to run integration tests against pseduo s3 bucket
- Fix to exclude filter out glacier objects from ingestion
- Add tox framework for testing
- Moving from setup tools to poetry via pyproject.toml
Changes
- Updating to a pypi version of singer-python (realit-singer-python)
Changes
- Update pylint requirement from >=2.12,<3.1 to >=2.12,<3.3 #66
Changes
- Patching boto3, voluptuous, messytables, pylint
- Replace ujson with msgspec
- Changing json serialisation with updated pipelinewise-singer-python (using msgspec)
Changes
- Moving to local copy of messytables to resolve Python 3.10 issues
- messytable okfn/messytables#196
Bumping Versions
- boto3==1.28.30
- pytest>=7.1,<7.5
- more_itertools>=8.12,<10.2
- ujson==5.8.0
- pytest-cov>=3.0,<4.2
Changes
- Will output an empty file if there is just a header row and no records can be sampled.
Bumping Versions
- boto3==1.26.138
- ipdb==0.13.13
- more_itertools>=8.12,<9.2
- pylint>=2.12,<2.18
- pytest-cov>=3.0,<4.1
Changes
- Using a List rather than a Set when obtaining a unique list of columns in the spreadsheet. This allows the column order to be retained as per the original csv file.
Changes
- Providing an optional set_empty_values_null setting. When set true will emit null (the JSON equivalent of None) instead of an empty string.
Changes
- Providing an optional s3_proxies dict config to set the use of a proxy server. Set to {} to avoid using a proxy server for s3 traffic.
Changes
- Bump boto3 from 1.23.10 to 1.24.26
- Bump ujson from 5.2.0 to 5.4.0 because of vunerabilities
The tap-s3-csv enhancements deal with scenarios where the csv files are not loading correctly due to various quality issues or assumption about the data being read e.g. data-types.
Changes
- Allows strings to be overridden to have a string data-type regardless of what has been discovered
- Supports the reading of UTF-8-BOM (Byte Order) - Microsoft saved csv files
- Support a suffix being added to streams / tables to make them unique e.g. a date or provider_id
- Provides option to warn rather error if a file isn't discovered for the search criteria
- Support the ability to remove a character from the csv file being read e.g. strip out all double-quotes.
Changes
- Dropped support for python 3.6
- Bump ujson from 4.3.0 to 5.1.0
Fix
- Set
time_extracted
when creating singer records.
Changes
- Migrate CI to github actions
- bump dependencies
Fix
- Make use of
start_date
when doing discovery - Discovery to run on more recent files to be able to detect new columns.
- Bumping dependencies
- Add
aws_profile
option to support Profile based authentication to S3 - Add option to authenticate to S3 using
AWS_PROFILE
,AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
andAWS_SESSION_TOKEN
environment variables
- Make logging configurable
- Updated generated json schema to be more in sync with fast sync in PipelineWise
- New data type guesser by
messytables
- Add
aws_endpoint_url
to support non-aws S3 account
- License classifier and project description update
- Raise exception when file(s) cannot sample
- Better error messages when no files found
- Initial release