Releases: s7clarke10/tap-rest-api-msdk
Patching OAuth Authentication to align with SDK changes
A change to the Meltano SDK affected the OAuth authentication.
This releases updates the tap and auth components to adjust to the changes in the Meltano SDK.
Updating sdk and dependencies
What's Changed
updated SDK and deps by @jlloyd-widen in Widen#56
Full Changelog: Widen/tap-rest-api-msdk@1.3.10...1.3.11
Use correct argument for page_size #52
This resolves a small bug with the simple_offset_paginator where the page_size positional argument was incorrectly set.
1.3.9 - SimpleOffsetPaginator and Drop Python 3.7 add 3.12
The two features have been added from the upstream fork.
-
add SimpleOffsetPaginator (Widen#48)
-
Support Python 3.12 and drop support for EOL Python 3.7
New Features from upstream
Adding the following features
- Pagination offset starting counter is configurable
- If JSON Path next token is set, this is the default paginator
Add Rate Limit Logic and Cache Authenticator
This PR contains three features to deliver required functionality for a new API.
- It contains enhanced logic in the tap discovery to cache credentials to avoid having to re-authenticate for each stream. The API was erroring due to too many OAuth Requests in a quick succession.
- Optional backoff logic has been built in allowing tap-rest-api-msdk to respond to http retry-after messages. This is configurable to use either the header or message responses. There is also a new setting which adds some additional time because sometime the requested wait time is longer enough. This feature was built in to meet the API's backoff requirements.
- Optionally provides the ability to store the whole raw message. The feature is enabled by setting the store_raw_json_message to true. This is useful if you wish to offload, the flattening functionality to the likes of dbt. Where I have used this feature I tend to select only the primary key, the replication key, and the _sdc_raw_json field.
The use case for this was a dynamic schema with optional fields/columns which were not available in every record. In this situation the schema discovery did not pick up every field leading to missing data, it was elected that storing the raw json record was important - to ensure all data is preserved.
Points 2-3 are optional, and so the tap's behaviour does not change. For Point 1, the caching will speed up discovery and ingestion as credentials are cached.
Resolving pagination bug and dependabot securities
This release resolves a bug affecting offset pagination. It also addresses one recent Dependabot security vulnerabilities, and a partial fix for the other.
Bug Fix:
- Removing the defaulting of pagination_page_size to 0. This is an optional parameter.
Security Fixes:
- pyca/cryptography's wheels include vulnerable OpenSSL
- ReDoS in py library when used with subversion
The second security issue has been resolved by a pytest version upgrade, however it also needs an upgrade of tox to be fully resolved. Currently a Meltano SDK dependency prevents the tox being bumped to a higher version. When the SDK is updated, a bump of tox will fully resolve the - ReDoS in py library when used with subversion
issue.
Syncing tap-rest-api-msdk with upstream repo
This release brings the tap in-line with the upstream repository now that the PR has been accepted to merge our changes into the main repo.
Release includes
- library dependencies
- linting of the code
- resolution to Dependabot issues
Adding SDK support for Pagination and Authentication.
This PR introduces new Authenticators and Paginators to tap-rest-api-msdk
. (it is a refactored approach to previous PR's). With this feature there is greater support for a range of API's - making this tap the swiss army knife for accessing API's.
Summary
- Support for most Meltano SDK Authenticators.
- Support for all Meltano Paginators.
- Flexibility to support many new API's by new settings to adjust request parameter names. See README.md for more details on settings.
- Ability to send parameters in the request body rather than request parameters (if required).
- Moving from deprecated
get_next_page_token
to supportget_new_paginator
. This removes the warnings in the logs. - Enhanced incremental replication (include support for API query templates).
- New modules
auth
andpagination
keeping a clean design. - New
auth
method aws, to support ingestion from AWS REST End-point e.g. OpenSearch.
Paginators
Each REST API is different. This PR builds on the concept of picking an appropriate request and response style for the API. Select an appropriate pagination_request_style
to pick the paginator you require. In most cases this needs to be coupled with an appropriate paginator_response_style
to process the response and pick the next page location in the body or headers.
Supported Paginators as part of this PR include:
jsonpath_paginator
ordefault
- This style obtains the token for the next page from a specific location in the response body via JSONPath notation. In many situations thejsonpath_paginator
is a more appropriate paginator to thehateoas_paginator
.next_page_token_path
- The jsonpath to next page token. Example:"$['@odata.nextLink']"
, this locates the token returned via the Microsoft Graph API. Default'$.next_page'
for thejsonpath_paginator
paginator only otherwise None.
offset_paginator
orstyle1
- This style uses URL parameters named offset and limitoffset
is calculated from the previous response, or not set if there is no previous responsepagination_page_size
- Sets a limit to number of records per page / response. Default25
records.pagination_limit_per_page_param
- the name of the API parameter to limit number of records per page. Default parameter namelimit
.pagination_total_limit_param
- The name of the param that indicates the total limit e.g. total, count. Defaults to totalnext_page_token_path
- Used to locate an appropriate link in the response. Default None - but looks in thepagination
section of the JSON response by default. Example, jsonpath to get the offset from the NOAA API'$.metadata.resultset'
.
simple_header_paginator
- This style uses links in the Header Response to locate the next page. Example thex-next-page
link used by the Gitlab API.header_link_paginator
- This style uses the default header link paginator from the Meltano SDK.restapi_header_link_paginator
- This style is a variant on the header_link_paginator. It supports the ability to read from GitHub API.pagination_page_size
- Sets a limit to number of records per page / response. Default25
records.pagination_limit_per_page_param
- the name of the API parameter to limit number of records per page. Default parameter nameper_page
.pagination_results_limit
- Restricts the total number of records returned from the API. Default None i.e. no limit.
hateoas_paginator
- This style parses the next_token response for the parameters to pass. It is used by API's utilising the HATEOAS Rest style HATEOAS, including FHIR API's.pagination_page_size
- Sets a limit to number of records per page / response. Default None.pagination_limit_per_page_param
- the name of the API parameter to limit number of records per page e.g._count
for FHIR API's. Default None.
single_page_paginator
- A paginator that does works with single-page endpoints.page_number_paginator
- Paginator class for APIs that use page number. Looks at the response link to determine more pages.next_page_token_path
- Use to locate an appropriate link in the response. Default"hasMore"
.
Authentication
This PR introduces many additional forms of authentication that weren't possible with just headers in the request (for example OAuth).
The Meltano SDK introduced a number of authentication methods, which have been supported with this feature. The feature utilizes the available SDK Authenticators https://github.com/meltano/sdk/blob/main/singer_sdk/authenticators.py.
While new auth methods are supported, by default for legacy support, you can still pass Authentication via headers, there is no breaking changes as a result. New supported authenticators :
- oauth: for OAuth2 authentication
- basic: Basic Header authentication - base64-encoded username + password config items
- api_key: for API Keys in the header e.g. X-API-KEY.
- bearer_token: for Bearer token authentication.
- aws: for AWS authentication. Works with the
aws_credentials
parameter.
Please note that support for OAuthJWTAuthentication has not been developed.
Other Changes:
- Fixes to the meltano.yml kind / data types.
- Updated meltano.yml with all the available parameters.
- Adds a config.json.sample file for illustrating how to construct a config.json file when using the tap stand-alone for development purposes.
- Documentation for new settings and examples of use against a number of API's.
Note: I am aware that there are no supported API tests as they are time consuming to build and test. I have however with my limited time tested against of variety of API's available to me. Perhaps faker
python package to help simulate tests for a variety of API's and responses. This appears to be used by tap-dbt https://github.com/MeltanoLabs/tap-dbt/blob/main/tests/test_core.py