Adding SDK support for Pagination and Authentication.
This PR introduces new Authenticators and Paginators to tap-rest-api-msdk
. (it is a refactored approach to previous PR's). With this feature there is greater support for a range of API's - making this tap the swiss army knife for accessing API's.
Summary
- Support for most Meltano SDK Authenticators.
- Support for all Meltano Paginators.
- Flexibility to support many new API's by new settings to adjust request parameter names. See README.md for more details on settings.
- Ability to send parameters in the request body rather than request parameters (if required).
- Moving from deprecated
get_next_page_token
to supportget_new_paginator
. This removes the warnings in the logs. - Enhanced incremental replication (include support for API query templates).
- New modules
auth
andpagination
keeping a clean design. - New
auth
method aws, to support ingestion from AWS REST End-point e.g. OpenSearch.
Paginators
Each REST API is different. This PR builds on the concept of picking an appropriate request and response style for the API. Select an appropriate pagination_request_style
to pick the paginator you require. In most cases this needs to be coupled with an appropriate paginator_response_style
to process the response and pick the next page location in the body or headers.
Supported Paginators as part of this PR include:
jsonpath_paginator
ordefault
- This style obtains the token for the next page from a specific location in the response body via JSONPath notation. In many situations thejsonpath_paginator
is a more appropriate paginator to thehateoas_paginator
.next_page_token_path
- The jsonpath to next page token. Example:"$['@odata.nextLink']"
, this locates the token returned via the Microsoft Graph API. Default'$.next_page'
for thejsonpath_paginator
paginator only otherwise None.
offset_paginator
orstyle1
- This style uses URL parameters named offset and limitoffset
is calculated from the previous response, or not set if there is no previous responsepagination_page_size
- Sets a limit to number of records per page / response. Default25
records.pagination_limit_per_page_param
- the name of the API parameter to limit number of records per page. Default parameter namelimit
.pagination_total_limit_param
- The name of the param that indicates the total limit e.g. total, count. Defaults to totalnext_page_token_path
- Used to locate an appropriate link in the response. Default None - but looks in thepagination
section of the JSON response by default. Example, jsonpath to get the offset from the NOAA API'$.metadata.resultset'
.
simple_header_paginator
- This style uses links in the Header Response to locate the next page. Example thex-next-page
link used by the Gitlab API.header_link_paginator
- This style uses the default header link paginator from the Meltano SDK.restapi_header_link_paginator
- This style is a variant on the header_link_paginator. It supports the ability to read from GitHub API.pagination_page_size
- Sets a limit to number of records per page / response. Default25
records.pagination_limit_per_page_param
- the name of the API parameter to limit number of records per page. Default parameter nameper_page
.pagination_results_limit
- Restricts the total number of records returned from the API. Default None i.e. no limit.
hateoas_paginator
- This style parses the next_token response for the parameters to pass. It is used by API's utilising the HATEOAS Rest style HATEOAS, including FHIR API's.pagination_page_size
- Sets a limit to number of records per page / response. Default None.pagination_limit_per_page_param
- the name of the API parameter to limit number of records per page e.g._count
for FHIR API's. Default None.
single_page_paginator
- A paginator that does works with single-page endpoints.page_number_paginator
- Paginator class for APIs that use page number. Looks at the response link to determine more pages.next_page_token_path
- Use to locate an appropriate link in the response. Default"hasMore"
.
Authentication
This PR introduces many additional forms of authentication that weren't possible with just headers in the request (for example OAuth).
The Meltano SDK introduced a number of authentication methods, which have been supported with this feature. The feature utilizes the available SDK Authenticators https://github.com/meltano/sdk/blob/main/singer_sdk/authenticators.py.
While new auth methods are supported, by default for legacy support, you can still pass Authentication via headers, there is no breaking changes as a result. New supported authenticators :
- oauth: for OAuth2 authentication
- basic: Basic Header authentication - base64-encoded username + password config items
- api_key: for API Keys in the header e.g. X-API-KEY.
- bearer_token: for Bearer token authentication.
- aws: for AWS authentication. Works with the
aws_credentials
parameter.
Please note that support for OAuthJWTAuthentication has not been developed.
Other Changes:
- Fixes to the meltano.yml kind / data types.
- Updated meltano.yml with all the available parameters.
- Adds a config.json.sample file for illustrating how to construct a config.json file when using the tap stand-alone for development purposes.
- Documentation for new settings and examples of use against a number of API's.
Note: I am aware that there are no supported API tests as they are time consuming to build and test. I have however with my limited time tested against of variety of API's available to me. Perhaps faker
python package to help simulate tests for a variety of API's and responses. This appears to be used by tap-dbt https://github.com/MeltanoLabs/tap-dbt/blob/main/tests/test_core.py