Skip to content

Adding SDK support for Pagination and Authentication.

Compare
Choose a tag to compare
@s7clarke10 s7clarke10 released this 25 Jul 01:59
· 20 commits to main since this release
adb41b5

This PR introduces new Authenticators and Paginators to tap-rest-api-msdk. (it is a refactored approach to previous PR's). With this feature there is greater support for a range of API's - making this tap the swiss army knife for accessing API's.

Summary

  • Support for most Meltano SDK Authenticators.
  • Support for all Meltano Paginators.
  • Flexibility to support many new API's by new settings to adjust request parameter names. See README.md for more details on settings.
  • Ability to send parameters in the request body rather than request parameters (if required).
  • Moving from deprecated get_next_page_token to support get_new_paginator. This removes the warnings in the logs.
  • Enhanced incremental replication (include support for API query templates).
  • New modules auth and pagination keeping a clean design.
  • New auth method aws, to support ingestion from AWS REST End-point e.g. OpenSearch.

Paginators

Each REST API is different. This PR builds on the concept of picking an appropriate request and response style for the API. Select an appropriate pagination_request_style to pick the paginator you require. In most cases this needs to be coupled with an appropriate paginator_response_style to process the response and pick the next page location in the body or headers.

Supported Paginators as part of this PR include:

  • jsonpath_paginator or default - This style obtains the token for the next page from a specific location in the response body via JSONPath notation. In many situations the jsonpath_paginator is a more appropriate paginator to the hateoas_paginator.
    • next_page_token_path - The jsonpath to next page token. Example: "$['@odata.nextLink']", this locates the token returned via the Microsoft Graph API. Default '$.next_page' for the jsonpath_paginator paginator only otherwise None.
  • offset_paginator or style1 - This style uses URL parameters named offset and limit
    • offset is calculated from the previous response, or not set if there is no previous response
    • pagination_page_size - Sets a limit to number of records per page / response. Default 25 records.
    • pagination_limit_per_page_param - the name of the API parameter to limit number of records per page. Default parameter name limit.
    • pagination_total_limit_param - The name of the param that indicates the total limit e.g. total, count. Defaults to total
    • next_page_token_path - Used to locate an appropriate link in the response. Default None - but looks in the pagination section of the JSON response by default. Example, jsonpath to get the offset from the NOAA API '$.metadata.resultset'.
  • simple_header_paginator - This style uses links in the Header Response to locate the next page. Example the x-next-page link used by the Gitlab API.
  • header_link_paginator - This style uses the default header link paginator from the Meltano SDK.
  • restapi_header_link_paginator - This style is a variant on the header_link_paginator. It supports the ability to read from GitHub API.
    • pagination_page_size - Sets a limit to number of records per page / response. Default 25 records.
    • pagination_limit_per_page_param - the name of the API parameter to limit number of records per page. Default parameter name per_page.
    • pagination_results_limit - Restricts the total number of records returned from the API. Default None i.e. no limit.
  • hateoas_paginator - This style parses the next_token response for the parameters to pass. It is used by API's utilising the HATEOAS Rest style HATEOAS, including FHIR API's.
    • pagination_page_size - Sets a limit to number of records per page / response. Default None.
    • pagination_limit_per_page_param - the name of the API parameter to limit number of records per page e.g. _count for FHIR API's. Default None.
  • single_page_paginator - A paginator that does works with single-page endpoints.
  • page_number_paginator - Paginator class for APIs that use page number. Looks at the response link to determine more pages.
    • next_page_token_path - Use to locate an appropriate link in the response. Default "hasMore".

Authentication

This PR introduces many additional forms of authentication that weren't possible with just headers in the request (for example OAuth).

The Meltano SDK introduced a number of authentication methods, which have been supported with this feature. The feature utilizes the available SDK Authenticators https://github.com/meltano/sdk/blob/main/singer_sdk/authenticators.py.

While new auth methods are supported, by default for legacy support, you can still pass Authentication via headers, there is no breaking changes as a result. New supported authenticators :

  • oauth: for OAuth2 authentication
  • basic: Basic Header authentication - base64-encoded username + password config items
  • api_key: for API Keys in the header e.g. X-API-KEY.
  • bearer_token: for Bearer token authentication.
  • aws: for AWS authentication. Works with the aws_credentials parameter.

Please note that support for OAuthJWTAuthentication has not been developed.

Other Changes:

  • Fixes to the meltano.yml kind / data types.
  • Updated meltano.yml with all the available parameters.
  • Adds a config.json.sample file for illustrating how to construct a config.json file when using the tap stand-alone for development purposes.
  • Documentation for new settings and examples of use against a number of API's.

Note: I am aware that there are no supported API tests as they are time consuming to build and test. I have however with my limited time tested against of variety of API's available to me. Perhaps faker python package to help simulate tests for a variety of API's and responses. This appears to be used by tap-dbt https://github.com/MeltanoLabs/tap-dbt/blob/main/tests/test_core.py