Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add missing parts for rest client docs #1397

Closed
wants to merge 6 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
167 changes: 141 additions & 26 deletions docs/website/docs/general-usage/http/rest-client.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,18 @@
---
title: RESTClient
description: Learn how to use the RESTClient class to interact with RESTful APIs
keywords: [api, http, rest, request, extract, restclient, client, pagination, json, response, data_selector, session, auth, paginator, jsonresponsepaginator, headerlinkpaginator, offsetpaginator, jsonresponsecursorpaginator, queryparampaginator, bearer, token, authentication]
keywords:
[
api, http, rest, request, extract, restclient, client,
pagination, json, response, data_selector, session, auth,
paginator, jsonresponsepaginator, headerlinkpaginator, offsetpaginator,
jsonresponsecursorpaginator, queryparampaginator, bearer, token,
authentication, reverse etl, json path, openapi, swagger
]
---

The `RESTClient` class offers an interface for interacting with RESTful APIs, including features like:

- automatic pagination,
- various authentication mechanisms,
- customizable request/response handling.
Expand Down Expand Up @@ -72,31 +80,31 @@ For example, if the API response looks like this:

```json
{
"posts": [
{"id": 1, "title": "Post 1"},
{"id": 2, "title": "Post 2"},
{"id": 3, "title": "Post 3"}
]
"posts": [
{ "id": 1, "title": "Post 1" },
{ "id": 2, "title": "Post 2" },
{ "id": 3, "title": "Post 3" }
]
}
```

The `data_selector` should be set to `"posts"` to extract the list of posts from the response.
The `data_selector` should be set to `"posts"` or `"$.posts"` to extract the list of posts from the response.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some people are used to use JSONPath starting with $. so this is just to give relation to it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but I think if they know JSONPath already they would know that extended syntax anyway. I would try to optimize here for those unfamiliar with JSONPath. We also link JSONPath docs twice for those who need advanced JSONPath.


For a nested structure like this:

```json
{
"results": {
"posts": [
{"id": 1, "title": "Post 1"},
{"id": 2, "title": "Post 2"},
{"id": 3, "title": "Post 3"}
]
}
"results": {
"posts": [
{ "id": 1, "title": "Post 1" },
{ "id": 2, "title": "Post 2" },
{ "id": 3, "title": "Post 3" }
]
}
}
```

The `data_selector` needs to be set to `"results.posts"`. Read more about [JSONPath syntax](https://github.com/h2non/jsonpath-ng?tab=readme-ov-file#jsonpath-syntax) to learn how to write selectors.
The `data_selector` needs to be set to `"results.posts"` or `"$.results.posts"`. Read more about [JSONPath syntax](https://github.com/h2non/jsonpath-ng?tab=readme-ov-file#jsonpath-syntax) to learn how to write selectors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for this. Why would we need to have an alternative declaration here?


### PageData

Expand Down Expand Up @@ -133,14 +141,14 @@ Suppose the API response for `https://api.example.com/posts` looks like this:

```json
{
"data": [
{"id": 1, "title": "Post 1"},
{"id": 2, "title": "Post 2"},
{"id": 3, "title": "Post 3"}
],
"pagination": {
"next": "https://api.example.com/posts?page=2"
}
"data": [
{ "id": 1, "title": "Post 1" },
{ "id": 2, "title": "Post 2" },
{ "id": 3, "title": "Post 3" }
],
"pagination": {
"next": "https://api.example.com/posts?page=2"
}
}
```

Expand All @@ -161,7 +169,6 @@ def get_data():
yield page
```


#### HeaderLinkPaginator

This paginator handles pagination based on a link to the next page in the response headers (e.g., the `Link` header, as used by GitHub).
Expand Down Expand Up @@ -432,6 +439,26 @@ for page in client.paginate("/protected/resource"):
print(page)
```

## Common resource defaults

In `RESTAPIConfig` you can provide via `resource_defaults` which will then be applied to all requests

```py
my_params = {
"from_year": 2018,
"end_year": 2024,
}

source_config: RESTAPIConfig = {
"client": {...},
"resource_defaults": {
"endpoint": {
"params": my_params,
}
}
}
```

Comment on lines +442 to +461
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part does not belong to this document. This is documentation for RESTClient and not rest_api.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I will remove it

### API key authentication

API Key Authentication (`ApiKeyAuth`) is an auth method where the client sends an API key in a custom header (e.g. `X-API-Key: <key>`, or as a query parameter).
Expand Down Expand Up @@ -481,11 +508,13 @@ response = client.get("/protected/resource")

You can implement custom authentication by subclassing the `AuthConfigBase` class and implementing the `__call__` method:

**Custom bearer auth:**

```py
from dlt.sources.helpers.rest_client.auth import AuthConfigBase

class CustomAuth(AuthConfigBase):
def __init__(self, token):
def __init__(self, token: str):
self.token = token

def __call__(self, request):
Expand All @@ -494,6 +523,24 @@ class CustomAuth(AuthConfigBase):
return request
```

**Custom combined auth:**
Sometimes you need to pass authentication parameters via headers as well as query params

```py
from dlt.sources.helpers.rest_client.auth import AuthConfigBase

class CombinedAuth(AuthConfigBase):
def __init__(self, client_id: str, client_secret: str):
self.client_id = client_id
self.client_secret = client_secret

def __call__(self, request):
# Modify the request object to include the necessary authentication headers and request params
request.headers["Authorization"] = f"Bearer {self.client_secret}"
request.prepare_url(request.url, {"client_id": self.client_id})
return request
```
Comment on lines +526 to +542
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we need this example here: the difference that I see here from the previous example is that it shows that request is a PreparedRequest instance (because we can call prepare_url on it). I think showing that request is a PreparedRequest has a value, so instead of the example I would add explicit types in the previous example (like you did with token: str) and add a text description elaborating on what __call__ actually is receiving and linking Requests docs from PreparedRequest so the reader can quickly see what's possible.


Then, you can use your custom authentication class with the `RESTClient`:

```py
Expand All @@ -518,6 +565,74 @@ client.paginate("/posts", hooks={"response": [custom_response_handler]})

The handler function may raise `IgnoreResponseException` to exit the pagination loop early. This is useful for the enpoints that return a 404 status code when there are no items to paginate.

### Incremental loading

It is often needed to load only the new data based on some incremental property be it timestamp, date and time, integer identifier or a cursor value.
Fortunately our `RESTClient` allows you to elegantly express this behavior.

Let's use our slightly modified example response json and we want to load new posts as they appear without complete reload of data.

```json
{
"data": [
{ "id": 1, "title": "Post 1", "created_at": "2010-08-21T17:11:27-0400" },
{ "id": 2, "title": "Post 2", "created_at": "2010-09-21T17:11:27-0400" },
{ "id": 3, "title": "Post 3", "created_at": "2010-10-21T17:11:27-0400" }
]
}
```

To achive our objective we need to use `endpoint.params` by adding the incremental type.
In the following examples we use `id` - primary key and `created_at` - creation datetime.

**Incremental loading by id**

```py
source_config: RESTAPIConfig = {
"resources": [
{
"name": "get_posts_list",
"table_name": "posts",
"endpoint": {
"data_selector": "$.data",
"path": "/posts",
"params": {
"post_id": {
"type": "incremental",
"cursor_path": "id",
"initial_value": 1,
}
},
},
}
]
}
```

**Incremental loading by creation date**

```py
source_config: RESTAPIConfig = {
"resources": [
{
"name": "get_posts_list",
"table_name": "posts",
"endpoint": {
"data_selector": "$.data",
"path": "/posts",
"params": {
"creation_date": {
"type": "incremental",
"cursor_path": "created_at",
"initial_value": "2010-08-21T17:11:27-0400",
}
},
},
}
]
}
```

Comment on lines +568 to +635
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right place for this section: rest-client.md is only for documenting RESTClient class & relevant functionality and not rest_api source. Incremental loading is covered in rest_api here: https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api#incremental-loading

## Shortcut for paginating API responses

The `paginate()` function provides a shorthand for paginating API responses. It takes the same parameters as the `RESTClient.paginate()` method but automatically creates a RESTClient instance with the specified base URL:
Expand Down Expand Up @@ -560,7 +675,7 @@ RUNTIME__LOG_LEVEL=INFO python my_script.py
```

2. Use the [`PageData`](#pagedata) instance to inspect the [request](https://docs.python-requests.org/en/latest/api/#requests.Request)
and [response](https://docs.python-requests.org/en/latest/api/#requests.Response) objects:
and [response](https://docs.python-requests.org/en/latest/api/#requests.Response) objects:

```py
from dlt.sources.helpers.rest_client import RESTClient
Expand Down
Loading