Add an example for the incremental configuration to the rest_api docs

dlt-hub · Jun 24, 2024 · 9b1004d · 9b1004d
1 parent 8eba834
commit 9b1004d
Showing 1 changed file with 65 additions and 33 deletions.
diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/rest_api.md b/docs/website/docs/dlt-ecosystem/verified-sources/rest_api.md
@@ -560,49 +560,81 @@ This will include the `id`, `title`, and `created_at` fields from the `issues` r
 Some APIs provide a way to fetch only new or changed data (most often by using a timestamp field like `updated_at`, `created_at`, or incremental IDs).
 This is called [incremental loading](../../general-usage/incremental-loading.md) and is very useful as it allows you to reduce the load time and the amount of data transferred.
 
-When the API endpoint supports incremental loading, you can configure the source to load only the new or changed data using these two methods:
+When the API endpoint supports incremental loading, you can configure dlt to load only the new or changed data using these two methods:
 
-1. Defining a special parameter in the `params` section of the [endpoint configuration](#endpoint-configuration):
+1. Defining a special parameter in the `params` section.
+2. Specifying the `incremental` field.
 
-    ```py
-    {
-        "<parameter_name>": {
-            "type": "incremental",
-            "cursor_path": "<path_to_cursor_field>",
-            "initial_value": "<initial_value>",
-        },
-    }
-    ```
+Both are configured in the [endpoint configuration](#endpoint-configuration). Let's start with the first method.
 
-    For example, in the `issues` resource configuration in the GitHub example, we have:
+### Incremental loading in `params`
 
-    ```py
-    {
-        "since": {
+Imagine we have the following endpoint `https://api.example.com/posts` and it:
+1. Accepts a `created_since` query parameter to fetch posts created after a certain date.
+2. Returns a list of posts with the `created_at` field for each post.
+
+For example, if we query the endpoint with `https://api.example.com/posts?created_since=2024-01-25`, we get the following response:
+
+```json
+{
+    "results": [
+        {"id": 1, "title": "Post 1", "created_at": "2024-01-26"},
+        {"id": 2, "title": "Post 2", "created_at": "2024-01-27"},
+        {"id": 3, "title": "Post 3", "created_at": "2024-01-28"}
+    ]
+}
+```
+
+To enable the incremental loading for this endpoint, you can use the following configuration:
+
+```py
+{
+    "path": "posts",
+    "data_selector": "results",  # Optional JSONPath to select the list of posts
+    "params": {
+        "created_since": {
             "type": "incremental",
-            "cursor_path": "updated_at",
-            "initial_value": "2024-01-25T11:21:28Z",
+            "cursor_path": "created_at", # The JSONPath to the field we want to track in each post
+            "initial_value": "2024-01-25",
         },
-    }
-    ```
+    },
+}
+```
 
-    This configuration tells the source to create an incremental object that will keep track of the `updated_at` field in the response and use it as a value for the `since` parameter in subsequent requests.
+After you run the pipeline, dlt will keep track of the last `created_at` from all the posts fetched and use it as the `created_since` parameter in the next request.
+So in our case, the next request will be made to `https://api.example.com/posts?created_since=2024-01-28` to fetch only the new posts created after `2024-01-28`.
 
-2. Specifying the `incremental` field in the [endpoint configuration](#endpoint-configuration):
+Now, let's break down the configuration. The `created_since` parameter is defined as an incremental parameter with the following fields:
 
-    ```py
-    {
-        "incremental": {
-            "start_param": "<parameter_name>",
-            "end_param": "<parameter_name>",
-            "cursor_path": "<path_to_cursor_field>",
-            "initial_value": "<initial_value>",
-            "end_value": "<end_value>",
-        }
-    }
-    ```
+```py
+{
+    "<parameter_name>": {
+        "type": "incremental",
+        "cursor_path": "<path_to_cursor_field>",
+        "initial_value": "<initial_value>",
+    },
+}
+```
+
+- `type`: The type of the incremental parameter. Set to `incremental`.
+- `cursor_path`: The JSONPath to the field within each item in the list that will be used as the cursor value. In this case, it's `created_at`. Note that the path starts from the root of the item (dict) and not from the root of the response.
+- `initial_value`: The initial value for the cursor. This is the value that will initialize the state of incremental loading. In this case, it's `2024-01-25`.
+
+### Incremental loading using the `incremental` field
 
-    This configuration is more flexible and allows you to specify the start and end conditions for the incremental loading.
+The alternative method is to use the `incremental` field in the [endpoint configuration](#endpoint-configuration). This method is more flexible and allows you to specify the start and end conditions for the incremental loading:
+
+```py
+{
+    "incremental": {
+        "start_param": "<parameter_name>",
+        "end_param": "<parameter_name>",
+        "cursor_path": "<path_to_cursor_field>",
+        "initial_value": "<initial_value>",
+        "end_value": "<end_value>",
+    }
+}
+```
 
 See the [incremental loading](../../general-usage/incremental-loading.md#incremental-loading-with-a-cursor-field) guide for more details.