Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a topic guide for the RESTClient class #1151

Merged
merged 24 commits into from
May 7, 2024
Merged

Conversation

burnash
Copy link
Collaborator

@burnash burnash commented Mar 26, 2024

Description

This PR adds documentation for the RESTClient class, including a brief example and instructions on how to handle paginated endpoints in APIs.

Related Issues

@burnash burnash self-assigned this Mar 26, 2024
Copy link

netlify bot commented Mar 26, 2024

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit 6fcd820
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/663a6701a812060007ef7727
😎 Deploy Preview https://deploy-preview-1151--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@sh-rp sh-rp added the documentation Improvements or additions to documentation label Apr 3, 2024
@@ -0,0 +1,259 @@
# Reading data from RESTful APIs

RESTful APIs are a common way to interact with web services. They are based on the HTTP protocol and are used to read and write data from and to a web server. dlt provides a simple way to read data from RESTful APIs using two helper methods: [a wrapper around the `requests`](../reference/performance#using-the-built-in-requests-client) library and a `RESTClient` class.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest splitting this in two pages:

  1. RESTClient -- takes care of pagination, authentification, customization
  2. requests module (migrate stuff from performance to separate page) -- takes care of retries

Maybe even abstract the requests helper and make retry customization available through RESTClient so users have one entry point.


## Quick example

Here's a simple pipeline that reads issues from the [dlt GitHub repository](https://github.com/dlt-hub/dlt/issues). The API endpoint is `https://api.github.com/repos/dlt-hub/dlt/issues`. The result is "paginated", meaning that the API returns a limited number of issues per page. The `paginate()` iterates over all pages and yields the results which are then processed by the pipeline.
Copy link
Contributor

@VioletM VioletM Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add that in this example authentification and pagination are guessed automatically. And after this example put another one that shows how you can specify the paginator and auth if they are special/not guessed correctly.

Honestly, these two examples would be enough to give users an understanding of how to use RESTClient.

So overall I suggest:

  • Quick example simple
  • Quick example with specific paginator/auth
  • Available Paginators
    • How to write my own if nothing fits
  • Available Auths
    • How to write my own if nothing fits
  • Advanced usage

In this case user WOW with this doc would be:

  1. Copy-past code which guesses everything automatically
  2. If it doesn't work, copy-past code with a specification of pagination and auth.
  3. Search in docs which auth and paginator is right for you
  4. If nothing fits -> How-to write custom auth/paginator.

@rudolfix rudolfix added the sprint Marks group of tasks with core team focus at this moment label Apr 25, 2024
rudolfix
rudolfix previously approved these changes May 3, 2024
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@burnash if you want to top up this then add add an example with incremental loading. this will be the most common "advanced" task people have

it maybe a separate ticket - this is really good

@rudolfix rudolfix dismissed their stale review May 3, 2024 07:48

one topic is really missing

@burnash
Copy link
Collaborator Author

burnash commented May 3, 2024

one topic is really missing

@rudolfix which one?

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not explain data_selector. It requires a separate chapter because it does a lot of detection magic. this must be explained - at least briefly

IMO we should do a followup ticket on this:

  • I agree with @VioletM - we should explain the pagination detection better and say when it works (links paginator, next field at known places) and when it does not (offset pagination using non standard names) and what happens if we cannot detect.
  • more in depth on data_selector response detections
  • I'd add an example with incremental loading. this will be the most common "advanced" task people have

@burnash burnash force-pushed the enh/docs/rest_client branch from ae5508d to 9a3d9bb Compare May 7, 2024 08:52
@burnash burnash requested review from VioletM and rudolfix May 7, 2024 09:04
@burnash
Copy link
Collaborator Author

burnash commented May 7, 2024

@VioletM @rudolfix please take a look at the reworked version. I've covered missing sections and moved the docs for requests wrapper to this guide. I agree with @VioletM we may consider going even further and streamlining the code so there's one entry point. Open for discussion.

VioletM
VioletM previously approved these changes May 7, 2024
Copy link
Contributor

@VioletM VioletM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Everything is explained with examples, this is a very good quality docs :)


`JSONResponsePaginator` is designed for APIs where the next page URL is included in the response's JSON body. This paginator uses a JSON path to locate the next page URL within the JSON response.

##### Parameters:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:
The 5th and 6th headings looks very strange, smaller than a regular text :D
Maybe just use bold?
image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

@burnash burnash merged commit 386e581 into devel May 7, 2024
40 of 41 checks passed
@burnash burnash deleted the enh/docs/rest_client branch May 7, 2024 17:47
@rudolfix rudolfix removed the sprint Marks group of tasks with core team focus at this moment label Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants