[experiment] Add resource time limit and rate limiting #1485

sh-rp · 2024-06-18T09:45:16Z

Description

This PR extends the add_limit function to add time limits and rate limits to resources. This approach is to be discussed, but very straightforward, easy to test and works for both sync and async resources.

TODOs (if we go this route):

More tests that test pipe iterators with multiple resources (we want to allow a global rate limit for APIs for example)
Docs
Extend source level add_limit to have the same functionality as the resource level one
Improve the logger warning if add_limit is declared on non-incremental resources.
Investigate fifo extractor strategy, maybe do not go to round robin if none is yielded...

Other thoughts:

We might want to apply the rate limit wait also once before the original generator is used plus allow rate limiting on the transformers, otherwise global rate limiting for APIs will not work.

netlify · 2024-06-18T09:45:32Z

✅ Deploy Preview for dlt-hub-docs canceled.

Name	Link
🔨 Latest commit	`5fc2324`
🔍 Latest deploy log	https://app.netlify.com/sites/dlt-hub-docs/deploys/6734869165314d0008ffccee

sh-rp · 2024-06-18T09:48:46Z

dlt/extract/resource.py

+            if max_items >= 0 or max_time and not self.incremental:
+                from dlt.common import logger
+
+                logger.warning(


this will need to be improved a bit, but I think this is a quite nice solution.

also: generally speaking it would be cool to have some wrapper around log messages to be able to test them, maybe mocking would be enough, not sure

sh-rp · 2024-06-18T10:04:50Z

dlt/extract/resource.py

+                        while (last_iteration + min_wait) - time.time() > 0:
+                            # we give control back to the pipe iterator
+                            yield None
+                            time.sleep(0.1)


we could make this configurable, I am not sure wether it is needed though.

sh-rp · 2024-06-18T10:05:19Z

tests/extract/test_sources.py

@@ -788,52 +788,6 @@ def test_add_transformer_right_pipe() -> None:
        iter([1, 2, 3]) | dlt.resource(lambda i: i * 3, name="lambda")


-def test_limit_infinite_counter() -> None:


these three tests were moved to the new location where a couple more tests will be added specifically for the limits

joscha · 2024-11-28T16:52:10Z

We might want to apply the rate limit wait also once before the original generator is used plus allow rate limiting on the transformers, otherwise global rate limiting for APIs will not work.

🎉

I like where this is going. What are your thoughts on how this could be used in conjunction with headers returned from API endpoints?
I have this API for example: https://developer.affinity.co/#section/Getting-Started/Rate-Limits

which returns headers in each response:

It basically allows you to adjust your requests based on a api key/user and an org-wide limit. The current rest client supports 429 responses, but my plan is to possibly preempt them as much as I can, so using these headers dynamically (e.g. update the throttling behavior after each response) would be my goal.

sh-rp · 2024-12-15T19:15:57Z

Closing this PR in favor of:

@joscha the api rate limiting things you have suggested would be a layer above in the rest_api implementation. There might already be a ticket for this or you could open a new one.

joscha · 2024-12-15T19:40:42Z

would be a layer above in the rest_api implementation

Okay. How so, if it relies on a 429 answer or an explicit rest API resource returning the limits? Would you assume each resource to have some sort of hook to report back any time limits?

sh-rp commented Jun 18, 2024

View reviewed changes

sh-rp force-pushed the exp/more-limits branch from e345079 to 6126895 Compare June 20, 2024 08:37

sh-rp added the enhancement New feature or request label Jun 20, 2024

rudolfix force-pushed the devel branch 2 times, most recently from 2ee3eab to e48f641 Compare September 16, 2024 13:20

sh-rp force-pushed the devel branch from ec730e8 to fcc4c45 Compare September 17, 2024 10:04

first implementation of limits with some tests

5fc2324

sh-rp force-pushed the exp/more-limits branch from 6126895 to 5fc2324 Compare November 13, 2024 10:59

sh-rp self-assigned this Nov 19, 2024

rudolfix mentioned this pull request Dec 10, 2024

Filesystem Source incremental loading with S3 not working correctly #2124

Closed

sh-rp closed this Dec 15, 2024

joscha mentioned this pull request Dec 16, 2024

Simple source rate limiting #2149

Open

rudolfix deleted the exp/more-limits branch December 19, 2024 14:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[experiment] Add resource time limit and rate limiting #1485

[experiment] Add resource time limit and rate limiting #1485

sh-rp commented Jun 18, 2024 •

edited

Loading

netlify bot commented Jun 18, 2024 •

edited

Loading

sh-rp Jun 18, 2024

sh-rp Jun 19, 2024

sh-rp Jun 18, 2024

sh-rp Jun 18, 2024

joscha commented Nov 28, 2024

sh-rp commented Dec 15, 2024

joscha commented Dec 15, 2024

		@@ -788,52 +788,6 @@ def test_add_transformer_right_pipe() -> None:
		iter([1, 2, 3]) \| dlt.resource(lambda i: i * 3, name="lambda")


		def test_limit_infinite_counter() -> None:

[experiment] Add resource time limit and rate limiting #1485

[experiment] Add resource time limit and rate limiting #1485

Conversation

sh-rp commented Jun 18, 2024 • edited Loading

Description

netlify bot commented Jun 18, 2024 • edited Loading

✅ Deploy Preview for dlt-hub-docs canceled.

sh-rp Jun 18, 2024

Choose a reason for hiding this comment

sh-rp Jun 19, 2024

Choose a reason for hiding this comment

sh-rp Jun 18, 2024

Choose a reason for hiding this comment

sh-rp Jun 18, 2024

Choose a reason for hiding this comment

joscha commented Nov 28, 2024

sh-rp commented Dec 15, 2024

joscha commented Dec 15, 2024

sh-rp commented Jun 18, 2024 •

edited

Loading

netlify bot commented Jun 18, 2024 •

edited

Loading