-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make batched loading more convenient #2136
Comments
I think this would be a great addition! I've experimented a bit with something like the script bellow. But I've found that it seems that the generator is "restarted" ( passed by value and not reference somewhere maybe?). So it outputs 0-99 twice instead of 0-199. I don't know how easy that would be to fix, but maybe a pattern like that could be used?
|
@loveeklund-osttra on the next pipeline run your generator will be re-opened. for what you want to achieve you'll have to use an incremental and start the generator at the right point on the next run. |
I have linked a PR here which improves the add_limit and also adds a nice example for the sql_database. |
Yeah that works for most usecases. I guess there are some edge cases where it wont work, like if you more records with the same incremental key than you have memory in your runtime ( could be a problem for small lambdas etc). I don't think these are huge issue and I appreciate your answers a lot. Just wanted to make you aware of some issues that might arise from this solution that I've thought about, maybe to be included in some documentation so people are aware. |
A common use case seems to be people loading large sources (such as big sql_databases) and wanting to batch the runs into multiple smaller runs. We should probably make this more convenient by:
The text was updated successfully, but these errors were encountered: