-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(bigquery): add streaming inserts support #1123
Conversation
✅ Deploy Preview for dlt-hub-docs canceled.
|
@rudolfix, if you're wondering if it's working. Yes, it is, here is a scratchy proof: So, I inserted the data and then read it from the BigQuery backend. I don't see yet why it would be a bad solution, so I'm tidying it up 👌 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good direction! but it can't be so hacky:
- do not import from other destination. move jobs that you need to
job_impl.py
. we need a small refactor here - also support parquet via the other job
- config option is cool but we need per table definition. look at
bigquery_adapter
there you can add table hint with the loader type and interpret it during loading. should be very easy
add streaming arg into bigquery_adapter support parquet format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK we are almost there. I have one comment regarding the hint.
there's one more thing:
google-cloud-bigquery = {version = ">=2.26.0", optional = true}
google-cloud-bigquery = {version = ">=3.14", optional = true}
which is good, because we should work on the newest version. but we have two problems:
-
I use some kind of hack to access query job from the dbapi cursor to load data frame:
https://github.com/dlt-hub/dlt/actions/runs/8431727786/job/23089624139?pr=1123#step:8:921
are you able to make it work for both versions? I have another PR where I do it without any hacks
https://github.com/dlt-hub/dlt/pull/998/files#diff-94cc93a6e16c7508322344071cf19788557e561db8b55b6755261155bce91cb8R48
please take code from there -
something is going on here:
https://github.com/dlt-hub/dlt/actions/runs/8431727786/job/23089624139?pr=1123#step:8:932
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I've found a few inconsistences
@rudolfix, there is a failure in I suspect there is something in the Ci/Cl test table. Is it possible to clear it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- pls read my comment on MERGE problem
- see my comments on failing tests
test tables and datasets are dropped in test fixture. what happens on CI is weird because there's a record but one of the values is None... my take: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code is good! please update bigquery docs.
- the BigQuery adapter part
- rewrite
Data Loading
chapter to add information on streaming writes. please mention that only append mode works and that there is a buffer etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Towards #1037