Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bigquery): add streaming inserts support #1123

Merged
merged 22 commits into from
Apr 4, 2024
Merged

Conversation

IlyaFaer
Copy link
Contributor

Towards #1037

Copy link

netlify bot commented Mar 21, 2024

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit 35fc641
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/660e8a9472662f0008bef8ad

@IlyaFaer
Copy link
Contributor Author

@rudolfix, if you're wondering if it's working. Yes, it is, here is a scratchy proof:

Untitled2

So, I inserted the data and then read it from the BigQuery backend. I don't see yet why it would be a bad solution, so I'm tidying it up 👌

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good direction! but it can't be so hacky:

  1. do not import from other destination. move jobs that you need to job_impl.py. we need a small refactor here
  2. also support parquet via the other job
  3. config option is cool but we need per table definition. look at bigquery_adapter there you can add table hint with the loader type and interpret it during loading. should be very easy

dlt/destinations/impl/bigquery/bigquery.py Outdated Show resolved Hide resolved
dlt/destinations/impl/bigquery/bigquery.py Outdated Show resolved Hide resolved
dlt/destinations/impl/bigquery/configuration.py Outdated Show resolved Hide resolved
dlt/common/schema/typing.py Outdated Show resolved Hide resolved
@IlyaFaer IlyaFaer marked this pull request as ready for review March 26, 2024 07:39
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK we are almost there. I have one comment regarding the hint.

there's one more thing:

google-cloud-bigquery = {version = ">=2.26.0", optional = true}
google-cloud-bigquery = {version = ">=3.14", optional = true}

which is good, because we should work on the newest version. but we have two problems:

  1. I use some kind of hack to access query job from the dbapi cursor to load data frame:
    https://github.com/dlt-hub/dlt/actions/runs/8431727786/job/23089624139?pr=1123#step:8:921
    are you able to make it work for both versions? I have another PR where I do it without any hacks
    https://github.com/dlt-hub/dlt/pull/998/files#diff-94cc93a6e16c7508322344071cf19788557e561db8b55b6755261155bce91cb8R48
    please take code from there

  2. something is going on here:
    https://github.com/dlt-hub/dlt/actions/runs/8431727786/job/23089624139?pr=1123#step:8:932

dlt/common/schema/typing.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I've found a few inconsistences

dlt/destinations/impl/bigquery/bigquery.py Show resolved Hide resolved
tests/load/bigquery/test_bigquery_streaming_insert.py Outdated Show resolved Hide resolved
dlt/destinations/impl/bigquery/bigquery.py Outdated Show resolved Hide resolved
@IlyaFaer
Copy link
Contributor Author

IlyaFaer commented Apr 1, 2024

@rudolfix, there is a failure in tests/load/bigquery/test_bigquery_streaming_insert.py::test_bigquery_streaming_nested_data FAILED. It works locally fine.

image

I suspect there is something in the Ci/Cl test table. Is it possible to clear it?

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. pls read my comment on MERGE problem
  2. see my comments on failing tests

dlt/destinations/impl/bigquery/bigquery.py Outdated Show resolved Hide resolved
tests/load/bigquery/test_bigquery_streaming_insert.py Outdated Show resolved Hide resolved
tests/load/bigquery/test_bigquery_streaming_insert.py Outdated Show resolved Hide resolved
@rudolfix
Copy link
Collaborator

rudolfix commented Apr 1, 2024

@rudolfix, there is a failure in tests/load/bigquery/test_bigquery_streaming_insert.py::test_bigquery_streaming_nested_data FAILED. It works locally fine.
I suspect there is something in the Ci/Cl test table. Is it possible to clear it?

test tables and datasets are dropped in test fixture. what happens on CI is weird because there's a record but one of the values is None... my take:
your both tests are writing to the same dataset and table. so because stream insert goes via buffer you can still get a row written to a table from previous test. and the table was in the meantime dropped and recreated with different but compatible schema.

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code is good! please update bigquery docs.

  1. the BigQuery adapter part
  2. rewrite
Data Loading

chapter to add information on streaming writes. please mention that only append mode works and that there is a buffer etc.

rudolfix
rudolfix previously approved these changes Apr 4, 2024
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rudolfix rudolfix merged commit 20664b2 into devel Apr 4, 2024
38 of 44 checks passed
@rudolfix rudolfix deleted the bigquery_streaming branch April 4, 2024 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants