feat(bigquery): add streaming inserts support #1123

IlyaFaer · 2024-03-21T09:24:23Z

Towards #1037

netlify · 2024-03-21T09:24:38Z

✅ Deploy Preview for dlt-hub-docs canceled.

Name	Link
🔨 Latest commit	`35fc641`
🔍 Latest deploy log	https://app.netlify.com/sites/dlt-hub-docs/deploys/660e8a9472662f0008bef8ad

dlt/destinations/impl/bigquery/bigquery.py

dlt/destinations/impl/destination/destination.py

dlt/destinations/impl/bigquery/bigquery.py

IlyaFaer · 2024-03-21T13:20:35Z

@rudolfix, if you're wondering if it's working. Yes, it is, here is a scratchy proof:

So, I inserted the data and then read it from the BigQuery backend. I don't see yet why it would be a bad solution, so I'm tidying it up 👌

rudolfix

good direction! but it can't be so hacky:

do not import from other destination. move jobs that you need to job_impl.py. we need a small refactor here
also support parquet via the other job
config option is cool but we need per table definition. look at bigquery_adapter there you can add table hint with the loader type and interpret it during loading. should be very easy

dlt/destinations/impl/bigquery/bigquery.py

dlt/destinations/impl/bigquery/configuration.py

add streaming arg into bigquery_adapter support parquet format

dlt/destinations/impl/bigquery/bigquery.py

dlt/common/schema/typing.py

rudolfix

OK we are almost there. I have one comment regarding the hint.

there's one more thing:

google-cloud-bigquery = {version = ">=2.26.0", optional = true}
google-cloud-bigquery = {version = ">=3.14", optional = true}

which is good, because we should work on the newest version. but we have two problems:

I use some kind of hack to access query job from the dbapi cursor to load data frame:
https://github.com/dlt-hub/dlt/actions/runs/8431727786/job/23089624139?pr=1123#step:8:921
are you able to make it work for both versions? I have another PR where I do it without any hacks
https://github.com/dlt-hub/dlt/pull/998/files#diff-94cc93a6e16c7508322344071cf19788557e561db8b55b6755261155bce91cb8R48
please take code from there
something is going on here:
https://github.com/dlt-hub/dlt/actions/runs/8431727786/job/23089624139?pr=1123#step:8:932

dlt/common/schema/typing.py

rudolfix

OK I've found a few inconsistences

dlt/destinations/impl/bigquery/bigquery.py

tests/load/bigquery/test_bigquery_streaming_insert.py

dlt/destinations/impl/bigquery/bigquery.py

IlyaFaer · 2024-04-01T13:16:28Z

@rudolfix, there is a failure in tests/load/bigquery/test_bigquery_streaming_insert.py::test_bigquery_streaming_nested_data FAILED. It works locally fine.

I suspect there is something in the Ci/Cl test table. Is it possible to clear it?

rudolfix

pls read my comment on MERGE problem
see my comments on failing tests

dlt/destinations/impl/bigquery/bigquery.py

dlt/destinations/impl/bigquery/bigquery_adapter.py

dlt/destinations/impl/bigquery/configuration.py

tests/load/bigquery/test_bigquery_streaming_insert.py

rudolfix · 2024-04-01T19:32:11Z

@rudolfix, there is a failure in tests/load/bigquery/test_bigquery_streaming_insert.py::test_bigquery_streaming_nested_data FAILED. It works locally fine.
I suspect there is something in the Ci/Cl test table. Is it possible to clear it?

test tables and datasets are dropped in test fixture. what happens on CI is weird because there's a record but one of the values is None... my take:
your both tests are writing to the same dataset and table. so because stream insert goes via buffer you can still get a row written to a table from previous test. and the table was in the meantime dropped and recreated with different but compatible schema.

rudolfix

code is good! please update bigquery docs.

the BigQuery adapter part
rewrite

Data Loading

chapter to add information on streaming writes. please mention that only append mode works and that there is a buffer etc.

rudolfix

LGTM!

feat(bigquery): add streaming inserts support

f6ef679

IlyaFaer commented Mar 21, 2024

View reviewed changes

dlt/destinations/impl/bigquery/bigquery.py Outdated Show resolved Hide resolved

IlyaFaer commented Mar 21, 2024

View reviewed changes

dlt/destinations/impl/bigquery/bigquery.py Outdated Show resolved Hide resolved

IlyaFaer commented Mar 21, 2024

View reviewed changes

dlt/destinations/impl/destination/destination.py Outdated Show resolved Hide resolved

IlyaFaer commented Mar 21, 2024

View reviewed changes

dlt/destinations/impl/bigquery/bigquery.py Outdated Show resolved Hide resolved

rudolfix requested changes Mar 21, 2024

View reviewed changes

dlt/destinations/impl/bigquery/bigquery.py Outdated Show resolved Hide resolved

dlt/destinations/impl/bigquery/bigquery.py Outdated Show resolved Hide resolved

dlt/destinations/impl/bigquery/configuration.py Outdated Show resolved Hide resolved

IlyaFaer added 3 commits March 22, 2024 15:49

move jobs into job_impl.py

2e474c6

add streaming arg into bigquery_adapter support parquet format

Merge branch 'devel' into bigquery_streaming

d6d3dcd

complete merge

beca0c3

IlyaFaer commented Mar 22, 2024

View reviewed changes

dlt/destinations/impl/bigquery/bigquery.py Outdated Show resolved Hide resolved

erase excess imports

a08db8a

IlyaFaer commented Mar 22, 2024

View reviewed changes

dlt/common/schema/typing.py Outdated Show resolved Hide resolved

IlyaFaer added 8 commits March 25, 2024 14:25

improve tests

75e894e

move tests into load

15d2d83

bimp bigquery version

10e57f6

lint fix

c692113

lint fix

81ac233

lint fix

527e6d5

Merge branch 'devel' into bigquery_streaming

bfc0ffe

resolve conflicts

1a5e4d0

IlyaFaer marked this pull request as ready for review March 26, 2024 07:39

rudolfix requested changes Mar 26, 2024

View reviewed changes

dlt/common/schema/typing.py Outdated Show resolved Hide resolved

rudolfix requested changes Mar 26, 2024

View reviewed changes

dlt/destinations/impl/bigquery/bigquery.py Show resolved Hide resolved

tests/load/bigquery/test_bigquery_streaming_insert.py Outdated Show resolved Hide resolved

dlt/destinations/impl/bigquery/bigquery.py Outdated Show resolved Hide resolved

IlyaFaer added 5 commits March 27, 2024 21:21

review fixes

c239109

add nested data test

f3c5068

lint fix

0ff0ea6

query_job fix

4191191

lint fix

23e71ae

rudolfix assigned IlyaFaer Apr 1, 2024

rudolfix requested changes Apr 1, 2024

View reviewed changes

fixes

2d9b1c7

rudolfix requested changes Apr 3, 2024

View reviewed changes

IlyaFaer added 2 commits April 3, 2024 16:02

update docs

a9f3aea

add docs example

c186092

rudolfix previously approved these changes Apr 4, 2024

View reviewed changes

still allows bigquery 2.x client

35fc641

rudolfix dismissed their stale review via 35fc641 April 4, 2024 11:10

rudolfix merged commit 20664b2 into devel Apr 4, 2024
38 of 44 checks passed

rudolfix deleted the bigquery_streaming branch April 4, 2024 11:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bigquery): add streaming inserts support #1123

feat(bigquery): add streaming inserts support #1123

IlyaFaer commented Mar 21, 2024

netlify bot commented Mar 21, 2024 •

edited

Loading

IlyaFaer commented Mar 21, 2024

rudolfix left a comment

rudolfix left a comment

rudolfix left a comment

IlyaFaer commented Apr 1, 2024

rudolfix left a comment

rudolfix commented Apr 1, 2024

rudolfix left a comment

rudolfix left a comment

feat(bigquery): add streaming inserts support #1123

feat(bigquery): add streaming inserts support #1123

Conversation

IlyaFaer commented Mar 21, 2024

netlify bot commented Mar 21, 2024 • edited Loading

✅ Deploy Preview for dlt-hub-docs canceled.

IlyaFaer commented Mar 21, 2024

rudolfix left a comment

Choose a reason for hiding this comment

rudolfix left a comment

Choose a reason for hiding this comment

rudolfix left a comment

Choose a reason for hiding this comment

IlyaFaer commented Apr 1, 2024

rudolfix left a comment

Choose a reason for hiding this comment

rudolfix commented Apr 1, 2024

rudolfix left a comment

Choose a reason for hiding this comment

rudolfix left a comment

Choose a reason for hiding this comment

netlify bot commented Mar 21, 2024 •

edited

Loading