-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with BigQuery counts in Spark after writing to table #1290
Comments
What happens if you omit the first count check? I'm trying to narrow down if there is an issue with the spark.read being called twice. |
Interesting, it appears that the last count returned is still incorrect if I omit the first count. To summarize, I just did the following:
However, after a couple minutes it returned the proper count as a dataframe. |
Hi @rchynoweth, |
@MichalBogoryja - do you have to set it in the spark settings? Not in the dataframe read options? Doesn't work for me in the dataframe read. |
This has been fixed and will be available in the next release. In the meantime, you can test it using the nightly build. E.g. |
Fixed in 0.41.1 |
Awesome. Thanks @vishalkarve15 ! |
I am having an issue getting accurate counts when reading/writing to BigQuery from Databricks after installing the connector.
Connector Version: spark-3.5-bigquery-0.39.1.jar
Apache Spark 3.5.0
Scala 2.12
Databricks 14.3LTS
Code to replicate:
When I check the BQ table the rows are updated but not reflected in my Dataframe. If I use the query option instead of the table and perform "select count(1) from mydataset.test_read_write_table" then the counts are accurate. This seems like a potential cache problem which I tried using the cacheExpirationTimeInMinutes option to 0 but it seems to not work. However, if I set it to a positive integer it does work after the time setting is up.
The text was updated successfully, but these errors were encountered: