DAY partitioned BQ table data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMIC #1325

soumikdas-oa · 2024-12-13T09:26:51Z

We have a date (YYYY-MM-DD) partitioned BQ table where it partitioned by DAY. We want to update a specific partition data in 'overwrite' mode using PySpark. So to do this, I applied 'spark.sql.sources.partitionOverwriteMode' to 'DYNAMIC' as per the spark-bigquery-connector documentation. But still it deleted the other partitioned data which should not be happening.

To give more context:

The dataframe is filtered by certain partition condition beforehand and then applied the 'write' to bq option. So the dataframe have the filtered partitioned data which supposed to overwrite the specific partition data, but that is not happening.
'spark.sql.sources.partitionOverwriteMode' is set to 'DYNAMIC' in dataframe writer options, which did not work (as mentioned below).
The same above config set to the cluster advanced spark config which also did not work.
Even if the 'partitionField' & 'partitionType' options are removed from the below code, still the result is not expected one, i.e: its deleting whole table data instead of specific partition data.

df.write.format("bigquery") \ .option("table", f"{bq_table}") \ .option("dataset", f"{bq_dataset}") \ .option("temporaryGcsBucket", f"{temp_gcs_bucket}") \ .option("partitionField", f"{partition_date_col}") \ .option("partitionType", f"{bq_partition_type}") \ .option("spark.sql.sources.partitionOverwriteMode", "DYNAMIC") \ .option("writeMethod", "indirect") \ .mode("overwrite") \ .save()

Databricks Runtime Version: 15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12)

The text was updated successfully, but these errors were encountered:

davidrabinowitz · 2024-12-13T16:16:24Z

Please verify which connector version does this Databricks runtime version uses.

soumikdas-oa · 2024-12-13T17:05:10Z

Please verify which connector version does this Databricks runtime version uses.

Please refer to the attached screenshot. This is Databricks Cluster's System Classpath where I found spark-bigquery-connector.

/databricks/jars/----ws_3_5--third_party--bigquery-connector--spark-bigquery-connector-hive-2.3__hadoop-3.2_2.12--118181791--fatJar-assembly-0.22.2-SNAPSHOT.jar | System Classpath

/databricks/jars/----ws_3_5--third_party--bigquery-connector--spark-bigquery-connector-upgrade_scala-2.12--118181791--spark-bigquery-with-dependencies_2.12-0.41.0.jar | System Classpath

davidrabinowitz · 2024-12-13T18:12:42Z

It is very strange - usually you can't have two connectors in the same spark. Also, version 0.22 is very old. Can you please replace those jars with our latest spark-3.5-bigauery-0.41.0.jar?

isha97 · 2024-12-13T18:15:09Z

Also, you don't need to use partitionField and partitionType while using dynamic partition overwrite mode.

davidrabinowitz assigned isha97 Dec 13, 2024

isha97 added the waiting for information Waiting for additional information from the issue opener label Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAY partitioned BQ table data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMIC #1325

DAY partitioned BQ table data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMIC #1325

soumikdas-oa commented Dec 13, 2024

davidrabinowitz commented Dec 13, 2024

soumikdas-oa commented Dec 13, 2024

davidrabinowitz commented Dec 13, 2024

isha97 commented Dec 13, 2024

DAY partitioned BQ table data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMIC #1325

DAY partitioned BQ table data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMIC #1325

Comments

soumikdas-oa commented Dec 13, 2024

davidrabinowitz commented Dec 13, 2024

soumikdas-oa commented Dec 13, 2024

davidrabinowitz commented Dec 13, 2024

isha97 commented Dec 13, 2024