You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm a user of the Spark BigQuery Connector, and I've been using it in two different scenarios:
Directly executing a SQL query using option("query", sql).
Loading a table using load("project.dataset.table"), creating a temporary view, and then querying the view using Spark SQL.
In the first scenario, I observed a new BigQuery job being created, and the bytes billed were visible in the BigQuery console. However, in the second scenario, where I loaded the table and queried a temporary view, I didn't see a dedicated BigQuery job. Despite this, I'm uncertain whether there are still billing implications for the operations performed.
Request for Clarification:
Does the loading of the entire table into Spark using load("project.dataset.table") incur any billing?
How does the Spark BigQuery Connector optimize operations, and are there scenarios where operations don't result in explicit BigQuery jobs but still incur billing?
Upon transitioning from the free tier to a billing account, what are the potential costs associated with creating temporary tables in BigQuery and querying from them using the Spark BigQuery Connector?
Example 1: Direct SQL Query
df_direct_query=spark.read.format("bigquery").option("credentialsFile", "creds.json").option("parentProject", "your-project").option("query", "SELECT * FROM project.dataset.table").load()
df_direct_query.show()
Example 2: Loading Table and Querying Temporary View
df_load_table=spark.read.format("bigquery").option("credentialsFile", "creds.json").option("parentProject", "your-project").load("your-project.dataset.table")
df_load_table.createOrReplaceTempView("temp_view")
spark.sql("SELECT * FROM temp_view").show()
Documentation Enhancement:
For the benefit of new users, could the README.md file provide clearer information on the billing process, especially in scenarios where no explicit job is created but operations might still incur costs?
Additional Information:
Account Type: I'm using a free tier account, and I'm seeking clarification on the billing implications for different operations.
Observations: I noticed bytes billed for explicit BigQuery jobs but not for loading the entire table into Spark.
The text was updated successfully, but these errors were encountered:
As a note to others who come across this before docs are update:
Example 1: as noted, creates a BigQuery query job, and so you'd be billed either by bytes scanned (if using on-demand), or it would use query slots (if using reservations).
Example 2: if you read a table directly that does not require any intermediate materialization (ie. not using a query, not reading a view), then it will be read using the Storage Read API. This is still billed. You can identify that a read session was created by looking in Cloud Logging (protoPayload.methodName="google.cloud.bigquery.storage.v1.BigQueryRead.CreateReadSession").
See the BQ pricing docs for more info on query and Read API pricing.
Billing Concerns:
I'm a user of the Spark BigQuery Connector, and I've been using it in two different scenarios:
In the first scenario, I observed a new BigQuery job being created, and the bytes billed were visible in the BigQuery console. However, in the second scenario, where I loaded the table and queried a temporary view, I didn't see a dedicated BigQuery job. Despite this, I'm uncertain whether there are still billing implications for the operations performed.
Request for Clarification:
Example 1: Direct SQL Query
Example 2: Loading Table and Querying Temporary View
Documentation Enhancement:
For the benefit of new users, could the README.md file provide clearer information on the billing process, especially in scenarios where no explicit job is created but operations might still incur costs?
Additional Information:
The text was updated successfully, but these errors were encountered: