Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on Billing and Improved README.md Explanation #1172

Open
LakshmanKishore opened this issue Jan 17, 2024 · 1 comment
Open

Clarification on Billing and Improved README.md Explanation #1172

LakshmanKishore opened this issue Jan 17, 2024 · 1 comment

Comments

@LakshmanKishore
Copy link

Billing Concerns:

I'm a user of the Spark BigQuery Connector, and I've been using it in two different scenarios:

  1. Directly executing a SQL query using option("query", sql).
  2. Loading a table using load("project.dataset.table"), creating a temporary view, and then querying the view using Spark SQL.

In the first scenario, I observed a new BigQuery job being created, and the bytes billed were visible in the BigQuery console. However, in the second scenario, where I loaded the table and queried a temporary view, I didn't see a dedicated BigQuery job. Despite this, I'm uncertain whether there are still billing implications for the operations performed.

Request for Clarification:

  1. Does the loading of the entire table into Spark using load("project.dataset.table") incur any billing?
  2. How does the Spark BigQuery Connector optimize operations, and are there scenarios where operations don't result in explicit BigQuery jobs but still incur billing?
  3. Upon transitioning from the free tier to a billing account, what are the potential costs associated with creating temporary tables in BigQuery and querying from them using the Spark BigQuery Connector?

Example 1: Direct SQL Query

df_direct_query = spark.read.format("bigquery").option("credentialsFile", "creds.json").option("parentProject", "your-project").option("query", "SELECT * FROM project.dataset.table").load()
df_direct_query.show()

Example 2: Loading Table and Querying Temporary View

df_load_table = spark.read.format("bigquery").option("credentialsFile", "creds.json").option("parentProject", "your-project").load("your-project.dataset.table")
df_load_table.createOrReplaceTempView("temp_view")
spark.sql("SELECT * FROM temp_view").show()

Documentation Enhancement:

For the benefit of new users, could the README.md file provide clearer information on the billing process, especially in scenarios where no explicit job is created but operations might still incur costs?

Additional Information:

  • Account Type: I'm using a free tier account, and I'm seeking clarification on the billing implications for different operations.
  • Observations: I noticed bytes billed for explicit BigQuery jobs but not for loading the entire table into Spark.
@dmedora
Copy link
Contributor

dmedora commented Mar 19, 2024

As a note to others who come across this before docs are update:

  • Example 1: as noted, creates a BigQuery query job, and so you'd be billed either by bytes scanned (if using on-demand), or it would use query slots (if using reservations).
  • Example 2: if you read a table directly that does not require any intermediate materialization (ie. not using a query, not reading a view), then it will be read using the Storage Read API. This is still billed. You can identify that a read session was created by looking in Cloud Logging (protoPayload.methodName="google.cloud.bigquery.storage.v1.BigQueryRead.CreateReadSession").
    See the BQ pricing docs for more info on query and Read API pricing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants