Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: migrating to TheLook Ecommerce dataset #257

Merged
Show file tree
Hide file tree
Changes from 59 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
eb68dcb
Initial changes
shanecglass Aug 21, 2023
dd0e158
initial changes
shanecglass Aug 21, 2023
5831fa2
Merge pull request #1 from shanecglass/scg-dev
shanecglass Aug 21, 2023
22271ea
Add RANK() to sample query
shanecglass Aug 22, 2023
dd4ac1d
Add waiter after APIs and clarify dependencies
shanecglass Aug 24, 2023
d27e903
Convert tables to Biglake tables
shanecglass Aug 24, 2023
d8a45b4
Updating outputs
shanecglass Sep 7, 2023
9a559f8
Fixing bug
shanecglass Sep 7, 2023
cd6ad43
Debug
shanecglass Sep 7, 2023
4b452cd
Typo
shanecglass Sep 7, 2023
359148c
Adding max staleness refresh
shanecglass Sep 7, 2023
9f77d83
correcting errors
shanecglass Sep 7, 2023
1eb414f
Testing
shanecglass Sep 7, 2023
2277866
testing
shanecglass Sep 7, 2023
0fddd9d
testing bug fix
shanecglass Sep 7, 2023
2e94bb0
bug testing
shanecglass Sep 7, 2023
e0e76da
bug testing
shanecglass Sep 7, 2023
0fc4a22
testing max staleness
shanecglass Sep 7, 2023
7b63f3f
staleness testing
shanecglass Sep 7, 2023
d7bc72d
staleness testing
shanecglass Sep 7, 2023
ed0dd4a
staleness bug testing
shanecglass Sep 8, 2023
8ffee33
staleness bug testing
shanecglass Sep 8, 2023
559b426
updating terraform version required
shanecglass Sep 8, 2023
7353572
staleness bug testing
shanecglass Sep 8, 2023
afb5483
staleness bug testing
shanecglass Sep 8, 2023
7e553ad
Correcting caching interval error
shanecglass Sep 26, 2023
b3a8516
Cleanup sample queries
shanecglass Sep 26, 2023
bf63e23
Clean up sample queries
shanecglass Sep 26, 2023
db37112
Correcting typo
shanecglass Sep 26, 2023
051c1f2
Update Looker Studio template link
shanecglass Sep 26, 2023
e6fba8e
Correct lookup table provisioning
shanecglass Sep 26, 2023
98e55e4
Correct lookup tables query
shanecglass Sep 26, 2023
4150cd0
Correcting lookup table sql
shanecglass Sep 26, 2023
dbf6672
Update deletion_protection
shanecglass Sep 26, 2023
8bdd2ed
Update deletion_protection
shanecglass Sep 26, 2023
c55a63d
Fix typo in BigLake table configs
shanecglass Sep 26, 2023
bafe1db
Correct typos
shanecglass Sep 26, 2023
6e9a7ed
Remove negative processing_hours values
shanecglass Sep 26, 2023
3219110
Update SQL for view creation
shanecglass Sep 26, 2023
6aa8f8e
Update looker studio link
shanecglass Sep 26, 2023
562fd2e
Update looker studio link
shanecglass Sep 26, 2023
e95dc23
Staleness testing
shanecglass Sep 26, 2023
667eeb9
Staleness testing
shanecglass Sep 26, 2023
765fa96
Staleness testing
shanecglass Sep 26, 2023
91c8923
Staleness testing
shanecglass Sep 26, 2023
990d813
Update versions.tf
shanecglass Sep 27, 2023
ddcace7
Merge pull request #3 from shanecglass/scg-dev
shanecglass Sep 27, 2023
4e77194
Manually deconflicting merges
shanecglass Sep 27, 2023
727ddf5
Merge pull request #4 from shanecglass/scg-dev
shanecglass Sep 27, 2023
bbe9467
Deconflicting branches
shanecglass Sep 27, 2023
da0cf37
Deconflicting branches
shanecglass Sep 27, 2023
e14b5aa
Deconflicting branches
shanecglass Sep 27, 2023
292289f
Update readme and Looker Studio report ID
shanecglass Sep 27, 2023
592babc
Rollback version constraints to match Cloud Shell
shanecglass Sep 27, 2023
3abcf76
Merge pull request #5 from shanecglass/scg-dev
shanecglass Sep 27, 2023
83bd6f1
Terraform formatting
shanecglass Sep 27, 2023
f13318c
Merge pull request #7 from shanecglass/master
shanecglass Oct 2, 2023
3e585a2
Merge branch 'master' of https://github.com/terraform-google-modules/…
shanecglass Oct 2, 2023
87ec2ba
Merging changes from previous branch
shanecglass Oct 2, 2023
1f03801
Changes for review and lint
shanecglass Oct 5, 2023
7180b9b
Merging in changes for review and lint corrections
shanecglass Oct 5, 2023
747ffa9
Correcting count_of_orders calculation
shanecglass Oct 5, 2023
37aa014
Merge branch 'master' into terraform-google-modules-master
davenportjw Oct 9, 2023
3faa38f
Change BigQuery dataset name
shanecglass Oct 9, 2023
9b40a01
Update renovate.json
shanecglass Oct 9, 2023
080b796
Merge remote-tracking branch 'refs/remotes/origin/terraform-google-mo…
shanecglass Oct 9, 2023
ec2527d
Merge branch 'master' into terraform-google-modules-master
davenportjw Oct 9, 2023
fb52bc6
Standardize parameters for dataset & project id
shanecglass Oct 9, 2023
781c1e4
Correcting parameter error
shanecglass Oct 9, 2023
9e7aa8b
Parameterize dataset_id in workflow
shanecglass Oct 9, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions .github/renovate.json
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,23 @@
},
{"matchDepTypes": ["module"], "groupName": "TF modules"},
{
"matchDepTypes": ["require"],
"matchDepTypes": [
"require"
],
"groupName": "GO modules",
"postUpdateOptions": ["gomodTidy"]
"postUpdateOptions": [
"gomodTidy"
]
},
{
"matchDatasources": ["golang-version"],
"matchDatasources": [
"golang-version"
],
"rangeStrategy": "bump",
"allowedVersions": "<1.21.0",
"postUpdateOptions": ["gomodTidy"]
"postUpdateOptions": [
"gomodTidy"
]
},
{
"matchDepNames": ["google", "google-beta"],
Expand Down
2 changes: 1 addition & 1 deletion modules/data_warehouse/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The resources/services/activations/deletions that this module will create/trigge
- Creates a BigQuery Dataset
- Creates a BigQuery Table
- Creates a Google Cloud Storage bucket
- Loads the Google Cloud Storage bucket with data from https://console.cloud.google.com/marketplace/product/city-of-new-york/nyc-tlc-trips
- Loads the Google Cloud Storage bucket with data from [TheLook eCommerce Public Dataset](https://console.cloud.google.com/marketplace/product/bigquery-public-data/thelook-ecommerce)
- Provides SQL examples
- Creates and inferences with a BigQuery ML model
- Creates a Looker Studio report
Expand Down
4 changes: 2 additions & 2 deletions modules/data_warehouse/assets/data-warehouse-architecture.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
183 changes: 155 additions & 28 deletions modules/data_warehouse/bigquery.tf
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ resource "google_bigquery_dataset" "ds_edw" {
location = var.region
labels = var.labels
delete_contents_on_destroy = var.force_destroy

depends_on = [time_sleep.wait_after_apis]
}

# # Create a BigQuery connection
Expand All @@ -33,6 +35,7 @@ resource "google_bigquery_connection" "ds_connection" {
location = var.region
friendly_name = "Storage Bucket Connection"
cloud_resource {}
depends_on = [time_sleep.wait_after_apis]
}

# # Grant IAM access to the BigQuery Connection account for Cloud Storage
Expand All @@ -48,22 +51,127 @@ resource "google_storage_bucket_iam_binding" "bq_connection_iam_object_viewer" {
]
}

# # Create a BigQuery external table
resource "google_bigquery_table" "tbl_edw_taxi" {
# # Create a Biglake table for events with metadata caching
resource "google_bigquery_table" "tbl_edw_events" {
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
table_id = "events"
project = module.project-services.project_id
deletion_protection = var.deletion_protection
# max_staleness = "1:0:0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we arent planning to use these, I'd probably just remove them to keep the code simple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept them in here because I thought it would be good for folks to see the syntax, but commented them out so that the Looker Studio report works immediately.


schema = file("${path.module}/src/schema/events_schema.json")

external_data_configuration {
autodetect = true
connection_id = google_bigquery_connection.ds_connection.name
source_format = "PARQUET"
source_uris = ["gs://${google_storage_bucket.raw_bucket.name}/thelook-ecommerce/events.parquet"]
# metadata_cache_mode = "AUTOMATIC"
}

labels = var.labels

depends_on = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove these since the solution calls out the resources specifically above (applies to other tables below).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up these dependencies, as well as those across all .tf files

google_bigquery_connection.ds_connection,
google_storage_bucket.raw_bucket,
]
}

# # Create a Biglake table for inventory_items
resource "google_bigquery_table" "tbl_edw_inventory_items" {
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
table_id = "taxi_trips"
table_id = "inventory_items"
project = module.project-services.project_id
deletion_protection = var.deletion_protection
# max_staleness = "1:0:0"

schema = file("${path.module}/src/schema/inventory_items_schema.json")

external_data_configuration {
autodetect = true
connection_id = "${module.project-services.project_id}.${var.region}.ds_connection"
connection_id = google_bigquery_connection.ds_connection.name
source_format = "PARQUET"
source_uris = ["gs://${google_storage_bucket.raw_bucket.name}/new-york-taxi-trips/tlc-yellow-trips-2022/taxi-*.Parquet"]
source_uris = ["gs://${google_storage_bucket.raw_bucket.name}/thelook-ecommerce/inventory_items.parquet"]
# metadata_cache_mode = "AUTOMATIC"
}

labels = var.labels

depends_on = [
google_bigquery_connection.ds_connection,
google_storage_bucket.raw_bucket,
]
}

# # Create a Biglake table with metadata caching for order_items
resource "google_bigquery_table" "tbl_edw_order_items" {
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
table_id = "order_items"
project = module.project-services.project_id
deletion_protection = var.deletion_protection
# max_staleness = "1:0:0"

schema = file("${path.module}/src/schema/order_items_schema.json")

external_data_configuration {
autodetect = true
connection_id = google_bigquery_connection.ds_connection.name
source_format = "PARQUET"
source_uris = ["gs://${google_storage_bucket.raw_bucket.name}/thelook-ecommerce/order_items.parquet"]
# metadata_cache_mode = "AUTOMATIC"
}

labels = var.labels

depends_on = [
google_bigquery_connection.ds_connection,
google_storage_bucket.raw_bucket,
]
}

# # Create a Biglake table for orders
resource "google_bigquery_table" "tbl_edw_orders" {
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
table_id = "orders"
project = module.project-services.project_id
deletion_protection = var.deletion_protection
# max_staleness = "1:0:0"

schema = file("${path.module}/src/schema/orders_schema.json")

external_data_configuration {
autodetect = true
connection_id = google_bigquery_connection.ds_connection.name
source_format = "PARQUET"
source_uris = ["gs://${google_storage_bucket.raw_bucket.name}/thelook-ecommerce/orders.parquet"]
# metadata_cache_mode = "AUTOMATIC"
}

schema = file("${path.module}/src/taxi_trips_schema.json")
labels = var.labels

depends_on = [
google_bigquery_connection.ds_connection,
google_storage_bucket.raw_bucket,
]
}

# # Create a Biglake table for products
resource "google_bigquery_table" "tbl_edw_products" {
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
table_id = "products"
project = module.project-services.project_id
deletion_protection = var.deletion_protection
# max_staleness = "1:0:0"

schema = file("${path.module}/src/schema/products_schema.json")

external_data_configuration {
autodetect = true
connection_id = google_bigquery_connection.ds_connection.name
source_format = "PARQUET"
source_uris = ["gs://${google_storage_bucket.raw_bucket.name}/thelook-ecommerce/products.parquet"]
# metadata_cache_mode = "AUTOMATIC"
}

labels = var.labels

Expand All @@ -73,8 +181,33 @@ resource "google_bigquery_table" "tbl_edw_taxi" {
]
}

# # Create a Biglake table for products
resource "google_bigquery_table" "tbl_edw_users" {
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
table_id = "users"
project = module.project-services.project_id
deletion_protection = var.deletion_protection
# max_staleness = "1:0:0"

schema = file("${path.module}/src/schema/users_schema.json")

external_data_configuration {
autodetect = true
connection_id = google_bigquery_connection.ds_connection.name
source_format = "PARQUET"
source_uris = ["gs://${google_storage_bucket.raw_bucket.name}/thelook-ecommerce/users.parquet"]
# metadata_cache_mode = "AUTOMATIC"
}

labels = var.labels
depends_on = [
google_bigquery_connection.ds_connection,
google_storage_bucket.raw_bucket,
]
}

# Load Queries for Stored Procedure Execution
# # Load Lookup Data Tables
# # Load Distribution Center Lookup Data Tables
resource "google_bigquery_routine" "sp_provision_lookup_tables" {
project = module.project-services.project_id
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
Expand All @@ -88,9 +221,8 @@ resource "google_bigquery_routine" "sp_provision_lookup_tables" {
]
}


# # Add Looker Studio Data Report Procedure
resource "google_bigquery_routine" "sproc_sp_demo_datastudio_report" {
# Add Looker Studio Data Report Procedure
resource "google_bigquery_routine" "sproc_sp_demo_lookerstudio_report" {
project = module.project-services.project_id
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
routine_id = "sp_lookerstudio_report"
Expand All @@ -99,7 +231,7 @@ resource "google_bigquery_routine" "sproc_sp_demo_datastudio_report" {
definition_body = templatefile("${path.module}/src/sql/sp_lookerstudio_report.sql", { project_id = module.project-services.project_id })

depends_on = [
google_bigquery_table.tbl_edw_taxi,
google_bigquery_table.tbl_edw_inventory_items,
davenportjw marked this conversation as resolved.
Show resolved Hide resolved
]
}

Expand All @@ -113,11 +245,12 @@ resource "google_bigquery_routine" "sp_sample_queries" {
definition_body = templatefile("${path.module}/src/sql/sp_sample_queries.sql", { project_id = module.project-services.project_id })

depends_on = [
google_bigquery_table.tbl_edw_taxi,
google_bigquery_table.tbl_edw_inventory_items,
]
}

# # Add Bigquery ML Model

# Add Bigquery ML Model
resource "google_bigquery_routine" "sp_bigqueryml_model" {
project = module.project-services.project_id
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
Expand All @@ -127,21 +260,7 @@ resource "google_bigquery_routine" "sp_bigqueryml_model" {
definition_body = templatefile("${path.module}/src/sql/sp_bigqueryml_model.sql", { project_id = module.project-services.project_id })

depends_on = [
google_bigquery_table.tbl_edw_taxi,
]
}

# # Add Translation Scripts
resource "google_bigquery_routine" "sp_sample_translation_queries" {
project = module.project-services.project_id
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
routine_id = "sp_sample_translation_queries"
routine_type = "PROCEDURE"
language = "SQL"
definition_body = templatefile("${path.module}/src/sql/sp_sample_translation_queries.sql", { project_id = module.project-services.project_id })

depends_on = [
google_bigquery_table.tbl_edw_taxi,
google_bigquery_table.tbl_edw_inventory_items,
]
}

Expand All @@ -151,6 +270,8 @@ resource "google_project_service_identity" "bigquery_data_transfer_sa" {
provider = google-beta
project = module.project-services.project_id
service = "bigquerydatatransfer.googleapis.com"

depends_on = [time_sleep.wait_after_apis]
}

# # Grant the DTS service account access
Expand All @@ -162,6 +283,8 @@ resource "google_project_iam_member" "dts_service_account_roles" {
project = module.project-services.project_id
role = each.key
member = "serviceAccount:${google_project_service_identity.bigquery_data_transfer_sa.email}"

depends_on = [time_sleep.wait_after_apis]
}

# Create specific service account for DTS Run
Expand All @@ -182,6 +305,8 @@ resource "google_project_iam_member" "dts_roles" {
project = module.project-services.project_id
role = each.key
member = "serviceAccount:${google_service_account.dts.email}"

depends_on = [google_service_account.dts]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one isnt needed since you have an output attribute of DTS resource on 307

}

# # Grant the DTS specific service account Token Creator to the DTS Service Identity
Expand All @@ -194,6 +319,8 @@ resource "google_service_account_iam_binding" "dts_token_creator" {

depends_on = [
google_project_iam_member.dts_service_account_roles,
google_service_account.dts,
google_project_service_identity.bigquery_data_transfer_sa
]
}

Expand Down
Loading