-
Notifications
You must be signed in to change notification settings - Fork 15
Development Notes
Sean Lopp edited this page Apr 12, 2023
·
4 revisions
While this example runs locally with no dependencies, the branch and production deployments require Snowflake and S3.
Snowflake:
- All branch deployments hit the Snowflake Database: DEMO_DB2_BRANCH
- The code executes as the DEMO user, with creds stored as secrets in the hooli-user Kubernetes cluster
- The code will create the necessary tables, but the schemas must exist: analytics, raw_data, forecasting
- Prod hits the Snowflake Database: DEMO_DB2
S3:
- Prod hits the bucket hooli-demo which lives inside the user-cloud sub-account
- Branch deployments hit the bucket hooli-demo-branch which lives inside the user-cloud sub-account
- S3 authentication happens in the Kubernetes cluster using secrets. Currently s3 requests execute as the hooli-demo user defined in the main elementl account, and the appropriate cross-account grants have been given to the s3 bucket and user permissions.
Sensor:
- One of the example sensors, watch_s3, watches the hooli-demo s3 bucket for a file called customers.txt. To touch this file and force a run, execute
jobs/touch_s3_file.py
. You should have auser-cloud-admin
s3 profile configured for this to work. - To run the branch locally, you'll want to update the sensor resource to use this profile:
# remove
# s3 = boto3.client('s3', region_name = self._region_name)
# replace with
dev = boto3.session.Session(profile_name = 'user-cloud-admin')
s3 = dev.client('s3', region_name = self._region_name)
Branching Locally:
- To test the branch deployment locally, run
export DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT=1
prior to launching dagit.
SES Alerting:
- One of the sensors sends an alert email via SES if either asset with a freshness policy is delayed more than 2 minutes. Right now the SES setup is hard-coded to send an email to [email protected] from [email protected] using SES configured in the AWS user-cloud us-west-2. To run locally you'll need to set the environment variables SMTP_USERNAME and SMTP_PASSWORD to the username and password of the SES credentials, which you can also pull from the k8s secret.
Backfills: If you make a code change you may need to run a backfill. I recommend following this process:
- Temporarily disable the scheduled job and freshness sla sensor
- Launch a backfill run without flakiness. This can be done via the launchpad. Supply
resources:
data_api:
config:
flaky: False
Also down select by searching for "orders" "users"
Finally, supply the tags dagster/asset_partition_range_start
equal to 2022-04-11-00:00 and dagster/asset_partition_range_end
equal to .
- Once orders and users and backfilled you will need to launch a backfill for the dbt assets that uses multiple runs.
- After the backfill is complete, re-enable the sensors and schedules
Secrets:
- Secrets are made available as environment variables via a secret in the hooli-user-cluster. To get secrets:
- Be sure you are setup with the appropriate k8s contexts (documented in the internal infrastructure guide)
- kubectl --context hooli-user-cluster --namespace data-eng-prod get secret demo-secrets --template="{{ .data.SNOWFLAKE_USER }}" | base64 -d prod
- To set a new secret
- Base64 encode the value: echo -n "my-secret-value" | base64
- Add the value to the k8s secret: kubectl --context hooli-user-cluster --namespace data-eng-prod edit secret demo-secrets