Skip to content

Development Notes

Sean Lopp edited this page Apr 12, 2023 · 4 revisions

While this example runs locally with no dependencies, the branch and production deployments require Snowflake and S3.

Snowflake:

  • All branch deployments hit the Snowflake Database: DEMO_DB2_BRANCH
  • The code executes as the DEMO user, with creds stored as secrets in the hooli-user Kubernetes cluster
  • The code will create the necessary tables, but the schemas must exist: analytics, raw_data, forecasting
  • Prod hits the Snowflake Database: DEMO_DB2

S3:

  • Prod hits the bucket hooli-demo which lives inside the user-cloud sub-account
  • Branch deployments hit the bucket hooli-demo-branch which lives inside the user-cloud sub-account
  • S3 authentication happens in the Kubernetes cluster using secrets. Currently s3 requests execute as the hooli-demo user defined in the main elementl account, and the appropriate cross-account grants have been given to the s3 bucket and user permissions.

Sensor:

  • One of the example sensors, watch_s3, watches the hooli-demo s3 bucket for a file called customers.txt. To touch this file and force a run, execute jobs/touch_s3_file.py. You should have a user-cloud-admin s3 profile configured for this to work.
  • To run the branch locally, you'll want to update the sensor resource to use this profile:
# remove 
# s3 = boto3.client('s3', region_name = self._region_name)
# replace with 
dev = boto3.session.Session(profile_name = 'user-cloud-admin')
s3 = dev.client('s3', region_name = self._region_name)

Branching Locally:

  • To test the branch deployment locally, run export DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT=1 prior to launching dagit.

SES Alerting:

  • One of the sensors sends an alert email via SES if either asset with a freshness policy is delayed more than 2 minutes. Right now the SES setup is hard-coded to send an email to [email protected] from [email protected] using SES configured in the AWS user-cloud us-west-2. To run locally you'll need to set the environment variables SMTP_USERNAME and SMTP_PASSWORD to the username and password of the SES credentials, which you can also pull from the k8s secret.

Backfills: If you make a code change you may need to run a backfill. I recommend following this process:

  • Temporarily disable the scheduled job and freshness sla sensor
  • Launch a backfill run without flakiness. This can be done via the launchpad. Supply
resources:
  data_api:
    config:
      flaky: False

Also down select by searching for "orders" "users"

Finally, supply the tags dagster/asset_partition_range_start equal to 2022-04-11-00:00 and dagster/asset_partition_range_end equal to .

  • Once orders and users and backfilled you will need to launch a backfill for the dbt assets that uses multiple runs.
  • After the backfill is complete, re-enable the sensors and schedules

Secrets:

  • Secrets are made available as environment variables via a secret in the hooli-user-cluster. To get secrets:
  1. Be sure you are setup with the appropriate k8s contexts (documented in the internal infrastructure guide)
  2. kubectl --context hooli-user-cluster --namespace data-eng-prod get secret demo-secrets --template="{{ .data.SNOWFLAKE_USER }}" | base64 -d prod
  • To set a new secret
  1. Base64 encode the value: echo -n "my-secret-value" | base64
  2. Add the value to the k8s secret: kubectl --context hooli-user-cluster --namespace data-eng-prod edit secret demo-secrets
Clone this wiki locally