Development Notes

While this example runs locally with no dependencies, the branch and production deployments require Snowflake and S3.

Snowflake:

All branch deployments hit the Snowflake Database: DEMO_DB2_BRANCH
The code executes as the DEMO user, with creds stored as secrets in the hooli-user Kubernetes cluster
The code will create the necessary tables, but the schemas must exist: analytics, raw_data, forecasting
Prod hits the Snowflake Database: DEMO_DB2

S3:

Prod hits the bucket hooli-demo which lives inside the user-cloud sub-account
Branch deployments hit the bucket hooli-demo-branch which lives inside the user-cloud sub-account
S3 authentication happens in the Kubernetes cluster using secrets. Currently s3 requests execute as the hooli-demo user defined in the main elementl account, and the appropriate cross-account grants have been given to the s3 bucket and user permissions.

Sensor:

One of the example sensors, watch_s3, watches the hooli-demo s3 bucket for a file called customers.txt. To touch this file and force a run, execute jobs/touch_s3_file.py. You should have a user-cloud-admin s3 profile configured for this to work.
To run the branch locally, you'll want to update the sensor resource to use this profile:

# remove 
# s3 = boto3.client('s3', region_name = self._region_name)
# replace with 
dev = boto3.session.Session(profile_name = 'user-cloud-admin')
s3 = dev.client('s3', region_name = self._region_name)

Branching Locally:

To test the branch deployment locally, run export DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT=1 prior to launching dagit.

SES Alerting:

One of the sensors sends an alert email via SES if either asset with a freshness policy is delayed more than 2 minutes. Right now the SES setup is hard-coded to send an email to [email protected] from [email protected] using SES configured in the AWS user-cloud us-west-2. To run locally you'll need to set the environment variables SMTP_USERNAME and SMTP_PASSWORD to the username and password of the SES credentials, which you can also pull from the k8s secret.

Backfills: If you make a code change you may need to run a backfill. I recommend following this process:

Temporarily disable the scheduled job and freshness sla sensor
Launch a backfill run without flakiness. This can be done via the launchpad. Supply

resources:
  data_api:
    config:
      flaky: False

Also down select by searching for "orders" "users"

Finally, supply the tags dagster/asset_partition_range_start equal to 2022-04-11-00:00 and dagster/asset_partition_range_end equal to .

Once orders and users and backfilled you will need to launch a backfill for the dbt assets that uses multiple runs.
After the backfill is complete, re-enable the sensors and schedules

Secrets:

Secrets are made available as environment variables via a secret in the hooli-user-cluster. To get secrets:

Be sure you are setup with the appropriate k8s contexts (documented in the internal infrastructure guide)
kubectl --context hooli-user-cluster --namespace data-eng-prod get secret demo-secrets --template="{{ .data.SNOWFLAKE_USER }}" | base64 -d prod

To set a new secret

Base64 encode the value: echo -n "my-secret-value" | base64
Add the value to the k8s secret: kubectl --context hooli-user-cluster --namespace data-eng-prod edit secret demo-secrets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development Notes

Clone this wiki locally