diff --git a/docs/docs-beta/docs/dagster-plus/features/ci-cd/branch-deployments/change-tracking.md b/docs/docs-beta/docs/dagster-plus/features/ci-cd/branch-deployments/change-tracking.md index f798ce75e47bd..03a4feae766e8 100644 --- a/docs/docs-beta/docs/dagster-plus/features/ci-cd/branch-deployments/change-tracking.md +++ b/docs/docs-beta/docs/dagster-plus/features/ci-cd/branch-deployments/change-tracking.md @@ -4,4 +4,210 @@ sidebar_position: 200 unlisted: true --- -{/* TODO copy from https://docs.dagster.io/dagster-plus/managing-deployments/branch-deployments/change-tracking */} +:::note +This guide is applicable to Dagster+. +::: + +Branch Deployments Change Tracking makes it eaiser for you and your team to identify how changes in a pull request will impact data assets. By the end of this guide, you'll understand how Change Tracking works and what types of asset changes can be detected. + +## How it works + +Branch Deployments compare asset definitions in the branch deployment against the asset definitions in the main deployment. The UI will then mark changed assets accordingly. For example, if the pull request associated with the branch deployment adds a new asset, the UI will display a label indicating the addition. + +You can also apply filters to show only new and changed assets in the UI. This makes it easy to understand which assets will be impacted by the changes in the pull request associated with the branch deployment. + +{/* **Note:** The default main deployment is `prod`. To configure a different deployment as the main deployment, [create a branch deployment using the dagster-cloud CLI](/dagster-plus/managing-deployments/branch-deployments/using-branch-deployments) and specify it using the optional `--base-deployment-name` parameter. */} +**Note:** The default main deployment is `prod`. To configure a different deployment as the main deployment, [create a branch deployment using the dagster-cloud CLI](/todo) and specify it using the optional `--base-deployment-name` parameter. + +## Supported change types + +Change Tracking can detect the following changes to assets: + +- [New assets](#new-assets) +- [Code versions](#code-versions) +- [Upstream dependencies](#upstream-dependencies) +- [Partitions definitions](#partitions-definitions) +- [Tags](#tags) +- [Metadata](#metadata) + +### New assets + +If an asset is new in the branch deployment, the asset will have a **New in branch** label in the asset graph: + +![Change tracking new](/images/dagster-cloud/managing-deployments/change-tracking-new.png) + +### Code versions + +If using the `code_version` argument on the asset decorator, Change Tracking can detect when this value changes. + +{/* Some Dagster integrations, like `dagster-dbt`, automatically compute code versions for you. For more information on code versions, refer to the [Code versioning guide](/guides/dagster/asset-versioning-and-caching). */} +Some Dagster integrations, like `dagster-dbt`, automatically compute code versions for you. For more information on code versions, refer to the [Code versioning guide](/todo). + + + + +In this example, the `customers` asset has a **Changed in branch** label indicating its `code_version` has been changed. + +Click the **Asset definition** tab to see the code change that created this label. + +![Change tracking code version](/images/dagster-cloud/managing-deployments/change-tracking-code-version.png) + + + + +**In the main branch**, we have a `customers` asset with a code version of `v1`: + +```python file=/dagster_cloud/branch_deployments/change_tracking_code_version.py startafter=start_main_deployment endbefore=end_main_deployment dedent=4 +@asset(code_version="v1") +def customers(): ... +``` + +**In the pull request**, `customers` is modified to change the code version to `v2`: + +```python file=/dagster_cloud/branch_deployments/change_tracking_code_version.py startafter=start_branch_deployment endbefore=end_branch_deployment dedent=4 +@asset(code_version="v2") +def customers(): ... +``` + + + + +### Upstream dependencies + +Change Tracking can detect when an asset's upstream dependencies have changed, whether they've been added or removed. + +**Note**: If an asset is marked as having changed dependencies, it means that the defining its upstream dependencies have changed. It doesn't mean that an upstream dependency has new data. + + + + +In this example, the `returns` asset has a **Changed in branch** label indicating it has changed dependencies. + +Click the **Asset definition** tab to see the code change that created this label. + +![Change tracking dependencies](/images/dagster-cloud/managing-deployments/change-tracking-dependencies.png) + +```python file=/dagster_cloud/branch_deployments/change_tracking_dependencies.py startafter=start_branch_deployment endbefore=end_branch_deployment dedent=4 +@asset(deps=[orders, customers]) +def returns(): ... +``` + + + + +### Partitions definitions + +Change Tracking can detect if an asset's has been changed, whether it's been added, removed, or updated. + + + + +In this example, the `weekly_orders` asset has a **Changed in branch** label indicating a changed partitions definition. + +Click the **Asset definition** tab to see the code change that created this label. + +![Change tracking partitions](/images/dagster-cloud/managing-deployments/change-tracking-partitions.png) + + + + +**In the main branch**, we have a `weekly_orders` asset: + +```python file=/dagster_cloud/branch_deployments/change_tracking_partitions_definition.py startafter=start_main_deployment endbefore=end_main_deployment dedent=4 +@asset(partitions_def=WeeklyPartitionsDefinition(start_date="2024-01-01")) +def weekly_orders(): ... +``` + +**In the pull request**, we updated the to start one year earlier: + +```python file=/dagster_cloud/branch_deployments/change_tracking_partitions_definition.py startafter=start_branch_deployment endbefore=end_branch_deployment dedent=4 +@asset(partitions_def=WeeklyPartitionsDefinition(start_date="2023-01-01")) +def weekly_orders(): ... +``` + + + + +### Tags + +{/* Change Tracking can detect when an [asset's tags](/concepts/metadata-tags/tags) have changed, whether they've been added, modified, or removed. */} +Change Tracking can detect when an [asset's tags](/todo) have changed, whether they've been added, modified, or removed. + + + + +In this example, the `fruits_in_stock` asset has a **Changed in branch** label indicating it has changed tags. + +Click the **Asset definition** tab to see the code change that created this label. + +![Change tracking tags](/images/dagster-cloud/managing-deployments/change-tracking-tags.png) + + + + +**In the main branch**, we have a `fruits_in_stock` asset: + +```python file=/dagster_cloud/branch_deployments/change_tracking_tags.py startafter=start_main_deployment endbefore=end_main_deployment dedent=4 +@asset(tags={"section": "produce"}) +def fruits_in_stock(): ... +``` + +**In the pull request**, we added the `type: perishable` tag to `fruits_in_stock`: + +```python file=/dagster_cloud/branch_deployments/change_tracking_tags.py startafter=start_branch_deployment endbefore=end_branch_deployment dedent=4 +@asset(tags={"section": "produce", "type": "perishable"}) +def fruits_in_stock(): ... +``` + + + + +### Metadata + +{/* Change Tracking can detect when an [asset's definition metadata](/concepts/metadata-tags/asset-metadata#attaching-definition-metadata) has changed, whether it's been added, modified, or removed. */} +Change Tracking can detect when an [asset's definition metadata](/todo) has changed, whether it's been added, modified, or removed. + + + + +In this example, the `produtcs` asset has a **Changed in branch** label indicating it has changed metadata. + +Click the **Asset definition** tab to see the code change that created this label. + +![Change tracking metadata](/images/dagster-cloud/managing-deployments/change-tracking-metadata.png) + + + + +**In the main branch**, we have a `products` asset: + +```python file=/dagster_cloud/branch_deployments/change_tracking_metadata.py startafter=start_main_deployment endbefore=end_main_deployment dedent=4 +@asset(metadata={"expected_columns": ["sku", "price", "supplier"]}) +def products(): ... +``` + +**In the pull request**, we update the value of the `expected_columns` metadata on `products`: + +```python file=/dagster_cloud/branch_deployments/change_tracking_metadata.py startafter=start_branch_deployment endbefore=end_branch_deployment dedent=4 +@asset(metadata={"expected_columns": ["sku", "price", "supplier", "backstock"]}) +def products(): ... +``` + + + + +## Related + +{/* + + + + + +*/} diff --git a/docs/docs-beta/docs/dagster-plus/features/ci-cd/branch-deployments/dagster-cloud-cli.md b/docs/docs-beta/docs/dagster-plus/features/ci-cd/branch-deployments/dagster-cloud-cli.md index f5af56c5b9abe..855cabd09fb49 100644 --- a/docs/docs-beta/docs/dagster-plus/features/ci-cd/branch-deployments/dagster-cloud-cli.md +++ b/docs/docs-beta/docs/dagster-plus/features/ci-cd/branch-deployments/dagster-cloud-cli.md @@ -4,4 +4,156 @@ sidebar_position: 300 unlisted: true --- -{/* TODO copy from https://docs.dagster.io/dagster-plus/managing-deployments/dagster-plus-cli#using-the-dagster-cloud-cli */} \ No newline at end of file +:::note +This guide is applicable to Dagster+. +::: + +The `dagster-cloud` CLI is a command-line toolkit designed to work with Dagster+. + +In this guide, we'll cover how to install and configure the `dagster-cloud` CLI, get help, and use some helpful environment variables and CLI options. + +## Installing the CLI + +The Dagster+ Agent library is available in PyPi. To install, run: + +```shell +pip install dagster-cloud +``` + +Refer to the [configuration section](#configuring-the-cli) for next steps. + +### Completions + +Optionally, you can install command-line completions to make using the `dagster-cloud` CLI easier. + +To have the CLI install these completions to your shell, run: + +```shell +dagster-cloud --install-completion +``` + +To print out the completion for copying or manual installation: + +```shell +dagster-cloud --show-completion +``` + +## Configuring the CLI + +The recommended way to set up your CLI's config for long-term use is through the configuration file, located by default at `~/.dagster_cloud_cli/config`. + +### Setting up the configuration file + +Set up the config file: + +```shell +dagster-cloud config setup +``` + +Select your authentication method. **Note**: Browser authentication is the easiest method to configure. + +
+BROWSER AUTHENTICATION + +The easiest way to set up is to authenticate through the browser. + +```shell +$ dagster-cloud config setup +? How would you like to authenticate the CLI? (Use arrow keys) + » Authenticate in browser + Authenticate using token +Authorized for organization `hooli` + +? Default deployment: prod +``` + +When prompted, you can specify a default deployment. If specified, a deployment won't be required in subsequent `dagster-cloud` commands. The default deployment for a new Dagster+ organization is `prod`. + +
+ +
+TOKEN AUTHENTICATION + +{/* Alternatively, you may authenticate using a user token. Refer to the [Managing user and agent tokens guide](/dagster-plus/account/managing-user-agent-tokens) for more info. */} +Alternatively, you may authenticate using a user token. Refer to the [Managing user and agent tokens guide](/todo) for more info. + +```shell +$ dagster-cloud config setup +? How would you like to authenticate the CLI? (Use arrow keys) + Authenticate in browser + » Authenticate using token + +? Dagster+ organization: hooli +? Dagster+ user token: ************************************* +? Default deployment: prod +``` + +When prompted, specify the following: + +- **Organization** - Your organization name as it appears in your Dagster+ URL. For example, if your Dagster+ instance is `https://hooli.dagster.cloud/`, this would be `hooli`. +- **User token** - The user token. +- **Default deployment** - **Optional**. A default deployment. If specified, a deployment won't be required in subsequent `dagster-cloud` commands. The default deployment for a new Dagster+ organization is `prod`. + +
+ +### Viewing and modifying the configuration file + +To view the contents of the CLI configuration file, run: + +```shell +$ dagster-cloud config view + +default_deployment: prod +organization: hooli +user_token: '*******************************8214fe' +``` + +Specify the `--show-token` flag to show the full user token. + +To modify the existing config, re-run: + +```shell +dagster-cloud config setup +``` + +## Toggling between deployments + +To quickly toggle between deployments, run: + +```shell +dagster-cloud config set-deployment +``` + +## Getting help + +To view help options in the CLI: + +```shell +dagster-cloud --help +``` + +## Reference + +- [Custom configuration file path](#custom-configuration-file-path) +- [Environment variables and CLI options](#environment-variables-and-cli-options) + +### Custom configuration file path + +Point the CLI at an alternate config location by specifying the `DAGSTER_CLOUD_CLI_CONFIG` environment variable. + +### Environment variables and CLI options + +Environment variables and CLI options can be used in place of or to override the CLI configuration file. + +The priority of these items is as follows: + +- **CLI options** - highest +- **Environment variables** +- **CLI configuration** - lowest + +| Setting | Environment variable | CLI flag | CLI config value | +| ------------ | ---------------------------- | ---------------------- | -------------------- | +| Organization | `DAGSTER_CLOUD_ORGANIZATION` | `--organization`, `-o` | `organization` | +| Deployment | `DAGSTER_CLOUD_DEPLOYMENT` | `--deployment`, `-d` | `default_deployment` | +| User Token | `DAGSTER_CLOUD_API_TOKEN` | `--user-token`, `-u` | `user_token` | + diff --git a/docs/docs-beta/docs/dagster-plus/features/ci-cd/branch-deployments/testing.md b/docs/docs-beta/docs/dagster-plus/features/ci-cd/branch-deployments/testing.md index 8106ced129daa..cdc92aeb8bb45 100644 --- a/docs/docs-beta/docs/dagster-plus/features/ci-cd/branch-deployments/testing.md +++ b/docs/docs-beta/docs/dagster-plus/features/ci-cd/branch-deployments/testing.md @@ -4,4 +4,489 @@ sidebar_position: 400 unlisted: true --- -{/* TODO move from https://docs.dagster.io/guides/dagster/branch_deployments#testing-against-production-with-dagster-branch-deployments */} \ No newline at end of file +:::note +This guide is applicable to Dagster+. +::: + +This guide details a workflow to test Dagster code in your cloud environment without impacting your production data. To highlight this functionality, we’ll leverage Dagster+ branch deployments and a Snowflake database to: + +- Execute code on a feature branch directly on Dagster+ +- Read and write to a unique per-branch clone of our Snowflake data + +With these tools, we can merge changes with confidence in the impact on our data platform and with the assurance that our code will execute as intended. + +Here’s an overview of the main concepts we’ll be using: + +{/* - [Assets](/concepts/assets/software-defined-assets) - We'll define three assets that each persist a table to Snowflake. */} +- [Assets](/todo) - We'll define three assets that each persist a table to Snowflake. +{/* - [Ops](/concepts/ops-jobs-graphs/ops) - We'll define two ops that query Snowflake: the first will clone a database, and the second will drop database clones. */} +- [Ops](/todo) - We'll define two ops that query Snowflake: the first will clone a database, and the second will drop database clones. +{/* - [Graphs](/concepts/ops-jobs-graphs/graphs) - We'll build graphs that define the order our ops should run. */} +- [Graphs](/todo) - We'll build graphs that define the order our ops should run. +{/* - [Jobs](/concepts/assets/asset-jobs) - We'll define jobs by binding our graphs to resources. */} +- [Jobs](/todo) - We'll define jobs by binding our graphs to resources. +{/* - [Resources](/concepts/resources) - We'll use the to swap in different Snowflake connections to our jobs depending on environment. */} +- [Resources](/todo) - We'll use the to swap in different Snowflake connections to our jobs depending on environment. +{/* - [I/O managers](/concepts/io-management/io-managers) - We'll use a Snowflake I/O manager to persist asset outputs to Snowflake. */} +- [I/O managers](/todo) - We'll use a Snowflake I/O manager to persist asset outputs to Snowflake. + +--- + +## Prerequisites + +:::note + This guide is an extension of the Transitioning data pipelines from development to production guide, illustrating a workflow for staging deployments. We’ll use the examples from this guide to build a workflow atop Dagster+’s branch deployment feature. +::: + +To complete the steps in this guide, you'll need: + +- A Dagster+ account +{/* - An existing Branch Deployments setup that uses [GitHub actions](/dagster-plus/managing-deployments/branch-deployments/using-branch-deployments-with-github) or [Gitlab CI/CD](/dagster-plus/managing-deployments/branch-deployments/using-branch-deployments-with-gitlab). Your setup should contain a Dagster project set up for branch deployments containing: */} +- An existing Branch Deployments setup that uses [GitHub actions](/todo) or [Gitlab CI/CD](/todo). Your setup should contain a Dagster project set up for branch deployments containing: + - Either a GitHub actions workflow file (e.g. `.github/workflows/branch-deployments.yaml`) or a Gitlab CI/CD file (e.g. `.gitlab-ci.yml`) + - Dockerfile that installs your Dagster project +{/* - User permissions in Dagster+ that allow you to [access Branch Deployments](/dagster-plus/account/managing-users/managing-user-roles-permissions) */} +- User permissions in Dagster+ that allow you to [access Branch Deployments](/todo) + +--- + +## Overview + +We have a `PRODUCTION` Snowflake database with a schema named `HACKER_NEWS`. In our production cloud environment, we’d like to write tables to Snowflake containing subsets of Hacker News data. These tables will be: + +- `ITEMS` - A table containing the entire dataset +- `COMMENTS` - A table containing data about comments +- `STORIES` - A table containing data about stories + +To set up a branch deployment workflow to construct and test these tables, we will: + +{/* 1. Define these tables as [assets](/concepts/assets/software-defined-assets). */} +1. Define these tables as [assets](/todo). +2. Configure our assets to write to Snowflake using a different connection (credentials and database name) for two environments: production and branch deployment. +3. Write a job that will clone the production database upon each branch deployment launch. Each clone will be named `PRODUCTION_CLONE_`, where `` is the pull request ID of the branch. Then we'll create a branch deployment and test our Hacker News assets against our newly cloned database. +4. Write a job that will delete the corresponding database clone upon closing the feature branch. + +--- + +## Step 1: Create our assets + +{/* In production, we want to write three tables to Snowflake: `ITEMS`, `COMMENTS`, and `STORIES`. We can define these tables as [assets](/concepts/assets/software-defined-assets) as follows: */} +In production, we want to write three tables to Snowflake: `ITEMS`, `COMMENTS`, and `STORIES`. We can define these tables as [assets](/todo) as follows: + +```python file=/guides/dagster/development_to_production/assets.py startafter=start_assets endbefore=end_assets +# assets.py +import pandas as pd +import requests + +from dagster import Config, asset + + +class ItemsConfig(Config): + base_item_id: int + + +@asset( + io_manager_key="snowflake_io_manager", +) +def items(config: ItemsConfig) -> pd.DataFrame: + """Items from the Hacker News API: each is a story or a comment on a story.""" + rows = [] + max_id = requests.get( + "https://hacker-news.firebaseio.com/v0/maxitem.json", timeout=5 + ).json() + # Hacker News API is 1-indexed, so adjust range by 1 + for item_id in range(max_id - config.base_item_id + 1, max_id + 1): + item_url = f"https://hacker-news.firebaseio.com/v0/item/{item_id}.json" + rows.append(requests.get(item_url, timeout=5).json()) + + # ITEM_FIELD_NAMES is a list of the column names in the Hacker News dataset + result = pd.DataFrame(rows, columns=ITEM_FIELD_NAMES).drop_duplicates(subset=["id"]) + result.rename(columns={"by": "user_id"}, inplace=True) + return result + + +@asset( + io_manager_key="snowflake_io_manager", +) +def comments(items: pd.DataFrame) -> pd.DataFrame: + """Comments from the Hacker News API.""" + return items[items["type"] == "comment"] + + +@asset( + io_manager_key="snowflake_io_manager", +) +def stories(items: pd.DataFrame) -> pd.DataFrame: + """Stories from the Hacker News API.""" + return items[items["type"] == "story"] +``` + +{/* As you can see, our assets use an [I/O manager](/concepts/io-management/io-managers) named `snowflake_io_manager`. Using I/O managers and other resources allow us to swap out implementations per environment without modifying our business logic. */} +As you can see, our assets use an [I/O manager](/todo) named `snowflake_io_manager`. Using I/O managers and other resources allow us to swap out implementations per environment without modifying our business logic. + +--- + +## Step 2: Configure our assets for each environment + +At runtime, we’d like to determine which environment our code is running in: branch deployment, or production. This information dictates how our code should execute, specifically with which credentials and with which database. + +To ensure we can't accidentally write to production from within our branch deployment, we’ll use a different set of credentials from production and write to our database clone. + +{/* Dagster automatically sets certain [environment variables](/dagster-plus/managing-deployments/reserved-environment-variables) containing deployment metadata, allowing us to read these environment variables to discern between deployments. We can access the `DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT` environment variable to determine the currently executing environment. */} +Dagster automatically sets certain [environment variables](/todo) containing deployment metadata, allowing us to read these environment variables to discern between deployments. We can access the `DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT` environment variable to determine the currently executing environment. + +Because we want to configure our assets to write to Snowflake using a different set of credentials and database in each environment, we’ll configure a separate I/O manager for each environment: + +```python file=/guides/dagster/development_to_production/branch_deployments/repository_v1.py startafter=start_repository endbefore=end_repository +# definitions.py +from dagster import Definitions + +from ..assets import comments, items, stories + +snowflake_config = { + "account": "abc1234.us-east-1", + "user": "system@company.com", + "password": {"env": "SYSTEM_SNOWFLAKE_PASSWORD"}, + "schema": "HACKER_NEWS", +} + +resources = { + "branch": { + "snowflake_io_manager": SnowflakePandasIOManager( + **snowflake_config, + database=f"PRODUCTION_CLONE_{os.getenv('DAGSTER_CLOUD_PULL_REQUEST_ID')}", + ), + }, + "prod": { + "snowflake_io_manager": SnowflakePandasIOManager( + **snowflake_config, + database="PRODUCTION", + ), + }, +} + + +def get_current_env(): + is_branch_depl = os.getenv("DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT") == "1" + assert is_branch_depl is not None # env var must be set + return "branch" if is_branch_depl else "prod" + + +defs = Definitions( + assets=[items, comments, stories], resources=resources[get_current_env()] +) +``` + +{/* Refer to the [Dagster+ environment variables documentation](/dagster-plus/managing-deployments/environment-variables-and-secrets) for more info about available environment variables. */} +Refer to the [Dagster+ environment variables documentation](/todo) for more info about available environment variables. + +--- + +## Step 3: Create jobs to manage database cloning per branch deployment + +We’ll first need to define a job that clones our `PRODUCTION` database for each branch deployment. Later, in our GitHub actions workflow, we can trigger this job to run upon each redeploy. Each clone will be named `PRODUCTION_CLONE_` with `` representing the pull request ID, ensuring each branch deployment has a unique clone. This job will drop a database clone if it exists and then reclone from production, ensuring each redeployment has a fresh clone of `PRODUCTION`: + +:::note + Why use ops and jobs instead of assets? We'll be writing + ops to clone the production database for each branch deployment and drop the + clone once the branch is merged. In this case, we chose to use ops since we + are primarily interested in the task that's being performed: cloning or + dropping the database. Additionally, we don't need asset-specific features for + these tasks, like viewing them in the Global Asset Graph. +::: + +```python file=/guides/dagster/development_to_production/branch_deployments/clone_and_drop_db.py startafter=start_clone_db endbefore=end_clone_db +from dagster_snowflake import SnowflakeResource + +from dagster import In, Nothing, graph, op + + +@op +def drop_database_clone(snowflake: SnowflakeResource): + with snowflake.get_connection() as conn: + cur = conn.cursor() + cur.execute( + "DROP DATABASE IF EXISTS" + f" PRODUCTION_CLONE_{os.environ['DAGSTER_CLOUD_PULL_REQUEST_ID']}" + ) + + +@op(ins={"start": In(Nothing)}) +def clone_production_database(snowflake: SnowflakeResource): + with snowflake.get_connection() as conn: + cur = conn.cursor() + cur.execute( + "CREATE DATABASE" + f" PRODUCTION_CLONE_{os.environ['DAGSTER_CLOUD_PULL_REQUEST_ID']} CLONE" + ' "PRODUCTION"' + ) + + +@graph +def clone_prod(): + clone_production_database(start=drop_database_clone()) + + +@graph +def drop_prod_clone(): + drop_database_clone() +``` + +We’ve defined `drop_database_clone` and `clone_production_database` to utilize the . The Snowflake resource will use the same configuration as the Snowflake I/O manager to generate a connection to Snowflake. However, while our I/O manager writes outputs to Snowflake, the Snowflake resource executes queries against Snowflake. + +We now need to define resources that configure our jobs to the current environment. We can modify the resource mapping by environment as follows: + +```python file=/guides/dagster/development_to_production/branch_deployments/repository_v2.py startafter=start_resources endbefore=end_resources +resources = { + "branch": { + "snowflake_io_manager": SnowflakePandasIOManager( + **snowflake_config, + database=f"PRODUCTION_CLONE_{os.getenv('DAGSTER_CLOUD_PULL_REQUEST_ID')}", + ), + "snowflake": SnowflakeResource( + **snowflake_config, + database=f"PRODUCTION_CLONE_{os.getenv('DAGSTER_CLOUD_PULL_REQUEST_ID')}", + ), + }, + "prod": { + "snowflake_io_manager": SnowflakePandasIOManager( + **snowflake_config, + database="PRODUCTION", + ), + "snowflake": SnowflakeResource(**snowflake_config, database="PRODUCTION"), + }, +} +``` + +Then, we can add the `clone_prod` and `drop_prod_clone` jobs that now use the appropriate resource to the environment and add them to our definitions: + +```python file=/guides/dagster/development_to_production/branch_deployments/repository_v2.py startafter=start_repository endbefore=end_repository +branch_deployment_jobs = [ + clone_prod.to_job(), + drop_prod_clone.to_job(), +] +defs = Definitions( + assets=[items, comments, stories], + resources=resources[get_current_env()], + jobs=( + branch_deployment_jobs + if os.getenv("DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT") == "1" + else [] + ), +) +``` + +--- + +## Step 4: Create our database clone upon opening a branch + + + + +The `branch_deployments.yml` file located in `.github/workflows/branch_deployments.yml` defines a `dagster_cloud_build_push` job with a series of steps that launch a branch deployment. Because we want to queue a run of `clone_prod` within each deployment after it launches, we'll add an additional step at the end `dagster_cloud_build_push`. This job is triggered on multiple pull request events: `opened`, `synchronize`, `reopen`, and `closed`. This means that upon future pushes to the branch, we'll trigger a run of `clone_prod`. The `if` condition below ensures that `clone_prod` will not run if the pull request is closed: + +```yaml file=/guides/dagster/development_to_production/branch_deployments/clone_prod.yaml +# .github/workflows/branch_deployments.yml + +name: Dagster Branch Deployments + on: + pull_request: + types: [opened, synchronize, reopened, closed] + env: + DAGSTER_CLOUD_URL: ${{ secrets.DAGSTER_CLOUD_URL }} + + jobs: + dagster_cloud_build_push: + runs-on: ubuntu-latest + name: Dagster Branch Deployments + strategy: + ... + steps: + # Existing steps here + ... + - name: Clone Snowflake schema upon launch + if: github.event.action != 'closed' + uses: dagster-io/dagster-cloud-action/actions/utils/run@v0.1 + with: + location_name: ${{ matrix.location.name }} + deployment: ${{ steps.deploy.outputs.deployment }} + job_name: clone_prod + env: + DAGSTER_CLOUD_URL: ${{ secrets.DAGSTER_CLOUD_URL }} + DAGSTER_CLOUD_API_TOKEN: ${{ secrets.DAGSTER_CLOUD_API_TOKEN }} +``` + +Opening a pull request for our current branch will automatically kick off a branch deployment. After the deployment launches, we can confirm that the `clone_prod` job has run: + +![Instance overview](/images/guides/development_to_production/branch_deployments/instance_overview.png) + +Alternatively, the logs for the branch deployment workflow can be found in the **Actions** tab on the GitHub pull request. + +We can also view our database in Snowflake to confirm that a clone exists for each branch deployment. When we materialize our assets within our branch deployment, we’ll now be writing to our clone of `PRODUCTION`. Within Snowflake, we can run queries against this clone to confirm the validity of our data: + +![Instance overview](/images/guides/development_to_production/branch_deployments/snowflake.png) + + + + +The `.gitlab-ci.yaml` script contains a `deploy` job that defines a series of steps that launch a branch deployment. Because we want to queue a run of `clone_prod` within each deployment after it launches, we'll add an additional step at the end of `deploy`. This job is triggered on when a merge request is created or updated. This means that upon future pushes to the branch, we'll trigger a run of `clone_prod`. + +```yaml file=/guides/dagster/development_to_production/branch_deployments/clone_prod.gitlab-ci.yml +# .gitlab-ci.yml + +stages: + - setup + - build + - deploy + +workflow: + rules: + - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH + - if: $CI_PIPELINE_SOURCE == 'merge_request_event' + +parse-workspace: + ... + +build-image: + ... + +deploy-docker: + ... + +deploy-docker-branch: + stage: deploy + rules: + - if: $CI_PIPELINE_SOURCE == 'merge_request_event' + dependencies: + - build-image + - parse-workspace + image: ghcr.io/dagster-io/dagster-cloud-action:0.1.23 + script: + # Existing steps here + ... + + # Add a step to launch the job cloning the prod db + - dagster-plus job launch + --url "$DAGSTER_CLOUD_URL/$DEPLOYMENT_NAME" + --api-token "$DAGSTER_CLOUD_API_TOKEN" + --location "location_name_containing_clone_prod_job" + --job clone_prod + environment: + name: branch/$CI_COMMIT_REF_NAME + on_stop: close_branch + +close_branch: + ... +``` + +Opening a merge request for our current branch will automatically kick off a branch deployment. After the deployment launches, we can confirm that the `clone_prod` job has run: + +![Instance overview](/images/guides/development_to_production/branch_deployments/instance_overview.png) + +We can also view our database in Snowflake to confirm that a clone exists for each branch deployment. When we materialize our assets within our branch deployment, we’ll now be writing to our clone of `PRODUCTION`. Within Snowflake, we can run queries against this clone to confirm the validity of our data: + +![Instance overview](/images/guides/development_to_production/branch_deployments/snowflake.png) + + + + + +--- + +## Step 5: Delete our database clone upon closing a branch + + + + +Finally, we can add a step to our `branch_deployments.yml` file that queues a run of our `drop_prod_clone` job: + +```yaml file=/guides/dagster/development_to_production/branch_deployments/drop_db_clone.yaml +# .github/workflows/branch_deployments.yml + +name: Dagster Branch Deployments + on: + pull_request: + types: [opened, synchronize, reopened, closed] + env: + DAGSTER_CLOUD_URL: ${{ secrets.DAGSTER_CLOUD_URL }} + + jobs: + dagster_cloud_build_push: + runs-on: ubuntu-latest + name: Dagster Branch Deployments + strategy: + ... + steps: + # Existing steps here + ... + - name: Clone Snowflake schema upon launch + ... + - name: Delete schema clone upon PR close + if: github.event.action == 'closed' + uses: dagster-io/dagster-cloud-action/actions/utils/run@v0.1 + with: + location_name: ${{ matrix.location.name }} + deployment: ${{ steps.deploy.outputs.deployment }} + job_name: drop_prod_clone + env: + DAGSTER_CLOUD_URL: ${{ secrets.DAGSTER_CLOUD_URL }} + DAGSTER_CLOUD_API_TOKEN: ${{ secrets.DAGSTER_CLOUD_API_TOKEN }} +``` + + + + +Finally, we can add a step to our `.gitlab-ci.yml` file that queues a run of our `drop_prod_clone` job: + +```yaml file=/guides/dagster/development_to_production/branch_deployments/drop_db_clone.gitlab-ci.yml +# .gitlab-ci.yml + +stages: + - setup + - build + - deploy + +workflow: + rules: + - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH + - if: $CI_PIPELINE_SOURCE == 'merge_request_event' + +parse-workspace: + ... + +build-image: + ... + +deploy-docker: + ... + +deploy-docker-branch: + ... + +close_branch: + stage: deploy + image: ghcr.io/dagster-io/dagster-cloud-action:0.1.23 + when: manual + only: + - merge_requests + script: + # Existing steps here + ... + + # Add a step to launch the job dropping the cloned db + - dagster-plus job launch + --url "$DAGSTER_CLOUD_URL/$DEPLOYMENT_NAME" + --api-token "$DAGSTER_CLOUD_API_TOKEN" + --location "location_name_containing_drop_prod_clone_job" + --job drop_prod_clone + environment: + name: branch/$CI_COMMIT_REF_NAME + action: stop +``` + + + + +After merging our branch, viewing our Snowflake database will confirm that our branch deployment step has successfully deleted our database clone. + +We’ve now built an elegant workflow that enables future branch deployments to automatically have access to their own clones of our production database that are cleaned up upon merge! diff --git a/docs/docs-beta/docs/dagster-plus/features/ci-cd/ci-cd-file-reference.md b/docs/docs-beta/docs/dagster-plus/features/ci-cd/ci-cd-file-reference.md index 21c585c75361c..a183855854c96 100644 --- a/docs/docs-beta/docs/dagster-plus/features/ci-cd/ci-cd-file-reference.md +++ b/docs/docs-beta/docs/dagster-plus/features/ci-cd/ci-cd-file-reference.md @@ -1,6 +1,176 @@ --- title: CI/CD file reference -unlisted: true --- -{/* TODO copy from https://docs.dagster.io/dagster-plus/references/ci-cd-file-reference */} \ No newline at end of file +:::note +This reference is applicable to Dagster+. +::: + +When you import a project into Dagster+ from GitHub or Gitlab, a few `.yml` files will be added to the repository. These files are essential as they manage the deployments in Dagster+. + +## branch_deployments.yml + + + + + + + + + + + + + + + + + + + + +
+ Name + branch_deployments.yml
+ Status + Active
+ Required + + Required to use{" "} + + Branch Deployments + +
+ Description + + Defines the steps required to use Branch Deployments.
+
+ Note: This file must be manually added to the + repository if using a{" "} + Hybrid deployment. +
+ +## deploy.yml + + + + + + + + + + + + + + + + + + + + +
+ Name + deploy.yml
+ Status + Active
+ Required + Required for Dagster+
+ Description + + Defines the steps required to deploy a project in Dagster+, including + running checks, checking out the project directory, and deploying the + project. Additionally, note the following: +
    +
  • + + If using a{" "} + Hybrid deployment + + , this file must be manually added to the repository. +
  • +
  • + If using dbt, some steps may need to be added to + successfully deploy your project. Refer to the{" "} + + Using dbt with Dagster+ guide + {" "} + for more information. +
  • +
+
+ +## Related + +{/* + + + + + + + + +*/} diff --git a/docs/docs-beta/docs/integrations/libraries/duckdb.md b/docs/docs-beta/docs/integrations/libraries/duckdb.md index e8097b5040ed8..a5ea4fd28d53f 100644 --- a/docs/docs-beta/docs/integrations/libraries/duckdb.md +++ b/docs/docs-beta/docs/integrations/libraries/duckdb.md @@ -17,9 +17,8 @@ enables: tags: [dagster-supported, storage] --- - - -This library provides an integration with the DuckDB database, and allows for an out-of-the-box [I/O Manager](https://docs.dagster.io/concepts/io-management/io-managers) so that you can make DuckDB your storage of choice. +{/* This library provides an integration with the DuckDB database, and allows for an out-of-the-box [I/O Manager](/concepts/io-management/io-managers) so that you can make DuckDB your storage of choice. */} +This library provides an integration with the DuckDB database, and allows for an out-of-the-box [I/O Manager](/todo) so that you can make DuckDB your storage of choice. ### Installation diff --git a/docs/docs-beta/static/images/dagster-cloud/managing-deployments/change-tracking-code-version.png b/docs/docs-beta/static/images/dagster-cloud/managing-deployments/change-tracking-code-version.png new file mode 100644 index 0000000000000..d31aedbd26f8c Binary files /dev/null and b/docs/docs-beta/static/images/dagster-cloud/managing-deployments/change-tracking-code-version.png differ diff --git a/docs/docs-beta/static/images/dagster-cloud/managing-deployments/change-tracking-dependencies.png b/docs/docs-beta/static/images/dagster-cloud/managing-deployments/change-tracking-dependencies.png new file mode 100644 index 0000000000000..04d957f47a9ec Binary files /dev/null and b/docs/docs-beta/static/images/dagster-cloud/managing-deployments/change-tracking-dependencies.png differ diff --git a/docs/docs-beta/static/images/dagster-cloud/managing-deployments/change-tracking-metadata.png b/docs/docs-beta/static/images/dagster-cloud/managing-deployments/change-tracking-metadata.png new file mode 100644 index 0000000000000..6332aaac0484a Binary files /dev/null and b/docs/docs-beta/static/images/dagster-cloud/managing-deployments/change-tracking-metadata.png differ diff --git a/docs/docs-beta/static/images/dagster-cloud/managing-deployments/change-tracking-new.png b/docs/docs-beta/static/images/dagster-cloud/managing-deployments/change-tracking-new.png new file mode 100644 index 0000000000000..3a3afa8bf1439 Binary files /dev/null and b/docs/docs-beta/static/images/dagster-cloud/managing-deployments/change-tracking-new.png differ diff --git a/docs/docs-beta/static/images/dagster-cloud/managing-deployments/change-tracking-partitions.png b/docs/docs-beta/static/images/dagster-cloud/managing-deployments/change-tracking-partitions.png new file mode 100644 index 0000000000000..3e98ecaa4a9f4 Binary files /dev/null and b/docs/docs-beta/static/images/dagster-cloud/managing-deployments/change-tracking-partitions.png differ diff --git a/docs/docs-beta/static/images/dagster-cloud/managing-deployments/change-tracking-tags.png b/docs/docs-beta/static/images/dagster-cloud/managing-deployments/change-tracking-tags.png new file mode 100644 index 0000000000000..6fcc7a29acf96 Binary files /dev/null and b/docs/docs-beta/static/images/dagster-cloud/managing-deployments/change-tracking-tags.png differ diff --git a/docs/docs-beta/static/images/guides/development_to_production/branch_deployments/instance_overview.png b/docs/docs-beta/static/images/guides/development_to_production/branch_deployments/instance_overview.png new file mode 100644 index 0000000000000..f6c13e40a7fa6 Binary files /dev/null and b/docs/docs-beta/static/images/guides/development_to_production/branch_deployments/instance_overview.png differ diff --git a/docs/docs-beta/static/images/guides/development_to_production/branch_deployments/snowflake.png b/docs/docs-beta/static/images/guides/development_to_production/branch_deployments/snowflake.png new file mode 100644 index 0000000000000..5fceb084dc5bf Binary files /dev/null and b/docs/docs-beta/static/images/guides/development_to_production/branch_deployments/snowflake.png differ