Skip to content

Commit

Permalink
Add structure for remaining guides content (#26423)
Browse files Browse the repository at this point in the history
## Summary & Motivation

Reorganize 'guides' and 'getting started' content (['Docs'
section](https://docs-preview.dagster.io/) of docs) to prepare for
remaining content.

No need for a line-level review on this one; we just need to make sure
the tests are green (except Vale—that's a bigger problem), and that
staging loads and looks basically fine.

## How I Tested These Changes

Local build

## Changelog

> Insert changelog entry or delete this section.

---------

Signed-off-by: nikki everett <[email protected]>
  • Loading branch information
neverett authored Jan 2, 2025
1 parent 24b60b3 commit 9de40ee
Show file tree
Hide file tree
Showing 116 changed files with 416 additions and 240 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@ The default I/O manager cannot be used if you are a Serverless user who:
- Are otherwise working with data subject to GDPR or other such regulations
:::

In Serverless, code that uses the default [I/O manager](/guides/build/configure/io-managers) is automatically adjusted to save data in Dagster+ managed storage. This automatic change is useful because the Serverless filesystem is ephemeral, which means the default I/O manager wouldn't work as expected.
In Serverless, code that uses the default [I/O manager](/guides/operate/io-managers) is automatically adjusted to save data in Dagster+ managed storage. This automatic change is useful because the Serverless filesystem is ephemeral, which means the default I/O manager wouldn't work as expected.

However, this automatic change also means potentially sensitive data could be **stored** and not just processed or orchestrated by Dagster+.

To prevent this, you can use [another I/O manager](/guides/build/configure/io-managers#built-in) that stores data in your infrastructure or [adapt your code to avoid using an I/O manager](/guides/build/configure/io-managers#before-you-begin).
To prevent this, you can use [another I/O manager](/guides/operate/io-managers#built-in) that stores data in your infrastructure or [adapt your code to avoid using an I/O manager](/guides/operate/io-managers#before-you-begin).

:::note
You must have [boto3](https://pypi.org/project/boto3/) or `dagster-cloud[serverless]` installed as a project dependency otherwise the Dagster+ managed storage can fail and silently fall back to using the default I/O manager.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -132,4 +132,4 @@ compute_logs:
ServerSideEncryption: "AES256"
show_url_only: true
region: "us-west-1"
```
```
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ TODO: add picture previously at "/images/dagster-cloud/user-token-management/cod
| Start and stop [schedules](/guides/automate/schedules) ||||||
| Start and stop [schedules](/guides/automate/sensors) ||||||
| Wipe assets ||||||
| Launch and cancel [schedules](/guides/build/backfill) ||||||
| Launch and cancel [schedules](/guides/automate/schedules) ||||||
| Add dynamic partitions ||||||

### Deployments
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ In this guide, we'll walk you through configuring [Okta SCIM provisioning](https
With Dagster+'s Okta SCIM provisioning feature, you can:

- **Create users**. Users that are assigned to the Dagster+ application in the IdP will be automatically added to your Dagster+ organization.
- **Update user attributes.** Updating a users name or email address in the IdP will automatically sync the change to your user list in Dagster+.
- **Update user attributes.** Updating a user's name or email address in the IdP will automatically sync the change to your user list in Dagster+.
- **Remove users.** Deactivating or unassigning a user from the Dagster+ application in the IdP will remove them from the Dagster+ organization
{/* - **Push user groups.** Groups and their members in the IdP can be pushed to Dagster+ as [Teams](/dagster-plus/account/managing-users/managing-teams). */}
- **Push user groups.** Groups and their members in the IdP can be pushed to Dagster+ as
Expand Down
2 changes: 1 addition & 1 deletion docs/docs-beta/docs/dagster-plus/features/catalog-views.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ In this guide, you'll learn how to create, access, and share catalog views with
<summary>Prerequisites</summary>

- **Organization Admin**, **Admin**, or **Editor** permissions on Dagster+
- Familiarity with [Assets](/guides/build/assets-concepts/index.mdx and [Asset metadata](/guides/build/create-a-pipeline/metadata)
- Familiarity with [Assets](/guides/build/create-asset-pipelines/assets-concepts/index.mdx and [Asset metadata](/guides/build/create-asset-pipelines/metadata)

</details>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ unlisted: true
This guide is applicable to Dagster+.
:::

Branch Deployments Change Tracking makes it eaiser for you and your team to identify how changes in a pull request will impact data assets. By the end of this guide, you'll understand how Change Tracking works and what types of asset changes can be detected.
Branch Deployments Change Tracking makes it easier for you and your team to identify how changes in a pull request will impact data assets. By the end of this guide, you'll understand how Change Tracking works and what types of asset changes can be detected.

## How it works

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@ unlisted: true
This guide is applicable to Dagster+.
:::

This guide details a workflow to test Dagster code in your cloud environment without impacting your production data. To highlight this functionality, well leverage Dagster+ branch deployments and a Snowflake database to:
This guide details a workflow to test Dagster code in your cloud environment without impacting your production data. To highlight this functionality, we'll leverage Dagster+ branch deployments and a Snowflake database to:

- Execute code on a feature branch directly on Dagster+
- Read and write to a unique per-branch clone of our Snowflake data

With these tools, we can merge changes with confidence in the impact on our data platform and with the assurance that our code will execute as intended.

Here’s an overview of the main concepts well be using:
Here’s an overview of the main concepts we'll be using:

{/* - [Assets](/concepts/assets/software-defined-assets) - We'll define three assets that each persist a table to Snowflake. */}
- [Assets](/todo) - We'll define three assets that each persist a table to Snowflake.
Expand All @@ -35,7 +35,7 @@ Here’s an overview of the main concepts we’ll be using:
## Prerequisites

:::note
This guide is an extension of the <a href="/guides/dagster/transitioning-data-pipelines-from-development-to-production"> Transitioning data pipelines from development to production </a> guide, illustrating a workflow for staging deployments. Well use the examples from this guide to build a workflow atop Dagster+’s branch deployment feature.
This guide is an extension of the <a href="/guides/dagster/transitioning-data-pipelines-from-development-to-production"> Transitioning data pipelines from development to production </a> guide, illustrating a workflow for staging deployments. We'll use the examples from this guide to build a workflow atop Dagster+’s branch deployment feature.

Check warning on line 38 in docs/docs-beta/docs/dagster-plus/features/ci-cd/branch-deployments/testing.md

View workflow job for this annotation

GitHub Actions / deploy

Do not use an `<a>` element to navigate. Use the `<Link />` component from `@docusaurus/Link` instead. See: https://docusaurus.io/docs/docusaurus-core#link
:::

To complete the steps in this guide, you'll need:
Expand All @@ -52,7 +52,7 @@ To complete the steps in this guide, you'll need:

## Overview

We have a `PRODUCTION` Snowflake database with a schema named `HACKER_NEWS`. In our production cloud environment, wed like to write tables to Snowflake containing subsets of Hacker News data. These tables will be:
We have a `PRODUCTION` Snowflake database with a schema named `HACKER_NEWS`. In our production cloud environment, we'd like to write tables to Snowflake containing subsets of Hacker News data. These tables will be:

- `ITEMS` - A table containing the entire dataset
- `COMMENTS` - A table containing data about comments
Expand Down Expand Up @@ -128,14 +128,14 @@ As you can see, our assets use an [I/O manager](/todo) named `snowflake_io_manag

## Step 2: Configure our assets for each environment

At runtime, wed like to determine which environment our code is running in: branch deployment, or production. This information dictates how our code should execute, specifically with which credentials and with which database.
At runtime, we'd like to determine which environment our code is running in: branch deployment, or production. This information dictates how our code should execute, specifically with which credentials and with which database.

To ensure we can't accidentally write to production from within our branch deployment, well use a different set of credentials from production and write to our database clone.
To ensure we can't accidentally write to production from within our branch deployment, we'll use a different set of credentials from production and write to our database clone.

{/* Dagster automatically sets certain [environment variables](/dagster-plus/managing-deployments/reserved-environment-variables) containing deployment metadata, allowing us to read these environment variables to discern between deployments. We can access the `DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT` environment variable to determine the currently executing environment. */}
Dagster automatically sets certain [environment variables](/todo) containing deployment metadata, allowing us to read these environment variables to discern between deployments. We can access the `DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT` environment variable to determine the currently executing environment.

Because we want to configure our assets to write to Snowflake using a different set of credentials and database in each environment, well configure a separate I/O manager for each environment:
Because we want to configure our assets to write to Snowflake using a different set of credentials and database in each environment, we'll configure a separate I/O manager for each environment:

```python file=/guides/dagster/development_to_production/branch_deployments/repository_v1.py startafter=start_repository endbefore=end_repository
# definitions.py
Expand Down Expand Up @@ -232,7 +232,7 @@ def drop_prod_clone():
drop_database_clone()
```

Weve defined `drop_database_clone` and `clone_production_database` to utilize the <PyObject object="SnowflakeResource" module="dagster_snowflake" />. The Snowflake resource will use the same configuration as the Snowflake I/O manager to generate a connection to Snowflake. However, while our I/O manager writes outputs to Snowflake, the Snowflake resource executes queries against Snowflake.
We've defined `drop_database_clone` and `clone_production_database` to utilize the <PyObject object="SnowflakeResource" module="dagster_snowflake" />. The Snowflake resource will use the same configuration as the Snowflake I/O manager to generate a connection to Snowflake. However, while our I/O manager writes outputs to Snowflake, the Snowflake resource executes queries against Snowflake.

We now need to define resources that configure our jobs to the current environment. We can modify the resource mapping by environment as follows:

Expand Down Expand Up @@ -322,7 +322,7 @@ Opening a pull request for our current branch will automatically kick off a bran

Alternatively, the logs for the branch deployment workflow can be found in the **Actions** tab on the GitHub pull request.

We can also view our database in Snowflake to confirm that a clone exists for each branch deployment. When we materialize our assets within our branch deployment, well now be writing to our clone of `PRODUCTION`. Within Snowflake, we can run queries against this clone to confirm the validity of our data:
We can also view our database in Snowflake to confirm that a clone exists for each branch deployment. When we materialize our assets within our branch deployment, we'll now be writing to our clone of `PRODUCTION`. Within Snowflake, we can run queries against this clone to confirm the validity of our data:

![Instance overview](/images/guides/development_to_production/branch_deployments/snowflake.png)

Expand Down Expand Up @@ -383,7 +383,7 @@ Opening a merge request for our current branch will automatically kick off a bra

![Instance overview](/images/guides/development_to_production/branch_deployments/instance_overview.png)

We can also view our database in Snowflake to confirm that a clone exists for each branch deployment. When we materialize our assets within our branch deployment, well now be writing to our clone of `PRODUCTION`. Within Snowflake, we can run queries against this clone to confirm the validity of our data:
We can also view our database in Snowflake to confirm that a clone exists for each branch deployment. When we materialize our assets within our branch deployment, we'll now be writing to our clone of `PRODUCTION`. Within Snowflake, we can run queries against this clone to confirm the validity of our data:

![Instance overview](/images/guides/development_to_production/branch_deployments/snowflake.png)

Expand Down Expand Up @@ -489,4 +489,4 @@ close_branch:

After merging our branch, viewing our Snowflake database will confirm that our branch deployment step has successfully deleted our database clone.

Weve now built an elegant workflow that enables future branch deployments to automatically have access to their own clones of our production database that are cleaned up upon merge!
We've now built an elegant workflow that enables future branch deployments to automatically have access to their own clones of our production database that are cleaned up upon merge!
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ You'll need one or more assets that emit the same metadata key at run time. Insi
are most valuable when you have multiple assets that emit the same kind of metadata, such as
such as the number of rows processed or the size of a file uploaded to object storage.

Follow [the metadata guide](/guides/build/create-a-pipeline/metadata#runtime-metadata) to add numeric metadata
Follow [the metadata guide](/guides/build/create-asset-pipelines/metadata#runtime-metadata) to add numeric metadata
to your asset materializations.

## Step 2: Enable viewing your metadata in Dagster+ Insights
Expand Down
2 changes: 1 addition & 1 deletion docs/docs-beta/docs/dagster-plus/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Dagster+ is a managed orchestration platform built on top of Dagster's open sour

Dagster+ is built to be the most performant, reliable, and cost effective way for data engineering teams to run Dagster in production. Dagster+ is also great for students, researchers, or individuals who want to explore Dagster with minimal overhead.

Dagster+ comes in two flavors: a fully [Serverless](/dagster-plus/deployment/deployment-types/serverless) offering and a [Hybrid](/dagster-plus/deployment/deployment-types/hybrid) offering. In both cases, Dagster+ does the hard work of managing your data orchestration control plane. Compared to a [Dagster open source deployment](/guides/), Dagster+ manages:
Dagster+ comes in two flavors: a fully [Serverless](/dagster-plus/deployment/deployment-types/serverless) offering and a [Hybrid](/dagster-plus/deployment/deployment-types/hybrid) offering. In both cases, Dagster+ does the hard work of managing your data orchestration control plane. Compared to a [Dagster open source deployment](guides/deploy/index.md), Dagster+ manages:

- Dagster's web UI at https://dagster.plus
- Metadata stores for data cataloging and cost insights
Expand Down
1 change: 0 additions & 1 deletion docs/docs-beta/docs/getting-started/glossary.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
---
title: Glossary
sidebar_position: 30
sidebar_label: Glossary
unlisted: true
---

Expand Down
4 changes: 1 addition & 3 deletions docs/docs-beta/docs/getting-started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 20
sidebar_label: Installation
---

# Installing Dagster

To follow the steps in this guide, you'll need:

- To install Python 3.9 or higher. **Python 3.12 is recommended**.
Expand Down Expand Up @@ -72,4 +70,4 @@ If you encounter any issues during the installation process:
## Next steps

- Get up and running with your first Dagster project in the [Quickstart](/getting-started/quickstart)
- Learn to [create data assets in Dagster](/guides/build/create-a-pipeline/data-assets)
- Learn to [create data assets in Dagster](/guides/build/create-asset-pipelines/data-assets)
6 changes: 2 additions & 4 deletions docs/docs-beta/docs/getting-started/quickstart.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
---
title: "Dagster quickstart"
title: Build your first Dagster project
description: Learn how to quickly get up and running with Dagster
sidebar_position: 30
sidebar_label: "Quickstart"
---

# Build your first Dagster project

Welcome to Dagster! In this guide, you'll use Dagster to create a basic pipeline that:

- Extracts data from a CSV file
Expand Down Expand Up @@ -154,4 +152,4 @@ id,name,age,city,age_group
Congratulations! You've just built and run your first pipeline with Dagster. Next, you can:

- Continue with the [ETL pipeline tutorial](/tutorial/tutorial-etl) to learn how to build a more complex ETL pipeline
- Learn how to [Think in assets](/guides/build/assets-concepts/index.md)
- Learn how to [Think in assets](/guides/build/create-asset-pipelines/assets-concepts/index.md)
4 changes: 2 additions & 2 deletions docs/docs-beta/docs/guides/automate/about-automation.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ title: About Automation
unlisted: true
---

{/* TODO combine with index page and delete this page */}

There are several ways to automate the execution of your data pipelines with Dagster.

The first system, and the most basic, is the [Schedule](/guides/automate/schedules), which responds to time.
Expand All @@ -24,8 +26,6 @@ as the schedule is processed.
Schedules were one of the first types of automation in Dagster, created before the introduction of Software-Defined Assets.
As such, you may find that many of the examples can seem foreign if you are used to only working within the asset framework.

For more on how assets and ops inter-relate, read about [Assets and Ops](/guides/build/assets-concepts#assets-and-ops)

The `dagster-daemon` process is responsible for submitting runs by checking each schedule at a regular interval to determine
if it's time to execute the underlying job.

Expand Down
21 changes: 8 additions & 13 deletions docs/docs-beta/docs/guides/automate/asset-sensors.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,17 @@
---
title: Triggering cross-job dependencies with Asset Sensors
sidebar_position: 300
sidebar_label: Cross-job dependencies
title: Trigger cross-job dependencies with asset sensors
sidebar_position: 40
---

Asset sensors in Dagster provide a powerful mechanism for monitoring asset materializations and triggering downstream computations or notifications based on those events.

This guide covers the most common use cases for asset sensors, such as defining cross-job and cross-code location dependencies.

<details>
<summary>Prerequisites</summary>
:::note

To follow this guide, you'll need:
This documentation assumes familiarity with [assets](/guides/build/create-asset-pipelines/assets-concepts/index.md) and [ops and jobs](/guides/build/ops-jobs)

- Familiarity with [Assets](/guides/build/assets-concepts/index.mdx
- Familiarity with [Ops and Jobs](/guides/build/ops-jobs)

</details>
:::

## Getting started

Expand Down Expand Up @@ -54,7 +49,7 @@ This is an example of an asset sensor that triggers a job when an asset is mater

<CodeExample filePath="guides/automation/simple-asset-sensor-example.py" language="python" />

## Customize evaluation logic
## Customizing the evaluation function of an asset sensor

You can customize the evaluation function of an asset sensor to include specific logic for deciding when to trigger a run. This allows for fine-grained control over the conditions under which downstream jobs are executed.

Expand Down Expand Up @@ -83,15 +78,15 @@ In the following example, the `@asset_sensor` decorator defines a custom evaluat

<CodeExample filePath="guides/automation/asset-sensor-custom-eval.py" language="python"/>

## Trigger a job with configuration
## Triggering a job with custom configuration

By providing a configuration to the `RunRequest` object, you can trigger a job with a specific configuration. This is useful when you want to trigger a job with custom parameters based on custom logic you define.

For example, you might use a sensor to trigger a job when an asset is materialized, but also pass metadata about that materialization to the job:

<CodeExample filePath="guides/automation/asset-sensor-with-config.py" language="python" />

## Monitor multiple assets
## Monitoring multiple assets

When building a pipeline, you may want to monitor multiple assets with a single sensor. This can be accomplished with a multi-asset sensor.

Expand Down
6 changes: 0 additions & 6 deletions docs/docs-beta/docs/guides/automate/declarative-automation.md

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
title: Arbitrary Python automation conditions
sidebar_position: 500
unlisted: true
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
title: Automation conditions operands and operators
sidebar_position: 600
unlisted: true
---

## Operands

## Operators

## Composite conditions
Loading

1 comment on commit 9de40ee

@github-actions
Copy link

@github-actions github-actions bot commented on 9de40ee Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deploy preview for dagster-docs-beta ready!

✅ Preview
https://dagster-docs-beta-ly99z0wge-elementl.vercel.app

Built with commit 9de40ee.
This pull request is being automatically deployed with vercel-action

Please sign in to comment.