Skip to content

Commit

Permalink
[DOCS] Schema change detection (#10755)
Browse files Browse the repository at this point in the history
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
klavavej and pre-commit-ci[bot] authored Dec 18, 2024
1 parent fa00348 commit c06a054
Show file tree
Hide file tree
Showing 13 changed files with 70 additions and 63 deletions.
2 changes: 1 addition & 1 deletion docs/docusaurus/docs/cloud/alerts/manage_alerts.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Every time a Data Asset fails a validation run, GX Cloud sends an email to all u

1. In GX Cloud, click **Data Assets**.
2. Click a Data Asset in the **Data Assets** list.
3. Click the **Expectations** tab and then **Alerts**.
3. Click **Alerts**.
4. Click the **toggle switch** to enable or disable email alerts for the Data Asset.

If you disabled an alert, you will stop receiving emails for the Data Asset immediately. If you enabled an alert, you will begin receiving the emails as soon as the Data Asset’s next full validation run is complete.
6 changes: 4 additions & 2 deletions docs/docusaurus/docs/cloud/connect/connect_databrickssql.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ import Tabs from '@theme/Tabs';

5. Select one or more tables to import as Data Assets.

6. Click **Add x Asset(s)**.
6. Decide if you want to **Generate Expectations that detect column changes in selected Data Assets**.

7. Add an Expectation. See [Add an Expectation](/cloud/expectations/manage_expectations.md#add-an-expectation).
7. Click **Add x Asset(s)**.

8. Add an Expectation. See [Add an Expectation](/cloud/expectations/manage_expectations.md#add-an-expectation).
6 changes: 4 additions & 2 deletions docs/docusaurus/docs/cloud/connect/connect_postgresql.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,9 @@ import Tabs from '@theme/Tabs';

5. Select one or more tables to import as Data Assets.

6. Click **Add x Asset(s)**.
6. Decide if you want to **Generate Expectations that detect column changes in selected Data Assets**.

7. Add an Expectation. See [Add an Expectation](/cloud/expectations/manage_expectations.md#add-an-expectation).
7. Click **Add x Asset(s)**.

8. Add an Expectation. See [Add an Expectation](/cloud/expectations/manage_expectations.md#add-an-expectation).

6 changes: 4 additions & 2 deletions docs/docusaurus/docs/cloud/connect/connect_snowflake.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,8 +107,10 @@ Then, you can use GX Cloud to [add a Data Asset](/cloud/data_assets/manage_data_

6. Select one or more tables to import as Data Assets.

7. Click **Add x Asset(s)**.
7. Decide if you want to **Generate Expectations that detect column changes in selected Data Assets**.

8. Add an Expectation. See [Add an Expectation](/cloud/expectations/manage_expectations.md#add-an-expectation).
8. Click **Add x Asset(s)**.

9. Add an Expectation. See [Add an Expectation](/cloud/expectations/manage_expectations.md#add-an-expectation).


10 changes: 5 additions & 5 deletions docs/docusaurus/docs/cloud/data_assets/manage_data_assets.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,19 +28,19 @@ Define the data you want GX Cloud to access.

3. Select one or more tables to import as Data Assets.

4. Click **Add x Asset(s)**.
4. Decide if you want to **Generate Expectations that detect column changes in selected Data Assets**.

5. Click **Add x Asset(s)**.

Then you can [add an Expectation](/cloud/expectations/manage_expectations.md#add-an-expectation) for your new Data Asset.

## View Data Asset metrics

Data Asset metrics provide you with insight into the data you can use for your data validations.
Data Asset metrics provide you with insight into the data you can use for your data validations. When you create a new Data Asset, schema data is automatically fetched.

1. In GX Cloud, click **Data Assets** and then select a Data Asset in the **Data Assets** list.

2. Click the **Overview** tab.

When you select a new Data Asset, schema data is automatically fetched.
2. Click the **Metrics** tab.

3. Optional. Select one of the following options:

Expand Down
28 changes: 12 additions & 16 deletions docs/docusaurus/docs/cloud/expectations/manage_expectations.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,10 @@ To clear the Expectation condition, click the clear button located on the right-

7. Optional. Run a Validation. See [Run a Validation](/cloud/validations/manage_validations.md#run-a-validation).

:::tip Automate rules for schema change detection
When you [create a new Data Asset](/cloud/data_assets/manage_data_assets.md#add-a-data-asset-from-an-existing-data-source), you can choose to automatically generate Expectations that detect column changes in that Data Asset.
:::


## Optional. Define a Batch

Expand All @@ -143,39 +147,33 @@ If your Data Asset has at least one DATE or DATETIME column, you can define a Ba

2. In the **Data Assets** list, click the Data Asset name.

3. Click the **Expectations** tab.
3. Click **Define batch**.

4. Click **Define batch**.
4. Choose how to **Validate by**. Select the **Entire Asset** tab to provide all Data Asset records to your Expectations and validations, or select one of the **Year**/**Month**/**Day** tabs to use subsets of Data Asset records for your Expectations and validations. **Year** partitions Data Asset records by year, **Month** partitions Data Asset records by year and month, **Day** partitions Data Asset records by year, month, and day.

5. Choose how to **Validate by**. Select the **Entire Asset** tab to provide all Data Asset records to your Expectations and validations, or select one of the **Year**/**Month**/**Day** tabs to use subsets of Data Asset records for your Expectations and validations. **Year** partitions Data Asset records by year, **Month** partitions Data Asset records by year and month, **Day** partitions Data Asset records by year, month, and day.

6. Select the **Batch column** that contains the DATE or DATETIME data to partition on.
5. Select the **Batch column** that contains the DATE or DATETIME data to partition on.

## Edit an Expectation

1. In GX Cloud, click **Data Assets**.

2. In the **Data Assets** list, click the Data Asset name.

3. Click the **Expectations** tab.

4. Click **Edit Expectation** for the Expectation that you want to edit.
3. Click **Edit Expectation** for the Expectation that you want to edit.

5. Edit the Expectation configuration.
4. Edit the Expectation configuration.

6. Click **Save**.
5. Click **Save**.

## Delete an Expectation

1. In GX Cloud, click **Data Assets**.

2. In the **Data Assets** list, click the Data Asset name.

3. Click the **Expectations** tab.
3. Click **Delete Expectation** for the Expectation you want to delete.

4. Click **Delete Expectation** for the Expectation you want to delete.

5. Click **Yes, delete Expectation**.
4. Click **Yes, delete Expectation**.

## GX-managed vs. API-managed Expectations

Expand All @@ -197,8 +195,6 @@ Here is a comparison of key characteristics of GX-managed and API-managed Expect
| Expectation Suite | Automatically organized in a hidden default Expectation Suite | Manually grouped into [custom Expectation Suites](/core/define_expectations/organize_expectation_suites.md) via the API |
| Delete | [Delete Expectation](/docs/cloud/expectations/manage_expectations/#delete-an-expectation) with the Cloud UI | [Delete Expectation with the API](/reference/api/ExpectationSuite_class.mdx#great_expectations.ExpectationSuite.delete_expectation) or the Cloud UI |



:::note Hidden resources for GX-managed Expectations
To support GX-managed Expectations, we create resources that you typically won't directly interact with. For example, we create a GX-managed Expectation Suite that we use to organize your Expectations. For some workflows you may need to work with these hidden resources, for example, you may need to [find the name of an automatically created Checkpoint](/cloud/connect/connect_airflow.md#create-a-dag-file-for-your-gx-cloud-checkpoint). But, typically you can ignore the existence of these hidden resources.
:::
3 changes: 2 additions & 1 deletion docs/docusaurus/docs/cloud/overview/gx_cloud_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,13 +46,14 @@ There are a variety of GX Cloud features that support additional enhancements to

* **Data Asset profiling.** GX Cloud introspects your data schema by default on Data Asset creation, and also offers one-click fetching of additional descriptive metrics including column type and statistical summaries. Data profiling results are used to suggest parameters for Expectations that you create.

* **Automate schema change detection.** GX Cloud can automatically generate Expectations that detect column changes. This option is available when [you create new Data Assets](/cloud/data_assets/manage_data_assets.md#add-a-data-asset-from-an-existing-data-source).

* **Schedule Validations.** GX Cloud enables you to schedule validations, so that you can test and assess your data on a regular cadence and monitor data quality over time. See [Manage schedules](/cloud/schedules/manage_schedules.md) for more detail.

* **Alerting.** GX Cloud provides the ability to send alerts when validations fail, enabling your organization to remain proactively aware of the health of your Data Assets. See [Manage alerts](/cloud/alerts/manage_alerts.md) for more detail.




## GX Cloud architecture

GX Cloud architecture comprises a frontend web UI, storage for entity configuration and metadata, a backend application, and a Python client.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<mxfile host="app.diagrams.net" agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" version="24.8.6" pages="2">
<mxfile host="app.diagrams.net" agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36" version="24.8.6" pages="2">
<diagram name="workflows" id="D7wG57QJHLXb0QdnoCGA">
<mxGraphModel dx="1307" dy="1107" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="1100" pageHeight="850" math="0" shadow="0">
<mxGraphModel dx="1385" dy="1081" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="1100" pageHeight="850" math="0" shadow="0">
<root>
<mxCell id="0" />
<mxCell id="1" parent="0" />
Expand Down Expand Up @@ -83,7 +83,7 @@
<mxGeometry x="657.05" y="120" width="160.2" height="120" as="geometry" />
</mxCell>
<mxCell id="-vgSHcs_DyUJYBKirj9j-12" value="Schedule recurring Validations" style="rounded=0;whiteSpace=wrap;html=1;fontSize=16;strokeWidth=3;strokeColor=none;fontStyle=0;fillColor=#aae1da;" parent="1" vertex="1">
<mxGeometry x="757.05" y="450" width="120" height="80" as="geometry" />
<mxGeometry x="757.05" y="440" width="120" height="80" as="geometry" />
</mxCell>
<mxCell id="MKeaaogaEU1yrKxKLzHu-5" value="" style="rounded=0;whiteSpace=wrap;html=1;fontSize=16;strokeWidth=3;strokeColor=#FF6310;fillColor=none;sketch=1;curveFitting=1;jiggle=2;" parent="1" vertex="1">
<mxGeometry x="20" y="120" width="117.25" height="120" as="geometry" />
Expand Down Expand Up @@ -158,7 +158,19 @@
<mxGeometry x="20" y="560" width="140" height="120" as="geometry" />
</mxCell>
<mxCell id="-vgSHcs_DyUJYBKirj9j-15" value="" style="rounded=0;whiteSpace=wrap;html=1;fontSize=16;strokeWidth=3;strokeColor=#3fa298;fontStyle=0;sketch=1;curveFitting=1;jiggle=2;fillColor=none;" parent="1" vertex="1">
<mxGeometry x="757.05" y="450" width="120" height="80" as="geometry" />
<mxGeometry x="757.05" y="440" width="120" height="80" as="geometry" />
</mxCell>
<mxCell id="eltq8oMHbreIp49xVEPe-5" value="" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;endArrow=block;endFill=1;strokeWidth=3;strokeColor=#404041;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;fontSize=16;" edge="1" parent="1">
<mxGeometry relative="1" as="geometry">
<mxPoint x="483" y="480" as="sourcePoint" />
<mxPoint x="523" y="480" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="eltq8oMHbreIp49xVEPe-12" value="&lt;span id=&quot;docs-internal-guid-0e7e019a-7fff-dd0c-b0b4-b6bef8b01081&quot;&gt;&lt;span style=&quot;font-family: Arial, sans-serif; background-color: transparent; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-alternates: normal; font-variant-position: normal; font-variant-emoji: normal; vertical-align: baseline; white-space-collapse: preserve;&quot;&gt;&lt;font style=&quot;font-size: 16px;&quot;&gt;Automate schema change detection&lt;/font&gt;&lt;/span&gt;&lt;/span&gt;" style="rounded=0;whiteSpace=wrap;html=1;fontSize=16;strokeWidth=3;strokeColor=none;fontStyle=0;fillColor=#aae1da;" vertex="1" parent="1">
<mxGeometry x="525" y="440" width="145" height="80" as="geometry" />
</mxCell>
<mxCell id="eltq8oMHbreIp49xVEPe-11" value="" style="rounded=0;whiteSpace=wrap;html=1;fontSize=16;strokeWidth=3;strokeColor=#3fa298;fontStyle=0;sketch=1;curveFitting=1;jiggle=2;fillColor=none;" vertex="1" parent="1">
<mxGeometry x="525" y="440" width="145" height="80" as="geometry" />
</mxCell>
</root>
</mxGraphModel>
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 5 additions & 9 deletions docs/docusaurus/docs/cloud/schedules/manage_schedules.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: 'Manage schedules'
description: Create and manage schedules for Validations in GX Cloud.
---

Use a schedule to automate data quality checks with GX-managed Expectations. When you add your first Expectation in the GX Cloud UI for a Data Asset, we enable a default schedule for that Asset's GX-managed Expectations. By default, GX-managed Expectations are scheduled to run every 24 hours. The first run will be at the start of the next hour after you add your first Expectation in the Cloud UI. You can keep the default schedule, edit it, or disable it.
Use a schedule to automate data quality checks with GX-managed Expectations. When you add your first Expectation in the GX Cloud UI for a Data Asset, including when you choose to auto-generate Expectations to detect schema changes, we enable a default schedule for that Asset's GX-managed Expectations. By default, GX-managed Expectations are scheduled to run every 24 hours. The first run will be at the start of the next hour after you add your first Expectation in the Cloud UI. You can keep the default schedule, edit it, or disable it.

:::note Schedules are for GX-managed Expectations only
To automate data quality checks for [API-managed Expectations](/cloud/expectations/manage_expectations.md#gx-managed-vs-api-managed-expectations), use an [orchestrator](/cloud/connect/connect_airflow.md).
Expand All @@ -17,20 +17,16 @@ To automate data quality checks for [API-managed Expectations](/cloud/expectatio

2. In the **Data Assets** list, click the Data Asset name.

3. Click the **Expectations** tab.
3. In the Scheduling component, click the **Edit Schedule** icon.

4. In the Scheduling component, click the **Edit Schedule** icon.
4. Change the **Frequency** and/or the **Start time** for the first run of the new schedule.

5. Change the **Frequency** and/or the **Start time** for the first run of the new schedule.

6. Click **Save**.
5. Click **Save**.

## Disable a schedule

1. In GX Cloud, click **Data Assets**.

2. In the **Data Assets** list, click the Data Asset name.

3. Click the **Expectations** tab.

4. Pause the schedule using the toggle in the Scheduling component.
3. Pause the schedule using the toggle in the Scheduling component.
22 changes: 9 additions & 13 deletions docs/docusaurus/docs/cloud/validations/manage_validations.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,39 +22,35 @@ To run a validation for an [API-managed Expectation](/cloud/expectations/manage_

2. Click a Data Asset in the **Data Assets** list.

3. Click the **Expectations** tab.
3. Click **Validate**.

4. Click **Validate**.
4. When the confirmation message appears, click **See results**, or click the **Validations** tab and select the Validation in the **Batches & run history** pane.

5. When the confirmation message appears, click **See results**, or click the **Validations** tab and select the Validation in the **Batches & run history** pane.

6. Optional. Click **Share** to copy the URL for the Validation Results and share them with other users in your organization.
5. Optional. Click **Share** to copy the URL for the Validation Results and share them with other users in your organization.

## Run a Validation on a subset of a Data Asset

If you've [defined a Batch](/cloud/expectations/manage_expectations.md#optional-define-a-batch), you can run a Validation on the latest Batch of data, or you can select a specific year, year and month, or year, month, and day period for the Validation. If a Batch is defined, Batch information appears on the Data Asset **Overview** page and on the **Validations** page in the **Batches & run history** pane.
If you've [defined a Batch](/cloud/expectations/manage_expectations.md#optional-define-a-batch), you can run a Validation on the latest Batch of data, or you can select a specific year, year and month, or year, month, and day period for the Validation. If a Batch is defined, Batch information appears on the Data Asset **Metrics** page and on the **Validations** page in the **Batches & run history** pane.

To run a Validation for a specific Batch, do the following:

1. In GX Cloud, click **Data Assets**.

2. Click a Data Asset in the **Data Assets** list.

3. Click the **Expectations** tab.

4. Click **Validate**.
3. Click **Validate**.

5. Select one of the following options:
4. Select one of the following options:

- **Latest** - Run the Validation on the latest Batch of data.

- **Custom** - Select the **year**, **month**, or **day** to run the Validation on a Batch of data for a specific period.

6. Click **Validate**.
5. Click **Run**.

7. When the confirmation message appears, click **See results**, or click the **Validations** tab and select the Validation in the **Batches & run history** pane.
6. When the confirmation message appears, click **See results**, or click the **Validations** tab and select the Validation in the **Batches & run history** pane.

8. Optional. Click **Share** to copy the URL for the Validation Results and share them with other users in your organization.
7. Optional. Click **Share** to copy the URL for the Validation Results and share them with other users in your organization.

## View Validation run history

Expand Down
Loading

0 comments on commit c06a054

Please sign in to comment.