Skip to content

Commit

Permalink
Merge branch 'm/zelda-1183/zelda-1188/adding-databricks-types' of htt…
Browse files Browse the repository at this point in the history
…ps://github.com/great-expectations/great_expectations into m/zelda-1183/zelda-1188/adding-databricks-types

* 'm/zelda-1183/zelda-1188/adding-databricks-types' of https://github.com/great-expectations/great_expectations:
  [MAINTENANCE] Allow `CheckpointResult` and `ActionContext` to be importable from top-level checkpoint module (#10788)
  [DOCS] Custom Actions (#10772)
  [DOCS] Remove unnecessary escape character in Expectation for Gallery (#10780)
  [MAINTENANCE] Deprecate `DataContext.add_or_update_datasource` (#10784)
  • Loading branch information
Shinnnyshinshin committed Dec 18, 2024
2 parents 4525003 + c06a069 commit 58ff217
Show file tree
Hide file tree
Showing 17 changed files with 168 additions and 30 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -306,7 +306,7 @@ jobs:
# We decided to exclude all external HTTP requests but the ones that under the domain greatexpectations.io
# The reason is to avoid having network errors such as pages that throw 429 after too many requests (like Github)
# and to prevent other possible errors related to user agent or lychee capturing hrefs from metadata that don't resolve to a specific page (preconnects in JS)
args: "--exclude='http.*' --include='^https://(.+\\.)?greatexpectations\\.io/' 'docs/docusaurus/build/**/*.html'"
args: "--exclude='http.*' 'docs/docusaurus/build/**/*.html'"

docs-tests:
runs-on: ubuntu-latest
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ To ensure the long-term quality of the GX Core codebase, we're not yet ready to
| -------------------- | ------------------ | ----- |
| CredentialStore | 🟢 Ready | |
| BatchDefinition | 🟡 Partially ready | Formerly known as splitters |
| Action | 🔴 Not ready | |
| Action | 🟢 Ready | |
| DataSource | 🔴 Not ready | Includes MetricProvider and ExecutionEngine |
| DataContext | 🔴 Not ready | Also known as Configuration Stores |
| DataAsset | 🔴 Not ready | |
Expand Down
15 changes: 8 additions & 7 deletions docs/docusaurus/docs/application_integration_support.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,16 +25,17 @@ The following table defines the GX Cloud, GX Core, and Community Supported integ
| Data Sources<sup>1</sup> | Snowflake<br/>Databricks (SQL)<br/> PostgreSQL<sup>2</sup> | Snowflake<br/>Databricks (SQL)<br/>PostgreSQL<br/>SQLite<br/>BigQuery<br/>Spark<br/>Pandas | MSSQL<br/>MySQL<br/> |
| Configuration Stores<sup>3</sup> | In-app | File system | None |
| Data Doc Stores | In-app | File system | None |
| Actions | Email | Slack <br/>Email <br/>Microsoft Teams | None |
| Credential Stores | Environment variables | Environment variables <br/> YAML<sup>4</sup> | None |
| Orchestrator | Airflow <sup>5</sup> <sup>6</sup> | Airflow <sup>5</sup> <sup>6</sup> | None |
| Actions | Email | Slack <br/>Email <br/>Microsoft Teams <br/>Custom<sup>4</sup> | None |
| Credential Stores | Environment variables | Environment variables <br/> YAML<sup>5</sup> | None |
| Orchestrator | Airflow <sup>6</sup> <sup>7</sup> | Airflow <sup>6</sup> <sup>7</sup> | None |

<sup>1</sup> We've also seen GX work with the following data sources in the past but we can't guarantee ongoing compatibility. These data sources include Clickhouse, Vertica, Dremio, Teradata, Athena, EMR Spark, AWS Glue, Microsoft Fabric, Trino, Pandas on (S3, GCS, Azure), Databricks (Spark), and Spark on (S3, GCS, Azure).<br/>
<sup>2</sup> Support for BigQuery in GX Cloud will be available in a future release.<br/>
<sup>3</sup> This includes configuration storage for Expectations, Checkpoints, Validation Definitions, and Validation Result<br/>
<sup>4</sup> config_variables.yml<br/>
<sup>5</sup> Although only Airflow is supported, GX Cloud and GX Core should work with any orchestrator that executes Python code.<br/>
<sup>6</sup> Airflow version 2.9.0+ required<br/>
<sup>3</sup> This includes configuration storage for Expectations, Checkpoints, Validation Definitions, and Validation Results.<br/>
<sup>4</sup> We support the general workflow for creating custom Actions but cannot help troubleshoot the domain-specific logic within a custom Action.<br/>
<sup>5</sup> Use `config_variables.yml`.<br/>
<sup>6</sup> Although only Airflow is supported, GX Cloud and GX Core should work with any orchestrator that executes Python code.<br/>
<sup>7</sup> Airflow version 2.9.0+ required.<br/>

### GX components

Expand Down
10 changes: 10 additions & 0 deletions docs/docusaurus/docs/components/examples_under_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -439,6 +439,16 @@
# data_context_dir="",
backend_dependencies=[],
),
# Create a custom Action
IntegrationTestFixture(
# To test, run:
# pytest --docs-tests -k "docs_example_create_a_custom_action" tests/integration/test_script_runner.py
name="docs_example_create_a_custom_action",
user_flow_script="docs/docusaurus/docs/core/trigger_actions_based_on_results/_examples/create_a_custom_action.py",
data_dir="docs/docusaurus/docs/components/_testing/test_data_sets/single_test_file",
# data_context_dir="",
backend_dependencies=[],
),
# Run a Checkpoint
IntegrationTestFixture(
# To test, run:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
"""
This is an example script for how to create a custom Action.
To test, run:
pytest --docs-tests -k "docs_example_create_a_custom_action" tests/integration/test_script_runner.py
"""

# EXAMPLE SCRIPT STARTS HERE:

# <snippet name="docs/docusaurus/docs/core/trigger_actions_based_on_results/_examples/create_a_custom_action.py - full code example">

from typing import Literal

from typing_extensions import override

from great_expectations.checkpoint import (
ActionContext,
CheckpointResult,
ValidationAction,
)


# 1. Extend the `ValidationAction` class.
# <snippet name="docs/docusaurus/docs/core/trigger_actions_based_on_results/_examples/create_a_custom_action.py - extend class">
class MyCustomAction(ValidationAction):
# </snippet>

# 2. Set the `type` attribute to a unique string that identifies the Action.
# <snippet name="docs/docusaurus/docs/core/trigger_actions_based_on_results/_examples/create_a_custom_action.py - set type">
type: Literal["my_custom_action"] = "my_custom_action"
# </snippet>

# 3. Override the `run()` method to perform the desired task.
# <snippet name="docs/docusaurus/docs/core/trigger_actions_based_on_results/_examples/create_a_custom_action.py - override run">
@override
def run(
self,
checkpoint_result: CheckpointResult,
action_context: ActionContext, # Contains results from prior Actions in the same Checkpoint run.
) -> dict:
# Domain-specific logic
self._do_my_custom_action(checkpoint_result)
# Return information about the Action
return {"some": "info"}

def _do_my_custom_action(self, checkpoint_result: CheckpointResult):
# Perform custom logic based on the validation results.
...

# </snippet>


# </snippet>
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@ import PrereqValidationDefinition from '../_core_components/prerequisites/_valid

A Checkpoint executes one or more Validation Definitions and then performs a set of Actions based on the Validation Results each Validation Definition returns.

<h2>Prerequisites</h2>
## Prerequisites

- <PrereqPythonInstalled/>.
- <PrereqGxInstalled/>.
- <PrereqPreconfiguredDataContext/>. In this guide the variable `context` is assumed to contain your Data Context.
- <PrereqValidationDefinition/>.

### Procedure
## Procedure

<Tabs
queryString="procedure"
Expand All @@ -40,7 +40,7 @@ A Checkpoint executes one or more Validation Definitions and then performs a set

2. Determine the Actions that the Checkpoint will automate.

After a Checkpoint receives Validation Results from running a Validation Definition, it executes a list of Actions. The returned Validation Results determine what task is performed for each Action. Actions can include updating Data Docs with the new Validation Results or sending alerts when validations fail. The Actions list is executed once for each Validation Definition in a Checkpoint.
After a Checkpoint receives Validation Results from running a Validation Definition, it executes a list of Actions. The returned Validation Results determine what task is performed for each Action. Actions can include updating Data Docs with the new Validation Results, sending alerts when validations fail, or your own [custom logic](/core/trigger_actions_based_on_results/create_a_custom_action.md). The Actions list is executed once for each Validation Definition in a Checkpoint.

Actions can be found in the `great_expectations.checkpoint` module. All Action class names end with `*Action`.

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
title: Create a custom Action
description: Run custom logic based on Validation Results to integrate with 3rd-party tools and business workflows.
---
import TabItem from '@theme/TabItem';
import Tabs from '@theme/Tabs';

import PrereqPythonInstalled from '../_core_components/prerequisites/_python_installation.md';
import PrereqGxInstalled from '../_core_components/prerequisites/_gx_installation.md';

Great Expectations provides [Actions for common workflows](/application_integration_support.md#integrations) such as sending emails and updating Data Docs. If these don't meet your needs, you can create a custom Action to integrate with different tools or apply custom business logic based on Validation Results. Example use cases for custom Actions include:
- Opening tickets in an issue tracker when Validation runs fail.
- Triggering different webhooks depending on which Expectations fail.
- Running follow-up ETL jobs to fill in missing values.

A custom Action can do anything that can be done with Python code.

To create a custom Action, you subclass the `ValidationAction` class, overriding the `type` attribute with a unique name and the `run()` method with custom logic.


## Prerequisites

- <PrereqPythonInstalled/>.
- <PrereqGxInstalled/>.

## Procedure

<Tabs
queryString="procedure"
defaultValue="instructions"
values={[
{value: 'instructions', label: 'Instructions'},
{value: 'sample_code', label: 'Sample code'}
]}
>
<TabItem value="instructions" label="Instructions">

1. Create a new custom Action class that inherits the `ValidationAction` class.

```python title="Python" name="docs/docusaurus/docs/core/trigger_actions_based_on_results/_examples/create_a_custom_action.py - extend class"
```

2. Set a unique name for `type`.

```python title="Python" name="docs/docusaurus/docs/core/trigger_actions_based_on_results/_examples/create_a_custom_action.py - set type"
```

3. Override the `run()` method with the logic for the Action.

```python title="Python" name="docs/docusaurus/docs/core/trigger_actions_based_on_results/_examples/create_a_custom_action.py - override run"
```

</TabItem>

<TabItem value="sample_code" label="Sample code">

```python title="Python" name="docs/docusaurus/docs/core/trigger_actions_based_on_results/_examples/create_a_custom_action.py - full code example"
```

</TabItem>

</Tabs>

Now you can use your custom Action like you would any built-in Action. [Create a Checkpoint with Actions](/core/trigger_actions_based_on_results/create_a_checkpoint_with_actions.md) to start automating responses to Validation Results.
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,14 @@ import OverviewCard from '@site/src/components/OverviewCard';
to="/core/trigger_actions_based_on_results/create_a_checkpoint_with_actions"
icon="/img/expectation_icon.svg"
/>

<LinkCard
topIcon
label="Create a custom Action"
description="Define custom logic to run based on Validation Results."
to="/core/trigger_actions_based_on_results/create_a_custom_action"
icon="/img/expectation_icon.svg"
/>

<LinkCard
topIcon
Expand Down
1 change: 1 addition & 0 deletions docs/docusaurus/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ module.exports = {
link: { type: 'doc', id: 'core/trigger_actions_based_on_results/trigger_actions_based_on_results' },
items: [
{ type: 'doc', id: 'core/trigger_actions_based_on_results/create_a_checkpoint_with_actions' },
{ type: 'doc', id: 'core/trigger_actions_based_on_results/create_a_custom_action' },
{ type: 'doc', id: 'core/trigger_actions_based_on_results/choose_a_result_format/choose_a_result_format' },
{ type: 'doc', id: 'core/trigger_actions_based_on_results/run_a_checkpoint' },
]
Expand Down
2 changes: 1 addition & 1 deletion great_expectations/checkpoint/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
UpdateDataDocsAction,
ValidationAction,
)
from .checkpoint import Checkpoint
from .checkpoint import ActionContext, Checkpoint, CheckpointResult

for _module_name, _package_name in [
(".actions", "great_expectations.checkpoint"),
Expand Down
9 changes: 5 additions & 4 deletions great_expectations/checkpoint/actions.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,10 +88,11 @@ def _build_renderer(config: dict) -> Renderer:
return renderer


@public_api
class ActionContext:
"""
Shared context for all actions in a checkpoint run.
Note that order matters in the action list, as the context is updated with each action's result.
Shared context for all Actions in a Checkpoint run.
Note that order matters in the Action list, as the context is updated with each Action's result.
"""

def __init__(self) -> None:
Expand Down Expand Up @@ -182,10 +183,10 @@ def __new__(cls, clsname, bases, attrs):
@public_api
class ValidationAction(BaseModel, metaclass=MetaValidationAction):
"""
ValidationActions define a set of steps to be run after a validation result is produced.
Actions define a set of steps to run after a Validation Result is produced. Subclass `ValidationAction` to create a [custom Action](/docs/core/trigger_actions_based_on_results/create_a_custom_action).
Through a Checkpoint, one can orchestrate the validation of data and configure notifications, data documentation updates,
and other actions to take place after the validation result is produced.
and other actions to take place after the Validation Result is produced.
""" # noqa: E501

class Config:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import great_expectations as gx
import great_expectations.exceptions as gx_exceptions
from great_expectations._docs_decorators import (
deprecated_method_or_class,
new_argument,
new_method_or_class,
public_api,
Expand Down Expand Up @@ -761,6 +762,7 @@ def add_or_update_datasource(
...

@new_method_or_class(version="0.15.48")
@deprecated_method_or_class(version="1.3.0")
def add_or_update_datasource(
self,
name: str | None = None,
Expand All @@ -778,6 +780,12 @@ def add_or_update_datasource(
Returns:
The Datasource added or updated by the input `kwargs`.
""" # noqa: E501
# deprecated-v1.3.0
warnings.warn(
"add_or_update_datasource() from the DataContext is deprecated and will be removed "
"in a future version of GX. Please use `context.data_sources.add_or_update` instead.",
category=DeprecationWarning,
)
self._validate_add_datasource_args(name=name, datasource=datasource)
return_datasource: FluentDatasource

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def datasource(
datasource.name == new_datasource_name
), "The datasource was not updated in the previous method call."
datasource.name = datasource_name
datasource = context.add_or_update_datasource( # type: ignore[assignment]
datasource = context.data_sources.add_or_update_pandas(
datasource=datasource,
)
assert (
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ def datasource(
), "The datasource was not updated in the previous method call."

datasource.base_directory = original_base_dir
datasource = context.add_or_update_datasource(datasource=datasource) # type: ignore[assignment]
datasource = context.data_sources.add_or_update_pandas_filesystem(datasource=datasource)
assert (
datasource.base_directory == original_base_dir
), "The datasource was not updated in the previous method call."
Expand Down
10 changes: 1 addition & 9 deletions tests/integration/cloud/end_to_end/test_spark_datasource.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,22 +36,14 @@ def datasource(
persist=True,
)
datasource.persist = False
datasource = context.data_sources.add_or_update_spark(datasource=datasource) # type: ignore[call-arg]
assert (
datasource.persist is False
), "The datasource was not updated in the previous method call."
datasource.persist = True
datasource = context.add_or_update_datasource(datasource=datasource) # type: ignore[assignment]
assert datasource.persist is True, "The datasource was not updated in the previous method call."
datasource.persist = False
datasource_dict = datasource.dict()
datasource = context.data_sources.add_or_update_spark(**datasource_dict)
assert (
datasource.persist is False
), "The datasource was not updated in the previous method call."
datasource.persist = True
datasource_dict = datasource.dict()
datasource = context.add_or_update_datasource(**datasource_dict) # type: ignore[assignment]
datasource = context.data_sources.add_or_update_spark(**datasource_dict)
assert datasource.persist is True, "The datasource was not updated in the previous method call."
return datasource

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ def datasource(
), "The datasource was not updated in the previous method call."

datasource.base_directory = normalize_directory_path(original_base_dir, context.root_directory)
datasource = context.add_or_update_datasource(datasource=datasource) # type: ignore[assignment]
datasource = context.data_sources.add_or_update_spark_filesystem(datasource=datasource)
assert (
datasource.base_directory == original_base_dir
), "The datasource was not updated in the previous method call."
Expand Down
1 change: 0 additions & 1 deletion tests/test_deprecation.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ def files_with_deprecation_warnings() -> List[str]:
"great_expectations/compatibility/pyspark.py",
"great_expectations/compatibility/sqlalchemy_and_pandas.py",
"great_expectations/compatibility/sqlalchemy_compatibility_wrappers.py",
"great_expectations/rule_based_profiler/altair/encodings.py", # ignoring because of imprecise matching logic # noqa: E501
]
for file_to_exclude in files_to_exclude:
if file_to_exclude in files:
Expand Down

0 comments on commit 58ff217

Please sign in to comment.