-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Demo Scripts - Quickstart, Idempotency #9002
Conversation
❌ Deploy Preview for niobium-lead-7998 failed.
|
# <snippet name="tutorials/quickstart/quickstart.py connect_to_data"> | ||
batch = context.sources.pandas_default.read_csv( | ||
"https://raw.githubusercontent.com/great-expectations/gx_tutorials/main/data/yellow_tripdata_sample_2019-01.csv" | ||
) | ||
|
||
# What happened in the background? | ||
# Datsource: "default_pandas_datasource" | ||
# Asset: "#ephemeral_pandas_asset" -- CSVAsset (path lives here) | ||
# BatchConfig (Splitters): No Splitters | ||
# BatchOptions: (none needed) | ||
|
||
# TODO: ticket We can also use a SQL query as a data source | ||
context.sources.add_postgresql( | ||
name="postgresql", connection_string="postgresql://localhost" | ||
) | ||
batch = context.sources.postgresql.query_batch("SELECT * FROM taxi LIMIT 1000") | ||
# </snippet> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original thought was to have the connect to data code, along with any related testing/verification, wrapped in a function that returns whatever is needed for the rest of the script (one function for each datasource demonstrated), and then call each of those functions once before proceeding with the rest of the example script.
Because snippets automatically truncate excess spacing when they are imported into the docs, it won't matter that there is "extra" indentation from the snippet existing in a function rather than in the top level namespace, at least insofar as the docs are concerned.
Then, we would just have each snippet named something like:
snippet name="tutorials/quickstart/quickstart.py connect_to_data <specific datasource>"
and reference the appropriate <specific datasource>
in each tab in the docs.
# <snippet name="tutorials/quickstart/quickstart.py connect_to_data"> | ||
batch = context.sources.pandas_default.read_csv( | ||
"https://raw.githubusercontent.com/great-expectations/gx_tutorials/main/data/yellow_tripdata_sample_2019-01.csv" | ||
) | ||
|
||
# What happened in the background? | ||
# Datsource: "default_pandas_datasource" | ||
# Asset: "#ephemeral_pandas_asset" -- CSVAsset (path lives here) | ||
# BatchConfig (Splitters): No Splitters | ||
# BatchOptions: (none needed) | ||
|
||
# TODO: ticket We can also use a SQL query as a data source | ||
context.sources.add_postgresql( | ||
name="postgresql", connection_string="postgresql://localhost" | ||
) | ||
batch = context.sources.postgresql.query_batch("SELECT * FROM taxi LIMIT 1000") | ||
# </snippet> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example:
def connect_to_data_pandas_csv()
# <snippet name="tutorials/quickstart/quickstart.py connect_to_data pandas_csv">
batch = context.sources.pandas_default.read_csv(
"https://raw.githubusercontent.com/great-expectations/gx_tutorials/main/data/yellow_tripdata_sample_2019-01.csv"
)
</snippet>
return batch
batch = connect_to_data_pandas_csv()
# BatchOptions: (none needed) | ||
|
||
batch = context.sources.postgresql.add_query_asset( | ||
name="top1000" "SELECT * FROM taxi LIMIT 1000" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing a comma between the name and the query.
def get_quickstart_batch(datasource_type: QuickstartDatasourceTabs) -> Batch: | ||
if datasource_type == QuickstartDatasourceTabs.PANDAS_DEFAULT: | ||
# <snippet name="tutorials/quickstart/quickstart.py connect_to_data pandas_csv"> | ||
batch = context.sources.pandas_default.read_csv( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update read_csv to return a batch.
tests/integration/docusaurus/tutorials/quickstart/quickstart_v1__1.py
Outdated
Show resolved
Hide resolved
tests/integration/docusaurus/tutorials/quickstart/quickstart_v1__1.py
Outdated
Show resolved
Hide resolved
tests/integration/docusaurus/tutorials/quickstart/quickstart_v1__1.py
Outdated
Show resolved
Hide resolved
tests/integration/docusaurus/tutorials/quickstart/quickstart_v1__1.py
Outdated
Show resolved
Hide resolved
|
||
# <snippet name="tutorials/quickstart/quickstart.py import_gx"> | ||
import great_expectations as gx | ||
import great_expectations.expectations as gxe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to move the expectations into this namespace.
tests/integration/docusaurus/tutorials/quickstart/quickstart_v1__1.py
Outdated
Show resolved
Hide resolved
bc04619
to
334ea64
Compare
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: T Pham <[email protected]>
…cusaurus (#9214) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: T Pham <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: T Pham <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…Creating an Asset (#9240)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Anthony Burdi <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Bill Dirks <[email protected]>
Co-authored-by: William Shin <[email protected]>
Co-authored-by: Allen Sallinger <[email protected]>
…in Expectation and ExpectationSuite (#9270)
Closing. This has all been merged into docs / updated based on subsequent timber work. |
invoke lint
(usesblack
+ruff
)For more information about contributing, see Contribute.
After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!