Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending Databricks step launcher config schema to expose policy ID for new clusters #16322

Closed
wants to merge 150 commits into from

Conversation

zyd14
Copy link
Contributor

@zyd14 zyd14 commented Sep 5, 2023

Summary & Motivation

Community request to update config schema to expose attributes that have been added to the Databricks API. Allowing users to use policy IDs helps those users enforce certain attributes be configured on a cluster, helping achieve consistency across jobs / teams and simplifying the configuration needed to launch a cluster.

How I Tested These Changes

Spun up a local Dagster instance with dagster==1.4.11 and my branch of dagster-databricks and submitted a job with a policy ID. Confirmed the policy was used in the job that was launched. Also extended the unit tests to ensure policy IDs are added to the new cluster configuration as expected

erinkcochran87 and others added 28 commits September 15, 2023 09:21
## Summary & Motivation

This PR adds the Capstone project of the Dagster Essentials module to
DagsterU.

## How I Tested These Changes

👀 , 👻

---------

Co-authored-by: Tim Castillo <[email protected]>
## Summary & Motivation

Databricks has deprecated the `dbfs` method of referencing init scripts,
and now most documentation suggests that `workspace` or `volume` init
scripts be used instead. This conversation on slack indicates community
interest in having these options enabled. I'll work on a more
full-featured PR to include some of the other additions to the
`jobs/runs/submit` endpoint this week.

## How I Tested These Changes

Spun up a local Dagster instance with `dagster==1.4.11` and my version
of `dagster-databricks` installed. Launched jobs with config referencing
init scripts in volumes and workspace. Confirmed that init scripts were
configured in cluster config

Note - this includes commit cherry-picked from [this
PR](dagster-io#16311), necessary to get
the step launcher working again.

---------

Co-authored-by: Sean Mackesey <[email protected]>
## Summary & Motivation

Use backfill page status tag in the backfill table row for asset
backfills. This is to make the status consistent for these backfills.

<img width="1125" alt="Screenshot 2023-09-01 at 12 48 47 PM"
src="https://github.com/dagster-io/dagster/assets/2823852/0d3f22f6-44bf-4833-81dd-be46b1cb9d67">
<img width="1120" alt="Screenshot 2023-09-01 at 12 35 12 PM"
src="https://github.com/dagster-io/dagster/assets/2823852/50b1b26d-d2e0-48e6-97e7-97b79e3b0497">


## How I Tested These Changes

Kick off an asset backfill, cancel it partway through. Verify that it
appears as "Complete" in the table and on the backfill page.
…ailed to hash during the scheduler tick (dagster-io#16323)

Summary:
Setting this to a set instead of a list caused it to try to hash the
namedtuple, which fails if the cron schedule is set to a list (which
does not hash) rather than a string (which does)

Test Plan:
Modified test case that was failing before, now passes

## Summary & Motivation

## How I Tested These Changes
…6318)

update the default context injector to handle direct from env var or
file loading

update the default message writer to support outputting to standard
streams or a file

## How I Tested These Changes

existing tests
…gster-io#16325)

## Summary & Motivation
Change the following:

```
dagster._core.errors.DagsterInvalidDefinitionError: resource with key 'dbt' required by op 'sca_custom_assets' was not provided. Please provide a <class 'dagster._core.definitions.resource_definition.ResourceDefinition'> to key 'dbt', or change the required key to one of the following keys which points to an <class 'dagster._core.definitions.resource_definition.ResourceDefinition'>: ['io_manager']
```

to:

```
dagster._core.errors.DagsterInvalidDefinitionError: resource with key 'dbt' required by op 'sca_custom_assets' was not provided. Please provide a ResourceDefinition to key 'dbt', or change the required key to one of the following keys which points to an ResourceDefinition: ['io_manager']
```

The "or change the required key..." should probably be changed as well.
Seeing the magic string `"io_manager"` here doesn't make sense, and
recommendations to use an IOManager key should only be made if the
current resource requirement is an io manager.

## How I Tested These Changes
local
…ster-io#16285)

## Summary & Motivation

Self-partitioned assets can cause a recursion error during stale status
resolution if there are too many partitions. This is because resolution
of the Nth partition will recurse back to the 1st partition, since this
is within its ancestor tree.

This PR adds a threshold number of dependencies beyond which no
staleness status will attempt to be derived for self-partitioned assets.
This follows the existing approach of opting out of staleness
calculation for partitions with large numbers of "horizontal"
dependencies.

See:
https://elementl-workspace.slack.com/archives/C03D094F23G/p1693573479147479

## How I Tested These Changes

New unit test.
## Summary & Motivation

This PR adds the Extra Credit lesson of the Dagster Essentials module to
DagsterU.

## How I Tested These Changes

👀 , 🕺
Add severity on AssetCheckResults, and remove for this week the ability
to make downstream assets wait on a check.

Luckily the UI is already well equipped for this, I just had to remove
the severity column:
![Screenshot 2023-09-05 at 3 47 41
PM](https://github.com/dagster-io/dagster/assets/22018973/940ba872-6004-4835-955d-78ba99079d43)

More context:
https://elementl-workspace.slack.com/archives/C05KFG9TETB/p1693929333653329
…agster-io#16317)

Summary:
The uvicorn INFO logs are actually quite verbose and logs the http
output for each request, so the default log level introduced in
https://github.com/dagster-io/dagster/pull/16239/files makes things too
spammy.

## Summary & Motivation

## How I Tested These Changes
## Summary & Motivation

Similar to the newly added "asset feature" context provider, this adds a
"job feature" context provider that allows setting the tabs on the Job
page, as well as setting a fallthrough route to handle custom pages on
the Job permalink.

## How I Tested These Changes

Verified that the Job page loads successfully in OSS. Same for snapshot
pages. Navigate to tabs, verify that routes load properly.

Verified that it also loads successfully in Cloud, including with custom
tab and route behavior.
Context:
https://elementl-workspace.slack.com/archives/C05KFG9TETB/p1693931080865119?thread_ts=1693924159.102829&cid=C05KFG9TETB

This is a breaking change that will always require `@asset_check(` to
have an `asset`. I think this isn't too bad ergonomically, though it
does mean repeating in the io manager case:

```python
@asset_check(asset=my_asset)
def my_check(my_asset):
  ...
```

At present I think this is required, because we use asset groups for
resolving keys to the fully prefixed keys, and we don't know the group
of a check until we resolve which asset it points at.
## Summary & Motivation

Use latest yarn, check in yarn.lock.

## How I Tested These Changes

`yarn`
…ild time (dagster-io#16043)

## Summary & Motivation
Adds documentation about:

- why we need a `manifest.json`
- the different options for creating a `manifest.json`
- how these options show up in `dagster-dbt project scaffold`
- our recommendations for development and production 

This removes the last references to
`load_assets_from_dbt_project/manifest`. Instead, they've been moved to
the `Relevant API's` section, with references to their alternatives.

## How I Tested These Changes
read
…ct in CI/CD (dagster-io#16212)

## Summary & Motivation
Basically includes
dagster-io#13767 as part of the
dbt reference docs.

Also include a snippet of how Dagster Cloud streamlines the monorepo
case.

## How I Tested These Changes
read
…agster-io#16290)

## Summary & Motivation

This PR provides an initial UI for asset checks on the DAG and in the
asset sidebar. It also improves the rendering of asset checks in the run
logs (linking to check, asset key, showing run metadata)

I also updated the subscription to run logs on the asset graph pages so
that the asset graph will live reload when long-running asset checks
complete.


<img width="730" alt="image"
src="https://github.com/dagster-io/dagster/assets/1037212/24e7b939-e04c-4f5c-bcf3-344313fb1bba">
<img width="1060" alt="image"
src="https://github.com/dagster-io/dagster/assets/1037212/49e11656-9f65-40d6-91cf-2998c9fcf604">
<img width="1109" alt="image"
src="https://github.com/dagster-io/dagster/assets/1037212/54cde88d-5074-484d-bb74-3ecd947a3521">



## How I Tested These Changes

I tested these changes manually using the toys repo and by updating our
storybooks for AssetNode and the asset sidebar to run through test
cases.

---------

Co-authored-by: bengotow <[email protected]>
Co-authored-by: Johann Miller <[email protected]>
## Summary & Motivation
Fixes dagster-io#16126 about enabling oauth authentication for Databricks service
principals
## How I Tested These Changes
Unit tests + manually tested that Databricks steps can be launched via
the step launcher using either a personal access token or oauth client
ID + client secret.

During this manual testing I also uncovered a few bugs around the
`submit()` call to the `databricks-sdk` which were preventing jobs from
being launched (databricks-sdk requires a [task_key
parameter](dagster-io@7d2466c),
and reading a file that doesn't exist now raises a
[DatabricksError](dagster-io@fd91538)).
Currently the step launcher is broken as-is and will not work without
these changes, so if this whole PR doesn't get merged before the next
release then those two commits should probably at least get
cherry-picked into a different PR and merged in. I should also note that
it appears that the most recent release 0.20.10 works, and that the
breaking changes must have been made since the last release.

---------

Co-authored-by: Sean Mackesey <[email protected]>
Adds a k8s pod external ext protocol implementation which injects
context via env var and sinks messages using stdout/stderr sifting - a
little greasy but I think a good option to support for an easy "out of
the box" working experience.

This PR also aligns docker to the same pattern. The file IO set-ups that
have been made to work are preserved under test, sort of demonstrating
what it would look like to do it "in user land".

`container_kwargs` in docker and `base_pod_spec` / `base_pod_meta` in
k8s have been added as catch-all escape hatch arguments that allow
passing any random thing from the orch layer through to the target
substrate.


## How I Tested These Changes

added tests
currently we hide the exceptions we raise

## How I Tested These Changes
ctrl+C
```
2023-09-06 10:07:48 -0500 - dagster - INFO - KeyboardInterrupt received
2023-09-06 10:07:48 -0500 - dagster - INFO - Shutting down Dagster services...
2023-09-06 10:07:48 -0500 - dagster.daemon - INFO - Daemon threads shut down.
2023-09-06 10:07:48 -0500 - dagster - INFO - Dagster services shut down.
```
kill daemon / webserver process

```
2023-09-06 10:08:21 -0500 - dagster - ERROR - An unexpected exception has occurred
Traceback (most recent call last):
  File "/Users/alangenfeld/dagster/python_modules/dagster/dagster/_cli/dev.py", line 192, in dev_command
    raise Exception(
Exception: dagster-daemon process shut down unexpectedly with return code 0
2023-09-06 10:08:21 -0500 - dagster - INFO - Shutting down Dagster services...
2023-09-06 10:08:21 -0500 - dagster - INFO - Dagster services shut down.
```
## Summary & Motivation

Condenses `dagster-ext` into a single file `dagster_ext/__init__.py` so
it can be vendored easily.

I didn't go all the way to just `dagster_ext.py` so that I could leave
`dagster_ext/version.py` and `dagster_ext/py.typed`. It should be just
as easy to copy-paste the file contents from `__init__.py`.

## How I Tested These Changes

Existing test suite.
…6340)

## Summary & Motivation
Same as dagster-io#14777. Let the commit
history be explicit in what is being editted in the base jaffle shop
project to support asset checks and pytest assertions against dbt
configuration.

## How I Tested These Changes
N/A
…16324)

## Summary & Motivation
As the title.

Currently, users will have to opt-in to modelling their dbt tests as
Dagster asset checks. This is because the API's for asset checks are
currently experimental, and we do not want to break users of the public
stable API's for ingesting dbt models as Dagster software-defined
assets.

So, to enable this feature, users will have to configure the following
metadata on their dbt project tests:

```yaml
# dbt_project.yml
tests:
  +meta:
    dagster:
      asset_check: True
```

Or, tests can be individually turned on using the dbt config block or
property file:
https://docs.getdbt.com/reference/test-configs#test-specific-configurations.

## How I Tested These Changes
pytest
## Summary & Motivation

duckdb.InvalidInputException: Invalid Input Error: Required module 'pandas.core.arrays.arrow.dtype' failed to import, due to the following Python exception:
ModuleNotFoundError: No module named 'pandas.core.arrays.arrow.dtype'

## How I Tested These Changes

https://elementl-workspace.slack.com/archives/C03E60FPJCU/p1694013481428789
…er-io#16342)

## Summary & Motivation
As the title.

## How I Tested These Changes
local
clairelin135 and others added 25 commits September 15, 2023 09:21
Existing context methods (`context.partition_key_range` etc.) return
`PartitionKeyRange`s, but there are no existing apidocs.

This PR adds apidocs for PartitionKeyRange. It also fixes an outdated
reference to `context.asset_partition_key_range`, which has since been
deprecated.
## Summary & Motivation

Floats are imprecise... who would have thought.

## How I Tested These Changes

Was able to repro original issue locally
## Summary & Motivation

This PR adds the "Evaluate checks" button to the right sidebar of the
asset DAG and to the asset checks page. On the asset checks page, you
can invoke them one by one or all together.

This obeys the `canLaunchPipelineExecution` permission - without it, the
button is grayed out with a tooltip explianing why.

Clicking the button gives a loading state with a spinner (similar to
"Materialize") and then you see the "Run launched" banner. Thankfully
this can use the same hook we use for other launch mutations and
triggers a live update to pull in the new states and stuff!

<img width="1412" alt="image"
src="https://github.com/dagster-io/dagster/assets/1037212/0ad52c8d-7525-448e-a4b4-3de15d51472c">

<img width="1341" alt="image"
src="https://github.com/dagster-io/dagster/assets/1037212/0dce69a7-f3ea-4dcb-979c-3c90a28e2b7c">

## How I Tested These Changes

I tested this manually with the `asset_checks` toys repo. Will write a
test or two for this as well.

---------

Co-authored-by: bengotow <[email protected]>
To populate the 'target' column of the runs page
**Test plan**
Unit, and played around with toy example. Verified that checks can fail
some assets and not others.
![Screenshot 2023-09-13 at 5 22 35
PM](https://github.com/dagster-io/dagster/assets/22018973/6f58c872-ebe8-44c3-865b-2c8380861843)
![Screenshot 2023-09-13 at 5 22 48
PM](https://github.com/dagster-io/dagster/assets/22018973/bdefa787-f4c0-4e32-8444-2ae1eb1325c5)
Thought I tested this, must not have
## Summary & Motivation

## How I Tested These Changes
…tInterval (dagster-io#16523)

## Summary & Motivation

dagster-io#16245 added opt-in batching
to `useLiveDataForAssetKeys`, but it neglected to thread the custom
refetching logic to `useQueryRefreshAtInterval`, this pr adds that.

## How I Tested These Changes
Locally tested and made sure that when the refresh fires it restarts
batching from index 0 if the batched fetch is already complete
## Summary & Motivation

I've been wanting to do this for a while: simplify the `border` prop on
`Box`.

- Allow a shorthand for specifying the border side, e.g.
`border="bottom"`. This implicitly applies `width: 1, color:
Colors.KeylineGray` for that side.
- Make `width` and `color` optional values in the `BorderSetting`
object. These will also default to `width: 1, color:
Colors.KeylineGray`.
- Replace `"horizontal"` with `"top-and-bottom"`, replace `"vertical"`
with `"left-and-right"`. I am consistently confused by which of these I
should use, so this hopefully eliminates the confusion.

I grepped aggressively to find existing callsites and swap in shorthand
where possible.

## How I Tested These Changes

View app and Storybook, verify that borders look correct throughout.
## Summary & Motivation

Resolves dagster-io#16475.

If a RepoAddress is available for a run, use it to link the
schedule/sensor in the "Launched by" column on the Runs table.

## How I Tested These Changes

View runs launched by schedules and sensors in the Runs table. Verify
that they are properly linked.

Find some runs that are launched by schedules/sensors that no longer
exist in the workspace. Verify that these are not linked.
…ter-io#16519)

## Summary & Motivation

This builds on the GraphQL in dagster-io#16510. Runs that had an
assetCheckSelection now show those asset checks in both the run list and
run details page.

<img width="1114" alt="image"
src="https://github.com/dagster-io/dagster/assets/1037212/3633b304-d6fb-473d-abcb-5e01b0bfb4d9">
<img width="704" alt="image"
src="https://github.com/dagster-io/dagster/assets/1037212/eab03c6a-1ac6-4361-b357-96553c881e9a">


## How I Tested These Changes

We have surprisingly no storybook coverage of these two run components
-- I will add some stories in a follow-up once the urgent stuff for
tomorrow's release is done!

---------

Co-authored-by: bengotow <[email protected]>
…agster-io#16492)

## Summary & Motivation

This makes three changes to how we deal with self-dependent assets with
AMP.

1. Updates to past partitions of a self-dependent asset no longer
trigger the `AutoMaterializeRule.materialize_on_parent_updated` rule for
subsequent partitions of that asset. This prevents "slow motion
backfills if a user were to update a random past partition of that asset
(kicking off a chain of computations)
2. Past partitions of a self-dependent asset are not used to determine
if a future partition of that asset is out of date or not. This means
that if a long-past partition of that asset is updated, that won't
prevent future partitions of that asset from being updated.
3. Past partitions of a self-dependent asset do not need to be updated
in order for the `AutoMaterializeRule.skip_on_not_all_parents_updated`
rule to be triggered.

This is not currently toggleable, as it's unclear if the previous
default behavior would be desirable.

## How I Tested These Changes
…odules (dagster-io#16434)

## Summary

PR stack which re-implements dagster-io#16257, in hopes of making it easier to
review.

This PR restructures the pythonic resources+config module into a few
smaller, easier to grasp files - one for the "Config" class, one for
Resources, one for IO managers, and one for misc Pydantic & Dagster
Config Type conversion utilities.

## Test Plan

Impl does not change, existing unit tests.
…6533)

Summary:
I got this right for 'dagster dev' but accidentally left it at WARNING
for 'dagster-webserver'.

Test Plan:
Run 'dagit / dagster-webserver' locally, see message about port 3000
Run `dagster dev`, unchanged

## Summary & Motivation

## How I Tested These Changes
Afaik we don't serialize this- we convert it in to an evaluation
To start, disable executing if the user code version is < 1.4.14 (the
version we're shipping with support for it tomorrow).

A follow up will add checks against checks+assets in the same op
## Summary & Motivation

Add `check_results` and `data_version` to `MaterializeResult`. This
supports streaming asset check results back from an ext process added
upstack in dagster-io#16466.

## How I Tested These Changes

New unit test.
## Summary & Motivation

- Add `dagster-ext` integration for Databricks. This is designed to
interfere with the official databricks SDK as little as possible-- you
pass in Databricks SDK data structures to `ExtDatabricks` and the only
modification it makes is injecting the necessary environment variables.
The rest of cluster config etc is left to the user. It is separate from
the rest of the databricks integration.
- Add example usage to `dagster_databricks/README.md`

## How I Tested These Changes

New unit tests (they are skipped on BK though)
## Summary & Motivation

Recently realized that we can fix the left nav of our storybooks by
explicitly putting a "path" in the `title` attribute:

Before: 
<img width="550" alt="image"
src="https://github.com/dagster-io/dagster/assets/1037212/50123418-a7a7-4e39-9ab6-ae372807ada0">


After:
<img width="544" alt="image"
src="https://github.com/dagster-io/dagster/assets/1037212/75023cd6-d304-4d3d-9b0d-7b1aafff2502">

## How I Tested These Changes

---------

Co-authored-by: bengotow <[email protected]>
…-io#16522)

Now if you click the execute button on an individual check inside a
multi asset, you'll get

![Screenshot 2023-09-14 at 3 53 59
PM](https://github.com/dagster-io/dagster/assets/22018973/0e2893f1-59d4-436f-89e7-b03e0a86a1b7)

instead of saying they don't exist.

We should gray out the execute button here (especially for dbt), but
this improves things for now. Graying it out requires deciding what
additional info we want to surface over ExternalAssetChecks.
…ter-io#16547)

## Summary & Motivation

This PR reads the new `canExecuteIndividually` value from ticks and
disables the "Execute" button as necessary.

I also changed the rendering of the checks in the asset sidebar slightly
so that they can middle truncate, and so that the status tags are right
justified so they don't move left/right slightly as you switch between
assets.

Note: If an asset hypothetically had one executable check and one
non-executable check, the Execute All button is still shown but only
executes the one that it can. I think that is ok because you can see on
the checks page that some of the execute buttons are disabled (and also
may not be a real scenario?)

## How I Tested These Changes

I updated the stories to include the disabled state so I could see this
scenario.

<img width="620" alt="image"
src="https://github.com/dagster-io/dagster/assets/1037212/df80de28-e7cf-4bc7-a318-a49b148f5ac6">

<img width="1140" alt="image"
src="https://github.com/dagster-io/dagster/assets/1037212/423a3fa4-0af2-4438-9f64-3a0262266a6c">

Co-authored-by: bengotow <[email protected]>
…o#16542)

## Summary & Motivation

Require kwargs on all params of AssetSpec, except for asset_key. This
reduces a lot of errors around parameter passing and provides us option
value to reordering params.

## How I Tested These Changes

BK
@zyd14
Copy link
Contributor Author

zyd14 commented Sep 15, 2023

I think I messed up this branch when I rebased... I'll close this and reopen with my commits cherry picked

@zyd14 zyd14 closed this Sep 15, 2023
OwenKephart pushed a commit that referenced this pull request Sep 15, 2023
## Summary & Motivation
Retry of [this PR](#16322)
where I think I messed up the git history when I was rebasing master
onto my branch
Policy IDs help enforce what configuration values a cluster is allowed
to have and help enforce consistency across an organization. Enabling
this will allow users to take advantage of this feature.

## How I Tested These Changes
Spun up a local test environment with the changed library installed via
wheel build and ran a test job using the step launcher with a policy ID
attached to the cluster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.