-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extending Databricks step launcher config schema to expose policy ID for new clusters #16322
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## Summary & Motivation This PR adds the Capstone project of the Dagster Essentials module to DagsterU. ## How I Tested These Changes 👀 , 👻 --------- Co-authored-by: Tim Castillo <[email protected]>
## Summary & Motivation Databricks has deprecated the `dbfs` method of referencing init scripts, and now most documentation suggests that `workspace` or `volume` init scripts be used instead. This conversation on slack indicates community interest in having these options enabled. I'll work on a more full-featured PR to include some of the other additions to the `jobs/runs/submit` endpoint this week. ## How I Tested These Changes Spun up a local Dagster instance with `dagster==1.4.11` and my version of `dagster-databricks` installed. Launched jobs with config referencing init scripts in volumes and workspace. Confirmed that init scripts were configured in cluster config Note - this includes commit cherry-picked from [this PR](dagster-io#16311), necessary to get the step launcher working again. --------- Co-authored-by: Sean Mackesey <[email protected]>
## Summary & Motivation Use backfill page status tag in the backfill table row for asset backfills. This is to make the status consistent for these backfills. <img width="1125" alt="Screenshot 2023-09-01 at 12 48 47 PM" src="https://github.com/dagster-io/dagster/assets/2823852/0d3f22f6-44bf-4833-81dd-be46b1cb9d67"> <img width="1120" alt="Screenshot 2023-09-01 at 12 35 12 PM" src="https://github.com/dagster-io/dagster/assets/2823852/50b1b26d-d2e0-48e6-97e7-97b79e3b0497"> ## How I Tested These Changes Kick off an asset backfill, cancel it partway through. Verify that it appears as "Complete" in the table and on the backfill page.
…ailed to hash during the scheduler tick (dagster-io#16323) Summary: Setting this to a set instead of a list caused it to try to hash the namedtuple, which fails if the cron schedule is set to a list (which does not hash) rather than a string (which does) Test Plan: Modified test case that was failing before, now passes ## Summary & Motivation ## How I Tested These Changes
…6318) update the default context injector to handle direct from env var or file loading update the default message writer to support outputting to standard streams or a file ## How I Tested These Changes existing tests
…gster-io#16325) ## Summary & Motivation Change the following: ``` dagster._core.errors.DagsterInvalidDefinitionError: resource with key 'dbt' required by op 'sca_custom_assets' was not provided. Please provide a <class 'dagster._core.definitions.resource_definition.ResourceDefinition'> to key 'dbt', or change the required key to one of the following keys which points to an <class 'dagster._core.definitions.resource_definition.ResourceDefinition'>: ['io_manager'] ``` to: ``` dagster._core.errors.DagsterInvalidDefinitionError: resource with key 'dbt' required by op 'sca_custom_assets' was not provided. Please provide a ResourceDefinition to key 'dbt', or change the required key to one of the following keys which points to an ResourceDefinition: ['io_manager'] ``` The "or change the required key..." should probably be changed as well. Seeing the magic string `"io_manager"` here doesn't make sense, and recommendations to use an IOManager key should only be made if the current resource requirement is an io manager. ## How I Tested These Changes local
![Screenshot 2023-09-05 at 12 46 06 PM](https://github.com/dagster-io/dagster/assets/22018973/87ba8562-a70b-4ff8-9b3f-9b59e2ee50b1)
…ster-io#16285) ## Summary & Motivation Self-partitioned assets can cause a recursion error during stale status resolution if there are too many partitions. This is because resolution of the Nth partition will recurse back to the 1st partition, since this is within its ancestor tree. This PR adds a threshold number of dependencies beyond which no staleness status will attempt to be derived for self-partitioned assets. This follows the existing approach of opting out of staleness calculation for partitions with large numbers of "horizontal" dependencies. See: https://elementl-workspace.slack.com/archives/C03D094F23G/p1693573479147479 ## How I Tested These Changes New unit test.
## Summary & Motivation This PR adds the Extra Credit lesson of the Dagster Essentials module to DagsterU. ## How I Tested These Changes 👀 , 🕺
Add severity on AssetCheckResults, and remove for this week the ability to make downstream assets wait on a check. Luckily the UI is already well equipped for this, I just had to remove the severity column: ![Screenshot 2023-09-05 at 3 47 41 PM](https://github.com/dagster-io/dagster/assets/22018973/940ba872-6004-4835-955d-78ba99079d43) More context: https://elementl-workspace.slack.com/archives/C05KFG9TETB/p1693929333653329
…agster-io#16317) Summary: The uvicorn INFO logs are actually quite verbose and logs the http output for each request, so the default log level introduced in https://github.com/dagster-io/dagster/pull/16239/files makes things too spammy. ## Summary & Motivation ## How I Tested These Changes
## Summary & Motivation Similar to the newly added "asset feature" context provider, this adds a "job feature" context provider that allows setting the tabs on the Job page, as well as setting a fallthrough route to handle custom pages on the Job permalink. ## How I Tested These Changes Verified that the Job page loads successfully in OSS. Same for snapshot pages. Navigate to tabs, verify that routes load properly. Verified that it also loads successfully in Cloud, including with custom tab and route behavior.
Context: https://elementl-workspace.slack.com/archives/C05KFG9TETB/p1693931080865119?thread_ts=1693924159.102829&cid=C05KFG9TETB This is a breaking change that will always require `@asset_check(` to have an `asset`. I think this isn't too bad ergonomically, though it does mean repeating in the io manager case: ```python @asset_check(asset=my_asset) def my_check(my_asset): ... ``` At present I think this is required, because we use asset groups for resolving keys to the fully prefixed keys, and we don't know the group of a check until we resolve which asset it points at.
## Summary & Motivation Use latest yarn, check in yarn.lock. ## How I Tested These Changes `yarn`
…ild time (dagster-io#16043) ## Summary & Motivation Adds documentation about: - why we need a `manifest.json` - the different options for creating a `manifest.json` - how these options show up in `dagster-dbt project scaffold` - our recommendations for development and production This removes the last references to `load_assets_from_dbt_project/manifest`. Instead, they've been moved to the `Relevant API's` section, with references to their alternatives. ## How I Tested These Changes read
…ct in CI/CD (dagster-io#16212) ## Summary & Motivation Basically includes dagster-io#13767 as part of the dbt reference docs. Also include a snippet of how Dagster Cloud streamlines the monorepo case. ## How I Tested These Changes read
…agster-io#16290) ## Summary & Motivation This PR provides an initial UI for asset checks on the DAG and in the asset sidebar. It also improves the rendering of asset checks in the run logs (linking to check, asset key, showing run metadata) I also updated the subscription to run logs on the asset graph pages so that the asset graph will live reload when long-running asset checks complete. <img width="730" alt="image" src="https://github.com/dagster-io/dagster/assets/1037212/24e7b939-e04c-4f5c-bcf3-344313fb1bba"> <img width="1060" alt="image" src="https://github.com/dagster-io/dagster/assets/1037212/49e11656-9f65-40d6-91cf-2998c9fcf604"> <img width="1109" alt="image" src="https://github.com/dagster-io/dagster/assets/1037212/54cde88d-5074-484d-bb74-3ecd947a3521"> ## How I Tested These Changes I tested these changes manually using the toys repo and by updating our storybooks for AssetNode and the asset sidebar to run through test cases. --------- Co-authored-by: bengotow <[email protected]> Co-authored-by: Johann Miller <[email protected]>
## Summary & Motivation Fixes dagster-io#16126 about enabling oauth authentication for Databricks service principals ## How I Tested These Changes Unit tests + manually tested that Databricks steps can be launched via the step launcher using either a personal access token or oauth client ID + client secret. During this manual testing I also uncovered a few bugs around the `submit()` call to the `databricks-sdk` which were preventing jobs from being launched (databricks-sdk requires a [task_key parameter](dagster-io@7d2466c), and reading a file that doesn't exist now raises a [DatabricksError](dagster-io@fd91538)). Currently the step launcher is broken as-is and will not work without these changes, so if this whole PR doesn't get merged before the next release then those two commits should probably at least get cherry-picked into a different PR and merged in. I should also note that it appears that the most recent release 0.20.10 works, and that the breaking changes must have been made since the last release. --------- Co-authored-by: Sean Mackesey <[email protected]>
Adds a k8s pod external ext protocol implementation which injects context via env var and sinks messages using stdout/stderr sifting - a little greasy but I think a good option to support for an easy "out of the box" working experience. This PR also aligns docker to the same pattern. The file IO set-ups that have been made to work are preserved under test, sort of demonstrating what it would look like to do it "in user land". `container_kwargs` in docker and `base_pod_spec` / `base_pod_meta` in k8s have been added as catch-all escape hatch arguments that allow passing any random thing from the orch layer through to the target substrate. ## How I Tested These Changes added tests
currently we hide the exceptions we raise ## How I Tested These Changes ctrl+C ``` 2023-09-06 10:07:48 -0500 - dagster - INFO - KeyboardInterrupt received 2023-09-06 10:07:48 -0500 - dagster - INFO - Shutting down Dagster services... 2023-09-06 10:07:48 -0500 - dagster.daemon - INFO - Daemon threads shut down. 2023-09-06 10:07:48 -0500 - dagster - INFO - Dagster services shut down. ``` kill daemon / webserver process ``` 2023-09-06 10:08:21 -0500 - dagster - ERROR - An unexpected exception has occurred Traceback (most recent call last): File "/Users/alangenfeld/dagster/python_modules/dagster/dagster/_cli/dev.py", line 192, in dev_command raise Exception( Exception: dagster-daemon process shut down unexpectedly with return code 0 2023-09-06 10:08:21 -0500 - dagster - INFO - Shutting down Dagster services... 2023-09-06 10:08:21 -0500 - dagster - INFO - Dagster services shut down. ```
## Summary & Motivation Condenses `dagster-ext` into a single file `dagster_ext/__init__.py` so it can be vendored easily. I didn't go all the way to just `dagster_ext.py` so that I could leave `dagster_ext/version.py` and `dagster_ext/py.typed`. It should be just as easy to copy-paste the file contents from `__init__.py`. ## How I Tested These Changes Existing test suite.
…6340) ## Summary & Motivation Same as dagster-io#14777. Let the commit history be explicit in what is being editted in the base jaffle shop project to support asset checks and pytest assertions against dbt configuration. ## How I Tested These Changes N/A
…16324) ## Summary & Motivation As the title. Currently, users will have to opt-in to modelling their dbt tests as Dagster asset checks. This is because the API's for asset checks are currently experimental, and we do not want to break users of the public stable API's for ingesting dbt models as Dagster software-defined assets. So, to enable this feature, users will have to configure the following metadata on their dbt project tests: ```yaml # dbt_project.yml tests: +meta: dagster: asset_check: True ``` Or, tests can be individually turned on using the dbt config block or property file: https://docs.getdbt.com/reference/test-configs#test-specific-configurations. ## How I Tested These Changes pytest
## Summary & Motivation duckdb.InvalidInputException: Invalid Input Error: Required module 'pandas.core.arrays.arrow.dtype' failed to import, due to the following Python exception: ModuleNotFoundError: No module named 'pandas.core.arrays.arrow.dtype' ## How I Tested These Changes https://elementl-workspace.slack.com/archives/C03E60FPJCU/p1694013481428789
…er-io#16342) ## Summary & Motivation As the title. ## How I Tested These Changes local
Existing context methods (`context.partition_key_range` etc.) return `PartitionKeyRange`s, but there are no existing apidocs. This PR adds apidocs for PartitionKeyRange. It also fixes an outdated reference to `context.asset_partition_key_range`, which has since been deprecated.
## Summary & Motivation Floats are imprecise... who would have thought. ## How I Tested These Changes Was able to repro original issue locally
## Summary & Motivation This PR adds the "Evaluate checks" button to the right sidebar of the asset DAG and to the asset checks page. On the asset checks page, you can invoke them one by one or all together. This obeys the `canLaunchPipelineExecution` permission - without it, the button is grayed out with a tooltip explianing why. Clicking the button gives a loading state with a spinner (similar to "Materialize") and then you see the "Run launched" banner. Thankfully this can use the same hook we use for other launch mutations and triggers a live update to pull in the new states and stuff! <img width="1412" alt="image" src="https://github.com/dagster-io/dagster/assets/1037212/0ad52c8d-7525-448e-a4b4-3de15d51472c"> <img width="1341" alt="image" src="https://github.com/dagster-io/dagster/assets/1037212/0dce69a7-f3ea-4dcb-979c-3c90a28e2b7c"> ## How I Tested These Changes I tested this manually with the `asset_checks` toys repo. Will write a test or two for this as well. --------- Co-authored-by: bengotow <[email protected]>
To populate the 'target' column of the runs page
**Test plan** Unit, and played around with toy example. Verified that checks can fail some assets and not others. ![Screenshot 2023-09-13 at 5 22 35 PM](https://github.com/dagster-io/dagster/assets/22018973/6f58c872-ebe8-44c3-865b-2c8380861843) ![Screenshot 2023-09-13 at 5 22 48 PM](https://github.com/dagster-io/dagster/assets/22018973/bdefa787-f4c0-4e32-8444-2ae1eb1325c5)
Thought I tested this, must not have
## Summary & Motivation ## How I Tested These Changes
…tInterval (dagster-io#16523) ## Summary & Motivation dagster-io#16245 added opt-in batching to `useLiveDataForAssetKeys`, but it neglected to thread the custom refetching logic to `useQueryRefreshAtInterval`, this pr adds that. ## How I Tested These Changes Locally tested and made sure that when the refresh fires it restarts batching from index 0 if the batched fetch is already complete
## Summary & Motivation I've been wanting to do this for a while: simplify the `border` prop on `Box`. - Allow a shorthand for specifying the border side, e.g. `border="bottom"`. This implicitly applies `width: 1, color: Colors.KeylineGray` for that side. - Make `width` and `color` optional values in the `BorderSetting` object. These will also default to `width: 1, color: Colors.KeylineGray`. - Replace `"horizontal"` with `"top-and-bottom"`, replace `"vertical"` with `"left-and-right"`. I am consistently confused by which of these I should use, so this hopefully eliminates the confusion. I grepped aggressively to find existing callsites and swap in shorthand where possible. ## How I Tested These Changes View app and Storybook, verify that borders look correct throughout.
## Summary & Motivation Resolves dagster-io#16475. If a RepoAddress is available for a run, use it to link the schedule/sensor in the "Launched by" column on the Runs table. ## How I Tested These Changes View runs launched by schedules and sensors in the Runs table. Verify that they are properly linked. Find some runs that are launched by schedules/sensors that no longer exist in the workspace. Verify that these are not linked.
…ter-io#16519) ## Summary & Motivation This builds on the GraphQL in dagster-io#16510. Runs that had an assetCheckSelection now show those asset checks in both the run list and run details page. <img width="1114" alt="image" src="https://github.com/dagster-io/dagster/assets/1037212/3633b304-d6fb-473d-abcb-5e01b0bfb4d9"> <img width="704" alt="image" src="https://github.com/dagster-io/dagster/assets/1037212/eab03c6a-1ac6-4361-b357-96553c881e9a"> ## How I Tested These Changes We have surprisingly no storybook coverage of these two run components -- I will add some stories in a follow-up once the urgent stuff for tomorrow's release is done! --------- Co-authored-by: bengotow <[email protected]>
…agster-io#16492) ## Summary & Motivation This makes three changes to how we deal with self-dependent assets with AMP. 1. Updates to past partitions of a self-dependent asset no longer trigger the `AutoMaterializeRule.materialize_on_parent_updated` rule for subsequent partitions of that asset. This prevents "slow motion backfills if a user were to update a random past partition of that asset (kicking off a chain of computations) 2. Past partitions of a self-dependent asset are not used to determine if a future partition of that asset is out of date or not. This means that if a long-past partition of that asset is updated, that won't prevent future partitions of that asset from being updated. 3. Past partitions of a self-dependent asset do not need to be updated in order for the `AutoMaterializeRule.skip_on_not_all_parents_updated` rule to be triggered. This is not currently toggleable, as it's unclear if the previous default behavior would be desirable. ## How I Tested These Changes
…odules (dagster-io#16434) ## Summary PR stack which re-implements dagster-io#16257, in hopes of making it easier to review. This PR restructures the pythonic resources+config module into a few smaller, easier to grasp files - one for the "Config" class, one for Resources, one for IO managers, and one for misc Pydantic & Dagster Config Type conversion utilities. ## Test Plan Impl does not change, existing unit tests.
…6533) Summary: I got this right for 'dagster dev' but accidentally left it at WARNING for 'dagster-webserver'. Test Plan: Run 'dagit / dagster-webserver' locally, see message about port 3000 Run `dagster dev`, unchanged ## Summary & Motivation ## How I Tested These Changes
Afaik we don't serialize this- we convert it in to an evaluation
To start, disable executing if the user code version is < 1.4.14 (the version we're shipping with support for it tomorrow). A follow up will add checks against checks+assets in the same op
## Summary & Motivation It should be imported from react-router-dom and not react-router. We might want to restrict react-router imports # Test plan Deployed this to dogfood and confirmed [the issue ](https://app.datadoghq.com/rum/sessions?query=%40application.id%3A7edc09bb-51d0-4555-90cd-b19ca483d1fe%20%40type%3Aerror&agg_m=count&agg_q=%40issue.id&agg_t=count&cols=&event=AgAAAYqV4UuxuusyKQAAAAAAAAAYAAAAAEFZcVY0YTFMQUFBcXJjLWNWcF9NTEFBRAAAACQAAAAAMDE4YTk1ZjAtZjljOC00M2UyLWI4YTktOWI4ZTMwZmM3Zjkw&tab=error&from_ts=1678802636052&to_ts=1679407436052&live=true) is fixed
## Summary & Motivation Add `check_results` and `data_version` to `MaterializeResult`. This supports streaming asset check results back from an ext process added upstack in dagster-io#16466. ## How I Tested These Changes New unit test.
## Summary & Motivation - Add `dagster-ext` integration for Databricks. This is designed to interfere with the official databricks SDK as little as possible-- you pass in Databricks SDK data structures to `ExtDatabricks` and the only modification it makes is injecting the necessary environment variables. The rest of cluster config etc is left to the user. It is separate from the rest of the databricks integration. - Add example usage to `dagster_databricks/README.md` ## How I Tested These Changes New unit tests (they are skipped on BK though)
## Summary & Motivation Recently realized that we can fix the left nav of our storybooks by explicitly putting a "path" in the `title` attribute: Before: <img width="550" alt="image" src="https://github.com/dagster-io/dagster/assets/1037212/50123418-a7a7-4e39-9ab6-ae372807ada0"> After: <img width="544" alt="image" src="https://github.com/dagster-io/dagster/assets/1037212/75023cd6-d304-4d3d-9b0d-7b1aafff2502"> ## How I Tested These Changes --------- Co-authored-by: bengotow <[email protected]>
…-io#16522) Now if you click the execute button on an individual check inside a multi asset, you'll get ![Screenshot 2023-09-14 at 3 53 59 PM](https://github.com/dagster-io/dagster/assets/22018973/0e2893f1-59d4-436f-89e7-b03e0a86a1b7) instead of saying they don't exist. We should gray out the execute button here (especially for dbt), but this improves things for now. Graying it out requires deciding what additional info we want to surface over ExternalAssetChecks.
…ter-io#16547) ## Summary & Motivation This PR reads the new `canExecuteIndividually` value from ticks and disables the "Execute" button as necessary. I also changed the rendering of the checks in the asset sidebar slightly so that they can middle truncate, and so that the status tags are right justified so they don't move left/right slightly as you switch between assets. Note: If an asset hypothetically had one executable check and one non-executable check, the Execute All button is still shown but only executes the one that it can. I think that is ok because you can see on the checks page that some of the execute buttons are disabled (and also may not be a real scenario?) ## How I Tested These Changes I updated the stories to include the disabled state so I could see this scenario. <img width="620" alt="image" src="https://github.com/dagster-io/dagster/assets/1037212/df80de28-e7cf-4bc7-a318-a49b148f5ac6"> <img width="1140" alt="image" src="https://github.com/dagster-io/dagster/assets/1037212/423a3fa4-0af2-4438-9f64-3a0262266a6c"> Co-authored-by: bengotow <[email protected]>
…o#16542) ## Summary & Motivation Require kwargs on all params of AssetSpec, except for asset_key. This reduces a lot of errors around parameter passing and provides us option value to reordering params. ## How I Tested These Changes BK
I think I messed up this branch when I rebased... I'll close this and reopen with my commits cherry picked |
OwenKephart
pushed a commit
that referenced
this pull request
Sep 15, 2023
## Summary & Motivation Retry of [this PR](#16322) where I think I messed up the git history when I was rebasing master onto my branch Policy IDs help enforce what configuration values a cluster is allowed to have and help enforce consistency across an organization. Enabling this will allow users to take advantage of this feature. ## How I Tested These Changes Spun up a local test environment with the changed library installed via wheel build and ran a test job using the step launcher with a policy ID attached to the cluster.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary & Motivation
Community request to update config schema to expose attributes that have been added to the Databricks API. Allowing users to use policy IDs helps those users enforce certain attributes be configured on a cluster, helping achieve consistency across jobs / teams and simplifying the configuration needed to launch a cluster.
How I Tested These Changes
Spun up a local Dagster instance with
dagster==1.4.11
and my branch ofdagster-databricks
and submitted a job with a policy ID. Confirmed the policy was used in the job that was launched. Also extended the unit tests to ensure policy IDs are added to the new cluster configuration as expected