[RFC] consolidate partition methods on the context #16255

jamiedemaria · 2023-08-31T21:18:53Z

Summary & Motivation

With the move away from I/O managers, inputs, and outputs, there is now a need for more generic partition methods on the context. In a I/O manager-less world, it doesn't make sense to call asset_partition_key_for_input as there is no input.

When discussing the AssetDep API for specifying PartitionMappings for AssetSpecs, Alex, Sandy, Yuhan, and I also discussed the need for partition context methods that didn't use input/output terminology.

In a parallel discussion Pete brought up similar concerns about the methods on the context. Based on this discussion, I think adding more general versions of the partition context methods would only add to confusion around which partition method to use when.

The PR explores what it would look like to completely remove the input/output partition methods and replace them with methods that take CoercibleToAssetKey instead. This is still in progress and lots of tests are failing, but i'm working on fixing them

Feedback is appreciated!

Some things to note:

names are all placeholders - if you think a different naming scheme would be better, please let me know
The case when someone has a long or complicated AssetKey (ie lots of prefixes) becomes harder to deal with when there are no input/output methods. I think a solution to this would be to expand the type signature of the more generic methods to be Union[CoercibleToAssetKey, AssetSpec] and maybe include SourceAsset and AssetsDefinition in there as well (although I'm not sure what the consequences of including the later two would be)
I'm also looking at the ExtContext for dagster-ext to try to make a single interface that both contexts implement. So the naming conventions are subject to change based on that interface

How I Tested These Changes

Not updating any unit tests in this PR to prove that the old versions of the methods work in the same way. I'm going to write a stacked PR that will change all of the callsites in our code base to the new methods to test those methods

jamiedemaria · 2023-08-31T21:19:05Z

Current dependencies on/for this PR:

master
- PR AssetDep for using PartitionMappings with AssetSpecs #16207
  - PR [small] re-organize partition methods to make review of RFC easier #16365
    - PR [RFC] consolidate partition methods on the context #16255 👈

This comment was auto-generated by Graphite.

jamiedemaria · 2023-09-06T20:06:27Z

python_modules/dagster/dagster/_core/execution/context/compute.py


    @public
-    def asset_partitions_time_window_for_output(self, output_name: str = "result") -> TimeWindow:
-        """The time window for the partitions of the output asset.
+    def partitions_time_window_for_asset(


kinda odd to me that this one is plural partitions. Any objection to renaming partition_time_window_for_asset?

Need to keep the old one around and deprecate it

the old one being asset_partitions_time_window_for_output? I don't think we should rename that, and it's been moved to the bottom of the file and marked deprecated.

ah I misread what was going on 👍🏻

yeah moving the methods around in the file made the diff really annoying. I'll stack a PR before this one that just moves the methods to clean things up

ok stacked a temp PR before this one that should hopefully make the diff a bit better to read. It's still pretty beefy, but at least GitHub is showing the changes in a way that's truer to what the PR is doing

github-actions · 2023-09-06T20:28:28Z

Deploy preview for dagster-docs ready!

Preview available at https://dagster-docs-syr53iqsz-elementl.vercel.app
https://jamie-partition-context-methods.dagster.dagster-docs.io

Direct link to changed pages:

alangenfeld · 2023-09-06T20:48:58Z

python_modules/dagster/dagster/_core/execution/context/compute.py

+        """
+        asset_key = self.asset_key_for_output(output_name)
+        return self.partitions_time_window_for_asset(asset_key)
+
+    @deprecated(
+        breaking_version="2.0",
+        additional_warn_text="Use `partitions_time_window_for_asset` instead.",
+    )
+    @public
+    def asset_partitions_time_window_for_input(self, input_name: str = "result") -> TimeWindow:
+        """Deprecated. Use partitions_time_window_for_asset instead.
+
+        The time window for the partitions of the input asset.
+
+        Raises an error if either of the following are true:
+        - The input asset has no partitioning.
+        - The input asset is not partitioned with a TimeWindowPartitionsDefinition or a
+        MultiPartitionsDefinition with one time-partitioned dimension.
+        """
+        asset_key = self.asset_key_for_input(input_name)
+        return self.partitions_time_window_for_asset(asset_key, is_dependency=True)
+
+    @deprecated(
+        breaking_version="2.0",
+        additional_warn_text=(
+            "Use `partition_key_range` or `partition_key_range_for_asset` instead."
+        ),
+    )
+    @public
+    def asset_partition_key_range_for_output(
+        self, output_name: str = "result"
+    ) -> PartitionKeyRange:
+        """Deprecated. Use partition_key_range property or partition_key_range_for_asset instead.
+
+        Return the PartitionKeyRange for the corresponding output. Errors if not present.
+        """
+        asset_key = self.asset_key_for_output(output_name)
+        return self.partition_key_range_for_asset(asset_key)
+
+    @deprecated(
+        breaking_version="2.0", additional_warn_text="Use `partition_key_range_for_asset` instead."
+    )
+    @public
+    def asset_partition_key_range_for_input(self, input_name: str) -> PartitionKeyRange:
+        """Deprecated. Use partition_key_range_for_asset instead.
+
+        Return the PartitionKeyRange for the corresponding input. Errors if there is more or less than one.
+        """
+        asset_key = self.asset_key_for_input(input_name)
+        return self.partition_key_range_for_asset(asset_key, is_dependency=True)
+
+    @deprecated(
+        breaking_version="2.0",
+        additional_warn_text="Use `partitions_def` or `partitions_def_for_asset` instead.",
+    )
+    @public
+    def asset_partitions_def_for_output(self, output_name: str = "result") -> PartitionsDefinition:
+        """Deprecated. Use partitions_def property or partitions_def_for_asset instead.
+
+        The PartitionsDefinition on the upstream asset corresponding to this input.
+        """
+        asset_key = self.asset_key_for_output(output_name)
+        return self.partitions_def_for_asset(asset_key)
+
+    @deprecated(
+        breaking_version="2.0", additional_warn_text="Use `partitions_def_for_asset` instead."
+    )
+    @public
+    def asset_partitions_def_for_input(self, input_name: str) -> PartitionsDefinition:
+        """Deprecated. Use partitions_def_for_asset instead.
+
+        The PartitionsDefinition on the upstream asset corresponding to this input.
+        """
+        asset_key = self.asset_key_for_input(input_name)
+        return self.partitions_def_for_asset(asset_key)
+
+    @deprecated(
+        breaking_version="2.0", additional_warn_text="Use `partition_keys_for_asset` instead."
+    )
+    @public
+    def asset_partition_keys_for_output(self, output_name: str = "result") -> Sequence[str]:
+        """Deprecated. Use partition_keys_for_asset instead.
+
+        Returns a list of the partition keys for the given output.
+        """
+        asset_key = self.asset_key_for_output(output_name)
+        return self.partition_keys_for_asset(asset_key)
+
+    @deprecated(
+        breaking_version="2.0", additional_warn_text="Use `partition_keys_for_asset` instead."
+    )
+    @public
+    def asset_partition_keys_for_input(self, input_name: str) -> Sequence[str]:
+        """Deprecated. Use partition_keys_for_asset instead.
+
+        Returns a list of the partition keys of the upstream asset corresponding to the
+        given input.
+        """
+        asset_key = self.asset_key_for_input(input_name)
+        return self.partition_keys_for_asset(asset_key, is_dependency=True)


unclear that we will want to deprecate these as long as AssetOut / AssetIn world is still supported. Since you define your objects in terms of output names there and the keys are derived for you it seem less than ideal to force users to translate

Agree with alex. We're aren't completely divorcing ourselves from I/O manager, but we are making them more more "opt-in" and making non-io-manager land feel more first-class. For io-manager-forward users, this is a big regression.

ok. My main concern with keeping all of these different versions around is how confusing all of the names will become. Seeing three to four variations of a method (i.e. asset_partitions_keys_for_input(), asset_partition_keys_for_output(), partition_keys_for_asset(), and partition_keys property) would be so daunting and would require the user to learn about each of those methods before they can pick the correct one.

We could probably resolve some of this with good API docs:

In the doc block for each method, indicate if it is intended for use with I/O managers. Maybe not in those words exactly, but give the user the ability to easily know if this method is relevant to them.

Manually create sections in the API docs page that groups the "for_asset" methods together, the "for_output" methods together, and the "for_input" methods together. I think this kind of manual ordering of methods within a class is possible in sphinx, but I'm not 100% sure

If we can do ^ type of manual ordering, we could present the methods in a way that goes from least complex to more complex. Right now it automatically sorts of the methods of the context alphabetically, so if you scroll through you are presented with the methods in a non-sensical order

alt docs organization would be to group by what the function does (partition_key, key range, time window, etc) and then have each option (for_asset, for_output, for_input) in the group

alangenfeld · 2023-09-06T20:49:49Z

python_modules/dagster/dagster/_core/execution/context/compute.py

-    def asset_partition_key_for_output(self, output_name: str = "result") -> str:
-        """Returns the asset partition key for the given output. Defaults to "result", which is the
-        name of the default output.
+    def partition_key_for_asset(


should these for_asset methods have asset be optional and work in the single asset case?

yeah that seems reasonable

jamiedemaria · 2023-09-06T21:32:13Z

python_modules/dagster/dagster/_core/execution/context/compute.py

-        """Returns the asset partition key for the given output. Defaults to "result", which is the
-        name of the default output.
+    def partition_key_for_asset(
+        self, asset: CoercibleToAssetKey, is_dependency: bool = False


the is_dependency bool param here is to handle the self-dependent partition case. Basically when you have a self-dependent partition, if you said context.partition_key_for_asset("my_asset") it's unclear if you want the partition key of my_asset as a dependency, or as is being currently materialized. I'm handling this (for now at least) by allowing users to specify is_dependency to indicate they want it loaded as a dependency.

In all other cases, the function should be able to figure out the correct partition key to load given the current asset

I'm not set on this solution, so if you have any concerns or other ideas let me know

schrockn

Could you break this up a bit? I think the is_dependency bit will require some discussion, but that shouldn't block the docs improvements

schrockn · 2023-09-07T11:17:58Z

python_modules/dagster/dagster/_core/execution/context/compute.py


    @public
-    def asset_partitions_time_window_for_output(self, output_name: str = "result") -> TimeWindow:
-        """The time window for the partitions of the output asset.
+    def partitions_time_window_for_asset(


Need to keep the old one around and deprecate it

schrockn · 2023-09-07T11:19:12Z

python_modules/dagster/dagster/_core/execution/context/compute.py

-    def asset_partitions_time_window_for_output(self, output_name: str = "result") -> TimeWindow:
-        """The time window for the partitions of the output asset.
+    def partitions_time_window_for_asset(
+        self, asset: CoercibleToAssetKey, is_dependency: bool = False


I don't think these should take CoercibleToAssetKey as an AssetsDefinition with multiple keys would have to be a hard error.

I think cases like this present a pretty good argument that CoercibleToAssetKey shouldn't be expanded to take AssetsDefinition (as discussed here) and that a new type alias should be created to include AssetsDefinition

schrockn · 2023-09-07T11:22:05Z

python_modules/dagster/dagster/_core/execution/context/compute.py

-    def asset_partitions_time_window_for_input(self, input_name: str = "result") -> TimeWindow:
-        """The time window for the partitions of the input asset.
+    def partition_keys_for_asset(
+        self, asset: CoercibleToAssetKey, is_dependency: bool = False


is_dependency seems like a rough name for this

Do we have a generalized way to describe this concept? Asset keys + partition mappings end up being a different sort of DAG, which you can visualize as a third dimension in the asset graph (imagine each partition being an entry in the z axis). How do we describe this elsewhere?

+1 that the name sucks.

I'm not aware of any generalized way to describe this kind of thing. Maybe @clairelin135 has an idea from when the partition mapping stuff was first written?

jamiedemaria · 2023-09-19T16:39:03Z

closing in favor of the PR for more complete simplification of AssetExecutionContext #16596

jamiedemaria mentioned this pull request Aug 31, 2023

AssetDep for using PartitionMappings with AssetSpecs #16207

Merged

jamiedemaria changed the title ~~[RFC] consolidate partition methods on the context~~ [exploration] consolidate partition methods on the context Aug 31, 2023

jamiedemaria force-pushed the jamie/asset-spec-partition-mapping branch from 33be378 to dc1e90c Compare September 5, 2023 20:19

jamiedemaria force-pushed the jamie/partition-context-methods branch 5 times, most recently from bcb05ee to 8c0def0 Compare September 6, 2023 15:55

jamiedemaria force-pushed the jamie/asset-spec-partition-mapping branch from a8aeca5 to 58d6691 Compare September 6, 2023 16:23

jamiedemaria force-pushed the jamie/partition-context-methods branch from 8c0def0 to a4bc5a0 Compare September 6, 2023 16:23

jamiedemaria mentioned this pull request Sep 6, 2023

use correct return types for some partition key methods on context #16339

Closed

jamiedemaria force-pushed the jamie/asset-spec-partition-mapping branch from 58d6691 to 735a7aa Compare September 6, 2023 16:59

jamiedemaria force-pushed the jamie/partition-context-methods branch from a4bc5a0 to e4d2d1c Compare September 6, 2023 17:00

jamiedemaria commented Sep 6, 2023

View reviewed changes

jamiedemaria marked this pull request as ready for review September 6, 2023 20:28

jamiedemaria requested a review from erinkcochran87 as a code owner September 6, 2023 20:28

jamiedemaria requested review from alangenfeld, schrockn and clairelin135 September 6, 2023 20:28

jamiedemaria changed the title ~~[exploration] consolidate partition methods on the context~~ [RFC] consolidate partition methods on the context Sep 6, 2023

alangenfeld reviewed Sep 6, 2023

View reviewed changes

jamiedemaria commented Sep 6, 2023

View reviewed changes

schrockn requested changes Sep 7, 2023

View reviewed changes

jamiedemaria force-pushed the jamie/partition-context-methods branch from da34a6a to 70330f4 Compare September 7, 2023 13:35

jamiedemaria changed the base branch from jamie/asset-spec-partition-mapping to jamie/partition-method-reorg September 7, 2023 15:50

jamiedemaria force-pushed the jamie/partition-context-methods branch from 70330f4 to 181ff5d Compare September 7, 2023 15:50

jamiedemaria mentioned this pull request Sep 7, 2023

[small] re-organize partition methods to make review of RFC easier #16365

Closed

jamiedemaria force-pushed the jamie/partition-method-reorg branch from 6f998c0 to 39bb474 Compare September 7, 2023 17:29

jamiedemaria force-pushed the jamie/partition-context-methods branch from 181ff5d to b2a5e9d Compare September 7, 2023 17:29

jamiedemaria requested a review from smackesey September 8, 2023 14:33

JamieDeMaria and others added 14 commits September 8, 2023 11:27

attempt to consolidate partition methods on context

abf9fc1

flip boolean

79d6196

comment on test failures

7502f46

some small fixes

f1bddb3

add is_dependency bool to handle self partitions?

cb3c286

change condition

cb66824

support source assets

0b17e18

trivial change because bk is failing to checkout previous commit?

44f4a05

wip

5d2c783

maybe fix the issue??

0738bb3

test fixes

df7d114

doc strings, some cleanup

c2ad4bb

some renaming and clean up

5b51ccc

small cleanup

ff4246f

jamiedemaria force-pushed the jamie/partition-method-reorg branch from 39bb474 to cf95455 Compare September 8, 2023 15:27

jamiedemaria force-pushed the jamie/partition-context-methods branch from b2a5e9d to ff4246f Compare September 8, 2023 15:27

jamiedemaria closed this Oct 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] consolidate partition methods on the context #16255

[RFC] consolidate partition methods on the context #16255

jamiedemaria commented Aug 31, 2023 •

edited

Loading

jamiedemaria commented Aug 31, 2023 •

edited

Loading

jamiedemaria Sep 6, 2023

schrockn Sep 7, 2023

jamiedemaria Sep 7, 2023

schrockn Sep 7, 2023

jamiedemaria Sep 7, 2023 •

edited

Loading

jamiedemaria Sep 7, 2023

github-actions bot commented Sep 6, 2023 •

edited

Loading

alangenfeld Sep 6, 2023

schrockn Sep 7, 2023

jamiedemaria Sep 7, 2023 •

edited

Loading

alangenfeld Sep 6, 2023

jamiedemaria Sep 7, 2023

jamiedemaria Sep 6, 2023

schrockn left a comment

schrockn Sep 7, 2023

schrockn Sep 7, 2023

jamiedemaria Sep 7, 2023

schrockn Sep 7, 2023

jamiedemaria Sep 7, 2023

jamiedemaria commented Sep 19, 2023

[RFC] consolidate partition methods on the context #16255

[RFC] consolidate partition methods on the context #16255

Conversation

jamiedemaria commented Aug 31, 2023 • edited Loading

Summary & Motivation

How I Tested These Changes

jamiedemaria commented Aug 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamiedemaria Sep 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Sep 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamiedemaria Sep 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

schrockn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamiedemaria commented Sep 19, 2023

jamiedemaria commented Aug 31, 2023 •

edited

Loading

jamiedemaria commented Aug 31, 2023 •

edited

Loading

jamiedemaria Sep 7, 2023 •

edited

Loading

github-actions bot commented Sep 6, 2023 •

edited

Loading

jamiedemaria Sep 7, 2023 •

edited

Loading