interface for ExtContext and AssetExecutionContext #16480

jamiedemaria · 2023-09-13T17:10:49Z

Summary & Motivation

Starting this PR as a place to have discussion about a shared interface for the ExtContext and the simplified AssetExecutionContext. I used the methods/properties on ExtContext as a starting place.

See comments in the code for discussion topics. I've documented the methods in the existing OpExecutionContext that have different names or don't exist.

Notion doc to track open questions, remaining work, etc https://www.notion.so/dagster/AssetExecutionContext-decision-points-cfc48a1da5a24ef6823ab79bc0f30668

How I Tested These Changes

jamiedemaria · 2023-09-13T17:11:01Z

Current dependencies on/for this PR:

master
- PR interface for ExtContext and AssetExecutionContext #16480 👈
  - PR Make AssetExecutionContext a subclass of OpExecutionContext #16596
    - PR partition methods on AssetExecutionContext #16625
    - PR provide AssetExecutionContext class to context #16493
      - PR update build_op_context to build_asset_context where relevant #16673
        
        PR support direct invocation with AssetExecutionContext #16635
        
        PR test AssetExecutionContext subclass deprecations #16598
  - PR OpExecutionContext metaclass #16486
    - PR Simplified AssetExecutionContext #16487

This comment was auto-generated by Graphite.

python_modules/dagster-ext/dagster_ext/__init__.py

jamiedemaria · 2023-09-13T17:16:17Z

python_modules/dagster-ext/dagster_ext/__init__.py

@@ -543,6 +543,80 @@ def upload_messages_chunk(self, payload: IO, index: int) -> None:
 # ########################


+class IContext(ABC):


What should the context be named? IContext is a place holder for now. Some options:

AbstractExecutionContext - potentially too similar to AbstractComputeExecutionContext

IExecutionContext

...

i like IExecutionContext better. technically it's for the execution phase, as opposed to phases like Evaluation. but then i wonder whether we should rename ExtContext to ExtExecutionContext or keep it as is for simplicity?

Can't think of a good one but I'm pretty opposed to IExecutionContext because OpExecutionContext doesn't implement this, so it would be very confusing.

Maybe INewExecutionContext or something (this isn't user-facing right)? IMaterializationContext?

Might help to spell out some of the thinking behind having a common interface here? What is it that AssetExecutionContext and ExtContext share in common? That they both proxy to an OpExecutionContext (though the mechanism is different)?

I saw feedback from dogfooding that we should ditch the term "context" entirely which could be useful to avoid confusion with our orchestration process context APIs. We could call it an ExtRequest or ExtInvocation

the resulting class signature for AssetExecutionContext would be a bit weird then

class AssetExecutionContext(OpExecutionContext, ExtRequest) # and eventually class AssetExecutionContext(ExtRequest)

it wouldn't be super visible to users, so it's probably ok, but just looks a touch odd to me

@schrockn was that feedback concerning the orchestration-side ExtOrchestrationContext or ext-side ExtContext? The above conversation is about an interface implemented by ExtContext but this feedback seems more fitting for ExtOrchestrationContext.

@jamiedemaria I agree that class AssetExecutionContext(ExtRequest) is super goofy and we should not have that in the codebase.

I bias towards someting like a IHasUserFacingExecutionMethods marker inteface. I think this is better than IHasCoreAssetMethods because the ExtContext interface can be used right now for vanilla ops. Not exporting it so that we have the optional value to change it, making a "two-way door"/reversible decision, so changing our mind has minimal cost.

This is for our only bookkeeping only, and we should feel free to change the name as we get more clarity around what it represents. IHasUserFacingExecutionMethods is long and goofy, but it is very specific, and I think accurately communicates that this (our user-facing context API, speciflcally) isn't the most conceptually coherent part of the system at the moment, and are just stopping the bleeding from adding yet another similar API. The only purpose of the interface is to keep ourselves honest.

python_modules/dagster-ext/dagster_ext/__init__.py

jamiedemaria · 2023-09-13T17:21:30Z

python_modules/dagster-ext/dagster_ext/__init__.py

+
+    @property
+    @abstractmethod
+    def asset_keys(self) -> Sequence[str]:


OpExecutionContext additionally has a selected_asset_keys method that may be useful to add to the interface. Interested in thoughts on that since I'm not as knowledgeable about ext use cases

i'm not sure if ext supports asset selection and how, so i'd defer to the ext folks.

different topic: how does multi_assets work with ext? as in, would an external execution would always map to 1 single asset key? if so, maybe we don't even need asset_keys here.

ext currently uses OpExecutionContext.selected_asset_keys to determine what asset keys are valid on ext-side methods-- these are passed as ExtContextData.asset_keys. So there is no distinction between "all" asset keys and "selected' asset keys is collapsed in ext.

But I'm not sure there even is a valid distinction in general for these contexts? I don't see an asset_keys property on OpExecutionContext. Maybe all of OpExecutionContext, AssetExecutionContext, ExtContext should just have an asset_keys method (which would imply it's on the interface too)?

ah ok - i originally misunderstood what asset_keys was for the ext context.

Based on this i think we:

have an asset_keys property on IContext (so also on ExtContext and AssetExecutionContext). This will be the list of selected asset keys for the materialization

deprecate selected_asset_keys on AssetExecutionContext

jamiedemaria · 2023-09-13T17:22:11Z

python_modules/dagster-ext/dagster_ext/__init__.py

+
+    @property
+    @abstractmethod
+    def provenance(self) -> Optional[ExtDataProvenance]:


equivalent in OpExecutionContext is get_asset_provenance, but I think making it a property is nice. Open to pushback on that

property sounds good to me. but should it be data_provenance instead? do we envision a big difference between asset_provenance and ext_data_provenance, or is one a subset of the other?

i'm not sure, sounds like a question for @smackesey

jamiedemaria · 2023-09-13T17:22:57Z

python_modules/dagster-ext/dagster_ext/__init__.py

+
+    @property
+    @abstractmethod
+    def provenance_by_asset_key(self) -> Mapping[str, Optional[ExtDataProvenance]]:


This method does not exist in OpExecutionContext, so we would add it to AssetExecutionContext or remove it from the interface and just have ExtContext implement it

Another concern @schrockn raised is that provenance and code_version methods reach into the DB for AssetExecutionContext, so making them easy to get is dangerous and encourages users to repeatedly reach out to the db. #16596 (comment)

EDIT - this has been largely resolved for AssetExecutionContext by pre-fetching code version and provenance in more efficient ways (see the linked pr for details)

python_modules/dagster-ext/dagster_ext/__init__.py

jamiedemaria · 2023-09-13T17:24:43Z

python_modules/dagster-ext/dagster_ext/__init__.py

+
+    @property
+    @abstractmethod
+    def is_partition_step(self) -> bool:


the equivalent method in OpExecutionContext is has_partition_key. We should align on the name. Options:

is_partition_step / is_partitioned_step

has_partition_key

is_partitioned

is_partitioned_execution

...

I like is_partitioned_step or is_partitioned_execution with slight preference for is_partitioned_execution. Some thought process/reasoning:

is_partitioned_step adds the "concept" of a step. It's not hard to understand what a step is, but it's still one additional thing and raises the question "is there a difference between an asset and a step?"

is_partitioned_execution uses a more familiar term, but is there a possibility that an asset could not be partitioned but still be part of a partitioned run? thinking of an asset with an AllPartitionMapping. I will test this out.

could it just be is_partitioned for simplicity?

i like is_partitioned. @smackesey what are our guarantees about the stability of ExtContext? if i rename is_partitioned_step to is_partitioned do i need to add a deprecation warning to is_partitioned_step or can i just delete it?

Our guarantees are weak right now, I think it's fine to change anything on there in the next few days.

That said I don't really like is_partitioned, because we have a lot of concepts floating around and it's best to be clear about what applies to what. is_partitioned makes it sound like the context itself is partitioned.

I think is_partitioned_execution could be good if we use that terminology elsewhere. is_partitioned_step also, though I think it wasn't chosen in a very principled way.

Curious what @sryza and @schrockn think here-- I believe in our ontology, jobs and assets are two entities that can clearly be the subject of partitioning. How do we refer to a step corresponding to one or more partitions?

On the asset execution context we have has_partition_key and has_partition_key_range. Is it functionally different than those methods on the asset execution context? My instinct would be to be consistent, even if we don't think it's the optimal choice.

Agree with @smackesey that is_partitioned is not specific enough.

I concur with @sryza. Having has_partition_key and has_partition_key_range be two sep methods here is desirable because it accurately communicates that there are two different APIs for accessing partitioning information. If it is only is_partitioned_step, the user will sometimes need to make a further distinction between has_partition_key and has_partition_key_range anyways. I do prefer has_single_partition_key to has_partition_key as someone reasonable might think that has_partition_key should return True whenever has_partition_key_range is True.

I think the crux of this is this image:

What is the behavior and best naming in the "potentially confusing" quadrant of the state space above.

maybe i'm missing something but i can't find has_partition_key_range or equivalent on OpExecutionContext. I can see that PlanExecutionContext has has_partition_key_range, but no other classes.

I was under the impression that has_partition_key was True in any partitioned execution, but i see now that it's not, which changes my thoughts on this.

has_partition_key_range doesn't sit well with me since a single partition key also has a partition key range, it's just a partition key range with the same start and end date.

I wonder if instead we should have the boolean be about what type of execution it is rather than the key plurality. something along the lines of is_single_partition_execution.

I also wonder how often this method will need to be used once we move to BackfillPolicy (here). If the asset author has to declare what type of backfill will be run at definition time, then they should know what methods (context.partition_key vs context.partition_key_range) to use without needing a run time boolean. I think there's still utility in having the method, but that might also affect how to think about it and name it

jamiedemaria · 2023-09-13T17:25:57Z

python_modules/dagster-ext/dagster_ext/__init__.py

+    @abstractmethod
+    def partition_time_window(self) -> Optional["ExtTimeWindow"]:
+        """TODO."""
+


TODO - investigate if the above methods make working with StaticPartitions difficult/impossible since there is no "range" of keys. Perhaps just partition_key is enough, but need to verify

might just need to add the partition_keys method to the interface

partition methods punted until https://github.com/dagster-io/internal/discussions/6704 is resolved. I'll add the partition methods to the interface once we decide what they should be

python_modules/dagster-ext/dagster_ext/__init__.py

jamiedemaria · 2023-09-14T12:54:42Z

python_modules/dagster-ext/dagster_ext/__init__.py

+    @abstractmethod
+    def retry_number(self) -> int:
+        """TODO."""
+


Additional methods in ExtContext that we may want to include:

get_extra

extras

report_asset_metadata

report_asset_data_version

log

jamiedemaria · 2023-09-14T12:56:55Z

python_modules/dagster-ext/dagster_ext/__init__.py

+    def retry_number(self) -> int:
+        """TODO."""
+
+


Additional methods from OpExecutionContext/AssetExecutionContext that we may want to include:

log

log_event

selected_asset_keys

jamiedemaria · 2023-09-14T17:15:58Z

python_modules/dagster-ext/dagster_ext/__init__.py

+    def retry_number(self) -> int:
+        """TODO."""
+
+


We also need to consider the asset check related methods #16466

jamiedemaria · 2023-09-18T13:45:00Z

cc @alangenfeld @schrockn @smackesey I'd really like to get your feedback on this in the next couple days so that we can pin down the interface and make any code changes

github-actions · 2023-09-18T15:09:55Z

Deploy preview for dagit-storybook ready!

✅ Preview
https://dagit-storybook-oa4jjzqha-elementl.vercel.app
https://jamie-context-interface.components-storybook.dagster-docs.io

Built with commit 51620ff.
This pull request is being automatically deployed with vercel-action

github-actions · 2023-09-18T15:10:20Z

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-1acqyufqm-elementl.vercel.app
https://jamie-context-interface.core-storybook.dagster-docs.io

Built with commit 51620ff.
This pull request is being automatically deployed with vercel-action

github-actions · 2023-09-18T15:12:57Z

Deploy preview for dagster-docs ready!

Preview available at https://dagster-docs-n0p8963vs-elementl.vercel.app
https://jamie-context-interface.dagster.dagster-docs.io

Direct link to changed pages:

alangenfeld

I think this is at least stale and at most the plans have shifted dramatically

back to your queue to refresh or dismiss

jamiedemaria · 2023-10-13T19:13:45Z

now that pipes has been release, I'm planning to open a new PR to discuss the shared interface for pipes context and asset context

jamiedemaria commented Sep 13, 2023

View reviewed changes

python_modules/dagster-ext/dagster_ext/__init__.py Outdated Show resolved Hide resolved