Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interface for ExtContext and AssetExecutionContext #16480

Closed
wants to merge 5 commits into from

Conversation

jamiedemaria
Copy link
Contributor

@jamiedemaria jamiedemaria commented Sep 13, 2023

Summary & Motivation

Starting this PR as a place to have discussion about a shared interface for the ExtContext and the simplified AssetExecutionContext. I used the methods/properties on ExtContext as a starting place.

See comments in the code for discussion topics. I've documented the methods in the existing OpExecutionContext that have different names or don't exist.

Notion doc to track open questions, remaining work, etc https://www.notion.so/dagster/AssetExecutionContext-decision-points-cfc48a1da5a24ef6823ab79bc0f30668

How I Tested These Changes

@@ -543,6 +543,80 @@ def upload_messages_chunk(self, payload: IO, index: int) -> None:
# ########################


class IContext(ABC):
Copy link
Contributor Author

@jamiedemaria jamiedemaria Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should the context be named? IContext is a place holder for now. Some options:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like IExecutionContext better. technically it's for the execution phase, as opposed to phases like Evaluation. but then i wonder whether we should rename ExtContext to ExtExecutionContext or keep it as is for simplicity?

Copy link
Collaborator

@smackesey smackesey Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't think of a good one but I'm pretty opposed to IExecutionContext because OpExecutionContext doesn't implement this, so it would be very confusing.

Maybe INewExecutionContext or something (this isn't user-facing right)? IMaterializationContext?

Might help to spell out some of the thinking behind having a common interface here? What is it that AssetExecutionContext and ExtContext share in common? That they both proxy to an OpExecutionContext (though the mechanism is different)?

Copy link
Member

@schrockn schrockn Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw feedback from dogfooding that we should ditch the term "context" entirely which could be useful to avoid confusion with our orchestration process context APIs. We could call it an ExtRequest or ExtInvocation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the resulting class signature for AssetExecutionContext would be a bit weird then

class AssetExecutionContext(OpExecutionContext, ExtRequest)

# and eventually 
class AssetExecutionContext(ExtRequest)

it wouldn't be super visible to users, so it's probably ok, but just looks a touch odd to me

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schrockn was that feedback concerning the orchestration-side ExtOrchestrationContext or ext-side ExtContext? The above conversation is about an interface implemented by ExtContext but this feedback seems more fitting for ExtOrchestrationContext.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamiedemaria I agree that class AssetExecutionContext(ExtRequest) is super goofy and we should not have that in the codebase.

I bias towards someting like a IHasUserFacingExecutionMethods marker inteface. I think this is better than IHasCoreAssetMethods because the ExtContext interface can be used right now for vanilla ops. Not exporting it so that we have the optional value to change it, making a "two-way door"/reversible decision, so changing our mind has minimal cost.

This is for our only bookkeeping only, and we should feel free to change the name as we get more clarity around what it represents. IHasUserFacingExecutionMethods is long and goofy, but it is very specific, and I think accurately communicates that this (our user-facing context API, speciflcally) isn't the most conceptually coherent part of the system at the moment, and are just stopping the bleeding from adding yet another similar API. The only purpose of the interface is to keep ourselves honest.


@property
@abstractmethod
def asset_keys(self) -> Sequence[str]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpExecutionContext additionally has a selected_asset_keys method that may be useful to add to the interface. Interested in thoughts on that since I'm not as knowledgeable about ext use cases

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure if ext supports asset selection and how, so i'd defer to the ext folks.

different topic: how does multi_assets work with ext? as in, would an external execution would always map to 1 single asset key? if so, maybe we don't even need asset_keys here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ext currently uses OpExecutionContext.selected_asset_keys to determine what asset keys are valid on ext-side methods-- these are passed as ExtContextData.asset_keys. So there is no distinction between "all" asset keys and "selected' asset keys is collapsed in ext.

But I'm not sure there even is a valid distinction in general for these contexts? I don't see an asset_keys property on OpExecutionContext. Maybe all of OpExecutionContext, AssetExecutionContext, ExtContext should just have an asset_keys method (which would imply it's on the interface too)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok - i originally misunderstood what asset_keys was for the ext context.

Based on this i think we:

  • have an asset_keys property on IContext (so also on ExtContext and AssetExecutionContext). This will be the list of selected asset keys for the materialization
  • deprecate selected_asset_keys on AssetExecutionContext


@property
@abstractmethod
def provenance(self) -> Optional[ExtDataProvenance]:
Copy link
Contributor Author

@jamiedemaria jamiedemaria Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

equivalent in OpExecutionContext is get_asset_provenance, but I think making it a property is nice. Open to pushback on that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

property sounds good to me. but should it be data_provenance instead? do we envision a big difference between asset_provenance and ext_data_provenance, or is one a subset of the other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure, sounds like a question for @smackesey


@property
@abstractmethod
def provenance_by_asset_key(self) -> Mapping[str, Optional[ExtDataProvenance]]:
Copy link
Contributor Author

@jamiedemaria jamiedemaria Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method does not exist in OpExecutionContext, so we would add it to AssetExecutionContext or remove it from the interface and just have ExtContext implement it

Copy link
Contributor Author

@jamiedemaria jamiedemaria Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another concern @schrockn raised is that provenance and code_version methods reach into the DB for AssetExecutionContext, so making them easy to get is dangerous and encourages users to repeatedly reach out to the db. #16596 (comment)

EDIT - this has been largely resolved for AssetExecutionContext by pre-fetching code version and provenance in more efficient ways (see the linked pr for details)


@property
@abstractmethod
def is_partition_step(self) -> bool:
Copy link
Contributor Author

@jamiedemaria jamiedemaria Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the equivalent method in OpExecutionContext is has_partition_key. We should align on the name. Options:

  • is_partition_step / is_partitioned_step
  • has_partition_key
  • is_partitioned
  • is_partitioned_execution
  • ...

Copy link
Contributor Author

@jamiedemaria jamiedemaria Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like is_partitioned_step or is_partitioned_execution with slight preference for is_partitioned_execution. Some thought process/reasoning:

  • is_partitioned_step adds the "concept" of a step. It's not hard to understand what a step is, but it's still one additional thing and raises the question "is there a difference between an asset and a step?"
  • is_partitioned_execution uses a more familiar term, but is there a possibility that an asset could not be partitioned but still be part of a partitioned run? thinking of an asset with an AllPartitionMapping. I will test this out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could it just be is_partitioned for simplicity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like is_partitioned. @smackesey what are our guarantees about the stability of ExtContext? if i rename is_partitioned_step to is_partitioned do i need to add a deprecation warning to is_partitioned_step or can i just delete it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our guarantees are weak right now, I think it's fine to change anything on there in the next few days.

That said I don't really like is_partitioned, because we have a lot of concepts floating around and it's best to be clear about what applies to what. is_partitioned makes it sound like the context itself is partitioned.

I think is_partitioned_execution could be good if we use that terminology elsewhere. is_partitioned_step also, though I think it wasn't chosen in a very principled way.

Curious what @sryza and @schrockn think here-- I believe in our ontology, jobs and assets are two entities that can clearly be the subject of partitioning. How do we refer to a step corresponding to one or more partitions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the asset execution context we have has_partition_key and has_partition_key_range. Is it functionally different than those methods on the asset execution context? My instinct would be to be consistent, even if we don't think it's the optimal choice.

Copy link
Member

@schrockn schrockn Sep 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @smackesey that is_partitioned is not specific enough.

I concur with @sryza. Having has_partition_key and has_partition_key_range be two sep methods here is desirable because it accurately communicates that there are two different APIs for accessing partitioning information. If it is only is_partitioned_step, the user will sometimes need to make a further distinction between has_partition_key and has_partition_key_range anyways. I do prefer has_single_partition_key to has_partition_key as someone reasonable might think that has_partition_key should return True whenever has_partition_key_range is True.

I think the crux of this is this image:

image

What is the behavior and best naming in the "potentially confusing" quadrant of the state space above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe i'm missing something but i can't find has_partition_key_range or equivalent on OpExecutionContext. I can see that PlanExecutionContext has has_partition_key_range, but no other classes.

I was under the impression that has_partition_key was True in any partitioned execution, but i see now that it's not, which changes my thoughts on this.

has_partition_key_range doesn't sit well with me since a single partition key also has a partition key range, it's just a partition key range with the same start and end date.

I wonder if instead we should have the boolean be about what type of execution it is rather than the key plurality. something along the lines of is_single_partition_execution.

I also wonder how often this method will need to be used once we move to BackfillPolicy (here). If the asset author has to declare what type of backfill will be run at definition time, then they should know what methods (context.partition_key vs context.partition_key_range) to use without needing a run time boolean. I think there's still utility in having the method, but that might also affect how to think about it and name it

@abstractmethod
def partition_time_window(self) -> Optional["ExtTimeWindow"]:
"""TODO."""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO - investigate if the above methods make working with StaticPartitions difficult/impossible since there is no "range" of keys. Perhaps just partition_key is enough, but need to verify

Copy link
Contributor Author

@jamiedemaria jamiedemaria Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might just need to add the partition_keys method to the interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partition methods punted until https://github.com/dagster-io/internal/discussions/6704 is resolved. I'll add the partition methods to the interface once we decide what they should be

@abstractmethod
def retry_number(self) -> int:
"""TODO."""

Copy link
Contributor Author

@jamiedemaria jamiedemaria Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional methods in ExtContext that we may want to include:

  • get_extra
  • extras
  • report_asset_metadata
  • report_asset_data_version
  • log

def retry_number(self) -> int:
"""TODO."""


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional methods from OpExecutionContext/AssetExecutionContext that we may want to include:

  • log
  • log_event
  • selected_asset_keys

def retry_number(self) -> int:
"""TODO."""


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to consider the asset check related methods #16466

@jamiedemaria
Copy link
Contributor Author

cc @alangenfeld @schrockn @smackesey I'd really like to get your feedback on this in the next couple days so that we can pin down the interface and make any code changes

@jamiedemaria jamiedemaria force-pushed the jamie/context-interface branch from 9508725 to 59fd5df Compare September 18, 2023 15:06
@github-actions
Copy link

github-actions bot commented Sep 18, 2023

Deploy preview for dagit-storybook ready!

✅ Preview
https://dagit-storybook-oa4jjzqha-elementl.vercel.app
https://jamie-context-interface.components-storybook.dagster-docs.io

Built with commit 51620ff.
This pull request is being automatically deployed with vercel-action

@github-actions
Copy link

github-actions bot commented Sep 18, 2023

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-1acqyufqm-elementl.vercel.app
https://jamie-context-interface.core-storybook.dagster-docs.io

Built with commit 51620ff.
This pull request is being automatically deployed with vercel-action

@jamiedemaria jamiedemaria requested a review from yuhan September 18, 2023 20:53
@jamiedemaria jamiedemaria force-pushed the jamie/context-interface branch 2 times, most recently from 7891ea4 to 0ff760e Compare September 20, 2023 22:06
@jamiedemaria jamiedemaria changed the title [for discussion] interface for ExtContext and AssetExecutionContext interface for ExtContext and AssetExecutionContext Sep 21, 2023
@jamiedemaria jamiedemaria force-pushed the jamie/context-interface branch 4 times, most recently from 65b6c1e to 587dde2 Compare September 22, 2023 17:46
@jamiedemaria jamiedemaria force-pushed the jamie/context-interface branch from 587dde2 to 51620ff Compare September 26, 2023 14:06
Copy link
Member

@alangenfeld alangenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is at least stale and at most the plans have shifted dramatically

back to your queue to refresh or dismiss

@jamiedemaria
Copy link
Contributor Author

now that pipes has been release, I'm planning to open a new PR to discuss the shared interface for pipes context and asset context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants