-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
interface for ExtContext and AssetExecutionContext #16480
Conversation
Current dependencies on/for this PR:
This comment was auto-generated by Graphite. |
@@ -543,6 +543,80 @@ def upload_messages_chunk(self, payload: IO, index: int) -> None: | |||
# ######################## | |||
|
|||
|
|||
class IContext(ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should the context be named? IContext
is a place holder for now. Some options:
AbstractExecutionContext
- potentially too similar toAbstractComputeExecutionContext
- IExecutionContext
- ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like IExecutionContext
better. technically it's for the execution phase, as opposed to phases like Evaluation. but then i wonder whether we should rename ExtContext
to ExtExecutionContext
or keep it as is for simplicity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't think of a good one but I'm pretty opposed to IExecutionContext
because OpExecutionContext
doesn't implement this, so it would be very confusing.
Maybe INewExecutionContext
or something (this isn't user-facing right)? IMaterializationContext
?
Might help to spell out some of the thinking behind having a common interface here? What is it that AssetExecutionContext
and ExtContext
share in common? That they both proxy to an OpExecutionContext
(though the mechanism is different)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw feedback from dogfooding that we should ditch the term "context" entirely which could be useful to avoid confusion with our orchestration process context APIs. We could call it an ExtRequest
or ExtInvocation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the resulting class signature for AssetExecutionContext
would be a bit weird then
class AssetExecutionContext(OpExecutionContext, ExtRequest)
# and eventually
class AssetExecutionContext(ExtRequest)
it wouldn't be super visible to users, so it's probably ok, but just looks a touch odd to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schrockn was that feedback concerning the orchestration-side ExtOrchestrationContext
or ext-side ExtContext
? The above conversation is about an interface implemented by ExtContext
but this feedback seems more fitting for ExtOrchestrationContext
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamiedemaria I agree that class AssetExecutionContext(ExtRequest)
is super goofy and we should not have that in the codebase.
I bias towards someting like a IHasUserFacingExecutionMethods
marker inteface. I think this is better than IHasCoreAssetMethods
because the ExtContext
interface can be used right now for vanilla ops. Not exporting it so that we have the optional value to change it, making a "two-way door"/reversible decision, so changing our mind has minimal cost.
This is for our only bookkeeping only, and we should feel free to change the name as we get more clarity around what it represents. IHasUserFacingExecutionMethods
is long and goofy, but it is very specific, and I think accurately communicates that this (our user-facing context API, speciflcally) isn't the most conceptually coherent part of the system at the moment, and are just stopping the bleeding from adding yet another similar API. The only purpose of the interface is to keep ourselves honest.
|
||
@property | ||
@abstractmethod | ||
def asset_keys(self) -> Sequence[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpExecutionContext
additionally has a selected_asset_keys
method that may be useful to add to the interface. Interested in thoughts on that since I'm not as knowledgeable about ext use cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure if ext supports asset selection and how, so i'd defer to the ext folks.
different topic: how does multi_assets work with ext? as in, would an external execution would always map to 1 single asset key? if so, maybe we don't even need asset_keys
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ext currently uses OpExecutionContext.selected_asset_keys
to determine what asset keys are valid on ext-side methods-- these are passed as ExtContextData.asset_keys
. So there is no distinction between "all" asset keys and "selected' asset keys is collapsed in ext.
But I'm not sure there even is a valid distinction in general for these contexts? I don't see an asset_keys
property on OpExecutionContext
. Maybe all of OpExecutionContext
, AssetExecutionContext
, ExtContext
should just have an asset_keys
method (which would imply it's on the interface too)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok - i originally misunderstood what asset_keys
was for the ext context.
Based on this i think we:
- have an
asset_keys
property onIContext
(so also onExtContext
andAssetExecutionContext
). This will be the list of selected asset keys for the materialization - deprecate
selected_asset_keys
onAssetExecutionContext
|
||
@property | ||
@abstractmethod | ||
def provenance(self) -> Optional[ExtDataProvenance]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
equivalent in OpExecutionContext
is get_asset_provenance
, but I think making it a property is nice. Open to pushback on that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
property sounds good to me. but should it be data_provenance
instead? do we envision a big difference between asset_provenance and ext_data_provenance, or is one a subset of the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure, sounds like a question for @smackesey
|
||
@property | ||
@abstractmethod | ||
def provenance_by_asset_key(self) -> Mapping[str, Optional[ExtDataProvenance]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method does not exist in OpExecutionContext
, so we would add it to AssetExecutionContext
or remove it from the interface and just have ExtContext
implement it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another concern @schrockn raised is that provenance
and code_version
methods reach into the DB for AssetExecutionContext, so making them easy to get is dangerous and encourages users to repeatedly reach out to the db. #16596 (comment)
EDIT - this has been largely resolved for AssetExecutionContext by pre-fetching code version and provenance in more efficient ways (see the linked pr for details)
|
||
@property | ||
@abstractmethod | ||
def is_partition_step(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the equivalent method in OpExecutionContext
is has_partition_key
. We should align on the name. Options:
is_partition_step
/is_partitioned_step
has_partition_key
is_partitioned
is_partitioned_execution
- ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like is_partitioned_step
or is_partitioned_execution
with slight preference for is_partitioned_execution
. Some thought process/reasoning:
is_partitioned_step
adds the "concept" of a step. It's not hard to understand what a step is, but it's still one additional thing and raises the question "is there a difference between an asset and a step?"is_partitioned_execution
uses a more familiar term, but is there a possibility that an asset could not be partitioned but still be part of a partitioned run? thinking of an asset with anAllPartitionMapping
. I will test this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could it just be is_partitioned
for simplicity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like is_partitioned
. @smackesey what are our guarantees about the stability of ExtContext? if i rename is_partitioned_step
to is_partitioned
do i need to add a deprecation warning to is_partitioned_step
or can i just delete it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our guarantees are weak right now, I think it's fine to change anything on there in the next few days.
That said I don't really like is_partitioned
, because we have a lot of concepts floating around and it's best to be clear about what applies to what. is_partitioned
makes it sound like the context itself is partitioned.
I think is_partitioned_execution
could be good if we use that terminology elsewhere. is_partitioned_step
also, though I think it wasn't chosen in a very principled way.
Curious what @sryza and @schrockn think here-- I believe in our ontology, jobs and assets are two entities that can clearly be the subject of partitioning. How do we refer to a step corresponding to one or more partitions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the asset execution context we have has_partition_key
and has_partition_key_range
. Is it functionally different than those methods on the asset execution context? My instinct would be to be consistent, even if we don't think it's the optimal choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @smackesey that is_partitioned
is not specific enough.
I concur with @sryza. Having has_partition_key
and has_partition_key_range
be two sep methods here is desirable because it accurately communicates that there are two different APIs for accessing partitioning information. If it is only is_partitioned_step
, the user will sometimes need to make a further distinction between has_partition_key
and has_partition_key_range
anyways. I do prefer has_single_partition_key
to has_partition_key
as someone reasonable might think that has_partition_key
should return True whenever has_partition_key_range
is True.
I think the crux of this is this image:
What is the behavior and best naming in the "potentially confusing" quadrant of the state space above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe i'm missing something but i can't find has_partition_key_range
or equivalent on OpExecutionContext
. I can see that PlanExecutionContext
has has_partition_key_range
, but no other classes.
I was under the impression that has_partition_key
was True
in any partitioned execution, but i see now that it's not, which changes my thoughts on this.
has_partition_key_range
doesn't sit well with me since a single partition key also has a partition key range, it's just a partition key range with the same start and end date.
I wonder if instead we should have the boolean be about what type of execution it is rather than the key plurality. something along the lines of is_single_partition_execution
.
I also wonder how often this method will need to be used once we move to BackfillPolicy
(here). If the asset author has to declare what type of backfill will be run at definition time, then they should know what methods (context.partition_key vs context.partition_key_range) to use without needing a run time boolean. I think there's still utility in having the method, but that might also affect how to think about it and name it
@abstractmethod | ||
def partition_time_window(self) -> Optional["ExtTimeWindow"]: | ||
"""TODO.""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO - investigate if the above methods make working with StaticPartitions difficult/impossible since there is no "range" of keys. Perhaps just partition_key
is enough, but need to verify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might just need to add the partition_keys
method to the interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partition methods punted until https://github.com/dagster-io/internal/discussions/6704 is resolved. I'll add the partition methods to the interface once we decide what they should be
@abstractmethod | ||
def retry_number(self) -> int: | ||
"""TODO.""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional methods in ExtContext
that we may want to include:
get_extra
extras
report_asset_metadata
report_asset_data_version
log
def retry_number(self) -> int: | ||
"""TODO.""" | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional methods from OpExecutionContext
/AssetExecutionContext
that we may want to include:
log
log_event
selected_asset_keys
def retry_number(self) -> int: | ||
"""TODO.""" | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to consider the asset check related methods #16466
cc @alangenfeld @schrockn @smackesey I'd really like to get your feedback on this in the next couple days so that we can pin down the interface and make any code changes |
9508725
to
59fd5df
Compare
Deploy preview for dagit-storybook ready! ✅ Preview Built with commit 51620ff. |
Deploy preview for dagit-core-storybook ready! ✅ Preview Built with commit 51620ff. |
Deploy preview for dagster-docs ready! Preview available at https://dagster-docs-n0p8963vs-elementl.vercel.app Direct link to changed pages: |
7891ea4
to
0ff760e
Compare
65b6c1e
to
587dde2
Compare
587dde2
to
51620ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is at least stale and at most the plans have shifted dramatically
back to your queue to refresh or dismiss
now that pipes has been release, I'm planning to open a new PR to discuss the shared interface for pipes context and asset context |
Summary & Motivation
Starting this PR as a place to have discussion about a shared interface for the
ExtContext
and the simplifiedAssetExecutionContext
. I used the methods/properties onExtContext
as a starting place.See comments in the code for discussion topics. I've documented the methods in the existing
OpExecutionContext
that have different names or don't exist.Notion doc to track open questions, remaining work, etc https://www.notion.so/dagster/AssetExecutionContext-decision-points-cfc48a1da5a24ef6823ab79bc0f30668
How I Tested These Changes