-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make AssetExecutionContext a subclass of OpExecutionContext #16596
Make AssetExecutionContext a subclass of OpExecutionContext #16596
Conversation
Current dependencies on/for this PR:
This comment was auto-generated by Graphite. |
python_modules/dagster/dagster/_core/execution/context/compute.py
Outdated
Show resolved
Hide resolved
python_modules/dagster/dagster/_core/execution/context/compute.py
Outdated
Show resolved
Hide resolved
return self._op_execution_context.get_asset_provenance(asset_key) | ||
|
||
@public | ||
# TODO - method naming. this needs work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternate names:
get_code_versions_for_assets
Alternately, we could delete this method entirely, and update the impls of code_version
and code_version_by_asset_key
to
@property
def code_version(self) -> Optional[str]:
return self.code_version_by_asset_key[self.asset_key]
@property
def code_version_by_asset_key(self) -> Mapping[AssetKey, Optional[str]]:
return self.op_execution_context.instance.get_latest_materialization_code_versions(
self.asset_keys
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is important that access each in-memory properties and expensive accesses to the instance feel very different, and that it is clear when the user is doing something expensive like a db query to get information, so something like code_version_by_asset_key
is not a good idea. Imagine accessing that in a tight loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense - the ExtContext
also has a code_version_by_asset_key
so whatever decisions we make here will also probably need to be applied there. Does making it a getter method seem like enough indication to the user that it's expensive?
def get_code_version(self) -> Optional[str]:
return self.code_version_by_asset_key[self.asset_key]
def get_code_version_by_asset_key(self) -> Mapping[AssetKey, Optional[str]]:
return self.op_execution_context.instance.get_latest_materialization_code_versions(
self.asset_keys
)
or something more dramatic like query_db_for_code_version
or expecting the user to get the instance
and call get_latest_materialization_code_versions
themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to be equivalent with the ext contract (code here:
def build_external_execution_context_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @smackesey this convo could be of interest to you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code_version
, code_version_by_asset_key
, provenance
and provenace_by_asset_key
have all been updated so that the dictionaries are pre-fetched during context init. additionally code_versions
are now pulled from the AssetsDefinition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does making it a getter method seem like enough indication to the user that it's expensive?
Maybe tangential to the immediate concern, but I like "fetch" as an indicator that a method is loading from elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additionally code_versions are now pulled from the AssetsDefinition.
Great, yeah it would be incorrect to fetch these from the instance since everywhere else "the asset's code version" means what is set on the definition, not what was used for a materialization at some undetermined time in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok great, i didn't know that subtlety (thinking about it now it makes total sense) so thanks for clearing that up
python_modules/dagster/dagster/_core/execution/context/compute.py
Outdated
Show resolved
Hide resolved
python_modules/dagster/dagster/_core/execution/context/compute.py
Outdated
Show resolved
Hide resolved
python_modules/dagster/dagster/_core/execution/context/compute.py
Outdated
Show resolved
Hide resolved
python_modules/dagster/dagster/_core/execution/context/compute.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of stuff. There are enough independent threads of discussion here that we might want to consider breaking up the PR if nothing else than to manage discussion and decisions.
return self._op_execution_context.get_asset_provenance(asset_key) | ||
|
||
@public | ||
# TODO - method naming. this needs work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is important that access each in-memory properties and expensive accesses to the instance feel very different, and that it is clear when the user is doing something expensive like a db query to get information, so something like code_version_by_asset_key
is not a good idea. Imagine accessing that in a tight loop.
python_modules/dagster/dagster/_core/execution/context/compute.py
Outdated
Show resolved
Hide resolved
python_modules/dagster/dagster/_core/execution/context/compute.py
Outdated
Show resolved
Hide resolved
@property | ||
def asset_check_spec(self) -> AssetCheckSpec: | ||
return self._op_execution_context.asset_check_spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the semantics of this? Ops can have zero checks, or can have many.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused on this myself - OpExecutionContext
has the method:
@property
def asset_check_spec(self) -> AssetCheckSpec:
asset_checks_def = check.not_none(
self.job_def.asset_layer.asset_checks_def_for_node(self.node_handle),
"This context does not correspond to an AssetChecksDefinition",
)
return asset_checks_def.spec
but AssetsDefinition has check_specs_by_output_name
and check_specs
which seem more useful.
I think a check_specs_by_asset_key
property would be the most flexible here, so i'll update the implementation with that as a starting point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @johannkm for opinions and insights on the check spec related context methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the existing one errors on anything except @asset_check, which has one spec. The current method looks fine- you could also remove check specs property for now. I don't have a clear usecase for it, it's only used in tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok good to know!
def provenance(self) -> Optional[DataProvenance]: | ||
return self.get_asset_provenance(self.asset_key) | ||
|
||
@property | ||
def provenance_by_asset_key(self) -> Mapping[AssetKey, Optional[DataProvenance]]: | ||
provenance_map = {} | ||
for key in self.asset_keys: | ||
provenance_map[key] = self.get_asset_provenance(key) | ||
|
||
return provenance_map | ||
|
||
@property | ||
def code_version(self) -> Optional[str]: | ||
return self.get_assets_code_version([self.asset_key])[self.asset_key] | ||
|
||
@property | ||
def code_version_by_asset_key(self) -> Mapping[AssetKey, Optional[str]]: | ||
return self.get_assets_code_version(self.asset_keys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these all go to the db? Super dangerous to make that this easy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved via call - code version is now fetched from the asset definition, and provenance for the selected assets is pre-fetched during init
def provenance(self) -> Optional[DataProvenance]: | ||
return self.get_asset_provenance(self.asset_key) | ||
|
||
@property | ||
def provenance_by_asset_key(self) -> Mapping[AssetKey, Optional[DataProvenance]]: | ||
provenance_map = {} | ||
for key in self.asset_keys: | ||
provenance_map[key] = self.get_asset_provenance(key) | ||
|
||
return provenance_map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while I think it is defensible to have separate asset_key
and asset_keys
properties for convenience, for the remainder of these properties we should not have both variants.
My recommendation is to only allow lookup by key, and also to strongly consider a single object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's fine w me - we'll also need to update ExtContext to match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while I think it is defensible to have separate asset_key and asset_keys properties for convenience, for the remainder of these properties we should not have both variants.
My recommendation is to only allow lookup by key, and also to strongly consider a single object.
IMO this makes for annoying UX:
context.provenance_by_asset_key(context.asset_key)
# instead of
context.provenance
I think it should be:
def provenance(self, asset_key: Optional[AssetKey]):
Then if there is only a single asset you default to that, otherwise require the asset_key
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exploring a bit more on the single object thread. if i understand correctly, the object could be some kind of AssetInfo
class that would hold the code_version
, provenance
, and maybe things like partition_keys
and partition_time_window
. Rather than having a set of code_version_by_asset_key
, partition_time_window_by_asset_key
etc methods, we'd just have a asset_info_by_asset_key
that would return these objects.
it feels similar to AssetSpec
but bound to a particular materialization, not a general definition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it feels similar to
AssetSpec
but bound to a particular materialization, not a general definition
Great mental model.
How about a compromise between the two approaches, asI do find @smackesey's concern of divergence between single and multi asset case convincing:
def asset_info(self, asset_key: Optional[AssetKey] = None) -> AssetInfo
...
However I do not think we should name this AssetInfo
. @jamiedemaria what is the complete set of properties that will exist on this object? That might help with naming.
partition_keys
and partition_time_window
are also interesting. They only vary on a per-asset-key basis when partition mappings are present, so if someone doesn't know about mappings they may find a "by_key" variant of this APi confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not fully complete, but the properties I can think to include are:
code_version
provenance
partition_time_window
partition_key_range
partition_keys
upstream_result_value
- no good name for this right now, but I'm thinking if we allow this:
@asset
def my_asset():
return MaterializeResult(value="foo")
then this object would be a good way to give that value to the user downstream
@asset(
deps=[my_asset]
)
def another_asset(context):
context.asset_info_by_asset_key[my_asset].upstream_result_value == "foo"
- metadata
- potentially even things like the time the asset was materialized. There's a small collection of user questions in the "how to i get X info about the upstream asset?" category. we could look through those and see if there's anything worth including
@property | ||
def partition_key(self) -> str: | ||
return self.op_execution_context.partition_key | ||
|
||
@public | ||
@property | ||
def partition_key_range(self) -> PartitionKeyRange: | ||
return self._op_execution_context.asset_partition_key_range | ||
|
||
@property | ||
def partition_time_window(self) -> TimeWindow: | ||
return self.op_execution_context.partition_time_window |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jamie internal discussion 6704 very relevant here.
python_modules/dagster/dagster/_core/execution/context/compute.py
Outdated
Show resolved
Hide resolved
@public | ||
@property | ||
def instance(self) -> DagsterInstance: | ||
"""DagsterInstance: The current Dagster instance.""" | ||
return self._op_execution_context.instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prha how are are you in thinking about subsetting the instance API? This would be the time to start to introduce an alternative interface.
59fd5df
to
7891ea4
Compare
9d2c497
to
92457f9
Compare
def selected_asset_keys(self) -> AbstractSet[AssetKey]: | ||
return self._selected_asset_keys | ||
|
||
# TODO - both get_asset_provenance and get_assets_code_version query the instance - do we want to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to keep get_asset_code_version
and get_asset_provenance
? they would allow users to get the code version/provenance for ay asset (not just those in the assets definition), but require instance queries.
I think i'm inclined to remove them and add back if we get requests, but not strongly tied to that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sean suggested switching to a fetch
prefix instead of get
#16596 (comment) so that is another option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to investigate where get_asset_provenance
is used but we should definitely not have get_asset_code_version
that returns a value from the last materialization (the last materialization code version is on the provenance anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpExecutionContext.get_asset_provenance
is used to create the ext context here, used in the init of the AssetExecutionContext in this PR, in an asset in the toys project, and in a test for data versioning.
7891ea4
to
0ff760e
Compare
efa6880
to
0788e4e
Compare
e688144
to
27152b9
Compare
9ab63fd
to
65b6c1e
Compare
27152b9
to
d01d5d2
Compare
65b6c1e
to
587dde2
Compare
2e71fd1
to
49b0a60
Compare
587dde2
to
51620ff
Compare
1448509
to
5ba005b
Compare
closing in favor of #16761 - I'll open a new PR with method deprecations in stages as we align on what the new API should be |
Summary & Motivation
Makes the
AssetExecutionContext
a subclass ofOpExecutionContext
. Since the plan is to splitAssetExecutionContext
into it's own class, we need to fire deprecation warnings for all methods onOpExecutionContext
that we do not intend to keep onAssetExecutionContext
longterm. We also fire a deprecation warning onisinstance(context, OpExecutionContext)
when context is anAssetExecutionContext
.This class should implement the IContext from #16480, but I can't import from dagster-ext so need to figure that part out first...
How I Tested These Changes
stacked PR #16598