Make AssetExecutionContext a subclass of OpExecutionContext #16596

jamiedemaria · 2023-09-18T16:30:50Z

Summary & Motivation

Makes the AssetExecutionContext a subclass of OpExecutionContext. Since the plan is to split AssetExecutionContext into it's own class, we need to fire deprecation warnings for all methods on OpExecutionContext that we do not intend to keep on AssetExecutionContext longterm. We also fire a deprecation warning on isinstance(context, OpExecutionContext) when context is an AssetExecutionContext.

This class should implement the IContext from #16480, but I can't import from dagster-ext so need to figure that part out first...

How I Tested These Changes

stacked PR #16598

jamiedemaria · 2023-09-18T16:31:03Z

Current dependencies on/for this PR:

master
- PR interface for ExtContext and AssetExecutionContext #16480
  - PR Make AssetExecutionContext a subclass of OpExecutionContext #16596 👈
    - PR partition methods on AssetExecutionContext #16625
    - PR provide AssetExecutionContext class to context #16493
      - PR update build_op_context to build_asset_context where relevant #16673
        
        PR support direct invocation with AssetExecutionContext #16635
        
        PR test AssetExecutionContext subclass deprecations #16598
  - PR OpExecutionContext metaclass #16486
    - PR Simplified AssetExecutionContext #16487

This comment was auto-generated by Graphite.

python_modules/dagster/dagster/_core/execution/context/compute.py

jamiedemaria · 2023-09-18T18:29:12Z

python_modules/dagster/dagster/_core/execution/context/compute.py

+        return self._op_execution_context.get_asset_provenance(asset_key)
+
+    @public
+    # TODO - method naming. this needs work


Alternate names:

get_code_versions_for_assets

Alternately, we could delete this method entirely, and update the impls of code_version and code_version_by_asset_key to

@property def code_version(self) -> Optional[str]: return self.code_version_by_asset_key[self.asset_key] @property def code_version_by_asset_key(self) -> Mapping[AssetKey, Optional[str]]: return self.op_execution_context.instance.get_latest_materialization_code_versions( self.asset_keys )

It is important that access each in-memory properties and expensive accesses to the instance feel very different, and that it is clear when the user is doing something expensive like a db query to get information, so something like code_version_by_asset_key is not a good idea. Imagine accessing that in a tight loop.

that makes sense - the ExtContext also has a code_version_by_asset_key so whatever decisions we make here will also probably need to be applied there. Does making it a getter method seem like enough indication to the user that it's expensive?

def get_code_version(self) -> Optional[str]: return self.code_version_by_asset_key[self.asset_key] def get_code_version_by_asset_key(self) -> Mapping[AssetKey, Optional[str]]: return self.op_execution_context.instance.get_latest_materialization_code_versions( self.asset_keys )

or something more dramatic like query_db_for_code_version or expecting the user to get the instance and call get_latest_materialization_code_versions themselves?

So to be equivalent with the ext contract (code here:

dagster/python_modules/dagster/dagster/_core/ext/context.py

Line 122 in 34acd9c

def build_external_execution_context_data(

) only code versions on the asset_defs object would be accessible cheaply. Do we need to have a the variant that fetches from the instance at all?

cc: @smackesey this convo could be of interest to you

code_version, code_version_by_asset_key, provenance and provenace_by_asset_key have all been updated so that the dictionaries are pre-fetched during context init. additionally code_versions are now pulled from the AssetsDefinition.

Does making it a getter method seem like enough indication to the user that it's expensive?

Maybe tangential to the immediate concern, but I like "fetch" as an indicator that a method is loading from elsewhere.

additionally code_versions are now pulled from the AssetsDefinition.

Great, yeah it would be incorrect to fetch these from the instance since everywhere else "the asset's code version" means what is set on the definition, not what was used for a materialization at some undetermined time in the past.

ok great, i didn't know that subtlety (thinking about it now it makes total sense) so thanks for clearing that up

python_modules/dagster/dagster/_core/execution/context/compute.py

schrockn

A bunch of stuff. There are enough independent threads of discussion here that we might want to consider breaking up the PR if nothing else than to manage discussion and decisions.

schrockn · 2023-09-20T01:01:47Z

python_modules/dagster/dagster/_core/execution/context/compute.py

+        return self._op_execution_context.get_asset_provenance(asset_key)
+
+    @public
+    # TODO - method naming. this needs work


It is important that access each in-memory properties and expensive accesses to the instance feel very different, and that it is clear when the user is doing something expensive like a db query to get information, so something like code_version_by_asset_key is not a good idea. Imagine accessing that in a tight loop.

python_modules/dagster/dagster/_core/execution/context/compute.py

schrockn · 2023-09-20T01:04:05Z

python_modules/dagster/dagster/_core/execution/context/compute.py

+    @property
+    def asset_check_spec(self) -> AssetCheckSpec:
+        return self._op_execution_context.asset_check_spec


what are the semantics of this? Ops can have zero checks, or can have many.

I'm a bit confused on this myself - OpExecutionContext has the method:

@property def asset_check_spec(self) -> AssetCheckSpec: asset_checks_def = check.not_none( self.job_def.asset_layer.asset_checks_def_for_node(self.node_handle), "This context does not correspond to an AssetChecksDefinition", ) return asset_checks_def.spec

but AssetsDefinition has check_specs_by_output_name and check_specs which seem more useful.

I think a check_specs_by_asset_key property would be the most flexible here, so i'll update the implementation with that as a starting point

cc @johannkm for opinions and insights on the check spec related context methods

Yeah the existing one errors on anything except @asset_check, which has one spec. The current method looks fine- you could also remove check specs property for now. I don't have a clear usecase for it, it's only used in tests

ok good to know!

schrockn · 2023-09-20T01:05:05Z

python_modules/dagster/dagster/_core/execution/context/compute.py

+    def provenance(self) -> Optional[DataProvenance]:
+        return self.get_asset_provenance(self.asset_key)
+
+    @property
+    def provenance_by_asset_key(self) -> Mapping[AssetKey, Optional[DataProvenance]]:
+        provenance_map = {}
+        for key in self.asset_keys:
+            provenance_map[key] = self.get_asset_provenance(key)
+
+        return provenance_map
+
+    @property
+    def code_version(self) -> Optional[str]:
+        return self.get_assets_code_version([self.asset_key])[self.asset_key]
+
+    @property
+    def code_version_by_asset_key(self) -> Mapping[AssetKey, Optional[str]]:
+        return self.get_assets_code_version(self.asset_keys)


these all go to the db? Super dangerous to make that this easy

resolved via call - code version is now fetched from the asset definition, and provenance for the selected assets is pre-fetched during init

schrockn · 2023-09-20T01:07:04Z

python_modules/dagster/dagster/_core/execution/context/compute.py

+    def provenance(self) -> Optional[DataProvenance]:
+        return self.get_asset_provenance(self.asset_key)
+
+    @property
+    def provenance_by_asset_key(self) -> Mapping[AssetKey, Optional[DataProvenance]]:
+        provenance_map = {}
+        for key in self.asset_keys:
+            provenance_map[key] = self.get_asset_provenance(key)
+
+        return provenance_map


while I think it is defensible to have separate asset_key and asset_keys properties for convenience, for the remainder of these properties we should not have both variants.

My recommendation is to only allow lookup by key, and also to strongly consider a single object.

that's fine w me - we'll also need to update ExtContext to match

while I think it is defensible to have separate asset_key and asset_keys properties for convenience, for the remainder of these properties we should not have both variants.

My recommendation is to only allow lookup by key, and also to strongly consider a single object.

IMO this makes for annoying UX:

context.provenance_by_asset_key(context.asset_key) # instead of context.provenance

I think it should be:

def provenance(self, asset_key: Optional[AssetKey]):

Then if there is only a single asset you default to that, otherwise require the asset_key.

exploring a bit more on the single object thread. if i understand correctly, the object could be some kind of AssetInfo class that would hold the code_version, provenance, and maybe things like partition_keys and partition_time_window. Rather than having a set of code_version_by_asset_key, partition_time_window_by_asset_key etc methods, we'd just have a asset_info_by_asset_key that would return these objects.

it feels similar to AssetSpec but bound to a particular materialization, not a general definition

it feels similar to AssetSpec but bound to a particular materialization, not a general definition

Great mental model.

How about a compromise between the two approaches, asI do find @smackesey's concern of divergence between single and multi asset case convincing:

def asset_info(self, asset_key: Optional[AssetKey] = None) -> AssetInfo ...

However I do not think we should name this AssetInfo. @jamiedemaria what is the complete set of properties that will exist on this object? That might help with naming.

partition_keys and partition_time_window are also interesting. They only vary on a per-asset-key basis when partition mappings are present, so if someone doesn't know about mappings they may find a "by_key" variant of this APi confusing.

Maybe not fully complete, but the properties I can think to include are:

code_version

provenance

partition_time_window

partition_key_range

partition_keys

upstream_result_value - no good name for this right now, but I'm thinking if we allow this:

@asset def my_asset(): return MaterializeResult(value="foo")

then this object would be a good way to give that value to the user downstream

@asset( deps=[my_asset] ) def another_asset(context): context.asset_info_by_asset_key[my_asset].upstream_result_value == "foo"

metadata

potentially even things like the time the asset was materialized. There's a small collection of user questions in the "how to i get X info about the upstream asset?" category. we could look through those and see if there's anything worth including

schrockn · 2023-09-20T01:08:19Z

python_modules/dagster/dagster/_core/execution/context/compute.py

+    @property
+    def partition_key(self) -> str:
+        return self.op_execution_context.partition_key
+
+    @public
+    @property
+    def partition_key_range(self) -> PartitionKeyRange:
+        return self._op_execution_context.asset_partition_key_range
+
+    @property
+    def partition_time_window(self) -> TimeWindow:
+        return self.op_execution_context.partition_time_window


Jamie internal discussion 6704 very relevant here.

python_modules/dagster/dagster/_core/execution/context/compute.py

schrockn · 2023-09-20T01:10:02Z

python_modules/dagster/dagster/_core/execution/context/compute.py

+    @public
+    @property
+    def instance(self) -> DagsterInstance:
+        """DagsterInstance: The current Dagster instance."""
+        return self._op_execution_context.instance


@prha how are are you in thinking about subsetting the instance API? This would be the time to start to introduce an alternative interface.

jamiedemaria · 2023-09-20T19:38:23Z

python_modules/dagster/dagster/_core/execution/context/compute.py

+    def selected_asset_keys(self) -> AbstractSet[AssetKey]:
+        return self._selected_asset_keys
+
+    # TODO - both get_asset_provenance and get_assets_code_version query the instance - do we want to


Do we want to keep get_asset_code_version and get_asset_provenance? they would allow users to get the code version/provenance for ay asset (not just those in the assets definition), but require instance queries.

I think i'm inclined to remove them and add back if we get requests, but not strongly tied to that

sean suggested switching to a fetch prefix instead of get #16596 (comment) so that is another option

I need to investigate where get_asset_provenance is used but we should definitely not have get_asset_code_version that returns a value from the last materialization (the last materialization code version is on the provenance anyway).

OpExecutionContext.get_asset_provenance is used to create the ext context here, used in the init of the AssetExecutionContext in this PR, in an asset in the toys project, and in a test for data versioning.

jamiedemaria · 2023-10-13T19:15:41Z

closing in favor of #16761 - I'll open a new PR with method deprecations in stages as we align on what the new API should be

This was referenced Sep 18, 2023

interface for ExtContext and AssetExecutionContext #16480

Closed

OpExecutionContext metaclass #16486

Closed

This was referenced Sep 18, 2023

Simplified AssetExecutionContext #16487

Closed

provide AssetExecutionContext class to context #16493

Closed

[quick] use correct context type annotations in tests #16507

Merged

test AssetExecutionContext subclass deprecations #16598

Closed

jamiedemaria commented Sep 18, 2023

View reviewed changes

python_modules/dagster/dagster/_core/execution/context/compute.py Outdated Show resolved Hide resolved