Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support partitioned assets for fetching latest AssetMaterializations #19286

Conversation

jamiedemaria
Copy link
Contributor

@jamiedemaria jamiedemaria commented Jan 18, 2024

Summary & Motivation

In #18971 we piggy-back on the versioning code path to provide AssetMaterializations for the direct upstream dependencies for a currently materializing asset on the context. However, due to scaling constraints, the versioning code path always fetches the latest AssetMaterialization without considering the partitions that are relevant to the current execution.

This means that for partitioned assets, the information provided via latest_materialization_for_upstream_asset could be incorrect. Consider the following scenario:

partitions_def = DailyPartitionsDefinition(start_date="2024-01-01")

@asset(partitions_def=partitions_def)
def upstream(context):
    return MaterializeResult(metadata={"partition_key": context.partition_key})

@asset(partitions_def=partitions_def, deps=[upstream])
def downstream(context: AssetExecutionContext):
    mat = context.latest_materialization_for_upstream_asset("upstream")
    assert mat is not None
    assert mat.metadata["partition_key"].value == context.partition_key

If we materialized partition 2024-01-01 of upstream, then materialized partition 2024-01-02 of upstream. At this point the latest AssetMaterialization for upstream is the one for the 2024-01-02 partition. If we then materialized partition 2024-01-01 of downstream, the final assertion would fail. Reasonable behavior to expect would be that the AssetMaterialization returned by latest_materailization_for_upstream_asset is the one for the most recent materialization of the partition of upstream that the currently materializing partition of downstream depends on. But the latest AssetMaterialization for upstream is for a different partition, so we get that AssetMaterialization instead.

This PR proposes adding additional logic to latest_materialization_for_upstream_asset so that for partitioned assets, we get the latest AssetMaterialization for the correct partition. This unfortunately requires a call to the DB. Renaming the function fetch_latest_materialization_for_upstream_asset should help indicate this to the user. For non-partitioned assets, we still use the AssetMaterializations fetched during the versioning code path, so those calls should remain efficient.

How I Tested These Changes

@jamiedemaria
Copy link
Contributor Author

jamiedemaria commented Jan 18, 2024

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

Current dependencies on/for this PR:

This stack of pull requests is managed by Graphite.

f"Cannot fetch AssetMaterialization for asset {key}. {key} must be an upstream dependency"
"in order to call latest_materialization_for_upstream_asset."
)
event = self.instance.get_latest_data_version_record(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starting a thread to discuss adding this DB access path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my major concern with not adding this access is that we could be providing inaccurate information to users who are working with partitioned assets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and a concern with adding it is that a user could iterate through a big list of assets and fetch the latest materialization for each and break something

@jamiedemaria jamiedemaria force-pushed the jamie/partition-methods-asset-key-based branch from 0d98e03 to a349b63 Compare January 19, 2024 16:38
@jamiedemaria jamiedemaria force-pushed the jamie/support-partitions-for-upstream-materialization-event branch from bfc9e4b to 033ee94 Compare January 19, 2024 16:38
@jamiedemaria jamiedemaria force-pushed the jamie/partition-methods-asset-key-based branch from a349b63 to b4cf23a Compare January 19, 2024 19:09
@jamiedemaria jamiedemaria force-pushed the jamie/support-partitions-for-upstream-materialization-event branch from 033ee94 to 60fbf64 Compare January 19, 2024 19:09
@jamiedemaria jamiedemaria marked this pull request as ready for review January 19, 2024 19:51
@jamiedemaria jamiedemaria force-pushed the jamie/partition-methods-asset-key-based branch from b4cf23a to 577ce16 Compare January 19, 2024 20:13
@jamiedemaria jamiedemaria force-pushed the jamie/support-partitions-for-upstream-materialization-event branch from 60fbf64 to d6e6846 Compare January 19, 2024 20:13
@schrockn
Copy link
Member

What if we prefetched this information for partitioned execution? Doesn't seem too crazy.

Also what about the behavior for parition-ranged backfills?

@jamiedemaria jamiedemaria force-pushed the jamie/partition-methods-asset-key-based branch from be3e875 to 0de310c Compare January 26, 2024 20:05
@jamiedemaria jamiedemaria force-pushed the jamie/support-partitions-for-upstream-materialization-event branch from 7ce61a4 to 97e885c Compare January 26, 2024 20:06
@jamiedemaria jamiedemaria force-pushed the jamie/partition-methods-asset-key-based branch from 0de310c to 16ec6bb Compare January 29, 2024 15:39
@jamiedemaria jamiedemaria force-pushed the jamie/support-partitions-for-upstream-materialization-event branch from 97e885c to 12e6484 Compare January 29, 2024 15:39
@jamiedemaria jamiedemaria force-pushed the jamie/partition-methods-asset-key-based branch from 16ec6bb to 0be42c6 Compare January 29, 2024 16:30
@jamiedemaria jamiedemaria force-pushed the jamie/support-partitions-for-upstream-materialization-event branch from 12e6484 to ff8de44 Compare January 29, 2024 16:30
@jamiedemaria
Copy link
Contributor Author

@schrockn and @smackesey just want to bump this convo since it probably got lost during the offsite

@jamiedemaria jamiedemaria force-pushed the jamie/partition-methods-asset-key-based branch from 0be42c6 to 845d988 Compare January 29, 2024 17:26
@jamiedemaria jamiedemaria force-pushed the jamie/support-partitions-for-upstream-materialization-event branch from ff8de44 to f00a840 Compare January 29, 2024 17:26
@jamiedemaria jamiedemaria force-pushed the jamie/partition-methods-asset-key-based branch from 845d988 to ef9380b Compare January 29, 2024 18:16
@jamiedemaria jamiedemaria force-pushed the jamie/support-partitions-for-upstream-materialization-event branch from f00a840 to 19f449d Compare January 29, 2024 18:16
@jamiedemaria jamiedemaria force-pushed the jamie/partition-methods-asset-key-based branch from ef9380b to 14211a3 Compare January 29, 2024 19:44
@jamiedemaria jamiedemaria force-pushed the jamie/support-partitions-for-upstream-materialization-event branch from 19f449d to 06673ac Compare January 29, 2024 19:45
@jamiedemaria jamiedemaria force-pushed the jamie/partition-methods-asset-key-based branch from 14211a3 to b2a3b39 Compare January 29, 2024 20:03
@jamiedemaria jamiedemaria force-pushed the jamie/support-partitions-for-upstream-materialization-event branch from 06673ac to 0bba728 Compare January 29, 2024 20:03
@jamiedemaria jamiedemaria force-pushed the jamie/partition-methods-asset-key-based branch from b2a3b39 to 2f56390 Compare January 30, 2024 17:23
@jamiedemaria jamiedemaria force-pushed the jamie/support-partitions-for-upstream-materialization-event branch from 0bba728 to c4e8780 Compare January 30, 2024 17:23
@erinkcochran87 erinkcochran87 removed their request for review January 30, 2024 17:48
@jamiedemaria jamiedemaria force-pushed the jamie/partition-methods-asset-key-based branch from 2f56390 to d2ab183 Compare January 30, 2024 20:19
@jamiedemaria jamiedemaria force-pushed the jamie/support-partitions-for-upstream-materialization-event branch from c4e8780 to 876c5ed Compare January 30, 2024 20:19
@jamiedemaria jamiedemaria marked this pull request as draft February 6, 2024 18:00
@jamiedemaria
Copy link
Contributor Author

converting back to draft to get this out of review queues until context work is re-prioritized

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants