-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support partitioned assets for fetching latest AssetMaterializations #19286
Support partitioned assets for fetching latest AssetMaterializations #19286
Conversation
f"Cannot fetch AssetMaterialization for asset {key}. {key} must be an upstream dependency" | ||
"in order to call latest_materialization_for_upstream_asset." | ||
) | ||
event = self.instance.get_latest_data_version_record( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
starting a thread to discuss adding this DB access path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my major concern with not adding this access is that we could be providing inaccurate information to users who are working with partitioned assets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and a concern with adding it is that a user could iterate through a big list of assets and fetch the latest materialization for each and break something
0d98e03
to
a349b63
Compare
bfc9e4b
to
033ee94
Compare
a349b63
to
b4cf23a
Compare
033ee94
to
60fbf64
Compare
b4cf23a
to
577ce16
Compare
60fbf64
to
d6e6846
Compare
What if we prefetched this information for partitioned execution? Doesn't seem too crazy. Also what about the behavior for parition-ranged backfills? |
be3e875
to
0de310c
Compare
7ce61a4
to
97e885c
Compare
0de310c
to
16ec6bb
Compare
97e885c
to
12e6484
Compare
16ec6bb
to
0be42c6
Compare
12e6484
to
ff8de44
Compare
@schrockn and @smackesey just want to bump this convo since it probably got lost during the offsite |
0be42c6
to
845d988
Compare
ff8de44
to
f00a840
Compare
845d988
to
ef9380b
Compare
f00a840
to
19f449d
Compare
ef9380b
to
14211a3
Compare
19f449d
to
06673ac
Compare
14211a3
to
b2a3b39
Compare
06673ac
to
0bba728
Compare
b2a3b39
to
2f56390
Compare
0bba728
to
c4e8780
Compare
2f56390
to
d2ab183
Compare
c4e8780
to
876c5ed
Compare
converting back to draft to get this out of review queues until context work is re-prioritized |
Summary & Motivation
In #18971 we piggy-back on the versioning code path to provide
AssetMaterialization
s for the direct upstream dependencies for a currently materializing asset on the context. However, due to scaling constraints, the versioning code path always fetches the latestAssetMaterialization
without considering the partitions that are relevant to the current execution.This means that for partitioned assets, the information provided via
latest_materialization_for_upstream_asset
could be incorrect. Consider the following scenario:If we materialized partition
2024-01-01
ofupstream
, then materialized partition2024-01-02
ofupstream
. At this point the latestAssetMaterialization
forupstream
is the one for the2024-01-02
partition. If we then materialized partition2024-01-01
of downstream, the final assertion would fail. Reasonable behavior to expect would be that theAssetMaterialization
returned bylatest_materailization_for_upstream_asset
is the one for the most recent materialization of the partition ofupstream
that the currently materializing partition ofdownstream
depends on. But the latestAssetMaterialization
forupstream
is for a different partition, so we get thatAssetMaterialization
instead.This PR proposes adding additional logic to
latest_materialization_for_upstream_asset
so that for partitioned assets, we get the latestAssetMaterialization
for the correct partition. This unfortunately requires a call to the DB. Renaming the functionfetch_latest_materialization_for_upstream_asset
should help indicate this to the user. For non-partitioned assets, we still use theAssetMaterialization
s fetched during the versioning code path, so those calls should remain efficient.How I Tested These Changes