Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add information for coordinating segments in python frontend. #3289

Merged
merged 5 commits into from
Oct 31, 2024

Conversation

rdspring1
Copy link
Collaborator

Overview

This PR adds information necessary for coordinating segments in the python frontend. Changes pulled from #3025.

PR Details

  • Track the fusion state ids for the inputs, outputs, and extents of a Fusion. Inputs and extents are used to gather tensor arguments and scalars to run a fusion segment, while the outputs are employed to store results between segments.
  • A map from a CPP value to its corresponding fusion state id, which is needed to map values from original fusion to its segmented fusions.

Implementation Details

  • FusionState is a lightweight representation of a CPP Fusion.
  • When calling buildFusionIr, a CPP Fusion is created from the Python FusionDefinition. At this point, the FusionState creates a mapping from CPP Fusion to its State objects.
  • However, the FusionState is temporary and the CPP Fusion is cached in FusionCache. The information linking the CPP Fusion and Python FusionDefinition is stored in FusionCache.
  • When we create a new FusionState, we look for a cached CPP Fusion. If it exists, we restore the mapping from the data stored in FusionSchedules.

* Track inputs, outputs, and extents
@rdspring1 rdspring1 added the Python API Issues related to the Python API label Oct 27, 2024
@rdspring1
Copy link
Collaborator Author

!build

Copy link
Collaborator

@jjsjann123 jjsjann123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the delayed review.

}
TensorView* tv = v->as<TensorView>();
std::vector<IterDomain*> logical_dom =
TensorDomain::noReductions(tv->getLogicalDomain());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wujingyue is trying to change how we bind IO buffers to kernels. i.e. we might rethink which domain and how we are going to use here.

Not proposing any change, just trying to raise awareness.

std::vector<Val*> extents = getExtents(fusion_);
for (Val* extent : extents) {
int64_t num_extents = (int64_t)extents_fid_.size();
int64_t extent_fid = -num_extents - 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a negative index?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All scalars, vectors, and tensors use positive indices. The extents do not exist in the FusionState, so I used the negative numbers exclusively for the extent scalars.

The extents are the size of iterDomain in CPP fusion. We don't track those in FusionDefinition but they can become input arguments to a FusionDefinition after segmentation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the negative number here is just an initialization? does the number carry any meaning or does a global -1 would do it just fine?
sorry I might miss the part where extents_fid_ is being used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an ordering component to the extent index.

It is used for the same purpose as collecting the extents in prepareRuntimeOrder.
https://github.com/NVIDIA/Fuser/blob/main/csrc/runtime/fusion_cache_utils.cpp#L199-L208

We're mapping the tensor sizes to the extents like so https://github.com/NVIDIA/Fuser/pull/3025/files#diff-e512bea3b02f75ab1e81b759562879c5867e6e863679d6e7696fa34087dc3dc9R98-R100.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this in a comment listing the use of negative indices to avoid conflict with other indices.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got'ya. It's hard to figure out the necessity without looking at the actual use. We can keep it as-is and revisit in follow up PRs.

// The extent can already exist in the fusion. However, since scalars cannot
// be passed between segments, always overwrited existing fids. The original
// fusion definition will provide scalar extents.
map_value_to_fid_[extent] = extent_fid;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit lost here.

iiuc, the map_value_to_fid_ on other values are mapped from the Val* to their index field in FusionState. Here looks like we are trying to create a the same thing for each TensorView's logical domain. Where are we creating the python container for that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exposing the TensorView's logical domain to the python frontend, but I am tracking it in the FusionState. We may have to pass the scalar extents of the TensorView's logical domain as an input argument to a fusion segment.

@Priya2698
Copy link
Collaborator

Is it possible to add a test demonstrating what new information the FusionState stores and its link to the FusionCache?

@rdspring1 rdspring1 force-pushed the user_sched_segmentation_mapping branch from 2cbb9d4 to 8310ace Compare October 30, 2024 21:27
@Priya2698
Copy link
Collaborator

Priya2698 commented Oct 30, 2024

LGTM overall.
Do you have a document describing the intended flow of information between FusionCache, FusionState, FusionSchedules, FusionDefinition that will be built for Issue #3025?

Copy link
Collaborator

@jjsjann123 jjsjann123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

csrc/python_frontend/fusion_state.h Outdated Show resolved Hide resolved
std::vector<Val*> extents = getExtents(fusion_);
for (Val* extent : extents) {
int64_t num_extents = (int64_t)extents_fid_.size();
int64_t extent_fid = -num_extents - 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got'ya. It's hard to figure out the necessity without looking at the actual use. We can keep it as-is and revisit in follow up PRs.

@rdspring1
Copy link
Collaborator Author

!build

@rdspring1 rdspring1 merged commit 621e146 into main Oct 31, 2024
35 of 36 checks passed
@rdspring1 rdspring1 deleted the user_sched_segmentation_mapping branch October 31, 2024 15:50
@rdspring1
Copy link
Collaborator Author

Summary

  • A FusionDefinition holds a series of RecordFunctors. When building the FusionDefinition, we traverse the Trie in the FusionCache. Upon reaching the EndRecord, create the CPP Fusion using the RecordFunctors. Also, generate the mappings from Scalar, Vector, and Tensor states to CPP Val.

  • Since the FusionDefinition is temporary, store any information in the FusionSchedules associated with this FusionDefinition. The FusionSchedules is associated with the EndRecord leaf in the FusionCache.

  • Now, say we have a new FusionDefinition python object with the same definition. When building the FusionDefinition, traverse the Trie in the FusionCache again. The CPP Fusion already exists, so we load the CPP Fusion to the FusionDefinition along with the stored information in FusionSchedules.

Information Flow

  • FusionCache holds FusionSchedules in the leaves of Trie.
  • FusionSchedules holds Fusion and other information.
  • FusionDefinition creates CPP Fusion associated with FusionCache.
  • FusionDefinition contains CPP Fusion and mappings from python frontend and CPP Fusion.
  • FusionState is the parent class of FusionDefinition.

@Priya2698

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Python API Issues related to the Python API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants