Allocation order refactor (#2168) · NVIDIA/Fuser@8c18701

Commit

Allocation order refactor (#2168)

refactored allocation order inference pass:
* Instead of per operation propagation rule, we are now using IdModel
mapping to map allocation domain of reference tensor to rfactor domain
of target tensor.
* Updated the inference API to allow specified sources and destinations
for the propagation.
  ```
void inferenceAllocationOrder(
    Fusion* fusion,
    const std::vector<TensorView*>& srcs,
    const std::vector<TensorView*>& dsts);
  ```

* The propagation tried to keep the memory format of `dsts` closer to
the `srcs` to simplify scheduling as well as facilitate vectorization.
It works roughly as:
* For each entry `dst`, among all its producers in `srcs`, we'll find
the one with the most loop iter domain in its allocation domain as the
reference `ref`
* We try to map each iter domain in `dst`'s rfactor domain to `ref`'s
allocation order domain and push those as the inner dimension in `dst`'s
new allocation domain, while pushing unmapped iter domains as outer
dimensions.
* I have to put in a WAR for the mapping logic for now, since reduction
scheduler is struggling with permuted output. See issue #2202. The WAR
is simply to preserve the existing position of reduction iter domain in
rfactor the same as it would be in its new allocation domain. This WAR
is supposed to be removed at a later point once we fixed reduction
scheduler. I kept both code path in the PR for easier future cleanup.

---------

Co-authored-by: Naoya Maruyama <[email protected]>
Co-authored-by: Jingyue Wu <[email protected]>

Loading branch information

3 people authored May 14, 2024

1 parent dfba77a commit 8c18701

0 comments on commit `8c18701`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `8c18701`

Commit

There are no files selected for viewing

0 comments on commit 8c18701

0 comments on commit `8c18701`