Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
refactored allocation order inference pass: * Instead of per operation propagation rule, we are now using IdModel mapping to map allocation domain of reference tensor to rfactor domain of target tensor. * Updated the inference API to allow specified sources and destinations for the propagation. ``` void inferenceAllocationOrder( Fusion* fusion, const std::vector<TensorView*>& srcs, const std::vector<TensorView*>& dsts); ``` * The propagation tried to keep the memory format of `dsts` closer to the `srcs` to simplify scheduling as well as facilitate vectorization. It works roughly as: * For each entry `dst`, among all its producers in `srcs`, we'll find the one with the most loop iter domain in its allocation domain as the reference `ref` * We try to map each iter domain in `dst`'s rfactor domain to `ref`'s allocation order domain and push those as the inner dimension in `dst`'s new allocation domain, while pushing unmapped iter domains as outer dimensions. * I have to put in a WAR for the mapping logic for now, since reduction scheduler is struggling with permuted output. See issue #2202. The WAR is simply to preserve the existing position of reduction iter domain in rfactor the same as it would be in its new allocation domain. This WAR is supposed to be removed at a later point once we fixed reduction scheduler. I kept both code path in the PR for easier future cleanup. --------- Co-authored-by: Naoya Maruyama <[email protected]> Co-authored-by: Jingyue Wu <[email protected]>
- Loading branch information