Merge OpenAI Triton commit `e9db186` #2793

whitneywhtsang · 2024-11-21T20:50:07Z

This PR change the Triton base from 251ec88 to e9db186 (Nov 21).
Pass rate: 93.23%->93.24%

Please do not squash and merge this PR.

This improves a warm-cache macOS build from ~25 mins to 2 mins.

Follow up to #5202 It's currently failing with the error ``` du: /Users/runner/.triton/**: No such file or directory Error: Process completed with exit code 1. ``` which happens because even though the `.triton` directory exists, it is empty. This instead uses du on `.triton` with a depth of 1.

#### Commits in this PR 1. [CI] Fix cache not saving Re-using the output of the cache restore step was recommended by the `actons/cache` docs, but it doesn't work here because we actually start from a clean cache when we run save so there is no output available to read. The annoyances of testing in the PR but main being a different environment. 2. Bump macOS timeout

We also exercise this in scale_dot, where we enable support for warps of arbitrary shape (before we just allowed `[num_warps, 1]`). With this infra in place, it should be rather easy to move from the legacy layouts to using LLs to represent all of our layouts. Something I'm concerned about is the amount of recomputation that happens when calling methods like `getSizePerThread` and the like, where we keep recomputing the result. There might be an optimisation opportunity here where we cache the result of all these functions. We choose the IR representation of an LL via its canonical form + a `repOrder` for several reasons: - It's generally more compact - It's easier to CSE, so it's easier to see when two layouts are in fact the same. - A technical reason: the `toLinearLayout` function returns a tensor with dimensions `dim0, ..., dim<rank-1>`, in other words, it "forgets" the repetition order. Without the repetition order, we cannot recover the tile size of the argument. In particular, we cannot recover `getSizePerThread`. There is an argument to be made about whether `getSizePerThread` is useful on its own, or whether it is `getElemsPerThread` the real useful abstraction here, but for now, we keep both for BC.

Currently you can manually call a workflow dispatch, but it won't actually run the tests because the variable enable_integration isn't set.

…ra-kernel perf tooling (#5119) This PR introduces the `Proton Dialect` to enable intra kernel profiling and tooling for Triton. As a third-party dialect, it serves as the building blocks to create 3rd-party perf tools (e.g., profilers, analysis, modeling) for Triton compiler developers in a compiler-centric way, such as an intra-kernel latency profiler to understand software pipelining, warp specialization, and CTA fine-grained orchestration (e.g., cuda core, tensor core, TMA). Future developments would integrate this dialect with the existing Proton backend profiling infrastructure to make it a powerful and general perf tool utility. As a first step, this PR adds some basic boilerplate code and mechanics, and the `proton.record` op for the `Proton Dialect`. --------- Co-authored-by: Yuanwei Fang <[email protected]> Co-authored-by: Keren Zhou <[email protected]>

This reverts commit de1f346.

peterbell10 and others added 9 commits November 20, 2024 23:16

[CI] Fix ccache cache restoration to improve build times (#5202)

6b6f4a2

This improves a warm-cache macOS build from ~25 mins to 2 mins.

[BACKEND][LAYOUT] Use LL for AMDMfma related layout conversions (#5210)

d5ba6ac

[BUILD] Add option to limit number of parallel link jobs (#5212)

cef2671

[CI] Run tests when CI is manually triggered (#5216)

ad28e6c

Currently you can manually call a workflow dispatch, but it won't actually run the tests because the variable enable_integration isn't set.

Merge commit 'd5ba6acb33bd5b382e946b1ddf0b3c45b73554ff'

cb53aed

whitneywhtsang requested a review from pbchekin November 21, 2024 20:50

whitneywhtsang self-assigned this Nov 21, 2024

Merge commit '66012fcb0e796511762c2de062b6a86bcddf8aac'

1b88a41

pbchekin approved these changes Nov 21, 2024

View reviewed changes

whitneywhtsang added 3 commits November 21, 2024 21:42

Merge commit 'de1f346aa6737fa2e3e6a8a64dae118fcfab9995'

741a71f

Revert "[LAYOUTS] Implement IR support for LinearLayouts (#5170)"

7b5daa4

This reverts commit de1f346.

Merge commit 'e9db1862b80633eaa4f8a61366fec16248eb2cb5'

229f7a1

whitneywhtsang marked this pull request as ready for review November 21, 2024 22:03

whitneywhtsang mentioned this pull request Nov 21, 2024

Reland upstream commit de1f346 #2794

Closed

whitneywhtsang merged commit 229f7a1 into main Nov 21, 2024
5 checks passed

whitneywhtsang deleted the whitneywhtsang/merge branch November 21, 2024 22:49

whitneywhtsang changed the title ~~Merge OpenAI Triton commit d5ba6ac~~ Merge OpenAI Triton commit e9db186 Nov 22, 2024

whitneywhtsang mentioned this pull request Nov 29, 2024

Merge OpenAI Triton till Nov 29th #2682

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge OpenAI Triton commit `e9db186` #2793

Merge OpenAI Triton commit `e9db186` #2793

whitneywhtsang commented Nov 21, 2024 •

edited

Loading

Merge OpenAI Triton commit e9db186 #2793

Merge OpenAI Triton commit e9db186 #2793

Conversation

whitneywhtsang commented Nov 21, 2024 • edited Loading

Merge OpenAI Triton commit `e9db186` #2793

Merge OpenAI Triton commit `e9db186` #2793

whitneywhtsang commented Nov 21, 2024 •

edited

Loading