Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge OpenAI Triton commit e9db186 #2793

Merged
merged 13 commits into from
Nov 21, 2024
Merged

Merge OpenAI Triton commit e9db186 #2793

merged 13 commits into from
Nov 21, 2024

Conversation

whitneywhtsang
Copy link
Contributor

@whitneywhtsang whitneywhtsang commented Nov 21, 2024

This PR change the Triton base from 251ec88 to e9db186 (Nov 21).
Pass rate: 93.23%->93.24%

Please do not squash and merge this PR.

peterbell10 and others added 9 commits November 20, 2024 23:16
This improves a warm-cache macOS build from ~25 mins to 2 mins.
Follow up to #5202

It's currently failing with the error
```
du: /Users/runner/.triton/**: No such file or directory
Error: Process completed with exit code 1.
```
which happens because even though the `.triton` directory exists, it is
empty. This instead uses du on `.triton` with a depth of 1.
#### Commits in this PR
1. [CI] Fix cache not saving
    
    Re-using the output of the cache restore step was recommended by the
`actons/cache` docs, but it doesn't work here because we actually start
from a clean cache when we run save so there is no output available to
    read.
    
    The annoyances of testing in the PR but main being a different
    environment.
2. Bump macOS timeout
We also exercise this in scale_dot, where we enable support for warps of
arbitrary shape (before we just allowed `[num_warps, 1]`).

With this infra in place, it should be rather easy to move from the
legacy layouts to using LLs to represent all of our layouts.

Something I'm concerned about is the amount of recomputation that
happens when calling methods like `getSizePerThread` and the like, where
we keep recomputing the result. There might be an optimisation
opportunity here where we cache the result of all these functions.

We choose the IR representation of an LL via its canonical form + a
`repOrder` for several reasons:
- It's generally more compact
- It's easier to CSE, so it's easier to see when two layouts are in fact
  the same.
- A technical reason: the `toLinearLayout` function returns a tensor
  with dimensions `dim0, ..., dim<rank-1>`, in other words, it "forgets"
  the repetition order. Without the repetition order, we cannot recover
  the tile size of the argument. In particular, we cannot recover
  `getSizePerThread`. There is an argument to be made about whether
  `getSizePerThread` is useful on its own, or whether it is
  `getElemsPerThread` the real useful abstraction here, but for now, we
  keep both for BC.
Currently you can manually call a workflow dispatch, but it won't
actually run the tests because the variable enable_integration isn't
set.
…ra-kernel perf tooling (#5119)

This PR introduces the `Proton Dialect` to enable intra kernel profiling
and tooling for Triton. As a third-party dialect, it serves as the
building blocks to create 3rd-party perf tools (e.g., profilers,
analysis, modeling) for Triton compiler developers in a compiler-centric
way, such as an intra-kernel latency profiler to understand software
pipelining, warp specialization, and CTA fine-grained orchestration
(e.g., cuda core, tensor core, TMA). Future developments would integrate
this dialect with the existing Proton backend profiling infrastructure
to make it a powerful and general perf tool utility. As a first step,
this PR adds some basic boilerplate code and mechanics, and the
`proton.record` op for the `Proton Dialect`.

---------

Co-authored-by: Yuanwei Fang <[email protected]>
Co-authored-by: Keren Zhou <[email protected]>
@whitneywhtsang whitneywhtsang self-assigned this Nov 21, 2024
@whitneywhtsang whitneywhtsang marked this pull request as ready for review November 21, 2024 22:03
@whitneywhtsang whitneywhtsang merged commit 229f7a1 into main Nov 21, 2024
5 checks passed
@whitneywhtsang whitneywhtsang deleted the whitneywhtsang/merge branch November 21, 2024 22:49
@whitneywhtsang whitneywhtsang changed the title Merge OpenAI Triton commit d5ba6ac Merge OpenAI Triton commit e9db186 Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants