Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Very naive and stupid CGA support (#3557)
This PR adds a super naive CGA support. It is by no means how we should design CGA, and not even an incremental step. But this PR is simple enough and it does provide us with an additional parameter to tune about. Perf on H100: ``` Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name -------- --------------- --------- -------- -------- -------- -------- ----------- ---------------------------------------------------------------------------------------------------- 33.4 134047 1 134047.0 134047.0 134047 134047 0.0 <unnamed>::nvfuser_none_f0_c0_r0_g0(<unnamed>::Tensor<<unnamed>::__half, (int)3, (int)3>, <unnamed>… 22.9 92031 1 92031.0 92031.0 92031 92031 0.0 nvjet_hsh_128x256_64x4_2x1_v_bz_coopA_NTN ``` nvFuser/cuBLAS: 68.7%
- Loading branch information