Add reduction_unroll_factor to autotuning script #3487

rdspring1 · 2024-11-27T01:48:52Z

This PR renames unroll_factor to iteration_unroll_factor and adds reduction_unroll_factor. reduction_unroll_factor adds unroll factor on top of vectorization factor for the inner reduction domain.

Support Gelu-Bias, Silu-Mul, Bcast-Add, Mul Fusions

liqiangxl · 2024-12-02T14:14:21Z

doc/dev/python_scheduling/autotune_inner_reduction.py

            )

            # number of reduction elements not handled by a CTA
            remaining_reduction = ceil_div(
                num_reductions,
-                (scheduler_config.bdimx * scheduler_config.vectorize_factor),
+                (scheduler_config.bdimx * vectorize_factor * reduction_unroll_factor),


Should be ceil_div(ceil_div(num_reductions/vectorize_factor, bdimx), reduction_unroll_factor)

liqiangxl · 2024-12-02T14:15:24Z

doc/dev/python_scheduling/autotune_inner_reduction.py

            )

-            if unroll_factor == 1 and remaining_reduction > 1:
+            if iteration_unroll_factor == 1 and remaining_reduction > 1:


This looks strange to me. Why grdim = remaining_reduction? We can do serial reduction instread of grid reduction.

nvFuser's default heuristic does:

// When iteration dim is small, may have unused SMs, to increase SM usage // needs to shift from block reduction to grid reduction. int64_t grdim = 1; while (godim * grdim * 2 <= sm_count && getInnerRemainder() / grdim >= 2) { grdim *= 2; }

From inner2dReductionHeuristic, I see this:

// Cross grid reduction if we haven't hit our target blocks, and we have manyr // reduction elements. if ((godim < target_blocks && remainder_in_reduction >= 0) || (remainder_in_reduction >= kEight)) { grdim = remainder_in_reduction; } // Try to do some cleanup of ragged waves on device { do_something } // Grid reductions do not support unrolling iteration dimension, revert if // set. Recalculate godim. { do_something }

another approch is we can add another search para is_block_reduction, if it is true, we only use block reduction, if it is false, we do grid reduction.

liqiangxl

LGTM.

rdspring1 · 2024-12-13T18:11:52Z

!build

rdspring1 added the Autotune Generate heuristics through machine learning models. label Nov 27, 2024

rdspring1 requested a review from liqiangxl November 27, 2024 01:48

rdspring1 mentioned this pull request Nov 27, 2024

Support 2D inner reduction scheduler with autotuning #3456

Merged

rdspring1 added 5 commits December 1, 2024 09:14

Create Autotuning utilities

ab34c7a

Support 2D break_point configurations

45b74f6

Support Gelu-Bias, Silu-Mul, Bcast-Add, Mul Fusions

Create 2d inner reduction autotuning script

6fb6832

refactor

a265617

comments

e7ffb29

rdspring1 force-pushed the autotune_inner_reduction_2d branch from 7817368 to e7ffb29 Compare December 1, 2024 17:30

add reduction_unroll_factor

fdcf6a5

rdspring1 force-pushed the autotune_inner_reduction_2d_update branch from 15bc05e to fdcf6a5 Compare December 1, 2024 17:31

liqiangxl reviewed Dec 2, 2024

View reviewed changes

Base automatically changed from autotune_inner_reduction_2d to main December 11, 2024 19:43

rdspring1 added 2 commits December 11, 2024 11:45

Merge branch 'main' into autotune_inner_reduction_2d_update

d58c362

update remaining_reduction calculation

fc8e57d

liqiangxl approved these changes Dec 13, 2024

View reviewed changes

update grdim

5ae857d

rdspring1 merged commit dc96e06 into main Dec 13, 2024
17 checks passed

rdspring1 deleted the autotune_inner_reduction_2d_update branch December 13, 2024 20:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add reduction_unroll_factor to autotuning script #3487

Add reduction_unroll_factor to autotuning script #3487

rdspring1 commented Nov 27, 2024

liqiangxl Dec 2, 2024

liqiangxl Dec 2, 2024

liqiangxl Dec 2, 2024

rdspring1 Dec 13, 2024 •

edited

Loading

liqiangxl Dec 13, 2024

liqiangxl left a comment

rdspring1 commented Dec 13, 2024

Add reduction_unroll_factor to autotuning script #3487

Add reduction_unroll_factor to autotuning script #3487

Conversation

rdspring1 commented Nov 27, 2024

liqiangxl Dec 2, 2024

Choose a reason for hiding this comment

liqiangxl Dec 2, 2024

Choose a reason for hiding this comment

liqiangxl Dec 2, 2024

Choose a reason for hiding this comment

rdspring1 Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

liqiangxl Dec 13, 2024

Choose a reason for hiding this comment

liqiangxl left a comment

Choose a reason for hiding this comment

rdspring1 commented Dec 13, 2024

rdspring1 Dec 13, 2024 •

edited

Loading