Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add knobs control inner dim unroll and outer dim unroll in pointwise scheduler redo pr-3275 to check code changes #3325

Closed
wants to merge 26 commits into from

Conversation

liqiangxl
Copy link
Collaborator

redo #3275 to check code changes.

@liqiangxl liqiangxl changed the title Llu/ps unroll inner outer add knobs control inner dim unroll and outer dim unroll in pointwise scheduler redo pr-3275 to check code changes Nov 1, 2024
@liqiangxl
Copy link
Collaborator Author

!test --diff-bench --diff

@liqiangxl
Copy link
Collaborator Author

(1) code diff in nvfuser-ci/jit_codegen_diff_bench_20_10/10 and nvfuser-ci/jit_codegen_diff_bench_20_9/10 are due to we deleted the split when unroll factor is 1. Removing one loop, leads to different compute at position and different expression order. This change is not a blocker since it only happens when the input can't be vectorized. I didn't check the performance since it is rare and we have a following PR reivising heuristics.
image

@liqiangxl
Copy link
Collaborator Author

(2) code diffs in nvfuser-ci/jit_codegen_diff_20_5/7 , nvfuser-ci/jit_codegen_diff_20_6/7 , and nvfuser-ci/jit_codegen_diff_20_7/7 are RNGTest, also because of the input size can't be vectorized and split of unroll = 1 is skipped.

(3) code diffs in ci/jit_codegen_diff_20_7/7 also has a DynamicTransformIssue418_CUDA where output order is switched. can't reproduce in local run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant