Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow silently ignore missing root-to-logical ops in BestEffortReplay #2901

Merged
merged 1 commit into from
Sep 4, 2024

Conversation

naoyam
Copy link
Collaborator

@naoyam naoyam commented Sep 4, 2024

Extracted from and required for #2875

@naoyam
Copy link
Collaborator Author

naoyam commented Sep 4, 2024

!build

@naoyam naoyam requested a review from zasdfgbnm September 4, 2024 19:08
@naoyam naoyam merged commit 7be78f8 into main Sep 4, 2024
18 of 19 checks passed
@naoyam naoyam deleted the relaxed_best_effort_replay branch September 4, 2024 20:26
naoyam added a commit that referenced this pull request Sep 5, 2024
…rallelization (#2875)

See #2850
Stacked on #2901 

The old code is still used by default. With `NVFUSER_ENABLE=id_model`,
the new analysis is used. It's also used for tensors with
non-conventional domains.

This is required for #2851. It also enables previously disabled
parallelization of the mismatching reshape test from #2684.

I validated the change by comparing the results between the existing and
new analyses with all the tests and benchmarks. The only mismatch was
with the mismatching reshape test, for which the existing analysis
declared a sync is required, whereas the new one correctly recognizes
there's no cross-thread dependency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants