Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix distributed tests with pt main #1452

Merged
merged 1 commit into from
Nov 19, 2024
Merged

Conversation

t-vi
Copy link
Collaborator

@t-vi t-vi commented Nov 19, 2024

PyTorch modified the test class to destroy the process group by default, but we want to do this ourselves.

@t-vi t-vi enabled auto-merge (squash) November 19, 2024 09:45
@t-vi t-vi merged commit 8d1637f into main Nov 19, 2024
41 checks passed
@t-vi t-vi deleted the tom/fix-distributed-tests-pt26 branch November 19, 2024 10:14
Comment on lines +133 to +136
if "destroy_process_group" in inspect.signature(self.run_test).parameters:
run_test_kwargs = {"destroy_process_group": False}
else:
run_test_kwargs = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how does this work?

I see failures of the TP tests with this PR. Another way to resolve the problem would be to redefine destroy_pg_upon_exit to return False instead of True.
https://github.com/pytorch/pytorch/blob/c418a9ac7501184d4a8cb4443f49e727d43f6be0/torch/testing/_internal/common_distributed.py#L580-L587

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, well, last week the way things worked were different.

IvanYashchuk added a commit that referenced this pull request Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants