-
Notifications
You must be signed in to change notification settings - Fork 277
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Attempt to extend nccl collective timeout (#858)
Summary: Pull Request resolved: #858 We have two remaining tests that are still failing, with the following error message: ``` [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=BROADCAST, NumelIn=2, NumelOut=2, Timeout(ms)=60000) ran for 60033 milliseconds before timing out. ``` Let's attempt to increase the collective timeout for those tests. There's no guarantee this will work, but it's worth trying. Otherwise we may consider deleting the failing tests to avoid flakyness. Reviewed By: galrotem Differential Revision: D59342738 fbshipit-source-id: 220f1f359eb0f98e5175e93badc7e998ae00db64
- Loading branch information
1 parent
5dad8d3
commit 58b6ea7
Showing
3 changed files
with
17 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters