refine collective API #8

zhangxiaoli73 · 2024-10-10T06:57:14Z

Fixes #ISSUE_NUMBER

Chao1Han · 2024-10-10T08:25:53Z

torch/csrc/distributed/c10d/ProcessGroupXCCL.cpp

@@ -559,10 +550,9 @@ c10::intrusive_ptr<Work> ProcessGroupXCCL::collective(
  for (const auto& input : inputs) {
    c10::xpu::XPUCachingAllocator::recordStream(
        input.storage().data_ptr(), stream);
+    fn(inputs[i], outputs[i], attr, *comm, stream)


for (const auto i : c10::irange(inputs.size())) {
c10::xpu::XPUCachingAllocator::recordStream(
inputs[i].storage().data_ptr(), stream);
fn(inputs[i], outputs[i], attr, *comm, stream);
}

I think we don't need attr as well. Let me change here.

See pytorch#140725 (comment) Running `torch.mps.synchronize()` after metal kernel resulted in infinite wait inside `[_MTLCommandBuffer waitUntilCompleted]` ``` (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP * frame #0: 0x00000001aa919084 Metal`pthread_cond_wait + 12 frame #1: 0x00000001aa78b1b4 Metal`-[_MTLCommandBuffer waitUntilCompleted] + 84 frame #2: 0x00000001032bf358 libtorch_python.dylib`torch::mps::MPSModule_deviceSynchronize(_object*, _object*) + 40 frame #3: 0x0000000100e94c20 Python`cfunction_vectorcall_NOARGS + 100 frame #4: 0x0000000100e389b8 Python`PyObject_Vectorcall + 92 frame #5: 0x0000000100f61e38 Python`_PyEval_EvalFrameDefault + 19040 frame #6: 0x0000000100f5d180 Python`PyEval_EvalCode + 200 frame #7: 0x0000000100fcd1a4 Python`run_eval_code_obj + 104 frame #8: 0x0000000100fccbe4 Python`run_mod + 168 frame #9: 0x0000000100fcb518 Python`pyrun_file + 164 frame #10: 0x0000000100fca854 Python`_PyRun_SimpleFileObject + 256 frame pytorch#11: 0x0000000100fca4e8 Python`_PyRun_AnyFileObject + 80 frame pytorch#12: 0x0000000100ff2028 Python`pymain_run_file_obj + 164 frame pytorch#13: 0x0000000100ff1ce4 Python`pymain_run_file + 72 frame pytorch#14: 0x0000000100ff0f74 Python`Py_RunMain + 988 frame pytorch#15: 0x0000000100ff1564 Python`pymain_main + 304 frame pytorch#16: 0x0000000100ff1604 Python`Py_BytesMain + 40 frame pytorch#17: 0x000000019f630274 dyld`start + 2840 ``` Pull Request resolved: pytorch#141296 Approved by: https://github.com/huydhn

Chao1Han reviewed Oct 10, 2024

View reviewed changes

zhangxiaoli73 force-pushed the cherry/xccl branch 3 times, most recently from 3ae903e to a3b2b0a Compare October 11, 2024 05:59

zhangxiaoli73 added 4 commits October 11, 2024 14:21

refine collective API

748e547

remove unneeded

69c22f9

refine code

a3b2b0a

debug

fd6a11d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refine collective API #8

refine collective API #8

zhangxiaoli73 commented Oct 10, 2024

Chao1Han Oct 10, 2024

zhangxiaoli73 Oct 10, 2024

refine collective API #8

Are you sure you want to change the base?

refine collective API #8

Conversation

zhangxiaoli73 commented Oct 10, 2024

Chao1Han Oct 10, 2024

Choose a reason for hiding this comment

zhangxiaoli73 Oct 10, 2024

Choose a reason for hiding this comment