Intel® oneCCL Bindings for Pytorch* v2.1.300+xpu release note
zhangxiaoli73
released this
26 Apr 12:25
·
1 commit
to ccl_torch2.1.300+xpu
since this release
Features include:
- Extend a prototype feature enabled by
TORCH_LLM_ALLREDUCE=1
to provide better scale-up performance by enabling optimized collectives such asallreduce
,allgather
,reducescatter
algorithms in Intel® oneCCL. This feature requires XeLink enabled for cross-cards communication. - Enable a set of coalesced primitives in CCL backend, including
allreduce_into_tensor_coalesced
,allgather_into_tensor_coalesced
,reduce_scatter_tensor_coalesced
and_broadcast_coalesced
.