Intel® oneCCL Bindings for Pytorch* v2.1.300+xpu release note

zhangxiaoli73 released this 26 Apr 12:25

· 1 commit to ccl_torch2.1.300+xpu since this release

1053f13

Features include:

Extend a prototype feature enabled by TORCH_LLM_ALLREDUCE=1 to provide better scale-up performance by enabling optimized collectives such as allreduce, allgather, reducescatter algorithms in Intel® oneCCL. This feature requires XeLink enabled for cross-cards communication.
Enable a set of coalesced primitives in CCL backend, including allreduce_into_tensor_coalesced, allgather_into_tensor_coalesced , reduce_scatter_tensor_coalesced and _broadcast_coalesced.

Assets 2

Provide feedback