Release v0.15.3 · microsoft/DeepSpeed

What's Changed

Update version.txt after 0.15.2 release by @loadams in #6615
Clean up prefetched parameters by @tohtana in #6557
AIO CPU Locked Tensor by @jomayeri in #6592
reduce setting global variables to reduce torch compile graph breaks by @NirSonnenschein in #6541
Add API to get devices of offload states by @tohtana in #6586
Ignore reuse_dist_env by @tohtana in #6623
Add API for updating ZeRO gradients by @tjruwase in #6590
[compile] Show breakdown of graph break by @delock in #6601
Accept btl_tcp_if_include option through launcher_args by @diskkid in #6613
Add first Step in LR Schedulers by @jomayeri in #6597
Support safetensors export by @xu-song in #6579
add option to disable logger while compiling to avoid graph breaks by @ShellyNR in #6496
Lock cache file of HF model list by @tohtana in #6628
Add README Pipeline Status for Huawei Ascend NPU by @xuedinge233 in #6588
Update torch version in workflows by @tohtana in #6631
Use file store for tests by @tohtana in #6632
Fix Memory Leak In AIO by @jomayeri in #6630
[XPU] upgrade xpu max1100 CI workflow to pytorch2.3 by @Liangliang-Ma in #6646
[XPU] host timer check version from Torch 2.5 to Torch 2.6 by @YizhouZ in #6633
[XPU] [DeepNVMe] use same cpu_op_desc_t with cuda by @Liangliang-Ma in #6645

Full Changelog: v0.15.2...v0.15.3