Skip to content

Releases: aws/sagemaker-training-toolkit

v4.2.6

18 Aug 15:17
Compare
Choose a tag to compare

Bug Fixes and Other Changes

  • Enable PT XLA distributed training on homogeneous clusters

v4.2.5

17 Aug 16:28
Compare
Choose a tag to compare

Bug Fixes and Other Changes

  • relax exception type

v4.2.4

15 Aug 22:48
Compare
Choose a tag to compare
prepare release v4.2.4

v4.2.3

11 Aug 23:49
Compare
Choose a tag to compare

Bug Fixes and Other Changes

  • update num_processes_per_host for smdataparallel runner

v4.2.2

10 Aug 20:30
Compare
Choose a tag to compare

Bug Fixes and Other Changes

  • Removed version hardcoding for sagemaker test dependency
  • update distribution_instance_group for pytorch ddp
  • specify flake8 config explicitly

v4.2.1

29 Jul 04:54
Compare
Choose a tag to compare

Bug Fixes and Other Changes

  • handle utf-8 decoding exceptions while processing stdout and stderr streams

v4.2.0

08 Jul 07:06
Compare
Choose a tag to compare

Features

  • Heterogeneous cluster changes

v4.1.6

28 Jun 00:16
Compare
Choose a tag to compare

Bug Fixes and Other Changes

  • update: protobuf version to overlap with TF requirements

v4.1.5

22 Jun 16:29
Compare
Choose a tag to compare

Bug Fixes and Other Changes

  • Fix none exception class issue for mpi

v4.1.4

10 Jun 21:09
Compare
Choose a tag to compare

Bug Fixes and Other Changes

  • Use framework provided error class and stack trace as error message