AWS ParallelCluster v3.8.0
We're excited to announce the release of AWS ParallelCluster Cookbook 3.8.0
This is associated with AWS ParallelCluster v3.8.0
ENHANCEMENTS
- Add support for EC2 Capacity Blocks for ML.
- Add support for Rocky Linux 8.
- Add support for
Scheduling/SlurmSettings/Database/DatabaseName
parameter to renderStorageLoc
in the slurmdbd configuration generated by ParallelCluster. - Add the option to use EFS storage instead of NFS exports from the head node root volume
for intra-cluster shared file system resources: ParallelCluster, Intel, Slurm, and/home
data. - Allow for mounting
home
as an EFS or FSx external shared storage via theSharedStorage
section of the config file.
CHANGES
- Upgrade Slurm to 23.02.7 (from 23.02.6).
- Upgrade NVIDIA driver to version 535.129.03.
- Upgrade CUDA Toolkit to version 12.2.2.
- Use Open Source NVIDIA GPU drivers (OpenRM) as NVIDIA kernel module for Linux instead of NVIDIA closed source module.
- Do not wait for static nodes in maintenance to signal CFN that the head node initialization is complete.
- Upgrade EFA installer to
1.29.1
.- Efa-driver:
efa-2.6.0-1
- Efa-config:
efa-config-1.15-1
- Efa-profile:
efa-profile-1.5-1
- Libfabric-aws:
libfabric-aws-1.19.0-1
- Rdma-core:
rdma-core-46.0-1
- Open MPI:
openmpi40-aws-4.1.6-1
- Efa-driver:
- Upgrade GDRCopy to version 2.4 in all supported OSes, except for Centos 7 where version 2.3.1 is used.
- Upgrade
aws-cfn-bootstrap
to version 2.0-28. - Upgrade Python to 3.9.17.
BUG FIXES
- Fix inconsistent scaling configuration after cluster update rollback when modifying the list of instance types declared in the Compute Resources.
- Fix users SSH keys generation when switching users without root privilege in clusters integrated with an external LDAP server through cluster configuration files.
- Fix disabling Slurm power save mode when setting
ScaledownIdletime = -1
. - Fix hard-coded path to Slurm installation dir in
update_slurm_database_password.sh
script for Slurm Accounting.