Releases: aws/aws-parallelcluster-cookbook
AWS ParallelCluster v3.7.1
We're excited to announce the release of AWS ParallelCluster Cookbook 3.7.1
This is associated with AWS ParallelCluster v3.7.1
CHANGES
- Upgrade Slurm to 23.02.5 (from 23.02.4).
- Upgrade Pmix to 4.2.6 (from 3.2.3).
- Upgrade libjwt to 1.15.3 (from 1.12.0).
- Upgrade EFA installer to
1.26.1
, fixing RDMA writedata issue in P5.- Efa-driver:
efa-2.5.0-1
- Efa-config:
efa-config-1.15-1
- Efa-profile:
efa-profile-1.5-1
- Libfabric-aws:
libfabric-aws-1.18.2-1
- Rdma-core:
rdma-core-46.0-1
- Open MPI:
openmpi40-aws-4.1.5-4
- Efa-driver:
AWS ParallelCluster v3.7.0
We're excited to announce the release of AWS ParallelCluster Cookbook 3.7.0
This is associated with AWS ParallelCluster v3.7.0
ENHANCEMENTS
- Add support for Ubuntu 22. RSA keys are not supported by default. See this page.
- Add support for login nodes.
- Add support to mount existing Amazon File Cache as shared storage.
- Allow configuration of static and dynamic node priorities in Slurm compute resources via the ParallelCluster configuration YAML file.
- Add a queue-level parameter (
JobExclusiveAllocation
) to ensure nodes in the partition are exclusively allocated to a single job at any given time. - Allow overriding the aws-parallelcluster-node package at cluster creation and update time (only on the head node during update). Useful for development purposes only.
- Allow memory-based scheduling when multiple instance types are specified for a Slurm Compute Resource.
- Avoid starting the NFS server on compute nodes.
- Forward SLURM_RESUME_FILE to ParallelCluster resume program.
CHANGES
- Deprecate Ubuntu 18.
- Upgrade Slurm to version 23.02.4.
- Update the default root volume size to 40 GB to account for limits on Centos 7.
- Upgrade NVIDIA driver to version 535.54.03.
- Upgrade CUDA library to version 12.2.0.
- Upgrade NVIDIA Fabric manager to nvidia-fabricmanager-535.
- Upgrade NICE DCV to version 2023.0-15487.
- server: 2023.0.15487-1
- xdcv: 2023.0.551-1
- gl: 2023.0.1039-1
- web_viewer: 2023.0.15487-1
- Upgrade EFA installer to 1.25.1.
- Efa-driver: efa-2.5.0-1
- Efa-config: efa-config-1.15-1
- Efa-profile: efa-profile-1.5-1
- Libfabric-aws: libfabric-aws-1.18.1-1
- Rdma-core: rdma-core-46.0-1
- Open MPI: openmpi40-aws-4.1.5-4
- Upgrade ARM PL to version 23.04.1 for Ubuntu 22.04 only.
- Upgrade third-party cookbook dependencies:
- apt-7.5.14 (from apt-7.4.0)
- line-4.5.13 (from line-4.5.2)
- openssh-2.11.3 (from openssh-2.10.3)
- pyenv-4.2.3 (from pyenv-3.5.1)
- selinux-6.1.12 (from selinux-6.0.5)
- yum-7.4.13 (from yum-7.4.0)
- yum-epel-5.0.2 (from yum-epel-4.5.0)
- Assign Slurm dynamic nodes a priority (weight) of 1000 by default. This allows Slurm to prioritize idle static nodes over idle dynamic ones.
- Change the default value of
Imds/ImdsSupport
from v1.0 to v2.0. - Make
aws-parallelcluster-node
daemons handle only ParallelCluster-managed Slurm partitions. - Restrict permission on file
/tmp/wait_condition_handle.txt
within the head node so that only root can read it. - Create a Slurm
partition-nodelist
mapping JSON file to be used by the node package daemons to recognize PC-managed Slurm partitions and nodelists. - Increase EFS-utils watchdog poll interval to 10 seconds. Note: This change is meaningful only if EncryptionInTransit is set to true, because watchdog does not run otherwise.
BUG FIXES
- Add validation to
ScaledownIdletime
value, to prevent setting a value lower than-1
. - Fix issue causing dangling IAM policies to be created when creating ParallelCluster CloudFormation custom resource provider with CustomLambdaRole.
- Fix an issue that was causing misalignment of compute nodes DNS name on instances with multiple network interfaces,
when usingSlurmSettings/Dns/UseEc2Hostnames
equals toTrue
. - Fix cluster creation failure with Ubuntu Deep Learning AMI on GPU instances and DCV enabled.
AWS ParallelCluster v3.6.1
We're excited to announce the release of AWS ParallelCluster Cookbook 3.6.1
This is associated with AWS ParallelCluster v3.6.1
CHANGES
- Remove security updates step executed on cluster nodes bootstrap in US isolated regions
in order to reduce bootstrap time and avoid a potential point of failure. - Replace
nvidia-persistenced
service withparallelcluster_nvidia
service to avoid conflicts with DLAMI.
BUG FIXES
- Fix an issue that was preventing ptrace protection from being disabled on Ubuntu and allowing Cross Memory Attach (CMA) in libfabric.
AWS ParallelCluster v3.6.0
We're excited to announce the release of AWS ParallelCluster Cookbook 3.6.0
This is associated with AWS ParallelCluster v3.6.0
ENHANCEMENTS
- Add support for RHEL8.
- Add support for customizing the cluster Slurm configuration via the ParallelCluster configuration YAML file.
- Build Slurm with support for LUA.
- Add health check manager and GPU health check, which can be activated through cluster configuration.
Health check manager execution is triggered by a Slurm prolog script. GPU check verifies healthiness of a node by executing NVIDIA DCGM L2 diagnostic. - Add log rotation support for ParallelCluster managed logs.
- Track head node memory and root volume disk utilization using the
mem_used_percent
anddisk_used_percent
metrics collected through the CloudWatch Agent. - Enforce the DCV Authenticator Server to use at least
TLS-1.2
protocol when creating the SSL Socket. - Load kernel module nvidia-uvm by default to provide Unified Virtual Memory (UVM) functionality to the CUDA driver.
- Install NVIDIA Persistence Daemon as a system service.
- Install NVIDIA Data Center GPU Manager (DCGM) package on all supported OSes except for aarch64
centos7
andalinux2
.
CHANGES
- Upgrade Slurm to version 23.02.2.
- Upgrade munge to version 0.5.15.
- Set Slurm default
TreeWidth
to 30. - Set Slurm prolog and epilog configurations to target a directory,
/opt/slurm/etc/scripts/prolog.d/
and/opt/slurm/etc/scripts/epilog.d/
respectively. - Set Slurm
BatchStartTimeout
to 3 minutes so to allow max 3 minutes Prolog execution during compute node registration. - Upgrade EFA installer to
1.22.1
- Dkms :
2.8.3-2
- Efa-driver:
efa-2.1.1g
- Efa-config:
efa-config-1.13-1
- Efa-profile:
efa-profile-1.5-1
- Libfabric-aws:
libfabric-aws-1.17.1-1
- Rdma-core:
rdma-core-43.0-1
- Open MPI:
openmpi40-aws-4.1.5-1
- Dkms :
- Upgrade Lustre client version to 2.12 on Amazon Linux 2 (same version available on Ubuntu 20.04, 18.04 and CentOS >= 7.7).
- Upgrade Lustre client version to 2.10.8 on CentOS 7.6.
- Upgrade
aws-cfn-bootstrap
to version 2.0-24. - Upgrade NVIDIA driver to version 470.182.03.
- Upgrade NVIDIA Fabric Manager to version 470.182.03.
- Upgrade NVIDIA CUDA Toolkit to version 11.8.0.
- Upgrade NVIDIA CUDA sample to version 11.8.0.
- Upgrade Intel MPI Library to 2021.9.0.43482.
- Upgrade NICE DCV to version
2023.0-15022
.- server:
2023.0.15022-1
- xdcv:
2023.0.547-1
- gl:
2023.0.1027-1
- web_viewer:
2023.0.15022-1
- server:
BUG FIXES
- Fix an issue that was causing misalignment of compute nodes IP on instances with multiple network interfaces.
- Fix replacement of
StoragePass
inslurm_parallelcluster_slurmdbd.conf
when a queue parameter update is performed and the Slurm accounting configurations are not updated. - Fix issue causing
cfn-hup
daemon to fail when it gets restarted. - Fix issue causing NVIDIA GPU compute nodes not to resume correctly after executing an
scontrol reboot
command.
AWS ParallelCluster v3.5.1
We're excited to announce the release of AWS ParallelCluster Cookbook 3.5.1
This is associated with AWS ParallelCluster v3.5.1
ENHANCEMENTS
- Add support for US isolated region us-isob-east-1.
CHANGES
- Upgrade EFA installer to
1.22.0
- Efa-driver:
efa-2.1.1g
- Efa-config:
efa-config-1.13-1
- Efa-profile:
efa-profile-1.5-1
- Libfabric-aws:
libfabric-aws-1.17.0-1
- Rdma-core:
rdma-core-43.0-1
- Open MPI:
openmpi40-aws-4.1.5-1
- Efa-driver:
- Upgrade NICE DCV to version
2022.2-14521
.- server:
2022.2.14521-1
- xdcv:
2022.2.519-1
- gl:
2022.2.1012-1
- web_viewer:
2022.2.14521-1
- server:
BUG FIXES
- Fix update cluster to remove shared EBS volumes can potentially cause node launching failures if
MountDir
match the same pattern in/etc/exports
.
AWS ParallelCluster v3.5.0
We're excited to announce the release of AWS ParallelCluster Cookbook 3.5.0
This is associated with AWS ParallelCluster v3.5.0
ENHANCEMENTS
- Fail cluster creation if cluster status changes to PROTECTED while provisioning static nodes.
CHANGES
- Upgrade Slurm to version
22.05.8
. - Upgrade EFA installer to 1.21.0`
- Efa-driver:
efa-2.1.1-1
- Efa-config:
efa-config-1.12-1
- Efa-profile:
efa-profile-1.5-1
- Libfabric-aws:
libfabric-aws-1.16.1amzn3.0-1
- Rdma-core:
rdma-core-43.0-1
- Open MPI:
openmpi40-aws-4.1.4-3
- Efa-driver:
- Make Slurm controller logs more verbose and enable additional logging for the Slurm power save plugin.
BUG FIXES
- Fix an issue where custom AMI creation failed in Ubuntu 20.04 on MySQL packages installation.
AWS ParallelCluster v3.4.1
We're excited to announce the release of AWS ParallelCluster Cookbook 3.4.1
This is associated with AWS ParallelCluster v3.4.1
BUG FIXES
- Fix an issue with the Slurm scheduler that might incorrectly apply updates to its internal registry of compute nodes. This might result in EC2 instances to become inaccessible or backed by an incorrect instance type.
AWS ParallelCluster v3.4.0
We're excited to announce the release of AWS ParallelCluster Cookbook 3.4.0
This is associated with AWS ParallelCluster v3.4.0
ENHANCEMENTS
- Add support for specifying multiple subnets for each queue to increase the EC2 capacity pool available for use.
CHANGES
- Upgrade EFA installer to
1.20.0
- Efa-driver:
efa-2.1
- Efa-config:
efa-config-1.11-1
- Efa-profile:
efa-profile-1.5-1
- Libfabric-aws:
libfabric-aws-1.16.1
- Rdma-core:
rdma-core-43.0-2
- Open MPI:
openmpi40-aws-4.1.4-3
- Efa-driver:
- Mount EFS file systems using
amazon-efs-utils
. EFS files systems can be mounted using in-transit encryption and IAM authorized user. - Install
stunnel
5.67 on CentOS7 and Ubuntu to support EFS in-transit encryption. - Add possibility to execute a custom script in the head node during the update of the cluster.
- Upgrade Slurm to version 22.05.6.
- Upgrade Python to 3.9.16 and 3.7.16.
AWS ParallelCluster v2.11.9
We're excited to announce the release of AWS ParallelCluster Cookbook 2.11.9
This is associated with AWS ParallelCluster v2.11.9
CHANGES
- There were no notable changes for this version.
AWS ParallelCluster v3.3.1
We're excited to announce the release of AWS ParallelCluster Cookbook 3.3.1
This is associated with AWS ParallelCluster v3.3.1
CHANGES
- There were no changes for this version.