Releases: aws/aws-parallelcluster
AWS ParallelCluster v2.11.8
We're excited to announce the release of AWS ParallelCluster 2.11.8
Upgrade
How to upgrade?
sudo pip install aws-parallelcluster==2.11.8
CHANGES
- Upgrade Intel MPI Library to 2021.6.0.602.
- Upgrade EFA installer to
1.19.0
- Efa-driver:
efa-1.16.0-1
- Efa-config:
efa-config-1.11-1
- Efa-profile:
efa-profile-1.5-1
- Libfabric-aws:
libfabric-aws-1.16.0-1
- Rdma-core:
rdma-core-41.0-2
- Open MPI:
openmpi40-aws-4.1.4-3
- Efa-driver:
- Upgrade Python runtime used by Lambda functions in AWS Batch integration to python3.9.
BUG FIXES
- Prevent cluster tags to be changed during an update because not supported.
AWS ParallelCluster v3.1.5
We're excited to announce the release of AWS ParallelCluster 3.1.5
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
CHANGES
- Upgrade EFA installer to
1.18.0
- Efa-driver:
efa-1.16.0-1
- Efa-config:
efa-config-1.11-1
- Efa-profile:
efa-profile-1.5-1
- Libfabric-aws:
libfabric-aws-1.16.0~amzn4.0-1
- Rdma-core:
rdma-core-41.0-2
- Open MPI:
openmpi40-aws-4.1.4-2
- Efa-driver:
- Add
lambda:ListTags
andlambda:UntagResource
toParallelClusterUserRole
used by ParallelCluster API stack for cluster update. - Upgrade Intel MPI Library to 2021.6.0.602.
- Upgrade NVIDIA driver to version 470.141.03.
- Upgrade NVIDIA Fabric Manager to version 470.141.03.
BUG FIXES
- Fix Slurm issue that prevents idle nodes termination.
AWS ParallelCluster v3.3.0
We're excited to announce the release of AWS ParallelCluster 3.3.0
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Add possibility to specify multiple EC2 instance types for the same compute resource.
- Add support for adding and removing shared storages at cluster update by updating
SharedStorage
configuration. - Add new configuration parameter
DeletionPolicy
for EFS and FSx for Lustre shared storage to support storage retention. - Add new configuration section
Scheduling/SlurmSettings/Database
to enable accounting functionality in Slurm. - Add support for On-Demand Capacity Reservations and Capacity Reservations Resource Groups.
- Add new configuration parameter in
Imds/ImdsSettings
to specify the IMDS version to support in a cluster or build image infrastructure. - Add support for
Networking/PlacementGroup
in theSlurmQueues/ComputeResources
section. - Add support for instances with multiple network interfaces that allows only one ENI per device.
- Improve validation of networking for external EFS file systems by checking the CIDR block in the attached security group.
- Add validator to check if configured instance types support placement groups.
- Configure NFS threads to be
min(256, max(8, num_cores * 4))
to ensure better stability and performance. - Move NFS installation at build time to reduce configuration time.
- Enable server-side encryption for the EcrImageBuilder SNS topic created when deploying ParallelCluster API and used to notify on docker image build events.
CHANGES
- Change behaviour of
SlurmQueues/Networking/PlacementGroup/Enabled
: now it creates a different managed placement
group for each compute resource instead of a single managed placement group for all compute resources. - Add support for
PlacementGroup/Name
as the preferred naming method. - Move head node tags from Launch Template to instance definition to avoid head node replacement on tags updates.
- Disable Multithreading through script executed by cloud-init and not through CpuOptions set into Launch Template.
- Upgrade Python to version 3.9 and NodeJS to version 16 in API infrastructure, API Docker container and cluster Lambda resources.
- Remove support for Python 3.6 in aws-parallelcluster-batch-cli.
- Upgrade Slurm to version 22.05.5.
- Upgrade NVIDIA driver to version 470.141.03.
- Upgrade NVIDIA Fabric Manager to version 470.141.03.
- Upgrade NVIDIA CUDA Toolkit to version 11.7.1.
- Upgrade Python used in ParallelCluster virtualenvs from 3.7.13 to 3.9.15.
- Upgrade Slurm to version 22.05.5.
- Upgrade EFA installer to version 1.18.0.
- Upgrade NICE DCV to version 2022.1-13300.
- Allow for suppressing the
SingleSubnetValidator
forQueues
.
BUG FIXES
- Fix validation of
filters
parameter inListClusterLogStreams
command to fail when incorrect filters are passed. - Fix validation of parameter
SharedStorage/EfsSettings
: now validation fails whenFileSystemId
is specified
along with otherSharedStorage/EfsSettings
parameters, whereas it was previously ignoring them. - Fix cluster update when changing the order of SharedStorage together with other changes in the configuration.
- Fix
UpdateParallelClusterLambdaRole
in the ParallelCluster API to upload logs to CloudWatch. - Fix Cinc not using the local CA certificates bundle when installing packages before any cookbooks are executed.
- Fix a hang in upgrading ubuntu via
pcluster build-image
whenBuild:UpdateOsPackages:Enabled:true
is set. - Fix parsing of YAML cluster configuration by failing on duplicate keys.
AWS ParallelCluster v3.2.1
We're excited to announce the release of AWS ParallelCluster 3.2.1
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Improve the logic to associate the host routing tables to the different network cards to better support EC2 instances with several NICs.
CHANGES
- Upgrade NVIDIA driver to version 470.141.03.
- Upgrade NVIDIA Fabric Manager to version 470.141.03.
- Disable cron job tasks man-db and mlocate, which may have a negative impact on node performance.
- Upgrade Intel MPI Library to 2021.6.0.602.
- Upgrade Python from 3.7.10 to 3.7.13 in response to this security risk.
BUG FIXES
- Avoid failing on DescribeCluster when cluster configuration is not available.
AWS ParallelCluster v3.2.0
We're excited to announce the release of AWS ParallelCluster 3.2.0
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Add support for memory-based job scheduling in Slurm
- Configure compute nodes real memory in the Slurm cluster configuration.
- Add new configuration parameter
Scheduling/SlurmSettings/EnableMemoryBasedScheduling
to enable memory-based scheduling in Slurm. - Add new configuration parameter
Scheduling/SlurmQueues/ComputeResources/SchedulableMemory
to override default value of the memory seen by the scheduler on compute nodes.
- Improve flexibility on cluster configuration updates to avoid the stop and start of the entire cluster whenever possible.
- Add new configuration parameter
Scheduling/SlurmSettings/QueueUpdateStrategy
to set the preferred strategy to adopt for compute nodes needing a configuration update and replacement.
- Add new configuration parameter
- Improve failover mechanism over available compute resources when hitting insufficient capacity issues with EC2 instances. Disable compute nodes by a configurable amount of time (default 10 min) when a node launch fails due to insufficient capacity.
- Add support to mount existing FSx for ONTAP and FSx for OpenZFS file systems.
- Add support to mount multiple instances of existing EFS, FSx for Lustre / for ONTAP/ for OpenZFS file systems.
- Add support for FSx for Lustre Persistent_2 deployment type when creating a new file system.
- Prompt user to enable EFA for supported instance types when using
pcluster configure
wizard. - Add support for rebooting compute nodes via Slurm.
- Improved handling of Slurm power states to also account for manual powering down of nodes.
- Add NVIDIA GDRCopy 2.3 into the product AMIs to enable low-latency GPU memory copy.
CHANGES
- Upgrade EFA installer to version 1.17.2
- EFA driver:
efa-1.16.0-1
- EFA configuration:
efa-config-1.10-1
- EFA profile:
efa-profile-1.5-1
- Libfabric:
libfabric-aws-1.16.0~amzn2.0-1
- RDMA core:
rdma-core-41.0-2
- Open MPI:
openmpi40-aws-4.1.4-2
- EFA driver:
- Upgrade NICE DCV to version 2022.0-12760.
- Upgrade NVIDIA driver to version 470.129.06.
- Upgrade NVIDIA Fabric Manager to version 470.129.06.
- Change default EBS volume types from gp2 to gp3 for both the root and additional volumes.
- Changes to FSx for Lustre file systems created by ParallelCluster:
- Change the default deployment type to
Scratch_2
. - Change the Lustre server version to
2.12
.
- Change the default deployment type to
- Do not require
PlacementGroup/Enabled
to be set totrue
when passing an existingPlacementGroup/Id
. - Add
parallelcluster:cluster-name
tag to all the resources created by ParallelCluster. - Do not allow setting
PlacementGroup/Id
whenPlacementGroup/Enabled
is explicitly set tofalse
. - Add
lambda:ListTags
andlambda:UntagResource
toParallelClusterUserRole
used by ParallelCluster API stack for cluster update. - Restrict IPv6 access to IMDS to root and cluster admin users only, when configuration parameter
HeadNode/Imds/Secured
is true as by default. - With a custom AMI, use the AMI root volume size instead of the ParallelCluster default of 35 GiB. The value can be changed in cluster configuration file.
- Automatic disabling of the compute fleet when the configuration parameter
Scheduling/SlurmQueues/ComputeResources/SpotPrice
is lower than the minimum required Spot request fulfillment price. - Show
requested_value
andcurrent_value
values in the change set when adding or removing a section during an update. - Disable
aws-ubuntu-eni-helper
service in DLAMI to avoid conflicts withconfigure_nw_interface.sh
when configuring instances with multiple network cards. - Remove support for Python 3.6.
- Set MTU to 9001 for all the network interfaces when configuring instances with multiple network cards.
- Remove the trailing dot when configuring the compute node FQDN.
BUG FIXES
- Fix the default behavior to skip the ParallelCluster validation and test steps when building a custom AMI.
- Fix file handle leak in
computemgtd
. - Fix race condition that was sporadically causing launched instances to be immediately terminated because not available yet in EC2 DescribeInstances response
- Fix support for
DisableSimultaneousMultithreading
parameter on instance types with Arm processors. - Fix ParallelCluster API stack update failure when upgrading from a previus version. Add resource pattern used for the
ListImagePipelineImages
action in theEcrImageDeletionLambdaRole
. - Fix ParallelCluster API adding missing permissions needed to import/export from S3 when creating an FSx for Lustre storage.
AWS ParallelCluster v2.11.7
We're excited to announce the release of AWS ParallelCluster 2.11.7
Upgrade
How to upgrade?
sudo pip install aws-parallelcluster==2.11.7
CHANGES
- Upgrade Slurm to version 20.11.9.
AWS ParallelCluster v3.1.4
We're excited to announce the release of AWS ParallelCluster 3.1.4
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Add validation for
DirectoryService/PasswordSecretArn
to fail in case the secret does not exist.
CHANGES
- Upgrade Slurm to version 21.08.8-2.
- Build Slurm with JWT support.
- Do not require
PlacementGroup/Enabled
to be set totrue
when passing an existingPlacementGroup/Id
. - Add
lambda:TagsResource
toParallelClusterUserRole
used by ParallelCluster API stack for cluster creation and image creation.
BUG FIXES
- Fix the ability to export cluster's logs when using
export-cluster-logs
command with the--filters
option. - Fix AWS Batch Docker entrypoint to use
/home
shared directory to coordinate Multi-node-Parallel job execution.
AWS ParallelCluster v2.11.6
We're excited to announce the release of AWS ParallelCluster 2.11.6
Upgrade
How to upgrade?
sudo pip install aws-parallelcluster==2.11.6
ENHANCEMENTS
- Improve exception management in case of missing networking.
CHANGES
- OS package updates and security fixes.
AWS ParallelCluster v3.1.3
We're excited to announce the release of AWS ParallelCluster 3.1.3
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Execute SSH key creation alongside with the creation of HOME directory, i.e.
during SSH login, when switching to another user and when executing a command as another user. - Add support for both FQDN and LDAP Distinguished Names in the configuration parameter
DirectoryService/DomainName
. The new validator now checks both the syntaxes. - New
update_directory_service_password.sh
script deployed on the head node supports the manual update of the Active Directory password in the SSSD configuration.
The password is retrieved by the AWS Secrets Manager as from the cluster configuration. - Add support to deploy API infrastructure in environments without a default VPC.
- Add validation for
DirectoryService/AdditionalSssdConfigs
to fail in case of invalid overrides.
CHANGES
- Disable deeper C-States in x86_64 official AMIs and AMIs created through
build-image
command, to guarantee high performance and low latency. - OS package updates and security fixes.
- Change Amazon Linux 2 base images to use AMIs with Kernel 5.10.
BUG FIXES
- Fix build-image stack in
DELETE_FAILED
after image built successful, due to new EC2ImageBuilder policies. - Fix the configuration parameter
DirectoryService/DomainAddr
conversion toldap_uri
SSSD property when it contains multiples domain addresses.
AWS ParallelCluster v2.11.5
We're excited to announce the release of AWS ParallelCluster 2.11.5
Upgrade
How to upgrade?
sudo pip install aws-parallelcluster==2.11.5
ENHANCEMENTS
- Add support for
NEW_CHANGED_DELETED
as value of FSx for LustreAutoImportPolicy
option.
CHANGES
- Drop support for SGE and Torque schedulers.
- Disable log4j-cve-2021-44228-hotpatch service on Amazon Linux to avoid incurring in potential performance degradation.
- Upgrade Intel MPI Library to 2021.4.0.441.
- Upgrade NVIDIA driver to version 470.103.01.
- Upgrade CUDA library to version 11.4.4.
- Upgrade NVIDIA Fabric manager to version 470.103.01.
- Extend head node creation timeout to 1h.
BUG FIXES
- Fix DCV connection through browsers.
- Fix YAML quoting to prevent custom Tags being parsed as numbers.