Releases: aws/aws-parallelcluster
AWS ParallelCluster v2.11.0
We're excited to announce the release of AWS ParallelCluster 2.11.0
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Add support for Ubuntu 20.04.
- Add support for using FSx Lustre in subnet with no internet access.
- Add support for building custom Centos 7 AMIs on ARM.
- Add support for FSx Lustre DataCompressionType feature.
- Add validation to prevent using a
cluster_resource_bucket
that is in a different region than the cluster. - Install SSM agent on CentOS 7 and 8.
- Add support for
security_group_id
in packer custom builders. Customers can exportAWS_SECURITY_GROUP_ID
environment variable to specify security group for custom builders when building custom AMIs. - SGE: always use shortname as hostname filter with
qstat
. This will make nodewatcher more robust when using custom DHCP option, where the full hostname seen bySGE
might differ from the hostname returned from EC2 metadata(local-hostname). - Transition from IMDSv1 to IMDSv2.
CHANGES
- Removed support for Ubuntu 16.04 (
ubuntu1604
). - Removed support for Amazon Linux (
alinux
). Amazon Linux 2 (alinux2
) remains fully supported.
Amazon Linux is no longer supported. - Make
key_name
parameter optional to support cluster configurations without a key pair. - Remove support for Python versions < 3.6.
- Remove dependency on
future
package and__future__
module.
- Remove dependency on
- Root volume size increased from 25GB to 35GB on all AMIs. Minimum root volume size is now 35GB.
- Add sanity check to prevent cluster creation in an AWS region not officially supported by ParallelCluster.
- Restrict IAM permissions to only allow cluster IAM instance role to launch instances via
run-instances
in cluster compute subnet. - Upgrade EFA installer to version 1.12.2
- EFA configuration:
efa-config-1.8-1
(fromefa-config-1.7
) - EFA profile:
efa-profile-1.5-1
(fromefa-profile-1.4
) - EFA kernel module:
efa-1.12.3
(fromefa-1.10.2
) - RDMA core:
rdma-core-32.1amzn
(fromrdma-core-31.2amzn
) - Libfabric:
libfabric-1.11.2amzon1.1-1
(fromlibfabric-1.11.1amzn1.0
) - Open MPI:
openmpi40-aws-4.1.1-2
(fromopenmpi40-aws-4.1.0
)
- EFA configuration:
- Upgrade Slurm to version 20.11.7.
- Update slurmctld and slurmd systemd unit files according to latest provided by slurm.
- Add new SlurmctldParameters, power_save_min_interval=30, so power actions will be processed every 30 seconds.
- Add new SlurmctldParameters, cloud_reg_addrs, which will reset a node's NodeAddr automatically on power_down.
- Specify instance GPU model as GRES GPU Type in gres.conf, instead of previous hardcoded value
Type=tesla
for all GPU.
- Upgrade Arm Performance Libraries (APL) to version 21.0.0.
- Upgrade NICE DCV to version 2021.1-10557.
- Upgrade NVIDIA driver to version 460.73.01.
- Upgrade CUDA library to version 11.3.0.
- Upgrade NVIDIA Fabric manager to
nvidia-fabricmanager-460
. - Install ParallelCluster AWSBatch CLI in dedicated python3 virtual env.
- Upgrade Python version used in ParallelCluster virtualenvs from version 3.6.13 to version 3.7.10.
- Upgrade Cinc Client to version 16.13.16.
- Upgrade third-party cookbook dependencies:
- apt-7.4.0 (from apt-7.3.0)
- iptables-8.0.0 (from iptables-7.1.0)
- line-4.0.1 (from line-2.9.0)
- openssh-2.9.1 (from openssh-2.8.1)
- pyenv-3.4.2 (from pyenv-3.1.1)
- selinux-3.1.1 (from selinux-2.1.1)
- ulimit-1.1.1 (from ulimit-1.0.0)
- yum-6.1.1 (from yum-5.1.0)
- yum-epel-4.1.2 (from yum-epel-3.3.0)
- Drop
lightdm
package install from Ubuntu 18.04 DCV installation process.
BUG FIXES
- Use ICP-compliant AL2 repo URLs when building Docker images in China
- Fix a bug that caused
clustermgtd
to not immediately replace instances with failed status check that are in replacement process.
AWS ParallelCluster v2.10.4
We're excited to announce the release of AWS ParallelCluster 2.10.4
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
CHANGES
- Upgrade Slurm to version 20.02.7.
AWS ParallelCluster v2.10.3
We're excited to announce the release of AWS ParallelCluster 2.10.3
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Enable support for ARM instances in China and GovCloud regions when using Ubuntu 18.04 or Amazon Linux 2.
- Add validation for
cluster_type
configuration parameter incluster
section - Add validation for
compute_type
configuration parameter inqueue
section
CHANGES
- Upgrade EFA installer to version 1.11.2
- EFA configuration: efa-config-1.7 (no change)
- EFA profile: efa-profile-1.4 (from efa-profile-1.3)
- EFA kernel module: efa-1.10.2 (no change)
- RDMA core: rdma-core-31.2amzn (no change)
- Libfabric: libfabric-1.11.1amzn1.0 (no change)
- Open MPI: openmpi40-aws-4.1.0 (no change)
BUG FIXES
- Fix issue with
awsbsub
command when setting environment variables for the job submission
AWS ParallelCluster v2.10.2
We're excited to announce the release of AWS ParallelCluster 2.10.2
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Improve cluster config validation by using cluster target AMI when invoking RunInstances in dryrun mode.
- Improve configuration procedure for the Munge service.
CHANGES
- Update Python version used in ParallelCluster virtualenvs from version 3.6.9 to version 3.6.13.
BUG FIXES
- Fix sanity checks with ARM instance types by using cluster AMI when performing validation.
- Fix
enable_efa
parameter validation when using Centos8 and Slurm or ARM instances. - Use non interactive
apt update
command when building custom Ubuntu AMIs. - Fix
encrypted_ephemeral = true
when using Alinux2 or CentOS8.
AWS ParallelCluster v2.10.1
We're excited to announce the release of AWS ParallelCluster 2.10.1
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Add support for me-south-1 region (Bahrein), af-south-1 region (Cape Town) and eu-south-1 region (Milan)
- At the time of this version launch:
- Amazon FSx for Lustre and ARM instance types are not supported in me-south-1, af-south-1 and eu-south-1
- AWS Batch is not supported in af-south-1
- EBS io2 is not supported in af-south-1 and eu-south-1
- At the time of this version launch:
- Install Arm Performance Libraries (APL) 20.2.1 on ARM AMIs (CentOS8, Alinux2, Ubuntu1804).
- Install EFA kernel module on ARM instances with
alinux2
andubuntu1804
. This enables support forc6gn
instances. - Add support for io2 and gp3 EBS volume type.
- Add
iam_lambda_role
parameter undercluster
section to enable the possibility to specify an existing IAM role to
be used by AWS Lambda functions in CloudFormation. When usingsge
,torque
, orslurm
as the scheduler,pcluster
will not create any IAM role if bothec2_iam_role
andiam_lambda_role
are provided. - Improve robustness of a Slurm cluster when clustermgtd is down.
- Configure NFS threads to be max(8, num_cores) for performance. This enhancement will not take effect on Ubuntu 16.04.
- Optimize calls to DescribeInstanceTypes EC2 API when validating cluster configuration.
CHANGES
- Upgrade EFA installer to version 1.11.1.
- EFA configuration:
efa-config-1.7
(from efa-config-1.5) - EFA profile:
efa-profile-1.3
(from efa-profile-1.1) - EFA kernel module:
efa-1.10.2
(no change) - RDMA core:
rdma-core-31.2amzn
(from rdma-core-31.amzn0) - Libfabric:
libfabric-1.11.1amzn1.0
(from libfabric-1.11.1amzn1.1) - Open MPI:
openmpi40-aws-4.1.0
(from openmpi40-aws-4.0.5)
- EFA configuration:
- Upgrade Intel MPI to version U8.
- Upgrade NICE DCV to version 2020.2-9662.
- Set default systemd runlevel to multi-user.target on all OSes during ParallelCluster official AMI creation.
The runlevel is set to graphical.target on head node only when DCV is enabled. This prevents the execution of graphical services, such as x/gdm, when they are not required. - Download Intel MPI and HPC packages from S3 rather than Intel yum repos.
- Change the default of instance types from the hardcoded
t2.micro
to the free tier instance type (t2.micro
ort3.micro
dependent on region). In regions without free tier, the default ist3.micro
. - Enable support for p4d as head node instance type (p4d was already supported as compute node in 2.10.0).
- Pull Amazon Linux Docker images from public ECR when building docker image for
awsbatch
scheduler. - Increase max retry attempts when registering Slurm nodes in Route53.
BUG FIXES
- Fix pcluster createami for Ubuntu 1804 by downloading SGE sources from Debian repository and not from the EOL Ubuntu 19.10.
- Remove CloudFormation DescribeStacks API call from AWS Batch Docker entrypoint. This removes the risk of job failures due to CloudFormation throttling.
- Mandate the presence of
vpc_settings
,vpc_id
,master_subnet_id
in the config file to avoid unhandled exceptions. - Set the default EBS volume size to 500 GiB when volume type is
st1
orsc1
. - Fix installation of Intel PSXE package on CentOS 7 by using yum4.
- Fix routing issues with multiple Network Interfaces on Ubuntu 18.04.
AWS ParallelCluster v2.10.0
We're excited to announce the release of AWS ParallelCluster 2.10.0.
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Add support for CentOS 8 in all Commercial regions.
- Add support for P4d instance type as compute node.
- Add the possibilty to enable NVIDIA GPUDirect RDMA support on EFA by using the new
enable_efa_gdr
configuration
parameter. - Enable support for NICE DCV in GovCloud regions.
- Enable support for AWS Batch scheduler in GovCloud regions.
- FSx Lustre:
- Add possibility to configure Auto Import policy through the new
auto_import_policy
parameter. - Add support to HDD storage type and the new
storage_type
anddrive_cache_type
configuration parameters.
- Add possibility to configure Auto Import policy through the new
- Create a CloudWatch Dashboard for the cluster, named
<clustername>-<region>
, including head node EC2 metrics and
cluster logs. It can be disabled by configuring theenable
parameter in thedashboard
section. - Add
-r/-region
arg topcluster configure
command. If this arg is provided, configuration will
skip region selection. - Add
-r/-region
arg tossh
anddcv connect
commands. - Add
cluster_resource_bucket
parameter undercluster
section to allow the user to specify an existing S3 bucket. createami
:- Add validation step to fail when using a base AMI created by a different version of ParallelCluster.
- Add validation step for AMI creation process to fail if the selected OS and the base AMI OS are not consistent.
- Add
--post-install
parameter to use a post installation script when building an AMI. - Add the possibility to use a ParallelCluster base AMI.
- Add possibility to change tags when performing a
pcluster update
. - Add new
all_or_nothing_batch
configuration parameter forslurm_resume
script. WhenTrue
,slurm_resume
will
succeed only if all the instances required by all the pending jobs in Slurm will be available. - Enable queue resizing on update without requiring to stop the compute fleet. Stopping the compute fleet is only
necessary when existing instances risk to be terminated. - Add validator for EBS volume size, type and IOPS.
- Add validators for
shared_dir
parameter when used in bothcluster
andebs
sections. - Add validator
cfn_scheduler_slots
key in theextra_json
parameter.
CHANGES
- CentOS 6 is no longer supported.
- Upgrade EFA installer to version 1.10.1
- EFA configuration:
efa-config-1.5
(from efa-config-1.4) - EFA profile:
efa-profile-1.1
(from efa-profile-1.0.0) - EFA kernel module:
efa-1.10.2
(from efa-1.6.0) - RDMA core:
rdma-core-31.amzn0
(from rdma-core-28.amzn0) - Libfabric:
libfabric-1.11.1amzn1.1
(from libfabric-1.10.1amzn1.1) - Open MPI:
openmpi40-aws-4.0.5
(from openmpi40-aws-4.0.3) - Unifies installer runtime options across x86 and aarch64
- Introduces
-g/--enable-gdr
switch to install packages with GPUDirect RDMA support - Updates to OMPI collectives decision file packaging, migrated from efa-config to efa-profile
- Introduces CentOS 8 support
- EFA configuration:
- Upgrade NVIDIA driver to version 450.80.02.
- Install NVIDIA Fabric manager to enable NVIDIA NVSwitch on supported platforms.
- Remove default region
us-east-1
. After the change,pcluster
will adhere to the following lookup order for region:-r/--region
arg.AWS_DEFAULT_REGION
environment variable.aws_region_name
in ParallelCluster configuration file.region
in AWScli configuration file.
- Slurm: change
SlurmctldPort
to 6820-6829 to not overlap with defaultslurmdbd
port (6819). - Slurm: add
compute_resource
name andefa
as node features. - Remove validation on
ec2_iam_role
parameter. - Improve retrieval of instance type info by using
DescribeInstanceType
API. - Remove
custom_awsbatch_template_url
configuration parameter. - Upgrade
pip
to latest version in virtual environments. - Upgrade image used by CodeBuild environment when building container images for Batch clusters, from
aws/codebuild/amazonlinux2-x86_64-standard:1.0
toaws/codebuild/amazonlinux2-x86_64-standard:3.0
.
BUG FIXES
- Retrieve the right number of compute instance slots when instance type is updated.
- Include user tags in compute nodes and EBS volumes.
- Fix
pcluster status
output when head node is stopped. pcluster update
:- Fix issue when tags are specified but not changed.
- Fix issue when the
cluster
section label changed. - Fix issue when
shared_dir
andebs_settings
are both configured in thecluster
section. - Fix
cluster
andcfncluster
compatibility inextra_json
parameter.
- Fix
pcluster configure
to avoid using default/initial values for internal parameter initialization. - Fix pre/post install script arguments management when using double quotes.
- Fix a bug that was causing
clustermgtd
andcomputemgtd
sleep interval to be incorrectly computed when
system timezone is not set to UTC. - Fix queue name validator to properly check for capital letters.
- Fix
enable_efa
parameter validation forqueue
section. - Fix CloudWatch Log Group creation for AWS Lambda functions handling CloudFormation Custom Resources.
AWS ParallelCluster v2.9.1
We're excited to announce the release of AWS ParallelCluster 2.9.1.
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
Bugfixes
- Fix cluster creation with the head node in a private subnet when it doesn't get a public IP.
Support
Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192
AWS ParallelCluster v2.9.0
We're excited to announce the release of AWS ParallelCluster 2.9.0.
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Add support for multiple queues and multiple instance types feature with the Slurm scheduler.
- Extend NICE DCV support to ARM instances.
- Extend support to disable hyperthreading on instances (like *.metal) that don't support CpuOptions in LaunchTemplate.
- Enable support for NFS 4 for the filesystems shared from the head node.
- Add CLI utility to convert configuration files with Slurm scheduler to new format to support multiple queues configuration.
- Add script wrapper to support Torque-like commands with the Slurm scheduler.
- Remove dependency on cfn-init in compute nodes bootstrap in order to avoid throttling and delays caused by CloudFormation when a large number of compute nodes join the cluster.
CHANGES
- Introduce new configuration sections and parameters to support multiple queues and multiple instance types.
- Optimize scaling logic with Slurm scheduler, no longer based on Auto Scaling groups.
- A Route53 private hosted zone is now created together with the cluster and used in DNS resolution inside cluster nodes when using Slurm scheduler.
- Upgrade EFA installer to version 1.9.5:
- EFA configuration:
efa-config-1.4
(from efa-config-1.3) - EFA profile:
efa-profile-1.0.0
- EFA kernel module:
efa-1.6.0
(no change) - RDMA core:
rdma-core-28.amzn0
(no change) - Libfabric:
libfabric-1.10.1amazon1.1
(no change) - Open MPI:
openmpi40-aws-4.0.3
(no change)
- EFA configuration:
- Upgrade Slurm to version 20.02.4.
- Apply the following changes to Slurm configuration:
- Assign a range of 10 ports to Slurmctld in order to better perform with large cluster settings
- Configure cloud scheduling logic
- Set
ReconfigFlags=KeepPartState
- Set
MessageTimeout=60
- Set
TaskPlugin=task/affinity,task/cgroup
together withTaskAffinity=no
andConstrainCores=yes
in cgroup.conf
- Upgrade NICE DCV to version 2020.1-9012.
- Use private IP instead of master node hostname when mounting shared NFS drives.
- Add new log streams to CloudWatch: chef-client, clustermgtd, computemgtd, slurm_resume, slurm_suspend.
- Add support for queue names in pre/post install scripts.
- Use PAY_PER_REQUEST billing mode for DynamoDb table in govcloud regions.
BUG FIXES
- Solve dpkg lock issue with Ubuntu that prevented custom AMI creation in some cases.
- Add/improve sanity checks for some configuration parameters.
- Prevent ignored changes from being reported in
pcluster update
output. - Fix incompatibility issues with python 2.7 for
pcluster update
. - Fix SNS Topic Subscriptions not being deleted with cluster's CloudFormation stack.
Support
Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192
AWS ParallelCluster v2.8.1
We're excited to announce the release of AWS ParallelCluster 2.8.1.
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
Changes
- Disable screen lock for DCV desktop sessions to prevent users from being locked out.
Bugfixes
- Fix
pcluster configure
command to avoid writing unexpected configuration parameters.
Support
Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192
AWS ParallelCluster v2.8.0
We're excited to announce the release of AWS ParallelCluster 2.8.0.
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Enable support for ARM instances on Ubuntu 18.04 and Amazon Linux 2.
- Add support for the automatic backup features of FSx file systems.
- Renewed user experience and robustness of cluster update functionality.
- Support DCV and EFS in China regions.
- Use DescribeInstanceTypes API to validate whether an instance type is EFA-enabled so that new EFA instances can
be used without requiring an update to the ParallelCluster configuration files. - Enable Slurm to directly launch tasks and initialize communications through PMIx v3.1.5 on all supported
operating systems except for CentOS 6. - Print a warning when using NICE DCV on micro or nano instances.
CHANGES
- Remove the client requirement to have Berkshelf to build a custom AMI.
- Upgrade EFA installer to version 1.9.4:
- Kernel module:
efa-1.6.0
(from efa-1.5.1) - RDMA core:
rdma-core-28.amzn0
(from rdma-core-25.0) - Libfabric:
libfabric-1.10.1amazon1.1
(updated from libfabric-aws-1.9.0amzn1.1) - Open MPI: openmpi40-aws-4.0.3 (no change)
- Kernel module:
- Avoid unnecessary validation of IAM policies.
- Removed unused dependency on supervisor from the Batch Dockerfile.
- Move all LogGroup definitions in the CloudFormation templates into the CloudWatch substack.
- Disable libvirtd service on CentOS 7. Virtual bridge interfaces are incorrectly detected by Open MPI and
cause MPI applications to hang, see https://www.open-mpi.org/faq/?category=tcp#tcp-selection for details - Use CINC instead of Chef for provisioning instances. See https://cinc.sh/about/ for details.
- Retry when mounting an NFS mount fails.
- Install the
pyenv
virtual environments used by ParallelCluster cookbook and node daemon code under
/opt/parallelcluster instead of under /usr/local. - Use the new official CentOS 7 AMI as the base images for ParallelCluster AMI.
- Upgrade NVIDIA driver to Tesla version 440.95.01 on CentOS 6 and version 450.51.05 on all other distros.
- Upgrade CUDA library to version 11.0 on all distros besides CentOS 6.
- Install third-party cookbook dependencies via local source, rather than using the Chef supermarket.
- Use https wherever possible in download URLs.
- Install glibc-static, which is required to support certain options for the Intel MPI compiler.
- Require an initial cluster size greater than zero when the option to maintain the initial cluster size is used.
BUG FIXES
- Fix validator for CIDR-formatted IP range parameters.
- Fix issue that was preventing concurrent use of custom node and pcluster CLI packages.
- Use the correct domain name when contacting AWS services from the China partition.