Skip to content

Releases: aws/aws-parallelcluster

AWS ParallelCluster v2.11.8

15 Nov 01:36
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 2.11.8

Upgrade

How to upgrade?

sudo pip install aws-parallelcluster==2.11.8

CHANGES

  • Upgrade Intel MPI Library to 2021.6.0.602.
  • Upgrade EFA installer to 1.19.0
    • Efa-driver: efa-1.16.0-1
    • Efa-config: efa-config-1.11-1
    • Efa-profile: efa-profile-1.5-1
    • Libfabric-aws: libfabric-aws-1.16.0-1
    • Rdma-core: rdma-core-41.0-2
    • Open MPI: openmpi40-aws-4.1.4-3
  • Upgrade Python runtime used by Lambda functions in AWS Batch integration to python3.9.

BUG FIXES

  • Prevent cluster tags to be changed during an update because not supported.

AWS ParallelCluster v3.1.5

16 Nov 13:54
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.1.5

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

CHANGES

  • Upgrade EFA installer to 1.18.0
    • Efa-driver: efa-1.16.0-1
    • Efa-config: efa-config-1.11-1
    • Efa-profile: efa-profile-1.5-1
    • Libfabric-aws: libfabric-aws-1.16.0~amzn4.0-1
    • Rdma-core: rdma-core-41.0-2
    • Open MPI: openmpi40-aws-4.1.4-2
  • Add lambda:ListTags and lambda:UntagResource to ParallelClusterUserRole used by ParallelCluster API stack for cluster update.
  • Upgrade Intel MPI Library to 2021.6.0.602.
  • Upgrade NVIDIA driver to version 470.141.03.
  • Upgrade NVIDIA Fabric Manager to version 470.141.03.

BUG FIXES

  • Fix Slurm issue that prevents idle nodes termination.

AWS ParallelCluster v3.3.0

02 Nov 15:06
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.3.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Add possibility to specify multiple EC2 instance types for the same compute resource.
  • Add support for adding and removing shared storages at cluster update by updating SharedStorage configuration.
  • Add new configuration parameter DeletionPolicy for EFS and FSx for Lustre shared storage to support storage retention.
  • Add new configuration section Scheduling/SlurmSettings/Database to enable accounting functionality in Slurm.
  • Add support for On-Demand Capacity Reservations and Capacity Reservations Resource Groups.
  • Add new configuration parameter in Imds/ImdsSettings to specify the IMDS version to support in a cluster or build image infrastructure.
  • Add support for Networking/PlacementGroup in the SlurmQueues/ComputeResources section.
  • Add support for instances with multiple network interfaces that allows only one ENI per device.
  • Improve validation of networking for external EFS file systems by checking the CIDR block in the attached security group.
  • Add validator to check if configured instance types support placement groups.
  • Configure NFS threads to be min(256, max(8, num_cores * 4)) to ensure better stability and performance.
  • Move NFS installation at build time to reduce configuration time.
  • Enable server-side encryption for the EcrImageBuilder SNS topic created when deploying ParallelCluster API and used to notify on docker image build events.

CHANGES

  • Change behaviour of SlurmQueues/Networking/PlacementGroup/Enabled: now it creates a different managed placement
    group for each compute resource instead of a single managed placement group for all compute resources.
  • Add support for PlacementGroup/Name as the preferred naming method.
  • Move head node tags from Launch Template to instance definition to avoid head node replacement on tags updates.
  • Disable Multithreading through script executed by cloud-init and not through CpuOptions set into Launch Template.
  • Upgrade Python to version 3.9 and NodeJS to version 16 in API infrastructure, API Docker container and cluster Lambda resources.
  • Remove support for Python 3.6 in aws-parallelcluster-batch-cli.
  • Upgrade Slurm to version 22.05.5.
  • Upgrade NVIDIA driver to version 470.141.03.
  • Upgrade NVIDIA Fabric Manager to version 470.141.03.
  • Upgrade NVIDIA CUDA Toolkit to version 11.7.1.
  • Upgrade Python used in ParallelCluster virtualenvs from 3.7.13 to 3.9.15.
  • Upgrade Slurm to version 22.05.5.
  • Upgrade EFA installer to version 1.18.0.
  • Upgrade NICE DCV to version 2022.1-13300.
  • Allow for suppressing the SingleSubnetValidator for Queues.

BUG FIXES

  • Fix validation of filters parameter in ListClusterLogStreams command to fail when incorrect filters are passed.
  • Fix validation of parameter SharedStorage/EfsSettings: now validation fails when FileSystemId is specified
    along with other SharedStorage/EfsSettings parameters, whereas it was previously ignoring them.
  • Fix cluster update when changing the order of SharedStorage together with other changes in the configuration.
  • Fix UpdateParallelClusterLambdaRole in the ParallelCluster API to upload logs to CloudWatch.
  • Fix Cinc not using the local CA certificates bundle when installing packages before any cookbooks are executed.
  • Fix a hang in upgrading ubuntu via pcluster build-image when Build:UpdateOsPackages:Enabled:true is set.
  • Fix parsing of YAML cluster configuration by failing on duplicate keys.

AWS ParallelCluster v3.2.1

03 Oct 08:59
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.2.1

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Improve the logic to associate the host routing tables to the different network cards to better support EC2 instances with several NICs.

CHANGES

  • Upgrade NVIDIA driver to version 470.141.03.
  • Upgrade NVIDIA Fabric Manager to version 470.141.03.
  • Disable cron job tasks man-db and mlocate, which may have a negative impact on node performance.
  • Upgrade Intel MPI Library to 2021.6.0.602.
  • Upgrade Python from 3.7.10 to 3.7.13 in response to this security risk.

BUG FIXES

  • Avoid failing on DescribeCluster when cluster configuration is not available.

AWS ParallelCluster v3.2.0

27 Jul 17:48
fdc0dfd
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.2.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Add support for memory-based job scheduling in Slurm
    • Configure compute nodes real memory in the Slurm cluster configuration.
    • Add new configuration parameter Scheduling/SlurmSettings/EnableMemoryBasedScheduling to enable memory-based scheduling in Slurm.
    • Add new configuration parameter Scheduling/SlurmQueues/ComputeResources/SchedulableMemory to override default value of the memory seen by the scheduler on compute nodes.
  • Improve flexibility on cluster configuration updates to avoid the stop and start of the entire cluster whenever possible.
    • Add new configuration parameter Scheduling/SlurmSettings/QueueUpdateStrategy to set the preferred strategy to adopt for compute nodes needing a configuration update and replacement.
  • Improve failover mechanism over available compute resources when hitting insufficient capacity issues with EC2 instances. Disable compute nodes by a configurable amount of time (default 10 min) when a node launch fails due to insufficient capacity.
  • Add support to mount existing FSx for ONTAP and FSx for OpenZFS file systems.
  • Add support to mount multiple instances of existing EFS, FSx for Lustre / for ONTAP/ for OpenZFS file systems.
  • Add support for FSx for Lustre Persistent_2 deployment type when creating a new file system.
  • Prompt user to enable EFA for supported instance types when using pcluster configure wizard.
  • Add support for rebooting compute nodes via Slurm.
  • Improved handling of Slurm power states to also account for manual powering down of nodes.
  • Add NVIDIA GDRCopy 2.3 into the product AMIs to enable low-latency GPU memory copy.

CHANGES

  • Upgrade EFA installer to version 1.17.2
    • EFA driver: efa-1.16.0-1
    • EFA configuration: efa-config-1.10-1
    • EFA profile: efa-profile-1.5-1
    • Libfabric: libfabric-aws-1.16.0~amzn2.0-1
    • RDMA core: rdma-core-41.0-2
    • Open MPI: openmpi40-aws-4.1.4-2
  • Upgrade NICE DCV to version 2022.0-12760.
  • Upgrade NVIDIA driver to version 470.129.06.
  • Upgrade NVIDIA Fabric Manager to version 470.129.06.
  • Change default EBS volume types from gp2 to gp3 for both the root and additional volumes.
  • Changes to FSx for Lustre file systems created by ParallelCluster:
    • Change the default deployment type to Scratch_2.
    • Change the Lustre server version to 2.12.
  • Do not require PlacementGroup/Enabled to be set to true when passing an existing PlacementGroup/Id.
  • Add parallelcluster:cluster-name tag to all the resources created by ParallelCluster.
  • Do not allow setting PlacementGroup/Id when PlacementGroup/Enabled is explicitly set to false.
  • Add lambda:ListTags and lambda:UntagResource to ParallelClusterUserRole used by ParallelCluster API stack for cluster update.
  • Restrict IPv6 access to IMDS to root and cluster admin users only, when configuration parameter HeadNode/Imds/Secured is true as by default.
  • With a custom AMI, use the AMI root volume size instead of the ParallelCluster default of 35 GiB. The value can be changed in cluster configuration file.
  • Automatic disabling of the compute fleet when the configuration parameter Scheduling/SlurmQueues/ComputeResources/SpotPrice
    is lower than the minimum required Spot request fulfillment price.
  • Show requested_value and current_value values in the change set when adding or removing a section during an update.
  • Disable aws-ubuntu-eni-helper service in DLAMI to avoid conflicts with configure_nw_interface.sh when configuring instances with multiple network cards.
  • Remove support for Python 3.6.
  • Set MTU to 9001 for all the network interfaces when configuring instances with multiple network cards.
  • Remove the trailing dot when configuring the compute node FQDN.

BUG FIXES

  • Fix the default behavior to skip the ParallelCluster validation and test steps when building a custom AMI.
  • Fix file handle leak in computemgtd.
  • Fix race condition that was sporadically causing launched instances to be immediately terminated because not available yet in EC2 DescribeInstances response
  • Fix support for DisableSimultaneousMultithreading parameter on instance types with Arm processors.
  • Fix ParallelCluster API stack update failure when upgrading from a previus version. Add resource pattern used for the ListImagePipelineImages action in the EcrImageDeletionLambdaRole.
  • Fix ParallelCluster API adding missing permissions needed to import/export from S3 when creating an FSx for Lustre storage.

AWS ParallelCluster v2.11.7

13 May 16:46
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 2.11.7

Upgrade

How to upgrade?

sudo pip install aws-parallelcluster==2.11.7

CHANGES

  • Upgrade Slurm to version 20.11.9.

AWS ParallelCluster v3.1.4

16 May 19:57
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.1.4

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Add validation for DirectoryService/PasswordSecretArn to fail in case the secret does not exist.

CHANGES

  • Upgrade Slurm to version 21.08.8-2.
  • Build Slurm with JWT support.
  • Do not require PlacementGroup/Enabled to be set to true when passing an existing PlacementGroup/Id.
  • Add lambda:TagsResource to ParallelClusterUserRole used by ParallelCluster API stack for cluster creation and image creation.

BUG FIXES

  • Fix the ability to export cluster's logs when using export-cluster-logs command with the --filters option.
  • Fix AWS Batch Docker entrypoint to use /home shared directory to coordinate Multi-node-Parallel job execution.

AWS ParallelCluster v2.11.6

19 Apr 13:27
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 2.11.6

Upgrade

How to upgrade?

sudo pip install aws-parallelcluster==2.11.6

ENHANCEMENTS

  • Improve exception management in case of missing networking.

CHANGES

  • OS package updates and security fixes.

AWS ParallelCluster v3.1.3

20 Apr 15:38
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.1.3

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Execute SSH key creation alongside with the creation of HOME directory, i.e.
    during SSH login, when switching to another user and when executing a command as another user.
  • Add support for both FQDN and LDAP Distinguished Names in the configuration parameter DirectoryService/DomainName. The new validator now checks both the syntaxes.
  • New update_directory_service_password.sh script deployed on the head node supports the manual update of the Active Directory password in the SSSD configuration.
    The password is retrieved by the AWS Secrets Manager as from the cluster configuration.
  • Add support to deploy API infrastructure in environments without a default VPC.
  • Add validation for DirectoryService/AdditionalSssdConfigs to fail in case of invalid overrides.

CHANGES

  • Disable deeper C-States in x86_64 official AMIs and AMIs created through build-image command, to guarantee high performance and low latency.
  • OS package updates and security fixes.
  • Change Amazon Linux 2 base images to use AMIs with Kernel 5.10.

BUG FIXES

  • Fix build-image stack in DELETE_FAILED after image built successful, due to new EC2ImageBuilder policies.
  • Fix the configuration parameter DirectoryService/DomainAddr conversion to ldap_uri SSSD property when it contains multiples domain addresses.

AWS ParallelCluster v2.11.5

01 Mar 18:29
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 2.11.5

Upgrade

How to upgrade?

sudo pip install aws-parallelcluster==2.11.5

ENHANCEMENTS

  • Add support for NEW_CHANGED_DELETED as value of FSx for Lustre AutoImportPolicy option.

CHANGES

  • Drop support for SGE and Torque schedulers.
  • Disable log4j-cve-2021-44228-hotpatch service on Amazon Linux to avoid incurring in potential performance degradation.
  • Upgrade Intel MPI Library to 2021.4.0.441.
  • Upgrade NVIDIA driver to version 470.103.01.
  • Upgrade CUDA library to version 11.4.4.
  • Upgrade NVIDIA Fabric manager to version 470.103.01.
  • Extend head node creation timeout to 1h.

BUG FIXES

  • Fix DCV connection through browsers.
  • Fix YAML quoting to prevent custom Tags being parsed as numbers.