Skip to content

Releases: aws/aws-parallelcluster-cookbook

AWS ParallelCluster v3.1.5

16 Nov 13:57
b96372a
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 3.1.5

This is associated with AWS ParallelCluster v3.1.5

CHANGES

  • Upgrade EFA installer to 1.18.0
    • Efa-driver: efa-1.16.0-1
    • Efa-config: efa-config-1.11-1
    • Efa-profile: efa-profile-1.5-1
    • Libfabric-aws: libfabric-aws-1.16.0~amzn4.0-1
    • Rdma-core: rdma-core-41.0-2
    • Open MPI: openmpi40-aws-4.1.4-2
  • Upgrade Intel MPI Library to 2021.6.0.602.
  • Upgrade NVIDIA driver to version 470.141.03.
  • Upgrade NVIDIA Fabric Manager to version 470.141.03.

BUG FIXES

  • Fix Slurm issue that prevents idle nodes termination.

AWS ParallelCluster v2.11.8

15 Nov 01:36
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 2.11.8

This is associated with AWS ParallelCluster v2.11.8

CHANGES

  • Upgrade Intel MPI Library to 2021.6.0.602.
  • Upgrade EFA installer to 1.19.0
    • Efa-driver: efa-1.16.0-1
    • Efa-config: efa-config-1.11-1
    • Efa-profile: efa-profile-1.5-1
    • Libfabric-aws: libfabric-aws-1.16.0-1
    • Rdma-core: rdma-core-41.0-2
    • Open MPI: openmpi40-aws-4.1.4-3

AWS ParallelCluster v3.3.0

02 Nov 15:06
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 3.3.0

This is associated with AWS ParallelCluster v3.3.0

ENHANCEMENTS

  • Add support for Slurm Accounting.
  • Add support for adding and removing shared storages at cluster update.
  • Add possibility to specify multiple instance types for the same compute resource.
  • Configure NFS threads to be min(256, max(8, num_cores * 4)) to ensure better stability and performance.
  • Move NFS installation at build time to reduce configuration time.

CHANGES

  • Upgrade NVIDIA driver to version 470.141.03.
  • Upgrade NVIDIA Fabric Manager to version 470.141.03.
  • Upgrade NVIDIA CUDA Toolkit to version 11.7.1.
  • Disable cron job tasks man-db and mlocate, which may have a negative impact on node performance.
  • Reduce timeout from 50 to a maximum of 5min in case of DynamoDB connection issues at compute node bootstrap.
  • Change the logic to number the routing tables when an instance have multiple NICs.
  • Upgrade Python from 3.7.13 to 3.9.15.
  • Upgrade Slurm to version 22.05.5.
  • Upgrade EFA installer to 1.18.0.
    • Efa-driver: efa-1.16.0-1
    • Efa-config: efa-config-1.11-1
    • Efa-profile: efa-profile-1.5-1
    • Libfabric-aws: libfabric-aws-1.16.0~amzn4.0-1
    • Rdma-core: rdma-core-41.0-2
    • Open MPI: openmpi40-aws-4.1.4-2
  • Upgrade NICE DCV to version 2022.1-13300.
    • server: 2022.1.13300-1
    • xdcv: 2022.1.433-1
    • gl: 2022.1.973-1
    • web_viewer: 2022.1.13300-1
  • Upgrade third-party cookbook dependencies:
    • selinux-6.0.5 (from selinux-6.0.4)
    • nfs-5.0.0 (from nfs-2.6.4)

AWS ParallelCluster v3.2.1

03 Oct 09:00
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 3.2.1

This is associated with AWS ParallelCluster v3.2.1

ENHANCEMENTS

  • Improve the logic to associate the host routing tables to the different network cards to better support EC2 instances with several NICs.

CHANGES

  • Upgrade NVIDIA driver to version 470.141.03.
  • Upgrade NVIDIA Fabric Manager to version 470.141.03.
  • Pin cfn-bootstrap helper package version to 2.0-10
  • Disable cron job tasks man-db and mlocate, which may have a negative impact on node performance.
  • Upgrade Intel MPI Library to 2021.6.0.602.
  • Upgrade Python from 3.7.10 to 3.7.13 in response to this security risk.

AWS ParallelCluster v3.2.0

27 Jul 17:49
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 3.2.0

This is associated with AWS ParallelCluster v3.2.0

ENHANCEMENTS

  • Add support for multiple Elastic File Systems.
  • Add support for multiple FSx File System.
  • Add support for attaching existing FSx for Ontap and FSx for OpenZFS File Systems.
  • Install NVIDIA GDRCopy 2.3 to enable low-latency GPU memory copy on supported instance types.
  • During cluster update set Slurm nodes state accordingly to strategy set through the configuration parameter Scheduling/SchedulerSettings/QueueUpdateStrategy.
  • Add support for memory-based scheduling in Slurm.
    • Configure RealMemory on compute nodes by default as 95% of the EC2 memory.
    • Move SelectTypeParameters to slurm_parallelcluster.conf include file.
    • Move ConstrainRAMSpace to slurm_parallelcluster_cgroup.conf include file.
    • Add support for new configuration parameter Scheduling/SlurmSettings/EnableMemoryBasedScheduling to configure memory-based scheduling in Slurm.
    • Add support for new configuration parameter Scheduling/SlurmQueues/ComputeResources/SchedulableMemory to override default value of the memory seen by the scheduler on compute nodes.
  • Add support for rebooting compute nodes via Slurm.

CHANGES

  • Restart clustermgtd and slurmctld daemons at cluster update time only when Scheduling parameters are updated in the cluster configuration.
  • Update slurmctld and slurmd systemd service files.
  • Upgrade NICE DCV to version 2022.0-12760.
  • Upgrade NVIDIA driver to version 470.129.06.
  • Upgrade NVIDIA Fabric Manager to version 470.129.06.
  • Upgrade EFA installer to version 1.17.2.
    • EFA driver: efa-1.16.0-1
    • EFA configuration: efa-config-1.10-1
    • EFA profile: efa-profile-1.5-1
    • Libfabric: libfabric-aws-1.16.0~amzn2.0-1
    • RDMA core: rdma-core-41.0-2
    • Open MPI: openmpi40-aws-4.1.4-2
  • Restrict IPv6 access to IMDS to root and cluster admin users only, when configuration parameter HeadNode/Imds/Secured is enabled.
  • Set Slurm configuration AuthInfo=cred_expire=70 to reduce the time requeued jobs must wait before starting again when nodes are not available.
  • Move SelectTypeParameters and ConstrainRAMSpace to the parallelcluster_slurm*.conf include files.
  • Upgrade third-party cookbook dependencies:
    • apt-7.4.2 (from apt-7.4.0)
    • line-4.5.2 (from line-4.0.1)
    • openssh-2.10.3 (from openssh-2.9.1)
    • pyenv-3.5.1 (from pyenv-3.4.2)
    • selinux-6.0.4 (from selinux-3.1.1)
    • yum-7.4.0 (from yum-6.1.1)
    • yum-epel-4.5.0 (from yum-epel-4.1.2)
  • Disable aws-ubuntu-eni-helper service, available in Deep Learning AMIs, to avoid conflicts with configure_nw_interface.sh when configuring instances with multiple network cards.
  • Set MTU to 9001 for all the network interfaces when configuring instances with multiple network cards.
  • Remove the trailing dot when configuring the compute node FQDN.

AWS ParallelCluster v3.1.4

16 May 19:57
73debc1
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 3.1.4

This is associated with AWS ParallelCluster v3.1.4

CHANGES

  • Upgrade Slurm to version 21.08.8-2.

ENHANCEMENTS

  • Add support for enabling JWT authentication Slurm.

AWS ParallelCluster v2.11.7

13 May 16:46
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 2.11.7

This is associated with AWS ParallelCluster v2.11.7

CHANGES

  • Upgrade Slurm to version 20.11.9.

AWS ParallelCluster v3.1.3

20 Apr 15:38
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 3.1.3

This is associated with AWS ParallelCluster v3.1.3

ENHANCEMENTS

  • Execute SSH key creation alongside with the creation of HOME directory, i.e.
    during SSH login, when switching to another user and when executing a command as another user.
  • Add support for both FQDN and LDAP Distinguished Names in the configuration parameter DirectoryService/DomainName. The new validator now checks both the syntaxes.
  • New update_directory_service_password.sh script deployed on the head node supports the manual update of the Active Directory password in the SSSD configuration.
    The password is retrieved by the AWS Secrets Manager as from the cluster configuration.

CHANGES

  • Disable deeper C-States in x86_64 official AMIs and AMIs created through build-image command, to guarantee high performance and low latency.

BUG FIXES

  • Fix the configuration parameter DirectoryService/DomainAddr conversion to ldap_uri SSSD property when it contains multiples domain addresses.

AWS ParallelCluster v2.11.6

19 Apr 13:27
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 2.11.6

This is associated with AWS ParallelCluster v2.11.6

CHANGES

  • OS package updates and security fixes.

AWS ParallelCluster v3.1.2

02 Mar 14:41
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 3.1.2

This is associated with AWS ParallelCluster v3.1.2

BUG FIXES

  • Fix the update of /etc/hosts file on computes nodes when a cluster is deployed in subnets without internet access.
  • Fix compute nodes bootstrap by waiting for ephemeral drives initialization before joining the cluster.

CHANGES

  • Upgrade Slurm to version 21.08.6.