Skip to content

AWS ParallelCluster v2.6.1

Compare
Choose a tag to compare
@tilne tilne released this 09 Apr 22:58
· 3 commits to release-2.6 since this release

We're excited to announce the release of AWS ParallelCluster 2.6.1.

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Improved management of S3 bucket that gets created when awsbatch scheduler is selected.
  • Add validation for supported OSes when using FSx Lustre.
  • Change ProctrackType from proctrack/gpid to proctrack/cgroup in Slurm in order to better handle termination of stray processes when running MPI applications. This also includes the creation of a cgroup Slurm configuration in in order to enable the cgroup plugin.
  • Skip execution, at node bootstrap time, of all those install recipes that are already applied at AMI creation time.
  • Start CloudWatch agent earlier in the node bootstrapping phase so that cookbook execution failures are correctly uploaded and are available for troubleshooting.
  • Improved the management of SQS messages and retries to speed-up recovery times when failures occur.

CHANGES

  • FSx Lustre: remove x-systemd.requires=lnet.service from mount options in order to rely on default lnet setup provided by Lustre.
  • Enforce Packer version to be >= 1.4.0 when building an AMI. This is also required for customers using pcluster createami command.
  • Do not launch a replacement for an unhealthy or unresponsive node until this is terminated. This makes cluster slower at provisioning new nodes when failures occur but prevents any temporary over-scaling with respect to the expected capacity.
  • Increase parallelism when starting slurmd on compute nodes that join the cluster from 10 to 30.
  • Reduce the verbosity of messages logged by the node daemons.
  • Do not dump logs to /home/logs when nodewatcher encounters a failure and terminates the node. CloudWatch can be used to debug such failures.
  • Reduce the number of retries for failed REMOVE events in sqswatcher.
  • Omit cfn-init-cmd and cfn-wire from the files stored in CloudWatch logs.

BUG FIXES

  • Configure proxy during cloud-init boothook in order for the proxy to be configured for all bootstrap actions.
  • Fix installation of Intel Parallel Studio XE Runtime that requires yum4 since version 2019.5.
  • Fix compilation of Torque scheduler on Ubuntu 18.04.
  • Fixed a bug in the ordering and retrying of SQS messages that was causing, under certain circumstances of heavy load, the scheduler configuration to be left in an inconsistent state.
  • Delete from queue the REMOVE events that are discarded due to hostname collision with another event fetched as part of the same sqswatcher iteration.

Support

Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192