Skip to content

Commit

Permalink
Merge branch 'develop' into fix/cli-image-builder-document
Browse files Browse the repository at this point in the history
  • Loading branch information
dreambeyondorange authored Dec 2, 2024
2 parents cf147f0 + 1360e74 commit f29d2e0
Show file tree
Hide file tree
Showing 103 changed files with 1,485 additions and 1,056 deletions.
3 changes: 2 additions & 1 deletion .github/ISSUE_TEMPLATE/bug_report_2.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ assignees: ''
If you have an active AWS support contract, please open a case with AWS Premium Support team using the below documentation to report the issue:
https://docs.aws.amazon.com/awssupport/latest/user/case-management.html

Before submitting a new issue, please search through open [GitHub Issues](https://github.com/aws/aws-parallelcluster/issues) and check out the [troubleshooting documentation](https://docs.aws.amazon.com/parallelcluster/latest/ug/troubleshooting.html).
Before submitting a new issue, please search through [GitHub Issues](https://github.com/aws/aws-parallelcluster/issues?q=is%3Aissue),
[GitHub Wiki](https://github.com/aws/aws-parallelcluster/wiki) and check out the [troubleshooting documentation](https://docs.aws.amazon.com/parallelcluster/latest/ug/troubleshooting.html).

Please make sure to add the following data in order to facilitate the root cause detection.

Expand Down
3 changes: 2 additions & 1 deletion .github/ISSUE_TEMPLATE/bug_report_3.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ assignees: ''
If you have an active AWS support contract, please open a case with AWS Premium Support team using the below documentation to report the issue:
https://docs.aws.amazon.com/awssupport/latest/user/case-management.html

Before submitting a new issue, please search through open [GitHub Issues](https://github.com/aws/aws-parallelcluster/issues) and check out the [troubleshooting documentation](https://docs.aws.amazon.com/parallelcluster/latest/ug/troubleshooting-v3.html).
Before submitting a new issue, please search through [GitHub Issues](https://github.com/aws/aws-parallelcluster/issues?q=is%3Aissue),
[GitHub Wiki](https://github.com/aws/aws-parallelcluster/wiki) and check out the [troubleshooting documentation](https://docs.aws.amazon.com/parallelcluster/latest/ug/troubleshooting.html).

Please make sure to add the following data in order to facilitate the root cause detection.

Expand Down
20 changes: 0 additions & 20 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,6 @@ jobs:
matrix:
os: [ubuntu-latest]
name:
- Python 3.7 Tests
- Python 3.8 Tests
- Python 3.9 Tests
- Python 3.10 Tests
- Python 3.11 Tests
Expand All @@ -50,14 +48,6 @@ jobs:
- API CloudFormation Templates Checks
- Integration Tests Config Checks
include:
- name: Python 3.7 Tests
python: 3.7
toxdir: cli
toxenv: py37-nocov
- name: Python 3.8 Tests
python: 3.8
toxdir: cli
toxenv: py38-nocov
- name: Python 3.9 Tests
python: 3.9
toxdir: cli
Expand Down Expand Up @@ -119,23 +109,13 @@ jobs:
matrix:
os: [ubuntu-latest]
name:
- Python 3.7 AWS Batch CLI Tests
- Python 3.8 AWS Batch CLI Tests
- Python 3.9 AWS Batch CLI Tests
- Python 3.10 AWS Batch CLI Tests
- Python 3.11 AWS Batch CLI Tests
- Python 3.12 AWS Batch CLI Tests
- Python 3.10 AWS Batch CLI Tests Coverage
- Code Checks AWS Batch CLI
include:
- name: Python 3.7 AWS Batch CLI Tests
python: 3.7
toxdir: awsbatch-cli
toxenv: py37-nocov
- name: Python 3.8 AWS Batch CLI Tests
python: 3.8
toxdir: awsbatch-cli
toxenv: py38-nocov
- name: Python 3.9 AWS Batch CLI Tests
python: 3.9
toxdir: awsbatch-cli
Expand Down
15 changes: 15 additions & 0 deletions .github/workflows/unsafe_patterns_checker.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: Unsafe Patterns Checker
on:
pull_request:
types: [opened, synchronize, reopened, ready_for_review, labeled, unlabeled]

jobs:
# Prevent bad URL suffix
bad-url-suffix-check:
runs-on: ubuntu-latest
steps:
- name: Check PR for Disallowed URL Suffixes
uses: francesco-giordano/[email protected]
with:
diffDoesNotContainRegex: "amazonaws\\.com|amazonaws\\.com\\.cn|c2s\\.ic\\.gov|sc2s\\.sgov\\.gov"
skipLabels: skip-bad-url-suffix-check
31 changes: 23 additions & 8 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,23 @@ CHANGELOG
3.12.0
------

**BUG FIXES**
- When mounting an external OpenZFS, it is no longer required to set the outbound rules for ports 111, 2049, 20001, 20002, 20003

3.12.0
------
**ENHANCEMENTS**
- Add new build image configuration section `Build/Installation` to turn on/off Nvidia software and Lustre client installations. By default, Nvidia software, although included in official ParallelCluster AMIs, is not installed by `build-image`. By default, Lustre client is installed.

**CHANGES**
- The CLI commands `export-cluster-logs` and `export-image-logs` can now by default export the logs to the default ParallelCluster bucket or to the CustomS3Bucket if specified in the config.
- Upgrade Amazon DCV to version `2024.0-18131`.
- server: `2024.0-18131-1`
- xdcv: `2024.0.631-1`
- gl: `2024.0.1078-1`
- web_viewer: `2024.0-18131-1`
- Upgrade mysql-community-client to version 8.0.39.
- Remove support for Python 3.7 and 3.8, which are in end of life.

**BUG FIXES**
- When mounting an external OpenZFS, it is no longer required to set the outbound rules for ports 111, 2049, 20001, 20002, 20003.
- Fix an issue where changes in sequence of custom actions scripts were not detected during cluster updates.
- Add missing permissions for ParallelCluster API to create the service linked roles for Elastic Load Balancing and Auto Scaling, that are required to deploy login nodes.

3.11.1
------
Expand All @@ -23,7 +33,9 @@ CHANGELOG
**BUG FIXES**
- Fix an issue in the way we configure the Pyxis Slurm plugin in ParallelCluster that can lead to job submission failures.
https://github.com/aws/aws-parallelcluster/issues/6459
- Add missing permissions required by login nodes to the public template of policies.
- Fix an issue that was causing failing deployment in configurations with login nodes
by add missing permissions required by login nodes in the public template of policies.
https://github.com/aws/aws-parallelcluster/issues/6483

3.11.0
------
Expand All @@ -39,6 +51,9 @@ CHANGELOG
- Install enroot and pyxis in official pcluster AMIs

**CHANGES**
- *[BREAKING]* The `loginNodes` field returned by the API `DescribeCluster` and the CLI command `describe-cluster`
has been changed from a dictionary to an array to support multiple pools of login nodes.
This change breaks backward compatibility, making these operations incompatible with clusters deployed with older versions.
- Upgrade Slurm to 23.11.10 (from 23.11.7).
- Upgrade Pmix to 5.0.3 (from 5.0.2).
- Upgrade EFA installer to `1.34.0`.
Expand Down Expand Up @@ -111,7 +126,7 @@ CHANGELOG
`IMPORT_*`, `REVIEW_IN_PROGRESS` and `UPDATE_FAILED`.
- Fix an issue that prevented cluster updates from including EFS filesystems with encryption in transit.
- Fix an issue that prevented slurmctld and slurmdbd services from restarting on head node reboot when
EFS is used for shared internal data.
EFS is used for shared internal data.
- On Ubuntu systems, remove default logrotate configuration for cloud-init log files that clashed with the
configuration coming from Parallelcluster.
- Fix image build failure with RHEL 8.10 or newer.
Expand All @@ -123,7 +138,7 @@ CHANGELOG
- Add support for FSx Lustre as a shared storage type in us-iso-east-1.

**BUG FIXES**
- Remove `cloud_dns` from the `SlurmctldParameters` in the Slurm config to avoid Slurm fanout issues.
- Remove `cloud_dns` from the `SlurmctldParameters` in the Slurm config to avoid Slurm fanout issues.
This is also not required since we set the IP addresses on instance launch.

3.9.2
Expand Down
12 changes: 12 additions & 0 deletions api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,3 +99,15 @@ through unit tests and integration tests that exercise the operations.
In order to test the API specifically, there are integraiton tests which will deploy the API and test the functionality using
the generated client.

### Invoking the API

Install requirements for the example:
```
pip install -r client/requirements.txt
```

Invoke a deployed ParallelCluster API:
```
python client/example.py --region [REGION] --stack-name [PCAPI_STACK_NAME]
```

37 changes: 28 additions & 9 deletions api/client/example.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,30 +13,49 @@
# language governing permissions and limitations under the License.

import boto3
import click
from pcluster_client.api import cluster_operations_api
from pcluster_client import Configuration, ApiClient, ApiException

apigateway = boto3.client("apigateway")


def request():
@click.command()
@click.option("--stack-name", help="ParallelCluster API stack name")
@click.option("--region", help="AWS region")
def request(stack_name: str, region: str):
"""Makes a simple request to the API Gateway"""
apis = apigateway.get_rest_apis()["items"]
api_id = next(api["id"] for api in apis if api["name"] == "ParallelCluster")
region = boto3.session.Session().region_name
host = f"{api_id}.execute-api.{region}.amazonaws.com"
configuration = Configuration(host=f"https://{host}/prod")
invoke_url = describe_stack_output(region, stack_name, "ParallelClusterApiInvokeUrl")
configuration = Configuration(host=invoke_url)

with ApiClient(configuration) as api_client:
client = cluster_operations_api.ClusterOperationsApi(api_client)
region_filter = region

try:
response = client.list_clusters(region=region_filter)
print("clusters: ", [c["cluster_name"] for c in response["clusters"]])
print("Response: ", response)
except ApiException as ex:
print("Exception when calling ClusterOperationsApi->list_clusters: %s\n" % ex)


def describe_stack_output(region: str, stack_name: str, output_name: str):
try:
# Describe stack
cloudformation = boto3.client("cloudformation", region_name=region)
response = cloudformation.describe_stacks(StackName=stack_name)

# Get the stack details
stacks = response.get("Stacks", [])
if not stacks:
print(f"No stacks found with the name: {stack_name}")
return None

# Extract output
outputs = stacks[0].get("Outputs", [])
return list(filter(lambda o: o['OutputKey'] == 'ParallelClusterApiInvokeUrl', outputs))[0]['OutputValue']

except Exception as e:
print(f"Cannot describe output '{output_name}' for stack '{stack_name}': {e}")
return None

if __name__ == "__main__":
request()
2 changes: 2 additions & 0 deletions api/client/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
boto3>=1.16.14
click~=8.1.7
17 changes: 16 additions & 1 deletion api/infrastructure/deploy-api.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,15 @@
# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions and
# limitations under the License.

usage="$(basename "$0") [-h] --s3-bucket bucket-name --region aws-region [--stack-name name] [--enable-iam-admin true|false] [--create-api-user true|false])"
usage="$(basename "$0") [-h] --s3-bucket bucket-name --region aws-region [--stack-name name] [--enable-iam-admin true|false] [--create-api-user true|false] [--lambda-layer abs_path]"

SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"

S3_BUCKET=
STACK_NAME="ParallelClusterApi"
ENABLE_IAM_ADMIN="true"
CREATE_API_USER="false"
LAMBDA_LAYER=
while [[ $# -gt 0 ]]
do
key="$1"
Expand Down Expand Up @@ -59,6 +60,11 @@ case $key in
shift # past argument
shift # past value
;;
--lambda-layer)
export LAMBDA_LAYER=$2
shift # past argument
shift # past value
;;
*) # unknown option
echo "$usage" >&2
exit 1
Expand All @@ -71,6 +77,8 @@ if [ -z "${S3_BUCKET}" ] || [ -z "${AWS_DEFAULT_REGION}" ] ; then
exit 1
fi

PC_VERSION=$(yq ".Mappings.ParallelCluster.Constants.Version" "${SCRIPT_DIR}/parallelcluster-api.yaml")

S3_UPLOAD_URI="s3://${S3_BUCKET}/api/ParallelCluster.openapi.yaml"
POLICIES_S3_URI="s3://${S3_BUCKET}/stacks/parallelcluster-policies.yaml"
POLICIES_TEMPLATE_URI="http://${S3_BUCKET}.s3.${AWS_DEFAULT_REGION}.amazonaws.com/stacks/parallelcluster-policies.yaml"
Expand All @@ -81,6 +89,12 @@ aws s3 cp "${SCRIPT_DIR}/../spec/openapi/ParallelCluster.openapi.yaml" "${S3_UPL
echo "Publishing policies CloudFormation stack to S3"
aws s3 cp "${SCRIPT_DIR}/../../cloudformation/policies/parallelcluster-policies.yaml" "${POLICIES_S3_URI}"

if [ -n "${LAMBDA_LAYER}" ]; then
LAMBDA_LAYER_S3_URI="s3://${S3_BUCKET}/parallelcluster/${PC_VERSION}/layers/aws-parallelcluster/lambda-layer.zip"
echo "Publishing Lambda Layer for version ${PC_VERSION} to S3"
aws s3 cp "${LAMBDA_LAYER}" "${LAMBDA_LAYER_S3_URI}"
fi

echo "Deploying API template"
aws cloudformation deploy \
--stack-name "${STACK_NAME}" \
Expand All @@ -90,4 +104,5 @@ aws cloudformation deploy \
--parameter-overrides ApiDefinitionS3Uri="${S3_UPLOAD_URI}" \
PoliciesTemplateUri="${POLICIES_TEMPLATE_URI}" \
EnableIamAdminAccess="${ENABLE_IAM_ADMIN}" CreateApiUserRole="${CREATE_API_USER}" \
"$([[ -n "${LAMBDA_LAYER}" ]] && echo "CustomBucket=${S3_BUCKET}" || echo " ")" \
--capabilities CAPABILITY_NAMED_IAM
2 changes: 1 addition & 1 deletion awsbatch-cli/tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ commands =
[testenv:pylint]
basepython = python3
deps =
setuptools<70.0.0
setuptools
pyflakes
pylint
commands =
Expand Down
4 changes: 1 addition & 3 deletions cli/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ def readme():
license="Apache License 2.0",
package_dir={"": "src"},
packages=find_namespace_packages("src"),
python_requires=">=3.7",
python_requires=">=3.9",
install_requires=REQUIRES,
extras_require={
"awslambda": LAMBDA_REQUIRES,
Expand All @@ -86,8 +86,6 @@ def readme():
"Environment :: Console",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
Expand Down
23 changes: 19 additions & 4 deletions cli/src/pcluster/cli/commands/cluster_logs.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@

from pcluster import utils
from pcluster.cli.commands.common import CliCommand, ExportLogsCommand
from pcluster.constants import PCLUSTER_BUCKET_PROTECTED_PREFIX
from pcluster.models.cluster import Cluster

LOGGER = logging.getLogger(__name__)
Expand All @@ -38,14 +39,18 @@ def register_command_args(self, parser: ArgumentParser) -> None: # noqa: D102
# Export options
parser.add_argument(
"--bucket",
required=True,
help="S3 bucket to export cluster logs data to. It must be in the same region of the cluster",
help=(
"S3 bucket to export cluster logs data to. It must be in the same region of the cluster. "
"If not specified, the ParallelCluster default bucket or "
"the CustomS3Bucket (if specified in the config) will be used."
),
)
# Export options
parser.add_argument(
"--bucket-prefix",
help="Keypath under which exported logs data will be stored in s3 bucket. Defaults to "
"<cluster_name>-logs-<current time in the format of yyyyMMddHHmm>",
"<cluster_name>-logs-<current time in the format of yyyyMMddHHmm>. If not specified `--bucket` option, "
f"cannot export logs to {PCLUSTER_BUCKET_PROTECTED_PREFIX} as it is a protected folder.",
)
super()._register_common_command_args(parser)
# Filters
Expand All @@ -65,6 +70,15 @@ def execute(self, args: Namespace, extra_args: List[str]) -> None: # noqa: D102
try:
if args.output_file:
self._validate_output_file_path(args.output_file)

is_pcluster_bucket = not args.bucket

if is_pcluster_bucket:
# Validate the bucket prefix for both the default pcluster bucket
# and CustomS3Bucket if specified in the cluster configuration.
# Skip validation if a bucket is specified via the --bucket CLI argument.
self._validate_bucket_prefix(args.bucket_prefix)

return self._export_cluster_logs(args, args.output_file)
except Exception as e:
utils.error(f"Unable to export cluster's logs.\n{e}")
Expand All @@ -76,7 +90,8 @@ def _export_cluster_logs(args: Namespace, output_file: str = None):
LOGGER.debug("Beginning export of logs for the cluster: %s", args.cluster_name)
cluster = Cluster(args.cluster_name)
url = cluster.export_logs(
bucket=args.bucket,
# cluster.bucket will handle the bucket init including policy update
bucket=args.bucket if args.bucket else cluster.bucket.name,
bucket_prefix=args.bucket_prefix,
keep_s3_objects=args.keep_s3_objects,
start_time=args.start_time,
Expand Down
13 changes: 13 additions & 0 deletions cli/src/pcluster/cli/commands/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@

from pcluster import utils
from pcluster.cli.exceptions import ParameterException
from pcluster.constants import PCLUSTER_BUCKET_PROTECTED_FOLDER, PCLUSTER_BUCKET_PROTECTED_PREFIX
from pcluster.utils import to_utc_datetime

LOGGER = logging.getLogger(__name__)
Expand Down Expand Up @@ -140,3 +141,15 @@ def _validate_output_file_path(file_path: str):
utils.error(f"Failed to create parent directory {file_dir} for file {file_path}. Reason: {e}")
if not os.access(file_dir, os.W_OK):
utils.error(f"Cannot write file: {file_path}. {file_dir} is not writeable.")

@staticmethod
def _validate_bucket_prefix(bucket_prefix: str) -> None:
if bucket_prefix:
if (
bucket_prefix.startswith(PCLUSTER_BUCKET_PROTECTED_PREFIX)
or bucket_prefix == PCLUSTER_BUCKET_PROTECTED_FOLDER
):
raise ValueError(
f"Cannot export logs to {bucket_prefix} as it is within the protected folder "
f"{PCLUSTER_BUCKET_PROTECTED_PREFIX}. Please use another folder."
)
Loading

0 comments on commit f29d2e0

Please sign in to comment.