diff --git a/.gitmodules b/.gitmodules deleted file mode 100644 index 387c08bc..00000000 --- a/.gitmodules +++ /dev/null @@ -1,3 +0,0 @@ -[submodule "playbooks/k8s/kubespray"] - path = playbooks/k8s/kubespray - url = https://github.com/kubernetes-sigs/kubespray.git diff --git a/Makefile b/Makefile index 03bd50c1..e8a03720 100644 --- a/Makefile +++ b/Makefile @@ -3,7 +3,7 @@ ARCH ?= 'icx' NIC ?= 'cvl' MIRRORS ?= false PLAYBOOKS_DIRS = playbooks playbooks/infra playbooks/intel -PLAYBOOK_NAMES = access basic full_nfv on_prem regional_dc remote_fp storage build_your_own +PLAYBOOK_NAMES = access basic full_nfv on_prem on_prem_vss on_prem_sw_defined_factory regional_dc remote_fp build_your_own # set default target available with simple 'make' command .DEFAULT_GOAL := examples diff --git a/Pipfile b/Pipfile index 083822a4..43b67861 100644 --- a/Pipfile +++ b/Pipfile @@ -6,7 +6,7 @@ name = "pypi" [packages] ansible = "~=5.7.1" "ansible-core" = "~=2.12" -cryptography = "~=39.0" +cryptography = "~=41.0" jinja2 = "~=3.1" netaddr = "~=0.7.19" pbr = "~=5.4" diff --git a/README.md b/README.md index b84127f2..e3ee81f4 100644 --- a/README.md +++ b/README.md @@ -8,13 +8,7 @@ The software provided here is for reference only and not intended for production **_NOTE:_** Instruction provided bellow are prepared for deployment done under root user by default. If you want to do deployment under non-root user then read [this](docs/rootless_deployment.md) file first and then continue with following steps under that non-root user. -1. Initialize git submodules to download Kubespray code. - - ```bash - git submodule update --init - ``` - -2. Decide which configuration profile you want to use and export environmental variable. +1. Decide which configuration profile you want to use and export environmental variable. > **_NOTE:_** It will be used only to ease execution of the steps listed below. - For **Kubernetes Basic Infrastructure** deployment: @@ -46,13 +40,25 @@ The software provided here is for reference only and not intended for production export PROFILE=on_prem ``` + - For **Kubernetes Infrastructure On Customer Premises for VSS** deployment: + + ```bash + export PROFILE=on_prem_vss + ``` + + - For **Kubernetes Infrastructure On Customer Premises for SW-Defined Factory** deployment: + + ```bash + export PROFILE=on_prem_sw_defined_factory + ``` + - For **Kubernetes Build-Your-Own Infrastructure** deployment: ```bash export PROFILE=build_your_own ``` -3. Install dependencies using one of the following methods +2. Install python dependencies using one of the following methods a) Non-invasive virtual environment using pipenv @@ -79,12 +85,18 @@ The software provided here is for reference only and not intended for production pip3 install -r requirements.txt ``` +3. Install ansible collection dependencies with following command: + + ```bash + ansible-galaxy install -r collections/requirements.yml + ``` + 4. Generate example host_vars, group_vars and inventory files for Intel Container Experience Kits profiles. > **_NOTE:_** It is **highly recommended** to read [this](docs/generate_profiles.md) file before profiles generation. ```bash - make examples ARCH= NIC= + make examples ARCH= NIC= ``` 5. Copy example inventory file to the project root dir. @@ -145,7 +157,7 @@ The software provided here is for reference only and not intended for production Needed details are at least dataplane_interfaces For more details see [VM case configuration guide](docs/vm_config_guide.md) -9. **Required:** Apply bug fix patch for Kubespray submodule (for RHEL 8+). +9. **Mandatory:** Apply patch for Kubespray collection. ```bash ansible-playbook -i inventory.ini playbooks/k8s/patch_kubespray.yml @@ -154,7 +166,9 @@ The software provided here is for reference only and not intended for production 10. Execute `ansible-playbook`. > **_NOTE:_** For Cloud case this step is not used. See the [cloud/](cloud/) directory for more details - + + > **_NOTE:_** It is recommended to use "--flush-cache" (e.g. "ansible-playbook -i --flush-cache inventory.ini playbooks/remote_fp.yml") when executing ansible-playbook in order to avoid unknown issues such as skip of tasks/roles, unable to update previous run inventory details, etc. + ```bash ansible-playbook -i inventory.ini playbooks/${PROFILE}.yml ``` @@ -178,6 +192,7 @@ Refer to the documentation linked below to see configuration details for selecte - [VM multinode setup guide](docs/vm_multinode_setup_guide.md) - [VM cluster expansion guide](docs/vm_cluster_expansion_guide.md) - [Non-root deployment guide](docs/rootless_deployment.md) + ## Prerequisites and Requirements - Required packages on the target servers: **Python3**. diff --git a/action_plugins/cpupin.py b/action_plugins/cpupin.py index 36add07b..b9712aec 100644 --- a/action_plugins/cpupin.py +++ b/action_plugins/cpupin.py @@ -17,6 +17,7 @@ # Make coding more python3-ish, this is required for contributions to Ansible from __future__ import (absolute_import, division, print_function) + __metaclass__ = type import re @@ -37,6 +38,7 @@ class ActionModule(ActionBase): """cpupin action plugin implementation""" + def __init__(self, task, connection, play_context, loader, templar, shared_loader_obj): super().__init__(task, connection, play_context, loader, templar, shared_loader_obj) # CPUs allocated for host OS @@ -165,7 +167,7 @@ def run(self, tmp=None, task_vars=None): if not self.pinning and self.alloc_all and int(self.number) != 0: msg = "You have to set parameter 'cpu_total:' to '0' when 'alloc_all: true' is used" - if not self.pinning and self.alloc_all and ( self.cpus or self.numa ): + if not self.pinning and self.alloc_all and (self.cpus or self.numa): msg = "'cpus' and 'numa' can't be used with 'alloc_all: true'" if self.pinning and not self.alloc_all and (not self.cpus or not self.numa): @@ -179,8 +181,8 @@ def run(self, tmp=None, task_vars=None): if self.pinning and self.alloc_all and (not self.cpus or self.numa): msg = ("When using parameters pinning=true and alloc_all=true, 'numa' parameter is None" - ", 'cpus' parameter have to be prepared in advance e.g.: via running module with " - "pinning=false") + ", 'cpus' parameter have to be prepared in advance e.g.: via running module with " + "pinning=false") if msg: raise AnsibleActionFail(msg) @@ -475,7 +477,6 @@ def _allocate_all_cpus(self, task_vars): self.cpu_list.sort() return task_vars - def _allocate_cpus(self, task_vars): """ Allocate required number of CPUs @@ -485,7 +486,7 @@ def _allocate_cpus(self, task_vars): # Select random NUMA if not self.numa: - self.numa = random.choice(self.numa_nodes) # nosec B311 # pseudo random is not used for security purposes + self.numa = random.choice(self.numa_nodes) # nosec B311 # pseudo random is not used for security purposes if not self.cpus: self.cpu_list = self._select_cpus(task_vars['numa_nodes_cpus'], self.number, self.numa) @@ -622,7 +623,7 @@ def _pin_cpus(self): f"{to_native(emupin_result['stderr'].strip())}'") if not self.alloc_all: - # Update VM NUMA alignment + # Update VM NUMA alignment cmd_numa = f"virsh numatune {self.name} --nodeset {self.numa} --live --config" numa_result = self._low_level_execute_command(cmd_numa) if numa_result['rc'] != 0: diff --git a/ansible.cfg b/ansible.cfg index 9bdaa235..3334e39d 100644 --- a/ansible.cfg +++ b/ansible.cfg @@ -16,6 +16,8 @@ fact_caching_timeout = 7200 action_plugins = ./action_plugins:~/.ansible/plugins/action:/usr/share/ansible/plugins/action library = ./library +roles_path = roles +collections_path = ./collections log_path = ./.ansible_last_run.log display_args_to_stdout = False diff --git a/cloud/README.md b/cloud/README.md index 284c0fc0..e40d1ffc 100644 --- a/cloud/README.md +++ b/cloud/README.md @@ -14,13 +14,11 @@ Cloud RA allows for deploying Intel Container Experience Kits on managed Kuberne - Python 3.8+ -- Azure CLI 2.46.0+ ([Install Guide](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-linux?pivots=apt)) +- Azure CLI 2.50.0+ ([Install Guide](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-linux?pivots=apt)) -- Azure CLI "aks-preview" extesnion ([Install Guide](https://learn.microsoft.com/en-us/cli/azure/azure-cli-extensions-overview)). Needed for enabling SGX. +- AWS CLI 2.12.7+ ([Install Guide](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)) -- AWS CLI 2.11.0+ ([Install Guide](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)) - -- Terraform 1.3.9+ +- Terraform 1.5.2+ - Docker 20.10.17+ @@ -83,8 +81,8 @@ azureConfig: sg_whitelist_cidr_blocks: [] enable_proximity_placement: true aks: - kubernetes_version: "1.25" - cni: "kubenet" # Possible values are: kubenet, cilium, cilium-ebpf + kubernetes_version: "1.26" + cni: "kubenet" # Possible values are: kubenet, cilium enable_sgx: false # Requires DCsv series instances in one of node pools default_node_pool: subnet_name: "subnet_a" @@ -122,7 +120,7 @@ awsConfig: sg_whitelist_cidr_blocks: [] ecr_repositories: [] eks: - kubernetes_version: "1.24" + kubernetes_version: "1.26" subnets: ["subnet_a", "subnet_b"] custom_ami: "ubuntu" # Comment out this line to use Amazon Linux 2 OS node_groups: diff --git a/cloud/cwdf.py b/cloud/cwdf.py index a767b5b0..2cc389c8 100644 --- a/cloud/cwdf.py +++ b/cloud/cwdf.py @@ -1,8 +1,10 @@ -import click -from cwdf import compose_terraform, compose_cloudcli import os + +import click import paramiko +from cwdf_util import compose_terraform, compose_cloudcli + @click.group() def cli(): @@ -45,6 +47,7 @@ def generate_terraform(cwdf_config, ssh_public_key, generate_keys, job_id, creat tf_manifest = compose_terraform(cwdf_config, job_id, ssh_public_key, create_ansible_host, create_container_registry) click.echo(tf_manifest) + @click.command() @click.option('--deployment_dir', help='Path to deployment directory', required=True) @click.option('--cwdf_config', help='Path to CWDF yaml config file', required=True) diff --git a/cloud/cwdf/templates/terraform/aws/common.tf.jinja b/cloud/cwdf/templates/terraform/aws/common.tf.jinja deleted file mode 100644 index 3991b63f..00000000 --- a/cloud/cwdf/templates/terraform/aws/common.tf.jinja +++ /dev/null @@ -1,122 +0,0 @@ -resource "aws_vpc" "default" { - cidr_block = "{{ vpc_cidr_block }}" - enable_dns_hostnames = true - enable_dns_support = true - - tags = merge( - jsondecode("{{ extra_tags_json }}"), - { - Name = "cwdf-infra-{{ job_id }}-default-vpc" - JobId = "{{ job_id }}" - } - ) -} - -resource "aws_internet_gateway" "default" { - vpc_id = aws_vpc.default.id - - tags = merge( - jsondecode("{{ extra_tags_json }}"), - { - Name = "cwdf-infra-{{ job_id }}-default-igw" - JobId = "{{ job_id }}" - } - ) -} - -resource "aws_route_table" "default" { - vpc_id = aws_vpc.default.id - - route { - cidr_block = "0.0.0.0/0" - gateway_id = aws_internet_gateway.default.id - } - - tags = merge( - jsondecode("{{ extra_tags_json }}"), - { - Name = "cwdf-infra-{{ job_id }}-default-rt" - JobId = "{{ job_id }}" - } - ) -} - -{% for subnet in subnets %} -resource "aws_subnet" "{{ subnet.name }}" { - vpc_id = aws_vpc.default.id - map_public_ip_on_launch = true - - cidr_block = "{{ subnet.cidr_block }}" - availability_zone = "{{ subnet.az }}" - - tags = merge( - jsondecode("{{ extra_tags_json }}"), - { - Name = "cwdf-infra-{{ job_id }}-subnet-{{ subnet.name }}" - JobId = "{{ job_id }}" - } - ) -} - -resource "aws_route_table_association" "{{ subnet.name }}" { - subnet_id = aws_subnet.{{ subnet.name }}.id - route_table_id = aws_route_table.default.id -} - -{% endfor %} - -resource "aws_security_group" "sg-{{ job_id }}" { - name = "cwdf-infra-{{ job_id }}-sg" - vpc_id = aws_vpc.default.id - - ingress { - description = "SSH" - from_port = 22 - to_port = 22 - protocol = "tcp" - cidr_blocks = [{% for cidr_block in sg_whitelist_cidr_blocks %}"{{cidr_block}}",{% endfor %}] - self = true - } - - ingress { - description = "PING" - from_port = 8 - to_port = 0 - protocol = "icmp" - cidr_blocks = [{% for cidr_block in sg_whitelist_cidr_blocks %}"{{cidr_block}}",{% endfor %}] - self = true - } - - egress { - from_port = 0 - to_port = 0 - protocol = "-1" - cidr_blocks = ["0.0.0.0/0"] - ipv6_cidr_blocks = ["::/0"] - } - - tags = merge( - jsondecode("{{ extra_tags_json }}"), - { - Name = "cwdf-infra-{{ job_id }}-default-sg" - JobId = "{{ job_id }}" - } - ) -} - -resource "aws_key_pair" "default" { - key_name = "cwdf-infra-{{ job_id }}-default-keypair" - public_key = "{{ ssh_pub_key }}" - - tags = merge( - jsondecode("{{ extra_tags_json }}"), - { - Name = "cwdf-infra-{{ job_id }}-default-keypair" - JobId = "{{ job_id }}" - } - ) -} - -output "cloud_provider" { - value = "aws" -} \ No newline at end of file diff --git a/cloud/cwdf/templates/terraform/aws/cr.tf.jinja b/cloud/cwdf/templates/terraform/aws/cr.tf.jinja deleted file mode 100644 index db409c00..00000000 --- a/cloud/cwdf/templates/terraform/aws/cr.tf.jinja +++ /dev/null @@ -1,39 +0,0 @@ -resource "aws_ecr_repository" "default" { - name = "cwdf-infra-{{ job_id }}-ecr-repository" - image_tag_mutability = "MUTABLE" - force_delete = true - image_scanning_configuration { - scan_on_push = true - } - - tags = merge( - jsondecode("{{ extra_tags_json }}"), - { - Name = "cwdf-infra-{{ job_id }}-ecr-repository" - JobId = "{{ job_id }}" - } - ) -} - -{% for name in ecr_repositories %} -resource "aws_ecr_repository" "{{ name }}" { - name = "{{ name }}" - image_tag_mutability = "MUTABLE" - force_delete = true - image_scanning_configuration { - scan_on_push = true - } - - tags = merge( - jsondecode("{{ extra_tags_json }}"), - { - Name = "{{ name }}" - JobId = "{{ job_id }}" - } - ) -} -{% endfor %} - -output "cr_url" { - value = aws_ecr_repository.default.repository_url -} diff --git a/cloud/cwdf_example_aws.yaml b/cloud/cwdf_example_aws.yaml index 0ce2ef8d..8bbe089e 100644 --- a/cloud/cwdf_example_aws.yaml +++ b/cloud/cwdf_example_aws.yaml @@ -15,7 +15,7 @@ awsConfig: sg_whitelist_cidr_blocks: [] ecr_repositories: [] eks: - kubernetes_version: "1.24" + kubernetes_version: "1.26" subnets: ["subnet_a", "subnet_b"] custom_ami: "ubuntu" # Comment out this line to use Amazon Linux 2 OS node_groups: diff --git a/cloud/cwdf_example_azure.yaml b/cloud/cwdf_example_azure.yaml index 88364447..01fd81b2 100644 --- a/cloud/cwdf_example_azure.yaml +++ b/cloud/cwdf_example_azure.yaml @@ -14,7 +14,7 @@ azureConfig: sg_whitelist_cidr_blocks: [] enable_proximity_placement: true aks: - kubernetes_version: "1.25" + kubernetes_version: "1.26" cni: "kubenet" # Possible values are: kubenet, cilium, cilium-ebpf enable_sgx: false # Requires DCsv series instances in one of node pools default_node_pool: diff --git a/cloud/cwdf/__init__.py b/cloud/cwdf_util/__init__.py similarity index 100% rename from cloud/cwdf/__init__.py rename to cloud/cwdf_util/__init__.py diff --git a/cloud/cwdf/config.py b/cloud/cwdf_util/config.py similarity index 91% rename from cloud/cwdf/config.py rename to cloud/cwdf_util/config.py index 4fe79d66..d9254534 100644 --- a/cloud/cwdf/config.py +++ b/cloud/cwdf_util/config.py @@ -1,6 +1,5 @@ from schema import Schema, Or, Optional - config_schema = Schema({ "cloudProvider": Or("aws", "azure"), Optional("awsConfig"): { @@ -25,7 +24,7 @@ Optional("root_volume_type", default='gp2'): str }], Optional("eks"): { - Optional("kubernetes_version", default='1.24'): Or("1.22", "1.23", "1.24"), + Optional("kubernetes_version", default='1.26'): Or("1.24", "1.25", "1.26"), "subnets": [str], Optional("install_ebs_csi_driver", default=True): bool, Optional("custom_ami", default=None): str, @@ -49,8 +48,8 @@ Optional("enable_proximity_placement", default=False): bool, Optional("ansible_instance_size", default="Standard_B2s"): str, Optional("aks"): { - Optional("kubernetes_version", default='1.25'): Or("1.23", "1.24", "1.25"), - Optional("cni", default="cilium"): Or("cilium", "cilium-ebpf", "kubenet"), + Optional("kubernetes_version", default='1.26'): Or("1.25", "1.26"), + Optional("cni", default="cilium"): Or("cilium", "kubenet"), Optional("enable_sgx", default=False): bool, "default_node_pool": { "subnet_name": str, diff --git a/cloud/cwdf/main.py b/cloud/cwdf_util/main.py similarity index 72% rename from cloud/cwdf/main.py rename to cloud/cwdf_util/main.py index 9c7a6b3b..bc4635e7 100644 --- a/cloud/cwdf/main.py +++ b/cloud/cwdf_util/main.py @@ -1,51 +1,49 @@ -from .config import config_schema -from schema import SchemaError -from jinja2 import Template, Environment, FileSystemLoader +import ipaddress +import json import os -import yaml import shutil import stat -import json -import requests -import ipaddress + import click +import requests +import yaml + +from jinja2 import Template, Environment, FileSystemLoader + +from .config import config_schema + + +def click_secho_error(message, bold=True): + click.secho(message, err=True, bold=bold, fg='red') def verify_cwdf_config(config): # Verify config file has correct schema configuration = yaml.safe_load(config) - try: - pop = config_schema.validate(configuration) - cloud_provider = pop["cloudProvider"] - - # Check if user has whitelisted current external IP - req = requests.get('https://ident.me') - if req.status_code == 200: - external_ip = req.text - else: - click.secho( - "The server https://ident.me is not responding properly.", - err=True, - bold=True, - fg="red") - click.secho( - "Unable to find your IP address.", - err=True, - fg="red") - ip_addr = ipaddress.ip_address(external_ip) - cidr_blocks = pop[f'{cloud_provider}Config']['sg_whitelist_cidr_blocks'] - whitelisted = False - for block in cidr_blocks: - ip_net = ipaddress.ip_network(block) - if ip_addr in ip_net: - whitelisted = True - break - if not whitelisted: - pop[f'{cloud_provider}Config']['sg_whitelist_cidr_blocks'].append(external_ip + "/32") - click.echo(f'Whitelisted your current external IP address {external_ip}.') - return pop - except SchemaError as se: - raise se + pop = config_schema.validate(configuration) + cloud_provider = pop["cloudProvider"] + + # Check if user has whitelisted current external IP + req = requests.get('https://ident.me', timeout=600) + if req.status_code != 200: + click_secho_error("The server https://ident.me is not responding properly.") + click_secho_error("Unable to find your IP address.", bold=False) + return + + external_ip = req.text + ip_addr = ipaddress.ip_address(external_ip) + cidr_blocks = pop[f'{cloud_provider}Config']['sg_whitelist_cidr_blocks'] + whitelisted = False + for block in cidr_blocks: + ip_net = ipaddress.ip_network(block) + if ip_addr in ip_net: + whitelisted = True + break + if not whitelisted: + pop[f'{cloud_provider}Config']['sg_whitelist_cidr_blocks'].append(external_ip + "/32") + click.echo(f'Whitelisted your current external IP address {external_ip}.') + return pop + def compose_cloudcli( deployment_dir, config, job_id, public_key_path, @@ -62,31 +60,26 @@ def compose_cloudcli( else: return - cloud_config['job_id'] = job_id - cloud_config['ssh_pub_key_path'] = public_key_path - - cloud_config["will_create_ansible_instance"] = create_ansible_instance - cloud_config["will_create_container_registry"] = create_container_registry + cloud_config.update({ + 'job_id': job_id, + 'ssh_pub_key_path': public_key_path, + "will_create_ansible_instance": create_ansible_instance, + "will_create_container_registry": create_container_registry, + }) with open(public_key_path, 'r') as f: cloud_config["ssh_public_key"] = f.read() - script_template = None - provider_template_path = os.path.join( os.path.dirname(__file__), f'templates/cloudcli/{cloud_provider}/') if cloud_provider == "aws": - version = cloud_config["eks"]["kubernetes_version"] - region = cloud_config["region"] + # version = cloud_config["eks"]["kubernetes_version"] + # region = cloud_config["region"] file_loader = FileSystemLoader(provider_template_path) env = Environment(loader=file_loader) - shutil.copy2( - os.path.join( - provider_template_path, - 'aws_cloudcli_cleanup.sh'), - deployment_dir) + shutil.copy2(os.path.join(provider_template_path, 'aws_cloudcli_cleanup.sh'), deployment_dir) cleanup_file = os.path.join(deployment_dir, 'aws_cloudcli_cleanup.sh') print(f"Cleanup file path: {cleanup_file}") st = os.stat(cleanup_file) @@ -95,33 +88,26 @@ def compose_cloudcli( elif cloud_provider == "azure": file_loader = FileSystemLoader(provider_template_path) env = Environment(loader=file_loader) - shutil.copy2(os.path.join( - provider_template_path, - 'azure_cloudcli_cleanup.sh'), - deployment_dir) + shutil.copy2(os.path.join(provider_template_path, 'azure_cloudcli_cleanup.sh'), deployment_dir) cleanup_file = os.path.join(deployment_dir, 'azure_cloudcli_cleanup.sh') st = os.stat(cleanup_file) os.chmod(cleanup_file, st.st_mode | stat.S_IEXEC) script_template = env.get_template("azure_cloudcli_deploy.sh.j2") else: - click.secho(f"Unknown cloud provider {cloud_provider}.", err=True, bold=True, fg="red") - click.secho( - "Currently supported cloud providers are: aws, azure, gpc.", - err=True, - bold=True, - fg="red") - click.secho("Nothing to do.", err=True, bold=True, fg="red") + click_secho_error(f"Unknown cloud provider {cloud_provider}.") + click_secho_error("Currently supported cloud providers are: aws, azure, gpc.") + click_secho_error("Nothing to do.") + return generated_script = script_template.render(cloud_config=cloud_config) script_file = os.path.join(deployment_dir, f"{cloud_provider}_cloudcli_provisioning.sh") with open(file=script_file, mode='wt', encoding='UTF-8') as sf: sf.write(generated_script) - sf.close() - os.chmod(script_file, st.st_mode | stat.S_IEXEC) return script_file + def compose_terraform( config, job_id, ssh_public_key, create_ansible_instance=True, diff --git a/cloud/cwdf/templates/cloudcli/aws/aws_cloudcli_cleanup.sh b/cloud/cwdf_util/templates/cloudcli/aws/aws_cloudcli_cleanup.sh similarity index 100% rename from cloud/cwdf/templates/cloudcli/aws/aws_cloudcli_cleanup.sh rename to cloud/cwdf_util/templates/cloudcli/aws/aws_cloudcli_cleanup.sh diff --git a/cloud/cwdf/templates/cloudcli/aws/aws_cloudcli_deploy.sh.j2 b/cloud/cwdf_util/templates/cloudcli/aws/aws_cloudcli_deploy.sh.j2 similarity index 100% rename from cloud/cwdf/templates/cloudcli/aws/aws_cloudcli_deploy.sh.j2 rename to cloud/cwdf_util/templates/cloudcli/aws/aws_cloudcli_deploy.sh.j2 diff --git a/cloud/cwdf/templates/cloudcli/azure/azure_cloudcli_cleanup.sh b/cloud/cwdf_util/templates/cloudcli/azure/azure_cloudcli_cleanup.sh similarity index 100% rename from cloud/cwdf/templates/cloudcli/azure/azure_cloudcli_cleanup.sh rename to cloud/cwdf_util/templates/cloudcli/azure/azure_cloudcli_cleanup.sh diff --git a/cloud/cwdf/templates/cloudcli/azure/azure_cloudcli_deploy.sh.j2 b/cloud/cwdf_util/templates/cloudcli/azure/azure_cloudcli_deploy.sh.j2 similarity index 93% rename from cloud/cwdf/templates/cloudcli/azure/azure_cloudcli_deploy.sh.j2 rename to cloud/cwdf_util/templates/cloudcli/azure/azure_cloudcli_deploy.sh.j2 index 4e565c45..3d732746 100644 --- a/cloud/cwdf/templates/cloudcli/azure/azure_cloudcli_deploy.sh.j2 +++ b/cloud/cwdf_util/templates/cloudcli/azure/azure_cloudcli_deploy.sh.j2 @@ -33,12 +33,21 @@ ANSIBLE_INSTANCE_IMAGE="Canonical:0001-com-ubuntu-server-jammy:22_04-lts-gen2:la # Ansible instance entrypoint script ANSIBLE_INSTANCE_ENTRYPOINT="$(cat <<- "EOM" #!/usr/bin/env bash +mkdir -p /etc/apt/keyrings +curl -sLS https://packages.microsoft.com/keys/microsoft.asc | + gpg --dearmor | + tee /etc/apt/keyrings/microsoft.gpg > /dev/null +chmod go+r /etc/apt/keyrings/microsoft.gpg +AZ_REPO=$(lsb_release -cs) +echo "deb [arch=`dpkg --print-architecture` signed-by=/etc/apt/keyrings/microsoft.gpg] https://packages.microsoft.com/repos/azure-cli/ $AZ_REPO main" | + tee /etc/apt/sources.list.d/azure-cli.list apt-get -qq -y update -apt-get -qq -y upgrade -apt-get -qq -y install python3-pip python3-venv -apt-get -qq -y install zip unzip net-tools apache2-utils -apt-get -qq -y install podman -curl -sL https://aka.ms/InstallAzureCLIDeb | bash +apt-get -qq -y -o DPkg::Lock::Timeout=60 upgrade +apt-get -qq -y -o DPkg::Lock::Timeout=60 install python3-pip python3-venv +apt-get -qq -y -o DPkg::Lock::Timeout=60 install zip unzip net-tools apache2-utils +apt-get -qq -y -o DPkg::Lock::Timeout=60 install ca-certificates curl apt-transport-https lsb-release gnupg +apt-get -qq -y -o DPkg::Lock::Timeout=60 install podman +apt-get -qq -y -o DPkg::Lock::Timeout=60 install azure-cli az aks install-cli sudo -H -u ubuntu bash -c 'az login --identity' sudo -H -u ubuntu bash -c 'az aks get-credentials --resource-group cwdf-infra-{{ cloud_config.job_id }}-rg --name cwdf-infra-{{ cloud_config.job_id }}-aks' @@ -194,12 +203,15 @@ az aks create \ --vnet-subnet-id $SUBNET_ID_{{ cloud_config.aks.default_node_pool.subnet_name }} \ --os-sku ubuntu \ --output tsv \ + --enable-addons monitoring {%- if cloud_config.aks.enable_sgx == true %},confcom{%- endif %} \ {%- if cloud_config.enable_proximity_placement == true %} --ppg $PPG_RESOURCE_ID \ {%- endif %} --tags {% for tag in cloud_config.extra_tags %}{{ tag }}={{ cloud_config.extra_tags[tag] }}{{ " " if not loop.last else "" }}{% endfor %} \ -{%- if cloud_config.aks.cni in ['cilium', 'cilium-ebpf'] %} - --network-plugin none +{%- if cloud_config.aks.cni == 'cilium' %} + --network-plugin azure \ + --network-dataplane cilium \ + --network-plugin-mode overlay {%- else %} --network-plugin {{ cloud_config.aks.cni }} \ {%- endif -%} @@ -451,45 +463,6 @@ ANSIBLE_INSTANCE_PRIVATE_IP=$(az vm list-ip-addresses \ --name $ANSIBLE_INSTANCE_NAME \ --query "[].virtualMachine.network.privateIpAddresses[0]" --output tsv) -{%- if cloud_config.aks.cni in ['cilium', 'cilium-ebpf'] %} -echo "--------------------------------------------------" -echo "Cilium install" -# Wait for the cloud-init to complete -az vm run-command invoke \ - --resource-group $AZ_GROUP_NAME \ - --name $ANSIBLE_INSTANCE_NAME \ - --command-id RunShellScript \ - --scripts "cloud-init status --wait" \ - --output tsv - -AKS_FQDN=$(az aks show --name ${AKS_NAME} --resource-group ${AZ_GROUP_NAME} --query "fqdn" --output tsv) -CILIUM_INSTALL=$(cat <<- EOM -sudo -H -u ubuntu bash -c 'helm repo add cilium https://helm.cilium.io/ && -helm install cilium cilium/cilium --version 1.12.7 \ - --namespace kube-system \ - --set aksbyocni.enabled=true \ - --set nodeinit.enabled=true \ - --set hubble.relay.enabled=true \ -{%- if cloud_config.aks.cni == 'cilium-ebpf' %} - --set hubble.ui.enabled=true \ - --set kubeProxyReplacement=strict \ - --set k8sServiceHost=${AKS_FQDN} \ - --set k8sServicePort=443' -{%- else %} - --set hubble.ui.enabled=true' -{%- endif %} -EOM -) - -# Run install Cilium command -az vm run-command invoke \ - --resource-group $AZ_GROUP_NAME \ - --name $ANSIBLE_INSTANCE_NAME \ - --command-id RunShellScript \ - --no-wait \ - --scripts "$CILIUM_INSTALL" -{%- endif %} - # Get subscriber ID SUBSCRIPTION_ID=$(az account show --query id --output tsv) diff --git a/cloud/cwdf/templates/terraform/aws/ansible_host.tf.jinja b/cloud/cwdf_util/templates/terraform/aws/ansible_host.tf.jinja similarity index 95% rename from cloud/cwdf/templates/terraform/aws/ansible_host.tf.jinja rename to cloud/cwdf_util/templates/terraform/aws/ansible_host.tf.jinja index 121dba71..a7303d7c 100644 --- a/cloud/cwdf/templates/terraform/aws/ansible_host.tf.jinja +++ b/cloud/cwdf_util/templates/terraform/aws/ansible_host.tf.jinja @@ -138,6 +138,11 @@ data "aws_ami" "ubuntu2204" { owners = ["099720109477"] # Canonical } +resource "aws_eip" "ansible" { + instance = aws_instance.ansible.id + domain = "vpc" +} + resource "aws_instance" "ansible" { ami = data.aws_ami.ubuntu2204.id instance_type = "{{ ansible_instance_type }}" @@ -147,9 +152,15 @@ resource "aws_instance" "ansible" { key_name = aws_key_pair.default.key_name iam_instance_profile = aws_iam_instance_profile.ansible-instance-profile.name + metadata_options { + http_endpoint = "enabled" + http_tokens = "required" + } + root_block_device { volume_size = 64 volume_type = "gp3" + encrypted = true } user_data = </cluster name = "/aws/eks/cwdf-infra-{{ job_id }}-eks-cluster/cluster" retention_in_days = 7 skip_destroy = false + kms_key_id = aws_kms_key.cluster_logs.arn tags = merge( jsondecode("{{ extra_tags_json }}"), @@ -88,6 +199,7 @@ resource "aws_cloudwatch_log_group" "performance" { name = "/aws/containerinsights/cwdf-infra-{{ job_id }}-eks-cluster/performance" retention_in_days = 7 skip_destroy = false + kms_key_id = aws_kms_key.performance_logs.arn tags = merge( jsondecode("{{ extra_tags_json }}"), @@ -98,13 +210,34 @@ resource "aws_cloudwatch_log_group" "performance" { ) } +resource "aws_kms_key" "eks_secret" { + description = "EKS Cluster {{ job_id }} Secrets Key" + deletion_window_in_days = 7 + enable_key_rotation = true + + tags = merge( + jsondecode("{{ extra_tags_json }}"), + { + Name = "cwdf-infra-{{ job_id }}-eks-cluster-secrets-key" + JobId = "{{ job_id }}" + } + ) +} + resource "aws_eks_cluster" "default" { role_arn = aws_iam_role.eks-cluster-role.arn name = "cwdf-infra-{{ job_id }}-eks-cluster" version = "{{ eks.kubernetes_version }}" - enabled_cluster_log_types = ["api", "audit", "authenticator"] + enabled_cluster_log_types = ["api", "authenticator", "audit", "scheduler", "controllerManager"] + + encryption_config { + resources = [ "secrets" ] + provider { + key_arn = aws_kms_key.eks_secret.arn + } + } vpc_config { subnet_ids = [{% for subnet in eks.subnets %}aws_subnet.{{ subnet }}.id,{% endfor %}] @@ -165,6 +298,11 @@ resource "aws_launch_template" "{{ node_group.name }}" { aws_eks_cluster.default.vpc_config[0].cluster_security_group_id ] + metadata_options { + http_endpoint = "enabled" + http_tokens = "required" + } + user_data = base64encode(< /dev/null +chmod go+r /etc/apt/keyrings/microsoft.gpg +AZ_REPO=$(lsb_release -cs) +echo "deb [arch=`dpkg --print-architecture` signed-by=/etc/apt/keyrings/microsoft.gpg] https://packages.microsoft.com/repos/azure-cli/ $AZ_REPO main" | + tee /etc/apt/sources.list.d/azure-cli.list apt-get -qq -y update -apt-get -qq -y upgrade -apt-get -qq -y install python3-pip python3-venv -apt-get -qq -y install zip unzip net-tools apache2-utils -apt-get -qq -y install podman -curl -sL https://aka.ms/InstallAzureCLIDeb | bash +apt-get -qq -y -o DPkg::Lock::Timeout=60 upgrade +apt-get -qq -y -o DPkg::Lock::Timeout=60 install python3-pip python3-venv +apt-get -qq -y -o DPkg::Lock::Timeout=60 install zip unzip net-tools apache2-utils +apt-get -qq -y -o DPkg::Lock::Timeout=60 install ca-certificates curl apt-transport-https lsb-release gnupg +apt-get -qq -y -o DPkg::Lock::Timeout=60 install podman +apt-get -qq -y -o DPkg::Lock::Timeout=60 install azure-cli az aks install-cli sudo -H -u ubuntu bash -c 'az login --identity' sudo -H -u ubuntu bash -c 'az aks get-credentials --resource-group ${azurerm_resource_group.default.name} --name ${azurerm_kubernetes_cluster.default.name}' diff --git a/cloud/cwdf/templates/terraform/azure/common.tf.jinja b/cloud/cwdf_util/templates/terraform/azure/common.tf.jinja similarity index 100% rename from cloud/cwdf/templates/terraform/azure/common.tf.jinja rename to cloud/cwdf_util/templates/terraform/azure/common.tf.jinja diff --git a/cloud/cwdf/templates/terraform/azure/cr.tf.jinja b/cloud/cwdf_util/templates/terraform/azure/cr.tf.jinja similarity index 100% rename from cloud/cwdf/templates/terraform/azure/cr.tf.jinja rename to cloud/cwdf_util/templates/terraform/azure/cr.tf.jinja diff --git a/cloud/cwdf/templates/terraform/azure/provider.tf.jinja b/cloud/cwdf_util/templates/terraform/azure/provider.tf.jinja similarity index 93% rename from cloud/cwdf/templates/terraform/azure/provider.tf.jinja rename to cloud/cwdf_util/templates/terraform/azure/provider.tf.jinja index 4049f6f9..f0e75bef 100644 --- a/cloud/cwdf/templates/terraform/azure/provider.tf.jinja +++ b/cloud/cwdf_util/templates/terraform/azure/provider.tf.jinja @@ -2,11 +2,11 @@ terraform { required_providers { azurerm = { source = "hashicorp/azurerm" - version = "3.41.0" + version = "3.64.0" } helm = { source = "hashicorp/helm" - version = "2.8.0" + version = "2.10.1" } } } diff --git a/cloud/deployer.py b/cloud/deployer.py index 315ee3c6..f874dbad 100644 --- a/cloud/deployer.py +++ b/cloud/deployer.py @@ -1,20 +1,27 @@ -import os -import paramiko -import click import json -import subprocess # nosec B404 # subrocess is set to shell=False -import string +import os import random -from time import sleep -import cwdf -import yaml import shutil -import sw_deployment.sw_deployment_tool as sw_deployment -from ssh_connector import SSHConnector +import string +import subprocess # nosec B404 # subprocess is set to shell=False + +import click +import paramiko +import yaml + from azure.identity import AzureCliCredential from azure.mgmt.network import NetworkManagementClient from azure.mgmt.compute import ComputeManagementClient +import cwdf_util +import sw_deployment.sw_deployment_tool as sw_deployment + +from ssh_connector import SSHConnector + + +def subprocess_run(*args, **kwargs): + return subprocess.run(*args, **kwargs) # pylint: disable=W1510 + def generate_ssh_keys(ssh_dir, public_key_path, private_key_path): key_length = 2048 @@ -28,52 +35,18 @@ def generate_ssh_keys(ssh_dir, public_key_path, private_key_path): def authenticate_aks(deployment_dir): - proc = subprocess.run(["terraform", "output", "-json"], cwd=deployment_dir, capture_output=True) + proc = subprocess_run(["terraform", "output", "-json"], cwd=deployment_dir, capture_output=True) terraform_output = json.loads(proc.stdout) resource_group_name = terraform_output.get("resource_group_name", {}).get("value") aks_cluster_name = terraform_output.get("aks_cluster_name", {}).get("value") if not resource_group_name or not aks_cluster_name: return click.echo("Running az aks get-credentials...") - proc = subprocess.run(["az", "aks", "get-credentials", f"-n={aks_cluster_name}", f"-g={resource_group_name}"], universal_newlines=True) + proc = subprocess_run(["az", "aks", "get-credentials", f"-n={aks_cluster_name}", f"-g={resource_group_name}"], universal_newlines=True) if proc.returncode != 0: click.secho("Unable to setup kubectl. Ignoring...", err=True, bold=True, fg="red") -def install_aks_confcom_addon(deployment_dir, provisioner_tool): - aks_cluster_name = "" - resource_group_name = "" - if provisioner_tool == "terraform": - proc = subprocess.run(["terraform", "output", "-json"], cwd=deployment_dir, capture_output=True) - terraform_output = json.loads(proc.stdout) - resource_group_name = terraform_output.get("resource_group_name", {}).get("value") - aks_cluster_name = terraform_output.get("aks_cluster_name", {}).get("value") - if not resource_group_name or not aks_cluster_name: - return - - if provisioner_tool == "cloudcli": - output_file = os.path.join(deployment_dir, "provision_output.json") - with open(file=output_file, mode='r', encoding='utf-8') as json_file: - cloudcli_output = json.load(json_file) - resource_group_name = cloudcli_output["resource_group_name"]["value"] - aks_cluster_name = cloudcli_output["aks_cluster_name"]["value"] - if not resource_group_name or not aks_cluster_name: - return - - proc = subprocess.run(["az", "aks", "addon", "list", f"-n={aks_cluster_name}", f"-g={resource_group_name}"], capture_output=True) - json_output = proc.stdout - enabled_addons = json.loads(json_output) - sgx_addon = list(filter(lambda addon: addon["api_key"] == "ACCSGXDevicePlugin", enabled_addons))[0] - if sgx_addon["enabled"] == True: - click.echo("SGX is already enabled in cluster.") - return - - click.echo("Enabling SGX...") - proc = subprocess.run(["az", "aks", "enable-addons", "-a=confcom", f"-n={aks_cluster_name}", f"-g={resource_group_name}", "--enable-sgxquotehelper"], universal_newlines=True) - if proc.returncode != 0: - click.secho("Unable to enable SGX. Ignoring...", err=True, bold=True, fg="red") - - @click.group() def cli(): pass @@ -107,7 +80,7 @@ def deploy(deployment_dir, provisioner_tool): with open(public_key_path, 'r') as f: ssh_public_key = f.read() with open(private_key_path, 'r') as f: - ssh_private_key = f.read() + f.read() # Create lock file if not exists # Lock file is intended to contain info not to break previous deployment in future @@ -118,19 +91,20 @@ def deploy(deployment_dir, provisioner_tool): lock_str = f.read() try: lock = json.loads(lock_str) - except ValueError as e: + except ValueError: lock = None if lock is not None and "job_id" in lock: job_id = lock["job_id"] else: # Random 8 digit identifier - job_id = ''.join(random.choices(string.digits, k=8)) # nosec B311 # pseudo-random generator is not used for security purposes + job_id = ''.join(random.choices(string.digits, k=8)) # nosec B311 # pseudo-random generator is not used for security purposes lock = json.dumps({"job_id": job_id}) f.write(lock) click.echo("Job ID: " + job_id) + provisioning_output = None if provisioner_tool == "terraform": provisioning_output = terrafrom_provisioning(deployment_dir, cwdf_user_config, job_id, ssh_public_key) if provisioner_tool == "cloudcli": @@ -149,7 +123,7 @@ def deploy(deployment_dir, provisioner_tool): if cloud_provider == "azure": subscription_id = provisioning_output["subscription_id"]["value"] aks_scale_sets_rg = provisioning_output["aks_scale_sets_rg"]["value"] - + credential = AzureCliCredential() network_client = NetworkManagementClient(credential, subscription_id) compute_client = ComputeManagementClient(credential, subscription_id) @@ -169,7 +143,7 @@ def deploy(deployment_dir, provisioner_tool): }) elif cloud_provider == "aws": workers = provisioning_output["k8s_worker_instances"]["value"] - + click.echo("Worker nodes:") click.echo("-------------------") workers_ip = [] @@ -182,12 +156,12 @@ def deploy(deployment_dir, provisioner_tool): ssh_username = provisioning_output["k8s_worker_username"]["value"] click.echo("Opening SSH connection to Ansible host...") ssh = SSHConnector(ip_address=ansible_host_ip, - username='ubuntu', - priv_key=private_key_path, - try_loop=True) + username='ubuntu', + priv_key=private_key_path, + try_loop=True) click.echo("Opened SSH connection.") click.echo("Waiting for cloud init to complete on Ansible host...") - stdin, stdout, stderr = ssh.exec_command("cloud-init status --wait") + _, stdout, stderr = ssh.exec_command("cloud-init status --wait") stdout.channel.recv_exit_status() click.echo("Cloud init done.") @@ -199,10 +173,10 @@ def deploy(deployment_dir, provisioner_tool): cwdf_output_filename = os.path.join(deployment_dir, "cwdf_output.yaml") with open(cwdf_output_filename, 'w') as f: yaml.dump(cwdf_output, f) - ssh.copy_file(cwdf_output_filename, destination_path="/home/ubuntu/cwdf_deployment/") + ssh.copy_file(cwdf_output_filename, destination_path="/home/ubuntu/cwdf_deployment/") click.echo("Transferring SSH keys to Ansible machine...") - ssh.copy_file(private_key_path, destination_path='/tmp/id_rsa',) + ssh.copy_file(private_key_path, destination_path='/tmp/id_rsa') ssh.exec_command("sudo mv /tmp/id_rsa /home/ubuntu/cwdf_deployment/ssh/id_rsa") ssh.exec_command("sudo chmod 600 /home/ubuntu/cwdf_deployment/ssh/id_rsa") click.echo("Successfully transferred SSH key to ~/cwdf_deployment/ssh/id_rsa") @@ -215,7 +189,7 @@ def deploy(deployment_dir, provisioner_tool): if not os.path.exists(discovery_results_path): os.makedirs(discovery_results_path) for worker in workers: - stdin, stdout, stderr = ssh.exec_command( + _, stdout, stderr = ssh.exec_command( f"python3 /home/ubuntu/cwdf_deployment/discovery/discover.py {worker['private_ip']} ec2-user /home/ubuntu/cwdf_deployment/ssh/id_rsa" ) if stdout.channel.recv_exit_status() != 0: @@ -231,18 +205,6 @@ def deploy(deployment_dir, provisioner_tool): click.echo("Copied discovery results to Ansible host.") configuration = yaml.safe_load(cwdf_user_config) - if cloud_provider == 'azure': - if configuration["azureConfig"]["aks"]["cni"] == "cilium-ebpf": - for worker in workers: - click.echo(f"Preparing {worker['private_ip']} for Cilium eBPF") - stdin, stdout, stderr = ssh.exec_command( - f"ssh -o StrictHostKeyChecking=no {ssh_username}@{worker['private_ip']} -i ~/cwdf_deployment/ssh/id_rsa \"sudo iptables-save | sudo grep -v KUBE | sudo iptables-restore\"" - ) - if stdout.channel.recv_exit_status() != 0: - click.echo(f"Error while preparing {worker['private_ip']} for cilium eBPF:", err=True) - click.echo(stderr.read(), err=True) - if configuration["azureConfig"]["aks"]["enable_sgx"] == True: - install_aks_confcom_addon(deployment_dir, provisioner_tool) ssh.close_connection() click.echo("-------------------") @@ -265,14 +227,15 @@ def deploy(deployment_dir, provisioner_tool): sw_deployment.start_deploy(config=sw_config_path) + def terrafrom_provisioning(deployment_dir, cwdf_user_config, job_id, ssh_public_key): - tf_manifest, cwdf_config = cwdf.compose_terraform(cwdf_user_config, job_id, ssh_public_key) + tf_manifest, _ = cwdf_util.compose_terraform(cwdf_user_config, job_id, ssh_public_key) manifest_path = os.path.join(deployment_dir, 'deploy.tf') with open(manifest_path, 'w') as f: f.write(tf_manifest) click.echo("Initializing Terraform...") - proc = subprocess.run(["terraform", "init", "-upgrade"], cwd=deployment_dir, universal_newlines=True) + proc = subprocess_run(["terraform", "init", "-upgrade"], cwd=deployment_dir, universal_newlines=True) if proc.returncode != 0: click.secho("Error while initializing Terraform", err=True, bold=True, fg="red") return @@ -281,8 +244,8 @@ def terrafrom_provisioning(deployment_dir, cwdf_user_config, job_id, ssh_public_ authenticate_aks(deployment_dir) click.echo("Building Terraform plan...") - proc = subprocess.run([ - "terraform", "plan", "-out=outfile", "-detailed-exitcode"], + proc = subprocess_run( + ["terraform", "plan", "-out=outfile", "-detailed-exitcode"], cwd=deployment_dir, universal_newlines=True ) if proc.returncode == 1: @@ -292,7 +255,7 @@ def terrafrom_provisioning(deployment_dir, cwdf_user_config, job_id, ssh_public_ click.echo("No infrastructure changes needed.") elif proc.returncode == 2: if click.confirm("Continue with above modifications?"): - proc = subprocess.run(["terraform", "apply", "outfile"], cwd=deployment_dir, universal_newlines=True) + proc = subprocess_run(["terraform", "apply", "outfile"], cwd=deployment_dir, universal_newlines=True) if proc.returncode == 1: click.echo("Error while running deployment", err=True) return @@ -303,16 +266,17 @@ def terrafrom_provisioning(deployment_dir, cwdf_user_config, job_id, ssh_public_ else: return - proc = subprocess.run(["terraform", "output", "-json"], cwd=deployment_dir, capture_output=True) + proc = subprocess_run(["terraform", "output", "-json"], cwd=deployment_dir, capture_output=True) json_output = proc.stdout provisioning_output = json.loads(json_output) return provisioning_output + def cloudcli_provisioning(deployment_dir, cwdf_user_config, job_id, public_key_path): - script_path = cwdf.compose_cloudcli(deployment_dir, cwdf_user_config, job_id, public_key_path) + script_path = cwdf_util.compose_cloudcli(deployment_dir, cwdf_user_config, job_id, public_key_path) click.echo("Running CloudCLI provisioning script...") - proc = subprocess.run([script_path], universal_newlines=True) + proc = subprocess_run([script_path], universal_newlines=True) if proc.returncode != 0: click.secho("Error while initializing deploy script", err=True, bold=True, fg="red") return @@ -323,16 +287,19 @@ def cloudcli_provisioning(deployment_dir, cwdf_user_config, job_id, public_key_p return cloudcli_output + def remove_all_k8s_services(ansible_host_ip, private_key_path): ssh = SSHConnector(ip_address=ansible_host_ip, - username='ubuntu', - priv_key=private_key_path, - try_loop=True) + username='ubuntu', + priv_key=private_key_path, + try_loop=True) click.echo("Removing all k8s services...") - stdin, stdout, stderr = ssh.exec_command('for each in $(kubectl get ns -o jsonpath="{.items[*].metadata.name}" | sed s/"kube-system"//); do kubectl delete service --all -n $each; done') + cmd = 'for each in $(kubectl get ns -o jsonpath="{.items[*].metadata.name}" | sed s/"kube-system"//); do kubectl delete service --all -n $each; done' + stdout = ssh.exec_command(cmd)[1] stdout.channel.recv_exit_status() ssh.close_connection() + def temp_files_remove(deployment_dir, provisioner_tool, cloud_provider): click.echo("Removing temporary files...") @@ -367,7 +334,7 @@ def temp_files_remove(deployment_dir, provisioner_tool, cloud_provider): def terraform_cleanup(deployment_dir, skip_service_cleanup): if not skip_service_cleanup: - proc = subprocess.run(["terraform", "output", "-json"], cwd=deployment_dir, capture_output=True) + proc = subprocess_run(["terraform", "output", "-json"], cwd=deployment_dir, capture_output=True) json_output = proc.stdout terraform_output = json.loads(json_output) if "ansible_host_public_ip" in terraform_output: @@ -378,7 +345,7 @@ def terraform_cleanup(deployment_dir, skip_service_cleanup): # Setup ~/.kube/config if in AKS environment to prevent helm provider errors authenticate_aks(deployment_dir) - proc = subprocess.run( + proc = subprocess_run( ["terraform", "plan", "-destroy", "-out=destroyplan", "-detailed-exitcode"], cwd=deployment_dir, universal_newlines=True @@ -388,22 +355,23 @@ def terraform_cleanup(deployment_dir, skip_service_cleanup): return elif proc.returncode == 0: click.echo("No infrastructure changes needed.") - elif proc.returncode == 2: - if click.confirm("Continue with above modifications?"): - proc = subprocess.run(["terraform", "apply", "destroyplan"], cwd=deployment_dir, universal_newlines=True) - if proc.returncode == 1: - click.echo("Error while running cleanup", err=True) - return - else: - click.echo("Cleanup finished.") - else: - return - else: + elif proc.returncode != 2: + return + elif not click.confirm("Continue with above modifications?"): return + else: + proc = subprocess_run(["terraform", "apply", "destroyplan"], cwd=deployment_dir, universal_newlines=True) + if proc.returncode == 1: + click.echo("Error while running cleanup", err=True) + return + + click.echo("Cleanup finished.") + temp_files_remove(deployment_dir, "terraform", None) + def cloudcli_cleanup(deployment_dir): - config_path = os.path.join(deployment_dir, "cwdf.yaml") + config_path = os.path.join(deployment_dir, "cwdf_util.yaml") # Verify config file exists if not os.path.exists(config_path): @@ -417,12 +385,13 @@ def cloudcli_cleanup(deployment_dir): script_path = os.path.join(deployment_dir, cleaning_script) click.echo("Running CloudCLI cleaning script...") - proc = subprocess.run([script_path], universal_newlines=True) + proc = subprocess_run([script_path], universal_newlines=True) if proc.returncode != 0: click.secho("Error while initializing cleaning script", err=True, bold=True, fg="red") return temp_files_remove(deployment_dir, "cloudcli", configuration['cloudProvider']) + @click.command() @click.option('--deployment_dir', help='Path to deployment directory', required=True) @click.option('--skip_service_cleanup', help='Skip deleting k8s service resources', default=False) @@ -440,6 +409,5 @@ def cleanup(deployment_dir, skip_service_cleanup, provisioner_tool): cli.add_command(deploy) cli.add_command(cleanup) - if __name__ == "__main__": cli() diff --git a/cloud/discovery/discover.py b/cloud/discovery/discover.py index 82ea143e..7a60cbe1 100644 --- a/cloud/discovery/discover.py +++ b/cloud/discovery/discover.py @@ -1,20 +1,29 @@ -from asyncio.subprocess import DEVNULL +import fnmatch import json import os -import subprocess # nosec B404 # subrocess is set to shell=False -import paramiko -import sys -import fnmatch import pathlib +import subprocess # nosec B404 # subprocess is set to shell=False +import sys + +from asyncio.subprocess import DEVNULL + +import paramiko import yaml + sys.path.append(os.path.abspath(os.path.join(os.getcwd(), os.pardir))) -from ssh_connector import SSHConnector +from ssh_connector import SSHConnector # pylint:disable=C0413 ARCH_FILE = os.path.join(pathlib.Path(__file__).absolute().parent.resolve(), "cpu_arch.yml") qat_pf_ids = ['0435', '37c8', '19e2', '18ee', '6f54', '18a0', '4940', '4942'] qat_vf_ids = ['0443', '37c9', '19e3', '18ef', '6f55', '18a1', '4941', '4943'] feature_flag_summary = ["sgx", "avx"] +remote = None +nic_sriov = None +nic_ddp = None +nic_types = [] +qat_sriov = None + class Remote: def __init__(self, ip_addr, username, key_filename): @@ -31,10 +40,10 @@ def connect(self): def exec(self, cmd, split=False): try: - _stdin, output, stderr = self.session.exec_command(cmd) + _, output, stderr = self.session.exec_command(cmd) parse_out = output.read().decode("UTF-8").rstrip('\n') except paramiko.SSHException as e: - print("Command exec failed: ",e) + print("Command exec failed: ", e) return None if stderr.read(1): return None @@ -44,26 +53,29 @@ def exec(self, cmd, split=False): return parse_out def close(self): - self.close + pass + def check_output(cmd, split=False): if remote is not None: - output = remote.exec(cmd, split=split) + exec_func = getattr(remote, 'exec') + output = exec_func(cmd, split=split) return output try: if split: output = subprocess.check_output(cmd, shell=True, stderr=DEVNULL).decode("UTF-8").splitlines() else: output = subprocess.check_output(cmd, shell=True, stderr=DEVNULL).decode("UTF-8") - except subprocess.CalledProcessError as e: + except subprocess.CalledProcessError: return None return output + def socket_update(orig: dict, update: dict): for i in set(update["Socket"].keys()): if i not in orig["Socket"]: orig["Socket"].update({i: {}}) - if "Device" in list(set(orig["Socket"][i].keys())&set(update["Socket"][i].keys())): + if "Device" in list(set(orig["Socket"][i].keys()) & set(update["Socket"][i].keys())): orig["Socket"][i]["Device"].update(update["Socket"][i]["Device"]) else: orig["Socket"][i].update(update["Socket"][i]) @@ -71,34 +83,34 @@ def socket_update(orig: dict, update: dict): def get_pci_net(): + global nic_sriov # pylint: disable=W0603 + global nic_ddp # pylint: disable=W0603 + socket_out = {"Socket": {}} - global nic_sriov - global nic_ddp - global nic_types try: - with open(os.path.join(sys.path[0],"ddp_devs"), 'r') as file: + with open(os.path.join(sys.path[0], "ddp_devs"), 'r') as file: for line in file: if not line.strip().startswith("#"): ddp_list = line.strip() break - except IOError as e: + except IOError: print("Error loading ddp_devs - Exiting") sys.exit() net_devices = check_output("ls -1 /sys/class/net/*/device/numa_node", split=True) - if net_devices is None: return None + if net_devices is None: + return None net_numa = check_output("cat /sys/class/net/*/device/numa_node", split=True) - for (i, h) in zip(net_devices, net_numa): + for i, h in zip(net_devices, net_numa): h = int(h) dev_name = i.split("/")[4] - device = {dev_name: {}} + device_data = {} dev_path = os.path.split(i) - uevent_dump = check_output("cat %s/uevent" % dev_path[0]) + uevent_dump = check_output("cat %s/uevent" % dev_path[0]) for line in uevent_dump.splitlines(): linevals = list(map(str.strip, line.split('=', 1))) - device[dev_name].update({linevals[0].title(): linevals[1]}) - pci_slot = device[dev_name]["Pci_Slot_Name"].split(':', 1)[1] - del device[dev_name]["Pci_Slot_Name"] - device[dev_name].update({"Interface": dev_name}) + device_data.update({linevals[0].title(): linevals[1]}) + pci_slot = device_data.pop("Pci_Slot_Name").split(':', 1)[1] + device_data["Interface"] = dev_name pci_subsystem = check_output("lspci -s %s -v | grep Subsystem" % pci_slot) if pci_subsystem: pci_subsystem = pci_subsystem.split(':')[1].strip() @@ -111,9 +123,8 @@ def get_pci_net(): pci_subsystem = "Unknown" except AttributeError: pci_subsystem = "Unknown" - device[dev_name].update({"Device": pci_subsystem}) - device[pci_slot] = device[dev_name] - del device[dev_name] + device_data.update({"Device": pci_subsystem}) + device = {pci_slot: device_data} if "Pci_Id" in device[pci_slot].keys(): if device[pci_slot]["Pci_Id"] in ddp_list: device[pci_slot].update({"Ddp_Support": True}) @@ -127,9 +138,9 @@ def get_pci_net(): if "fvl" not in nic_types: nic_types.append("fvl") - ## Get information about PF/VF and SR-IOV - ## Check for SR-IOV Capabilities - ########## CONTINUE WORKING ON THIS, MAKE SURE ALL INTERFACES HAVE RELEVANT INFO + # Get information about PF/VF and SR-IOV + # Check for SR-IOV Capabilities + # CONTINUE WORKING ON THIS, MAKE SURE ALL INTERFACES HAVE RELEVANT INFO totalvfs = check_output("cat %s/sriov_totalvfs" % dev_path[0]) if totalvfs is not None and int(totalvfs) > 0: # PF with SR-IOV enabled @@ -163,11 +174,13 @@ def get_pci_net(): socket_out["Socket"][h]["Device"]["Nic"].update(device) return socket_out + def get_pci_qat(): + global qat_sriov # pylint: disable=W0603 + pf_ids = [] vf_ids = [] socket_out = {"Socket": {}} - global qat_sriov dev_path = "/sys/bus/pci/devices/0000:" pci_devices = check_output("lspci -nmm", split=True) if not pci_devices: @@ -179,11 +192,12 @@ def get_pci_qat(): for vf_id in qat_vf_ids: if vf_id in device: vf_ids.append(device.split()[0]) - if len(pf_ids) == 0 and len(vf_ids) == 0: return None + if len(pf_ids) == 0 and len(vf_ids) == 0: + return None for pf_id in pf_ids: device = {pf_id: {}} qat_numa = int(check_output("cat %s%s/numa_node" % (dev_path, pf_id))) - uevent_dump = check_output("cat %s%s/uevent" % (dev_path, pf_id)) + uevent_dump = check_output("cat %s%s/uevent" % (dev_path, pf_id)) pci_subsystem = check_output("lspci -s %s -v | grep Subsystem" % pf_id) if pci_subsystem: pci_subsystem = pci_subsystem.split(':')[1].strip() @@ -225,7 +239,7 @@ def get_pci_qat(): for vf_id in vf_ids: device = {vf_id: {}} qat_numa = int(check_output("cat %s%s/numa_node" % (dev_path, vf_id))) - uevent_dump = check_output("cat %s%s/uevent" % (dev_path, vf_id)) + uevent_dump = check_output("cat %s%s/uevent" % (dev_path, vf_id)) pci_subsystem = check_output("lspci -s %s -v | grep Subsystem" % vf_id) if pci_subsystem: pci_subsystem = pci_subsystem.split(':')[1].strip() @@ -253,19 +267,23 @@ def get_pci_qat(): socket_out["Socket"][qat_numa]["Device"]["Qat"].update(device) return socket_out + def get_lscpu(): lscpu_out = {} cpu_info_json = check_output("lscpu -J") - if cpu_info_json is None: return None + if cpu_info_json is None: + return None json_object = json.loads(cpu_info_json) for i in json_object['lscpu']: - lscpu_out[i['field'].replace(":","")] = i['data'] + lscpu_out[i['field'].replace(":", "")] = i['data'] return {"lscpu": lscpu_out} + def get_core_info(): socket_out = {"Socket": {}} core_info_csv = check_output("lscpu -p=cpu,core,socket,node,cache") - if core_info_csv is None: return None + if core_info_csv is None: + return None for i in core_info_csv.splitlines(): # CPU, Core, Socket, Node, Cache if i and not i.startswith("#"): @@ -283,7 +301,7 @@ def get_core_info(): socket_out["Socket"].update({socket_id: {"Cores": {}}}) if core_id not in socket_out["Socket"][socket_id]["Cores"]: socket_out["Socket"][socket_id]["Cores"].update({core_id: {"Cpus": []}}) - #print(socket_out["Socket"][socket_id]["Core"][core_id]) + # print(socket_out["Socket"][socket_id]["Core"][core_id]) socket_out["Socket"][socket_id]["Cores"][core_id]["Cpus"].append(cpu_id) if node_id is not None and "Node" not in socket_out["Socket"][socket_id]["Cores"][core_id].keys(): socket_out["Socket"][socket_id]["Cores"][core_id].update({"Node": node_id}) @@ -291,10 +309,12 @@ def get_core_info(): socket_out["Socket"][socket_id]["Cores"][core_id].update({"Cache": cache}) return socket_out + def get_socket_mem_info(): socket_out = {"Socket": {}} mem_nodes = check_output("ls -1 /sys/devices/system/node/node*/meminfo", split=True) - if mem_nodes is None: return None + if mem_nodes is None: + return None for i in mem_nodes: socket = int(i.split("/")[5].lstrip('node')) socket_out["Socket"].update({socket: {"Memory": {}}}) @@ -304,37 +324,39 @@ def get_socket_mem_info(): socket_out["Socket"][socket]["Memory"].update({valpair[0].lstrip(':'): valpair[1]}) return socket_out + def get_mem_info(): # Add to full output meminfo_out = {"Memory": {}} mem_info = check_output("cat /proc/meminfo", split=True) - if mem_info is None: return None + if mem_info is None: + return None for i in mem_info: valpair = i.split()[0:2] meminfo_out["Memory"].update({valpair[0].rstrip(':'): valpair[1]}) return meminfo_out + def get_host_info(): - hostinfo_out = {"Host": {}} + host_data = {} # consider changing to /etc/os-release if hostnamectl is not common host_info = check_output("hostnamectl", split=True) if host_info: for i in host_info: value = i.split(':', 1)[1].strip() if "Static hostname" in i: - hostinfo_out["Host"].update({"Hostname": value}) + host_data.update({"Hostname": value}) elif "Operating System" in i: - hostinfo_out["Host"].update({"OS": value}) + host_data.update({"OS": value}) elif "Kernel" in i: - hostinfo_out["Host"].update({"Kernel": value}) + host_data.update({"Kernel": value}) elif "Architecture" in i: - hostinfo_out["Host"].update({"Arch": value}) + host_data.update({"Arch": value}) codename = check_output("cat /sys/devices/cpu/caps/pmu_name") if codename: - codename = codename.strip() - hostinfo_out["Host"].update({"Codename": codename.title()}) - if not hostinfo_out["Host"].keys(): return None - return hostinfo_out + host_data.update({"Codename": codename.strip().title()}) + return {"Host": host_data} if host_data else None + def get_cpu_arch_codename(cpu_model): cpu_codename_arch = '' @@ -345,12 +367,13 @@ def get_cpu_arch_codename(cpu_model): except yaml.YAMLError as exc: print(exc) if cpu_models is not None: - for arch, obj in cpu_models['architectures'].items(): - for model in cpu_models['architectures'][arch]['models']: + for arch_name, arch_data in cpu_models['architectures'].items(): + for model in arch_data['models']: if model in cpu_model: - cpu_codename_arch = arch + cpu_codename_arch = arch_name return cpu_codename_arch + def get_summary(info: dict): summary = {} # summarize existing object @@ -364,7 +387,7 @@ def get_summary(info: dict): elif info["Memory"]["Hugepagesize"] == "2048": summary["Hugepage_Size"] = "2M" else: - summary["Hugepage_Size"] = info["Memory"]["Hugepagesize"]+"K" + summary["Hugepage_Size"] = info["Memory"]["Hugepagesize"] + "K" if "lscpu" in info.keys(): if "Model name" in info["lscpu"]: summary["Cpu_Model"] = info["lscpu"]["Model name"] @@ -381,11 +404,11 @@ def get_summary(info: dict): summary["Numa_Nodes"] = info["lscpu"]["NUMA node(s)"] if int(summary["Numa_Nodes"]) != 0: for i in range(int(summary["Numa_Nodes"])): - summary["Numa_Node"+str(i)+"_Cpus"] = info["lscpu"]["NUMA node"+str(i)+" CPU(s)"] + summary["Numa_Node" + str(i) + "_Cpus"] = info["lscpu"]["NUMA node" + str(i) + " CPU(s)"] if "Flags" in info["lscpu"]: flags = info["lscpu"]["Flags"].split() for i in feature_flag_summary: - matches = fnmatch.filter(flags,i+"*") + matches = fnmatch.filter(flags, i + "*") if matches: summary[i.title()] = matches if "Virtualization" in info["lscpu"]: @@ -406,18 +429,20 @@ def get_summary(info: dict): summary_out = {"Summary": summary} return summary_out + def main(ip_addr, username, key_filename): - global remote + global remote # pylint: disable=W0603 + global nic_sriov # pylint: disable=W0603 + global nic_types # pylint: disable=W0603 + global qat_sriov # pylint: disable=W0603 + global nic_ddp # pylint: disable=W0603 + remote = Remote(ip_addr, username, key_filename) remote.connect() output = {"Socket": {}} - global nic_sriov nic_sriov = False - global nic_types nic_types = [] - global qat_sriov qat_sriov = False - global nic_ddp nic_ddp = False pci_net = get_pci_net() pci_qat = get_pci_qat() diff --git a/cloud/discovery/discover_local.py b/cloud/discovery/discover_local.py index 9b91c96f..6f793e10 100644 --- a/cloud/discovery/discover_local.py +++ b/cloud/discovery/discover_local.py @@ -2,7 +2,7 @@ import json import os import pprint -import subprocess # nosec B404 # subrocess is set to shell=False +import subprocess # nosec B404 # subprocess is set to shell=False import sys import fnmatch import yaml @@ -10,6 +10,11 @@ qat_pf_ids = ['0435', '37c8', '19e2', '18ee', '6f54', '18a0', '4940', '4942'] qat_vf_ids = ['0443', '37c9', '19e3', '18ef', '6f55', '18a1', '4941', '4943'] feature_flag_summary = ["sgx", "avx"] +nic_sriov = None +nic_ddp = None +nic_types = [] +qat_sriov = None + def check_output(cmd, split=False): try: @@ -17,15 +22,16 @@ def check_output(cmd, split=False): output = subprocess.check_output(cmd, shell=True, stderr=DEVNULL).decode("UTF-8").splitlines() else: output = subprocess.check_output(cmd, shell=True, stderr=DEVNULL).decode("UTF-8") - except subprocess.CalledProcessError as e: + except subprocess.CalledProcessError: return None return output + def socket_update(orig: dict, update: dict): for i in set(update["Socket"].keys()): if i not in orig["Socket"]: orig["Socket"].update({i: {}}) - if "Device" in list(set(orig["Socket"][i].keys())&set(update["Socket"][i].keys())): + if "Device" in list(set(orig["Socket"][i].keys()) & set(update["Socket"][i].keys())): orig["Socket"][i]["Device"].update(update["Socket"][i]["Device"]) else: orig["Socket"][i].update(update["Socket"][i]) @@ -33,34 +39,34 @@ def socket_update(orig: dict, update: dict): def get_pci_net(): + global nic_sriov # pylint: disable=W0603 + global nic_ddp # pylint: disable=W0603 + socket_out = {"Socket": {}} - global nic_sriov - global nic_ddp - global nic_types try: - with open(os.path.join(sys.path[0],"ddp_devs"), 'r') as file: + with open(os.path.join(sys.path[0], "ddp_devs"), 'r') as file: for line in file: if not line.strip().startswith("#"): ddp_list = line.strip() break - except IOError as e: + except IOError: print("Error loading ddp_devs - Exiting") sys.exit() net_devices = check_output("ls -1 /sys/class/net/*/device/numa_node", split=True) - if net_devices is None: return None + if net_devices is None: + return None net_numa = check_output("cat /sys/class/net/*/device/numa_node", split=True) - for (i, h) in zip(net_devices, net_numa): + for i, h in zip(net_devices, net_numa): h = int(h) dev_name = i.split("/")[4] - device = {dev_name: {}} + device_data = {} dev_path = os.path.split(i) - uevent_dump = check_output("cat %s/uevent" % dev_path[0]) + uevent_dump = check_output("cat %s/uevent" % dev_path[0]) for line in uevent_dump.splitlines(): linevals = list(map(str.strip, line.split('=', 1))) - device[dev_name].update({linevals[0].title(): linevals[1]}) - pci_slot = device[dev_name]["Pci_Slot_Name"].split(':', 1)[1] - del device[dev_name]["Pci_Slot_Name"] - device[dev_name].update({"Interface": dev_name}) + device_data.update({linevals[0].title(): linevals[1]}) + pci_slot = device_data.pop("Pci_Slot_Name").split(':', 1)[1] + device_data.update({"Interface": dev_name}) pci_subsystem = check_output("lspci -s %s -v | grep Subsystem" % pci_slot) if pci_subsystem: pci_subsystem = pci_subsystem.split(':')[1].strip() @@ -73,9 +79,8 @@ def get_pci_net(): pci_subsystem = "Unknown" except AttributeError: pci_subsystem = "Unknown" - device[dev_name].update({"Device": pci_subsystem}) - device[pci_slot] = device[dev_name] - del device[dev_name] + device_data["Device"] = pci_subsystem + device = {pci_slot: device_data} if "Pci_Id" in device[pci_slot].keys(): if device[pci_slot]["Pci_Id"] in ddp_list: device[pci_slot].update({"Ddp_Support": True}) @@ -89,9 +94,9 @@ def get_pci_net(): if "fvl" not in nic_types: nic_types.append("fvl") - ## Get information about PF/VF and SR-IOV - ## Check for SR-IOV Capabilities - ########## CONTINUE WORKING ON THIS, MAKE SURE ALL INTERFACES HAVE RELEVANT INFO + # Get information about PF/VF and SR-IOV + # Check for SR-IOV Capabilities + # CONTINUE WORKING ON THIS, MAKE SURE ALL INTERFACES HAVE RELEVANT INFO totalvfs = check_output("cat %s/sriov_totalvfs" % dev_path[0]) if totalvfs is not None and int(totalvfs) > 0: # PF with SR-IOV enabled @@ -125,11 +130,13 @@ def get_pci_net(): socket_out["Socket"][h]["Device"]["Nic"].update(device) return socket_out + def get_pci_qat(): + global qat_sriov # pylint: disable=W0603 + pf_ids = [] vf_ids = [] socket_out = {"Socket": {}} - global qat_sriov dev_path = "/sys/bus/pci/devices/0000:" pci_devices = check_output("lspci -nmm", split=True) if not pci_devices: @@ -141,11 +148,12 @@ def get_pci_qat(): for vf_id in qat_vf_ids: if vf_id in device: vf_ids.append(device.split()[0]) - if len(pf_ids) == 0 and len(vf_ids) == 0: return None + if len(pf_ids) == 0 and len(vf_ids) == 0: + return None for pf_id in pf_ids: device = {pf_id: {}} qat_numa = int(check_output("cat %s%s/numa_node" % (dev_path, pf_id))) - uevent_dump = check_output("cat %s%s/uevent" % (dev_path, pf_id)) + uevent_dump = check_output("cat %s%s/uevent" % (dev_path, pf_id)) pci_subsystem = check_output("lspci -s %s -v | grep Subsystem" % pf_id) if pci_subsystem: pci_subsystem = pci_subsystem.split(':')[1].strip() @@ -187,7 +195,7 @@ def get_pci_qat(): for vf_id in vf_ids: device = {vf_id: {}} qat_numa = int(check_output("cat %s%s/numa_node" % (dev_path, vf_id))) - uevent_dump = check_output("cat %s%s/uevent" % (dev_path, vf_id)) + uevent_dump = check_output("cat %s%s/uevent" % (dev_path, vf_id)) pci_subsystem = check_output("lspci -s %s -v | grep Subsystem" % vf_id) if pci_subsystem: pci_subsystem = pci_subsystem.split(':')[1].strip() @@ -215,19 +223,23 @@ def get_pci_qat(): socket_out["Socket"][qat_numa]["Device"]["Qat"].update(device) return socket_out + def get_lscpu(): lscpu_out = {} cpu_info_json = check_output("lscpu -J") - if cpu_info_json is None: return None + if cpu_info_json is None: + return None json_object = json.loads(cpu_info_json) for i in json_object['lscpu']: - lscpu_out[i['field'].replace(":","")] = i['data'] + lscpu_out[i['field'].replace(":", "")] = i['data'] return {"lscpu": lscpu_out} + def get_core_info(): socket_out = {"Socket": {}} core_info_csv = check_output("lscpu -p=cpu,core,socket,node,cache") - if core_info_csv is None: return None + if core_info_csv is None: + return None for i in core_info_csv.splitlines(): # CPU, Core, Socket, Node, Cache if i and not i.startswith("#"): @@ -245,7 +257,7 @@ def get_core_info(): socket_out["Socket"].update({socket_id: {"Cores": {}}}) if core_id not in socket_out["Socket"][socket_id]["Cores"]: socket_out["Socket"][socket_id]["Cores"].update({core_id: {"Cpus": []}}) - #print(socket_out["Socket"][socket_id]["Core"][core_id]) + # print(socket_out["Socket"][socket_id]["Core"][core_id]) socket_out["Socket"][socket_id]["Cores"][core_id]["Cpus"].append(cpu_id) if node_id is not None and "Node" not in socket_out["Socket"][socket_id]["Cores"][core_id].keys(): socket_out["Socket"][socket_id]["Cores"][core_id].update({"Node": node_id}) @@ -253,10 +265,12 @@ def get_core_info(): socket_out["Socket"][socket_id]["Cores"][core_id].update({"Cache": cache}) return socket_out + def get_socket_mem_info(): socket_out = {"Socket": {}} mem_nodes = check_output("ls -1 /sys/devices/system/node/node*/meminfo", split=True) - if mem_nodes is None: return None + if mem_nodes is None: + return None for i in mem_nodes: socket = int(i.split("/")[5].lstrip('node')) socket_out["Socket"].update({socket: {"Memory": {}}}) @@ -266,37 +280,40 @@ def get_socket_mem_info(): socket_out["Socket"][socket]["Memory"].update({valpair[0].lstrip(':'): valpair[1]}) return socket_out + def get_mem_info(): # Add to full output meminfo_out = {"Memory": {}} mem_info = check_output("cat /proc/meminfo", split=True) - if mem_info is None: return None + if mem_info is None: + return None for i in mem_info: valpair = i.split()[0:2] meminfo_out["Memory"].update({valpair[0].rstrip(':'): valpair[1]}) return meminfo_out + def get_host_info(): - hostinfo_out = {"Host": {}} + host_data = {} # consider changing to /etc/os-release if hostnamectl is not common host_info = check_output("hostnamectl", split=True) if host_info: for i in host_info: value = i.split(':', 1)[1].strip() if "Static hostname" in i: - hostinfo_out["Host"].update({"Hostname": value}) + host_data.update({"Hostname": value}) elif "Operating System" in i: - hostinfo_out["Host"].update({"OS": value}) + host_data.update({"OS": value}) elif "Kernel" in i: - hostinfo_out["Host"].update({"Kernel": value}) + host_data.update({"Kernel": value}) elif "Architecture" in i: - hostinfo_out["Host"].update({"Arch": value}) + host_data.update({"Arch": value}) + codename = check_output("cat /sys/devices/cpu/caps/pmu_name") if codename: - codename = codename.strip() - hostinfo_out["Host"].update({"Codename": codename.title()}) - if not hostinfo_out["Host"].keys(): return None - return hostinfo_out + host_data.update({"Codename": codename.strip().title()}) + return {"Host": host_data} if host_data else None + def get_cpu_arch_codename(cpu_model): cpu_codename_arch = '' @@ -307,12 +324,13 @@ def get_cpu_arch_codename(cpu_model): except yaml.YAMLError as exc: print(exc) if cpu_models is not None: - for arch, obj in cpu_models['architectures'].items(): - for model in cpu_models['architectures'][arch]['models']: + for arch_name, arch_data in cpu_models['architectures'].items(): + for model in arch_data['models']: if model in cpu_model: - cpu_codename_arch = arch + cpu_codename_arch = arch_name return cpu_codename_arch + def get_summary(info: dict): summary = {} # summarize existing object @@ -326,7 +344,7 @@ def get_summary(info: dict): elif info["Memory"]["Hugepagesize"] == "2048": summary["Hugepage_Size"] = "2M" else: - summary["Hugepage_Size"] = info["Memory"]["Hugepagesize"]+"K" + summary["Hugepage_Size"] = info["Memory"]["Hugepagesize"] + "K" if "lscpu" in info.keys(): if "Model name" in info["lscpu"]: summary["Cpu_Model"] = info["lscpu"]["Model name"] @@ -343,11 +361,11 @@ def get_summary(info: dict): summary["Numa_Nodes"] = info["lscpu"]["NUMA node(s)"] if int(summary["Numa_Nodes"]) != 0: for i in range(int(summary["Numa_Nodes"])): - summary["Numa_Node"+str(i)+"_Cpus"] = info["lscpu"]["NUMA node"+str(i)+" CPU(s)"] + summary["Numa_Node" + str(i) + "_Cpus"] = info["lscpu"]["NUMA node" + str(i) + " CPU(s)"] if "Flags" in info["lscpu"]: flags = info["lscpu"]["Flags"].split() for i in feature_flag_summary: - matches = fnmatch.filter(flags,i+"*") + matches = fnmatch.filter(flags, i + "*") if matches: summary[i.title()] = matches if "Virtualization" in info["lscpu"]: @@ -368,15 +386,17 @@ def get_summary(info: dict): summary_out = {"Summary": summary} return summary_out + def main(): + global nic_sriov # pylint: disable=W0603 + global nic_types # pylint: disable=W0603 + global qat_sriov # pylint: disable=W0603 + global nic_ddp # pylint: disable=W0603 + output = {"Socket": {}} - global nic_sriov nic_sriov = False - global nic_types nic_types = [] - global qat_sriov qat_sriov = False - global nic_ddp nic_ddp = False pci_net = get_pci_net() pci_qat = get_pci_qat() @@ -404,5 +424,6 @@ def main(): pprint.pprint(output) return output + if __name__ == "__main__": main() diff --git a/cloud/discovery/profiler.py b/cloud/discovery/profiler.py index 9e0c389a..ff0deaa0 100644 --- a/cloud/discovery/profiler.py +++ b/cloud/discovery/profiler.py @@ -1,14 +1,17 @@ -import discover_local -import yaml -import pprint import os +import pprint import sys +import yaml + +import discover_local # pylint:disable=E0401 + dists = ["RedHat", "Rocky", "Ubuntu"] dist_vers = ['8.5', '20.04', '21.10', '22.04'] # Verify pmu_name for SPR below archs = ["skylake", "cascadelake", "icelake", "sapphirerapids"] + class Features: def __init__(self, plat: dict): self.plat = plat @@ -22,14 +25,14 @@ def __init__(self, plat: dict): def _load_yaml(self, featfile: str): try: - with open(os.path.join(sys.path[0],featfile), 'r') as file: + with open(os.path.join(sys.path[0], featfile), 'r') as file: try: output = yaml.safe_load(file) - except yaml.YAMLError as exc: + except yaml.YAMLError: print("Error parsing %s - Exiting" % featfile) sys.exit() return output - except IOError as e: + except IOError: print("Error loading %s - Exiting" % featfile) sys.exit() @@ -41,9 +44,7 @@ def _get_codename(self): print("No Codename information available") return None codename = self.plat["Host"]["Codename"] - if not codename: return None - if codename.lower() not in archs: return None - return codename.lower() + return codename.lower() if codename and codename.lower() in archs else None def _get_nic_types(self): if "Summary" not in self.plat.keys(): @@ -52,8 +53,7 @@ def _get_nic_types(self): if "Nic_Types" not in self.plat["Summary"].keys(): return None nics = self.plat["Summary"]["Nic_Types"] - if not nics: return None - return nics + return nics if nics else None def _check_distro(self): match = False @@ -69,10 +69,10 @@ def _check_distro(self): if dv in self.plat["Host"]["OS"]: match = True break - if match: break - if not match: - return None - return match + if match: + break + return match if match else None + def check_feat_support(key, feats): reqs = feats.feat_reqs @@ -86,6 +86,7 @@ def check_feat_support(key, feats): return False return True + def check_sub_feat_support(key, byo_sub_dict, feats): output_dict = {} reqs = feats.sub_feat_reqs @@ -105,17 +106,18 @@ def check_sub_feat_support(key, byo_sub_dict, feats): output_dict.update({subfeat: True}) else: for subfeat in byo_sub_dict.keys(): - output_dict.update({subfeat: True}) + output_dict.update({subfeat: True}) return output_dict -def byo_check(feats: object): + +def byo_check(feats): output = {} if "build_your_own" not in feats.profiles.keys(): return None byo_list = feats.profiles["build_your_own"].keys() byo_dict = feats.profiles["build_your_own"] for key in byo_list: - if type(byo_dict[key]) == dict: + if isinstance(byo_dict[key], dict): feat_support = check_feat_support(key, feats) if feat_support is False: support = "Unsupported" @@ -131,15 +133,17 @@ def byo_check(feats: object): output.update({key: support}) return output + def set_sub_static(subfeats, state): feat_dict = {} for feat in subfeats.keys(): feat_dict.update({feat: state}) return feat_dict + def check_feat(key, feats): - features = ["sriov_operator", "sriov_network_dp", "qat", "qat_dp", "ddp"] # sgx features covered in arch_features - unchecked = ["gpu", "gpu_dp", "name", "on_vms", "vm_mode"] # Consider minio (when not test-mode) and physical storage + features = ["sriov_operator", "sriov_network_dp", "qat", "qat_dp", "ddp"] # sgx features covered in arch_features + unchecked = ["gpu", "gpu_dp", "name", "on_vms", "vm_mode"] # Consider minio (when not test-mode) and physical storage if key in unchecked: return None elif key not in features: @@ -163,7 +167,8 @@ def check_feat(key, feats): return True return False -def check_profiles(profiles: object, byo_feats: dict): + +def check_profiles(profiles: dict, byo_feats: dict): summary = {} for prof in profiles.keys(): if prof == "build_your_own": @@ -182,7 +187,7 @@ def check_profiles(profiles: object, byo_feats: dict): summary[prof]["Features"].update({feat: "Unsupported (CPU/NIC)"}) elif byo_feats[feat] is None: summary[prof]["Features"].update({feat: "Unchecked (TODO)"}) - elif type(profiles[prof][feat]) is dict: + elif isinstance(profiles[prof][feat], dict): subfeat_set = {} if byo_feats[feat] == "Unsupported": summary[prof]["Features"].update({feat: "Unsupported"}) @@ -203,12 +208,11 @@ def check_profiles(profiles: object, byo_feats: dict): if subfeat_set: summary[prof]["Features"].update({feat: subfeat_set}) except KeyError: - print("KeyError (expected): ",feat) + print("KeyError (expected): ", feat) summary[prof]["Features"].update({feat: "Special feature (not in BYO)"}) - summary[prof].update({"Supported": prof_support}) - if not summary: - return None - return summary + summary[prof]["Supported"] = prof_support + return summary if summary else None + def main(): platform_info = discover_local.main() @@ -222,10 +226,14 @@ def main(): byo_feats = byo_check(feats) pprint.pprint(byo_feats) full_summary = check_profiles(feats.profiles, byo_feats) + if not full_summary: + print("No support summary") + return pprint.pprint(full_summary) print("Printing support summary:") - for profile in full_summary.keys(): - print(" Profile: %s, Supported: %s" % (profile, full_summary[profile]["Supported"])) + for profile_name, profile_data in full_summary.items(): + print(" Profile: %s, Supported: %s" % (profile_name, profile_data["Supported"])) + if __name__ == "__main__": main() diff --git a/cloud/ssh_connector.py b/cloud/ssh_connector.py index 945f94cb..7bb96bb7 100644 --- a/cloud/ssh_connector.py +++ b/cloud/ssh_connector.py @@ -1,9 +1,11 @@ """Class for SSH connection""" -import sys -import os import io +import os +import sys +import time + import click -from time import sleep + from paramiko import SSHClient, SSHConfig, ProxyCommand, AutoAddPolicy, SSHException, AuthenticationException from scp import SCPClient, SCPException @@ -22,14 +24,14 @@ def __init__(self, ip_address, username, port=22, priv_key=None, gateway=None, t Parameters: ip_address (string): IP address of the remote instance - username (string): User name for autentication in remote instance + username (string): User name for authentication in remote instance port (int): SSH port - priv_key (string): Path to private RSA key for autentication in remote instance + priv_key (string): Path to private RSA key for authentication in remote instance gateway (SSHConnector obj): [optional] SSHConnector object with active SSH connection to gateway for create proxy jump - try_lopp (bool): When connection fails + try_loop (bool): When connection fails - Rerurn: + Return: None """ @@ -76,18 +78,18 @@ def __init__(self, ip_address, username, port=22, priv_key=None, gateway=None, t try: self.client.connect(**cfg) ssh_connected = True - except SSHException as e: + except SSHException: click.echo("SSH not available yet. Retrying in 10 seconds.") - sleep(10) + time.sleep(10) else: try: self.client.connect(**cfg) - except AuthenticationException as e: - click.echo("Auth failed: ",e) + except AuthenticationException as exc: + click.echo("Auth failed: ", exc) sys.exit() - except SSHException as ssh_excep: + except SSHException as ssh_exc: click.echo("Cannot connect to instance via SSH", err=True) - click.echo(f"Error message: {ssh_excep}", err=True) + click.echo(f"Error message: {ssh_exc}", err=True) sys.exit() def exec_command(self, command, print_output=False, return_parsed_output=False): @@ -109,7 +111,7 @@ def exec_command(self, command, print_output=False, return_parsed_output=False): stdin, stdout, stderr = self.client.exec_command(command) except SSHException: click.echo(f"During command: {stdin}") - click.echo(f"Error ocured: {stderr}") + click.echo(f"Error occurred: {stderr}") if print_output: for line in iter(lambda: stdout.readline(2048), ""): click.echo(line, nl=False) @@ -133,8 +135,8 @@ def progress(self, filename, size, sent): """ with click.progressbar(length=100, - label=f"Uploading {filename} progress") as prog_bar: - prog_bar.update(float(sent)/float(size)*100) + label=f"Uploading {filename} progress") as prog_bar: + prog_bar.update(float(sent) / float(size) * 100) def copy_file(self, file_path, destination_path, recursive=False): """ @@ -151,8 +153,8 @@ def copy_file(self, file_path, destination_path, recursive=False): scp = SCPClient(self.client.get_transport(), progress=self.progress) try: scp.put(file_path, destination_path, recursive) - except SCPException as error: - click.echo(f"Error during uploading host_var file: {error}", err=True) + except SCPException as exc: + click.echo(f"Error during uploading host_var file: {exc}", err=True) scp.close() def close_connection(self): diff --git a/cloud/sw_deployment/docker_management.py b/cloud/sw_deployment/docker_management.py index 2baf866f..6e52c5c8 100644 --- a/cloud/sw_deployment/docker_management.py +++ b/cloud/sw_deployment/docker_management.py @@ -1,44 +1,36 @@ """Class for Docker images management""" +import base64 +import configparser +import json import os +import subprocess # nosec B404 # subprocess is set to shell=False + from pathlib import Path -import configparser -import base64 -import validators + +import boto3 import click import docker -import boto3 -import subprocess # nosec B404 # subrocess is set to shell=False -import json +import validators + + +def subprocess_run(*args, **kwargs): + return subprocess.run(*args, **kwargs) # pylint: disable=W1510 class DockerManagement: """ Class contains methods for copy docker images between registries. """ - docker_client = None - CLOUD = None - to_registry = None - from_registry = None - show_log = False - images_to_replicate = None - tagged_images = [] - - AWS_ACCESS_KEY_ID = None - AWS_ACCESS_SECRET_KEY = None - AWS_REGION = None - CR_PASSWORD = None - CR_USERNAME = None - CR_URL = None def __init__(self, from_registry, to_registry, images_to_replicate, region, cloud=None, show_log=False): """ Init method for class. Parameters: - from_registry (string): URL adress of source registry + from_registry (string): URL address of source registry to_registry (string): URL address of target registry images_to_duplicate (list): List of images to copy between registries - cloud (string): [Not required] Type of cloud with targer registry. Currently supported: ['aws'] + cloud (string): [Not required] Type of cloud with target registry. Currently supported: ['aws'] show_log (bool): [Not required] Show log of push image Return: @@ -46,24 +38,29 @@ def __init__(self, from_registry, to_registry, images_to_replicate, region, clou """ self.docker_client = docker.from_env() - self.CLOUD = cloud - self.AWS_REGION = region + self.cloud = cloud + self.aws_region = region + self.aws_access_key_id = None + self.aws_access_secret_key = None + self.cr_password = None + self.cr_username = None + self.cr_url = None self.show_log = show_log self.to_registry = to_registry self.images_to_replicate = images_to_replicate + self.tagged_images = [] if not validators.url(from_registry): click.secho('The source registry does not have a valid URL!', fg='red') return - else: - self.from_registry = from_registry.replace('https://', '') + self.from_registry = from_registry.replace('https://', '') self.images_to_replicate = images_to_replicate click.echo(f"Images to replicate: {self.images_to_replicate}") - if self.CLOUD == "aws": + if self.cloud == "aws": self.initialize_ecr() - elif self.CLOUD == "azure": + elif self.cloud == "azure": self.initialize_acr() def copy_images(self): @@ -80,21 +77,21 @@ def copy_images(self): """ for image in self.images_to_replicate: self.pull_image(registry_url=self.from_registry, - image_name=image) + image_name=image) new_image = self.tag_image(image_name=image, - registry_old= self.from_registry, + registry_old=self.from_registry, registry_new=self.to_registry) self.tagged_images.append(new_image) self.push_image(image=new_image['repository'], tag=new_image['tag'], - registry=self.CR_URL, - username=self.CR_USERNAME, - password=self.CR_PASSWORD) + registry=self.cr_url, + username=self.cr_username, + password=self.cr_password) def initialize_ecr(self): """ - Initializing ECR and getting AWS credentials for autentification in ECR. - Method using local AWS credentials and config files for autentification. + Initializing ECR and getting AWS credentials for authentication in ECR. + Method using local AWS credentials and config files for authentication. Method set the global variables used in previous method. Parameters: @@ -109,37 +106,37 @@ def initialize_ecr(self): try: config.read(aws_credentials) credentials = config['default'] - self.AWS_ACCESS_KEY_ID = credentials['aws_access_key_id'] - self.AWS_ACCESS_SECRET_KEY = credentials['aws_secret_access_key'] + self.aws_access_key_id = credentials['aws_access_key_id'] + self.aws_access_secret_key = credentials['aws_secret_access_key'] except configparser.ParsingError as parser_error: click.secho(parser_error, fg='red') - aws_session = boto3.Session(region_name=self.AWS_REGION) - ecr_client = aws_session.client('ecr', aws_access_key_id=self.AWS_ACCESS_KEY_ID, - aws_secret_access_key=self.AWS_ACCESS_SECRET_KEY, - region_name=self.AWS_REGION) + aws_session = boto3.Session(region_name=self.aws_region) + ecr_client = aws_session.client('ecr', aws_access_key_id=self.aws_access_key_id, + aws_secret_access_key=self.aws_access_secret_key, + region_name=self.aws_region) ecr_credentials = (ecr_client.get_authorization_token()['authorizationData'][0]) - self.CR_USERNAME = "AWS" - self.CR_PASSWORD = (base64.b64decode(ecr_credentials['authorizationToken']) + self.cr_username = "AWS" + self.cr_password = (base64.b64decode(ecr_credentials['authorizationToken']) .replace(b'AWS:', b'').decode('utf-8')) - self.CR_URL = self.to_registry + self.cr_url = self.to_registry def initialize_acr(self): acr_name = self.to_registry.split(".")[0] command = f'az acr login --name {acr_name} --expose-token' - result = subprocess.run(command.split(' '), stdout=subprocess.PIPE) + result = subprocess_run(command.split(' '), stdout=subprocess.PIPE) access_token = json.loads(result.stdout)["accessToken"] - self.CR_PASSWORD = access_token - self.CR_USERNAME = "00000000-0000-0000-0000-000000000000" - self.CR_URL = self.to_registry + self.cr_password = access_token + self.cr_username = "00000000-0000-0000-0000-000000000000" + self.cr_url = self.to_registry def pull_image(self, registry_url, image_name, username=None, password=None): """ Downloading image from remote to local registry. - Parametes: - registry_url (string): URL adress of source registry + Parameters: + registry_url (string): URL address of source registry image_name (string): Name of downloaded image username (string): User name for source registry password (string): Password for source registry @@ -172,7 +169,7 @@ def tag_image(self, image_name, registry_old, registry_new): """ image = self.docker_client.images.get(f"{registry_old}/{image_name}") - if self.CLOUD == 'aws': + if self.cloud == 'aws': target_image = registry_new tag = image_name.replace('/', '-').replace(':', '-') else: @@ -196,13 +193,14 @@ def push_image(self, image, tag, registry=None, username=None, password=None): None """ - auth_config = None click.echo("Pushing image:") + auth_config = None if registry is not None and username is not None and password is not None: self.docker_client.login(username=username, password=password, registry=registry) auth_config = {'username': username, 'password': password} + if auth_config is not None: push_log = self.docker_client.images.push(image, tag=tag, auth_config=auth_config) if not self.show_log: diff --git a/cloud/sw_deployment/sw_deployment_tool.py b/cloud/sw_deployment/sw_deployment_tool.py index 3178b491..27783067 100644 --- a/cloud/sw_deployment/sw_deployment_tool.py +++ b/cloud/sw_deployment/sw_deployment_tool.py @@ -1,14 +1,16 @@ """Script for deploying Reference Architecture (RA) on Cloud solutions""" import os +import pathlib +import sys import tarfile + import click -import yaml import jinja2 -import pathlib -import sys +import yaml + sys.path.append(os.path.abspath(os.path.join(os.getcwd(), os.pardir))) -from ssh_connector import SSHConnector -from docker_management import DockerManagement +from ssh_connector import SSHConnector # pylint:disable=C0413,E0401 +from docker_management import DockerManagement # pylint:disable=C0413,E0401 configuration = { 'cloud_settings': { @@ -45,6 +47,7 @@ nodes_list = [] + @click.command() @click.option('-c', '--config', type=click.Path(dir_okay=False), @@ -76,19 +79,21 @@ def start_deploy(config): None """ - configuration = None + found_config = {} if os.path.exists(config): - configuration = _parse_configuration_file(config) + found_config = _parse_configuration_file(config) - configuration['ra_profile'] = _validate_ra_profile() + if not found_config: + return - _tar_repository(output_filename=TAR_PATH, source_dir=RA_DIR) + found_config['ra_profile'] = _validate_ra_profile() - _deploy(provider=configuration['cloud_settings']['provider'], - ansible_host_ip=configuration['ansible_host_ip'], - ssh_key=configuration['ssh_key'], - ssh_user=configuration['ssh_user'], - custom_ami=configuration['custom_ami']) + _tar_repository(output_filename=TAR_PATH, source_dir=RA_DIR) + _deploy(provider=found_config['cloud_settings']['provider'], + ansible_host_ip=found_config['ansible_host_ip'], + ssh_key=found_config['ssh_key'], + ssh_user=found_config['ssh_user'], + custom_ami=found_config['custom_ami']) def _validate_ra_profile(): @@ -106,8 +111,15 @@ def _validate_ra_profile(): exit() return group_vars.get('profile_name') +def _tar_archive_filter(tarinfo): + exclude = ['venv', '.terraform'] + if tarinfo.isdir() and any(substring in tarinfo.name for substring in exclude): + return None + else: + return tarinfo + def _tar_repository(output_filename, source_dir): - ''' + """ Making tar.gz file that contains the RA repository. Creating a tar file is more convenient for submitting the repo to a cloud instance. @@ -119,14 +131,15 @@ def _tar_repository(output_filename, source_dir): Return: None - ''' + """ if os.path.exists(output_filename): os.remove(output_filename) with tarfile.open(output_filename, "w:gz") as tar: - tar.add(source_dir, arcname=os.path.basename(source_dir)) + tar.add(source_dir, arcname=os.path.basename(source_dir), filter=_tar_archive_filter) + def _parse_configuration_file(config): - ''' + """ Get configuration from configuration.yaml If some of the parameters are set through cli, this settings have higher priority. @@ -136,9 +149,7 @@ def _parse_configuration_file(config): Return: dict:Configuration dictionary - - ''' - file_configuration = None + """ if not os.path.exists(config): return None with open(config, 'r', encoding="UTF-8") as stream: @@ -152,8 +163,9 @@ def _parse_configuration_file(config): configuration[item] = file_configuration[item] return configuration + def _remove_ssh_banner(ssh_client, node_ips_array, user): - ''' + """ Remove SSH for enabling root login via SSH. Using root is necessary for Ansible playbooks. @@ -165,7 +177,7 @@ def _remove_ssh_banner(ssh_client, node_ips_array, user): Return: None - ''' + """ for node_ip in node_ips_array: ssh_client.exec_command(f"ssh-keyscan -H {node_ip} >> /home/ubuntu/.ssh/known_hosts") if node_ip != "127.0.0.1": @@ -184,7 +196,7 @@ def _remove_ssh_banner(ssh_client, node_ips_array, user): def _install_dependencies_on_nodes(ssh_client, node_ips_array): - ''' + """ Installing lspci and golang as RA dependencies. Parameters: @@ -194,7 +206,7 @@ def _install_dependencies_on_nodes(ssh_client, node_ips_array): Return: None - ''' + """ for node_ip in node_ips_array: if node_ip != "127.0.0.1": ssh_node = SSHConnector(ip_address=node_ip, @@ -209,8 +221,9 @@ def _install_dependencies_on_nodes(ssh_client, node_ips_array): ssh_client.exec_command(command='yum makecache && yum -y install pciutils.x86_64 golang', print_output=True) + def _discovery_nodes(ssh_client, root_user, node_ips, node_type): - ''' + """ Creating array with information of Ansible nodes. Parameters: @@ -220,7 +233,7 @@ def _discovery_nodes(ssh_client, root_user, node_ips, node_type): Return: None - ''' + """ for node_ip in node_ips: ssh_client.exec_command(f"ssh-keyscan -H {node_ip} >> /home/ubuntu/.ssh/known_hosts") if node_ip != "127.0.0.1": @@ -240,6 +253,7 @@ def _discovery_nodes(ssh_client, root_user, node_ips, node_type): nodes_list.append(node) + def _create_inventory_file(ssh_client, nodes): """ Creating inventory file for RA Ansible with information @@ -279,6 +293,7 @@ def _create_host_var_files(ssh_client, hosts): ssh_client.copy_file(file_path=os.path.join(RA_DIR, "host_vars", "node1.yml"), destination_path=f"{RA_REMOTE_PATH}/host_vars/{host['host_name']}.yml") + def _docker_login(node_ips, ssh_client, user, registry, registry_username, password): """ Login to private AWS ECR. @@ -300,6 +315,7 @@ def _docker_login(node_ips, ssh_client, user, registry, registry_username, passw ssh_node.exec_command(command=f"docker login {registry} --username {registry_username} --password {password}", print_output=True) ssh_node.close_connection() + def cleanup(config): """ Cleanup function. @@ -321,13 +337,14 @@ def cleanup(config): client = SSHConnector(ip_address=configuration['ansible_host_ip'], username='ubuntu', priv_key=configuration['ssh_key']) for image in configuration['exec_containers']: - image_name = image.replace('/','-') + image_name = image.replace('/', '-') click.echo(f"Deleting pod: {image_name}") client.exec_command(command=f"kubectl delete {image_name}", print_output=True) client.exec_command(f"cd {RA_REMOTE_PATH} && ansible-playbook -i inventory.ini ./playbooks/redeploy_cleanup.yml") client.exec_command(f"rm {RA_REMOTE_PATH} -rf") + def _deploy(provider, ansible_host_ip, ssh_key, ssh_user, custom_ami): """ Function for deploy process of RA. @@ -348,9 +365,9 @@ def _deploy(provider, ansible_host_ip, ssh_key, ssh_user, custom_ami): click.echo("-------------------") click.secho("Copy private SSH key to Ansible instance", fg="yellow") - client.copy_file(file_path=ssh_key, destination_path=f"/home/ubuntu/cwdf_deployment/ssh/id_rsa") + client.copy_file(file_path=ssh_key, destination_path="/home/ubuntu/cwdf_deployment/ssh/id_rsa") - client.exec_command(f"sudo chmod 600 /home/ubuntu/cwdf_deployment/ssh/id_rsa") + client.exec_command("sudo chmod 600 /home/ubuntu/cwdf_deployment/ssh/id_rsa") click.echo("-------------------") click.secho("Copy RA repo as tar.gz file to Ansible instance", fg="yellow") @@ -376,21 +393,22 @@ def _deploy(provider, ansible_host_ip, ssh_key, ssh_user, custom_ami): click.echo("-------------------") click.secho("Install cert-manager in EKS cluster", fg="yellow") - commands = f"""helm repo add jetstack https://charts.jetstack.io && \ - helm repo update && \ - helm install cert-manager jetstack/cert-manager \ - --namespace cert-manager \ - --create-namespace \ - --version v1.10.0 \ - --set installCRDs=true - """ + commands = ( + "helm repo add jetstack https://charts.jetstack.io && " + "helm repo update && " + "helm install cert-manager jetstack/cert-manager " + "--namespace cert-manager" + "--create-namespace" + "--version v1.10.0" + "--set installCRDs=true" + ) client.exec_command(commands, print_output=True) click.echo("-------------------") click.secho("Install Multus in EKS cluster", fg="yellow") - commands= f"""kubectl apply -f \ - https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/v3.9.2/deployments/multus-daemonset.yml + commands = """kubectl apply -f \ + https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/v4.0.2/deployments/multus-daemonset-thick.yml """ client.exec_command(commands, print_output=True) @@ -398,7 +416,7 @@ def _deploy(provider, ansible_host_ip, ssh_key, ssh_user, custom_ami): if provider == 'aws': click.echo("-------------------") click.secho("Install Kubernetes Metrics Server", fg="yellow") - commands= f"""kubectl apply -f \ + commands = """kubectl apply -f \ https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml """ @@ -410,36 +428,36 @@ def _deploy(provider, ansible_host_ip, ssh_key, ssh_user, custom_ami): client.copy_file(file_path=EKS_PATCH_PATH, destination_path=f"/tmp/{EKS_PATCH_NAME}") client.exec_command(f"kubectl patch ds aws-node -n kube-system --patch-file /tmp/{EKS_PATCH_NAME}") - registry_local_address = "" if provider == 'aws': - registry_local_address = "/".join(configuration['replicate_to_container_registry'].split("/")[:-1]) + registry_local_address = str(configuration['replicate_to_container_registry']).rsplit("/", maxsplit=1)[0] + commands = ( + f'aws ecr get-login-password --region {configuration["cloud_settings"]["region"]} | ' + 'REGISTRY_AUTH_FILE="/home/ubuntu/.crauth" ' + f'podman login -u AWS --password-stdin {registry_local_address}' + ) else: - registry_local_address = configuration['replicate_to_container_registry'] + registry_local_address = str(configuration['replicate_to_container_registry']) + commands = ( + f'az acr login --name {registry_local_address.split(".", maxsplit=1)[0]} --expose-token --output tsv --query accessToken | ' + 'REGISTRY_AUTH_FILE="/home/ubuntu/.crauth" ' + 'podman login -u 00000000-0000-0000-0000-000000000000 --password-stdin {registry_local_address}' + ) click.echo("-------------------") click.secho("Update container registry credentials", fg="yellow") - if provider == 'aws': - region = configuration['cloud_settings']['region'] - commands = f"""aws ecr get-login-password --region {region} | \ - REGISTRY_AUTH_FILE="/home/ubuntu/.crauth" podman login -u AWS --password-stdin {registry_local_address} - """ - else: - registry_name = registry_local_address.split(".")[0] - commands = f"""az acr login --name {registry_name} --expose-token --output tsv --query accessToken | \ - REGISTRY_AUTH_FILE="/home/ubuntu/.crauth" podman login -u 00000000-0000-0000-0000-000000000000 --password-stdin {registry_local_address} - """ + client.exec_command(command=commands, print_output=True) click.echo("-------------------") - click.secho("Creating invenotry file", fg="yellow") + click.secho("Creating inventory file", fg="yellow") _create_inventory_file(client, nodes_list) click.secho("\nInitializing RA repository", fg="yellow") commands = f"""cd {RA_REMOTE_PATH} && \ - git submodule update --init && \ python3 -m venv --copies --clear venv && \ - venv/bin/pip install -r requirements.txt + venv/bin/pip install -r requirements.txt && \ + venv/bin/ansible-galaxy install -r collections/requirements.yml """ - + client.exec_command(command=commands, print_output=True) click.secho("\nCreating host_var files", fg="yellow") @@ -456,20 +474,21 @@ def _deploy(provider, ansible_host_ip, ssh_key, ssh_user, custom_ami): click.secho("Selected profile:", fg="yellow") click.secho(configuration['ra_profile'], fg="green") - ansible_playbook_commnads = f"""cd {RA_REMOTE_PATH} && \ + ansible_playbook_commands = f"""cd {RA_REMOTE_PATH} && \ + venv/bin/ansible-playbook -i inventory.ini playbooks/k8s/patch_kubespray.yml venv/bin/ansible-playbook -i inventory.ini -e registry_local_address={registry_local_address} playbooks/{configuration['ra_profile']}.yml """ - client.exec_command(command=ansible_playbook_commnads, print_output=True) + client.exec_command(command=ansible_playbook_commands, print_output=True) click.echo("-------------------") click.secho("Remove private SSH key from Ansible instance", fg="yellow") - client.exec_command(f"sudo rm /home/ubuntu/cwdf_deployment/ssh/id_rsa") + client.exec_command("sudo rm /home/ubuntu/cwdf_deployment/ssh/id_rsa") client.close_connection() if (configuration['replicate_from_container_registry'] is not None and - configuration['replicate_to_container_registry'] is not None and - configuration['exec_containers']): + configuration['replicate_to_container_registry'] is not None and + configuration['exec_containers']): click.echo("-------------------") click.secho("Copy Docker images to cloud registry") ssh_client = SSHConnector(ip_address=ansible_host_ip, username='ubuntu', priv_key=ssh_key) @@ -487,8 +506,8 @@ def _deploy(provider, ansible_host_ip, ssh_key, ssh_user, custom_ami): ssh_client=ssh_client, user='root', registry=configuration['replicate_to_container_registry'], - registry_username=docker_mgmt.CR_USERNAME, - password=docker_mgmt.CR_PASSWORD) + registry_username=docker_mgmt.cr_username, + password=docker_mgmt.cr_password) for image in configuration['exec_containers']: image_name = docker_mgmt.tagged_images[configuration['exec_containers'].index(image)]['repository'] @@ -497,5 +516,7 @@ def _deploy(provider, ansible_host_ip, ssh_key, ssh_user, custom_ami): ssh_client.exec_command(command=f"kubectl run {pod_name} --image={image_name} -n default", print_output=True) ssh_client.close_connection() + if __name__ == '__main__': - main() + # TODO get the config from... where? + main(config=None) diff --git a/collections/requirements.yml b/collections/requirements.yml new file mode 100644 index 00000000..3db125cb --- /dev/null +++ b/collections/requirements.yml @@ -0,0 +1,4 @@ +collections: + - name: https://github.com/kubernetes-sigs/kubespray + type: git + version: f9f5143c93f583541ccb6650eb008f7ef3d1bc3c diff --git a/docs/emr.md b/docs/emr.md new file mode 100644 index 00000000..148be4c4 --- /dev/null +++ b/docs/emr.md @@ -0,0 +1,17 @@ +# EMR platform configuration guide + +This guide introdues how to enable RA on the Intel EMR platforms. + +## BMRA configuration +### QAT Driver +Download the EMR QAT driver package and put it in the folder ``/tmp/emr_qat/`` folder on the ansible host machine. Then configure the QAT related operations in the files in the ``group_vars`` and ``host_vars`` referring to the security session in the below url + + +### DPDK driver +To align with EMR BKC ingredient version, on the EMR platform we will use ```DPDK 22.11.1``` lts version. + +## VMRA configuration +Not supported yet, to be done. + +## Cloud RA configuration +Not supported yet, to be done. diff --git a/docs/generate_profiles.md b/docs/generate_profiles.md index 05a81ff4..4c5157a3 100644 --- a/docs/generate_profiles.md +++ b/docs/generate_profiles.md @@ -15,42 +15,42 @@ a) Non-invasive virtual environment using pipenv - ```bash - pip3 install pipenv - pipenv install - # Then to run and use the environment - pipenv shell - ``` +```bash +pip3 install pipenv +pipenv install +# Then to run and use the environment +pipenv shell +``` b) Non-invasive virtual environment using venv - ```bash - python3 -m venv venv - # Then to activate new virtual environment - source venv/bin/activate - # Install dependencies in venv - pip3 install -r requirements.txt - ``` +```bash +python3 -m venv venv +# Then to activate new virtual environment +source venv/bin/activate +# Install dependencies in venv +pip3 install -r requirements.txt +``` c) System wide environment (not recommended) - ```bash - pip3 install -r requirements.txt - ``` +```bash +pip3 install -r requirements.txt +``` ## Creating Sample Profiles To create sample profiles one of the following commands must be executed: - ```bash - make - ``` +```bash +make +``` or - ```bash - make examples - ``` +```bash +make examples +``` After successful profiles generation, the results might be investigated in the `examples` directory. The three directories should be visible: @@ -87,7 +87,6 @@ At the moment, Container Experience Kits supports the following profiles: * access * basic -* full_nfv * on_prem * regional_dc * remote_fp @@ -105,47 +104,47 @@ At the moment, Container Experience Kits supports the following optional configu ## Example Commands -To generate files needed for deployment of `full_nfv` profile, for `Sapphire Rapids` machines, in `k8s` mode, with `cvl` Ethernet Network Adapter the following command must be executed: +To generate files needed for deployment of `remote_fp` profile, for `Sapphire Rapids` machines, in `k8s` mode, with `cvl` Ethernet Network Adapter the following command must be executed: - ```bash - make k8s-profile PROFILE=full_nfv ARCH=spr NIC=cvl - ``` +```bash +make k8s-profile PROFILE=remote_fp ARCH=spr NIC=cvl +``` To generate the same profile as above, but for `vm` mode, run: - ```bash - make vm-profile PROFILE=full_nfv ARCH=spr NIC=cvl - ``` +```bash +make vm-profile PROFILE=remote_fp ARCH=spr NIC=cvl +``` For Cloud RA, the architecture must be set according to the target machine types. In most cases, machines will be either `SkyLake` or `CascadeLake`. If different machine types are being used, the earliest architecture must be selected. At this time, changing the value of the Ethernet Network Adapter does not have an impact for Cloud RA. -To generate the `full_nfv` profile for `cloud` mode, targeting a mix of `SkyLake` and `CascadeLake` machines, run: +To generate the `remote_fp` profile for `cloud` mode, targeting a mix of `SkyLake` and `CascadeLake` machines, run: - ```bash - make cloud-profile PROFILE=full_nfv ARCH=skl - ``` +```bash +make cloud-profile PROFILE=remote_fp ARCH=skl +``` The values of `PROFILE`, `ARCH` and `NIC` parameters are up to you. Please update accordingly. If you run multiple of the above commands, you should see backups folder in your project root directory: - ```bash - ls backups/ - ``` -> **_NOTE:_** Above command will result in an output similar to this: "container-experience-kits$ backups/full_nfv_20221114_141523/" and within the mentioned folder location "group_vars, host_vars, inventory.ini" files can be found which will be backups of earlier prepared deployments. +```bash +ls backups/ +``` +> **_NOTE:_** Above command will result in an output similar to this: "container-experience-kits$ backups/remote_fp_20221114_141523/" and within the mentioned folder location "group_vars, host_vars, inventory.ini" files can be found which will be backups of earlier prepared deployments. Backups folder is created so that earlier prepared deployments are not lost. That way, you can easily switch between profiles deployment. Each backup contains a unique timestamp. To clean files and directories that were created by make commands, please run: - ```bash - make clean - ``` +```bash +make clean +``` This command will not remove backup directories. If you would like to remove all generated files and directories, please run: - ```bash - make clean-all - ``` +```bash +make clean-all +``` ## Playbook Generation diff --git a/docs/ipu_inventory_example.ini b/docs/ipu_inventory_example.ini new file mode 100644 index 00000000..46dca2ff --- /dev/null +++ b/docs/ipu_inventory_example.ini @@ -0,0 +1,27 @@ +[all] +localhost ansible_connection=local ansible_python_interpreter=/usr/bin/python3 +ipu_host_machine ansible_host= ip= ansible_user=root ansible_password= +ipu_link_partner ansible_host= ip= ansible_user=root ansible_password= ipmi_ip= ipmi_user=bmcuser ipma_password='' + +[ipu_host] +ipu_host_machine + +[ipu_linkp] +ipu_link_partner + +[vm_host] + +[kube_control_plane] + +[etcd] + +[kube_node] + +[k8s_cluster:children] +kube_control_plane +kube_node + +[all:vars] +ansible_python_interpreter=/usr/bin/python3 +ipu_1gbe_connected_to_linkp=true +ipu_1gbe_link_interface=eno2 diff --git a/docs/ipu_setup.md b/docs/ipu_setup.md new file mode 100644 index 00000000..eb06a79f --- /dev/null +++ b/docs/ipu_setup.md @@ -0,0 +1,113 @@ +# IPU setup user guide + +This guide introduces how to enable Intel(R) Infrastructure Processing Unit (Intel(R) IPU) inside RA environment. +IPU consist of two main blocks: Integrated Management Complex (IMC) and Acceleration Compute Complex (ACC) + + +## IPU physical setup - Config L + +Two physical machines are needed for this setup 'IPU host' and 'IPU link partner' +Both machines are connected to management network +IPU board has five different connection points: + - IPU board is inserted into PCIe slot on Xeon IPU host + - IPU board is connected with IPU host via power connection + - IPU board is connected via 1GbE connection to IPU link partner host + - IPU board is connected via serial Mini USB connection to IPU link partner host + - IPU board is connected via high speed connection to CVL NIC on IPU link partner host + + +## IPU pre-built images + +Please ask Intel Support to get tarballs with 'imc' and 'mev-rl' images. +Insert those tarballs into directory /tmp/ipu on your ansible host + + +## IPU inventory preparation + +Copy docs/ipu_inventory_example.ini to ipu_inventory.ini and update all relevant fields for your environment + +ipu_host_machine ansible_host= ip= ansible_user=root ansible_password= +ipu_link_partner ansible_host= ip= ansible_user=root ansible_password= ipmi_ip= ipmi_user=bmcuser ipma_password='' + +NOTE: If you need to connect 1GbE connection from IPU board to IPU host machine instead of IPU link partner (from any reason) + then you need to change variable ipu_1gbe_connected_to_linkp inside inventory from true to false + + +## IPU group_vars + +IPU deployment uses group_vars/all.yml from standard RA deployment. +So, if deployment is executed behind proxy then corresponding proxy configuration have to be done. + + +## IPU deployment + +To start IPU deployment run following command + +``` +ansible-playbook -i ipu_inventory.ini playbooks/infra/prepare_ipu.yml --flush-cache -vv 2>&1 | tee ipu_deployment.log +``` + +Here is summary of Successful deployment: + +PLAY RECAP *********************************************************************************************************** +ipu_link_partner : ok=49 changed=8 unreachable=1 failed=0 skipped=5 rescued=0 ignored=0 +ipu_host_machine : ok=13 changed=0 unreachable=0 failed=0 skipped=11 rescued=0 ignored=0 +mev-acc-rl : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 +mev-imc : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 +********************************************************************************************************************* + +NOTE: Unreachable=1 for ipu_link_partner is expected state. It is caused by ansible error and has no impact to deployment + + +### IPU deployment progress monitoring + +IPU deployment progress can be monitored via remote console for IPU host and via minicom connections to IMC and ACC from IPU link partner machine +minicom tool is installed in initial phase of deployment. So, it is available after approximately one minute. + +``` +minicom IMC +then press enter +``` + +``` +minicom ACC +then press enter +``` + +NOTE: To escape from minicom use 'Ctrl-a q' and then enter + To get minicom help use 'Ctrl-a z' + + +Finished deployment on ACC: + +MEV ACC mev-hw-b0-ci-ts.release.4988 mev-acc-rl - +mev-acc-rl login: + + +Finished deployment on IMC: + +INFO: ######## System booted successfully! ######## +MEV IMC MEV-HW-B0-CI-ts.release.4988 mev-imc /dev/ttyS0 +mev-imc login: + + +### IPU post deployment accessibility + +During IPU deployment updated version of inventory is generated: ipu_inventory_mev.ini +It contains records for IMC and ACC and corresponding host groups. +That inventory can be used to install/configure additional software on IMC and ACC. + +SSH config on ansible host is updated with records for IMC and ACC as well. +It allows to access IMC and ACC via ssh. + +For IMC: + +``` +ssh mev-imc +``` + +For ACC: + +``` +ssh mev-acc-rl +``` diff --git a/docs/sigstore_policy_controller.md b/docs/sigstore_policy_controller.md new file mode 100644 index 00000000..571fbc56 --- /dev/null +++ b/docs/sigstore_policy_controller.md @@ -0,0 +1,174 @@ +# Policy Controller + +The `policy-controller` admission controller can be used to enforce policy on a Kubernetes cluster based on verifiable supply-chain metadata from `cosign`. + +`policy-controller` also resolves the image tags to ensure the image being ran is not different from when it was admitted. + +See the [installation instructions](https://docs.sigstore.dev/policy-controller/installation) for more information. + +Today, `policy-controller` can automatically validate signatures and +attestations on container images. +Enforcement is configured on a per-namespace basis, and multiple keys are supported. + +We're actively working on more features here. + +For more information about the `policy-controller`, have a look at our documentation website [here](https://docs.sigstore.dev/policy-controller/overview). + +## Examples + +Please see the [examples/](./examples/) directory for example policies etc. + +## Policy Testing + +This repo includes a `policy-tester` tool which enables checking a policy against +various images. + +In the root of this repo, run the following to build: +``` +make policy-tester +``` + +Then run it pointing to a YAML file containing a ClusterImagePolicy, and an image to evaluate the policy against: +``` +(set -o pipefail && \ + ./policy-tester \ + --policy=test/testdata/policy-controller/tester/cip-public-keyless.yaml \ + --image=ghcr.io/sigstore/cosign/cosign:v1.9.0 | jq) +``` + +## Support Policy + +This policy-controller's versions are able to run in the following versions of Kubernetes: + +| | policy-controller `> 0.2.x` | +|---|:---:| +| Kubernetes 1.22 | ✓ | +| Kubernetes 1.23 | ✓ | +| Kubernetes 1.24 | ✓ | +| Kubernetes 1.25 | ✓ | +| Kubernetes 1.26 | ✓ | + +note: not fully tested yet, but can be installed + +## Release Cadence + +We are intending to move to a monthly cadence for minor releases. +Minor releases will be published around the beginning of the month. +We may cut a patch release instead, if the changes are small enough not to warrant a minor release. +We will also cut patch releases periodically as needed to address bugs. + +## Security + +Should you discover any security issues, please refer to sigstores [security +process](https://github.com/sigstore/community/blob/main/SECURITY.md) + + +########################################################################################################### + + +## Additional info added by RA integration + +## Enable policy-controller feature before deploy RA + +Edit group_vars/all.yml to make sure sigstore_policy_controller_install: true + +After BMRA deployed, do below container image signing tests: + +## Private/Public Key based policy + +Below is an example to sign an image: +``` +docker pull nginx:latest +docker tag nginx:latest :30500/key/nginx-signed:latest +docker push :30500/key/nginx-signed:latest +cosign sign --key k8s://my-cosign-namespace/cosign-key :30500/key/nginx-signed:latest +``` +Let's push a unsigned image to compare +``` +docker tag nginx:latest :30500/key/nginx-unsigned:latest +docker push :30500/key/nginx-unsigned:latest +``` + +## Successful case when image was signed correctly + +Deploy the signed image: +``` +kubectl apply -f - << EOF +apiVersion: v1 +kind: Pod +metadata: + name: nginx-pubkey + namespace: my-cosign-namespace +spec: + containers: + - name: nginx + image: :30500/key/nginx-signed:latest + imagePullPolicy: Always + imagePullSecrets: + - name: container-registry-secret +EOF +``` +It will say: +pod/signed-pubkey created + +## Failed case when image is not signed + +Deploy the unsigned image: +``` +kubectl apply -f - << EOF +apiVersion: v1 +kind: Pod +metadata: + name: nginx-unsigned + namespace: my-cosign-namespace +spec: + containers: + - name: nginx + image: :30500/key/nginx-unsigned:latest + imagePullPolicy: Always + imagePullSecrets: + - name: container-registry-secret +EOF +``` +It will say: +Error from server (BadRequest): admission webhook "policy.sigstore.dev" denied the request: validation failed: failed policy: ra-test-image-policy-pub-key: spec.containers[0].image +ra-test:30500/key/nginx-unsigned@sha256:942ae2dfd73088b54d7151a3c3fd5af038a51c50029bfcfd21f1e650d9579967 signature +keyless validation failed for authority authority-0 for ra-test:30500/key/nginx-unsigned@sha256:942ae2dfd73088b54d7151a3c3fd5af038a51c50029bfcfd21f1e650d9579967: no matching signatures: + +## Keyless based policy +We use an external container registry for example, but you can also use k8s local registry. + +``` +kubectl apply -f - << EOF +apiVersion: policy.sigstore.dev/v1alpha1 +kind: ClusterImagePolicy +metadata: + name: ghcr-io-image-policy-keyless +spec: + images: + - glob: ghcr.io/alekdu/** + authorities: + - keyless: + url: https://fulcio.sigstore.dev + identities: + - issuerRegExp: https://github.com/login/oauth + subjectRegExp: alek.du@intel.com +EOF +``` +## Successful case when image was signed correctly +``` +kubectl run signed-nginx --image=ghcr.io/alekdu/nginx-keyless-signed:latest --namespace my-cosign-namespace +``` + +It will say: +pod/signed-nginx created + +## Failed case when image is not signed +``` +kubectl run signed-nginx --image=ghcr.io/alekdu/nginx-keyless-unsigned:latest --namespace my-cosign-namespace +``` + +It will say: +Error from server (BadRequest): admission webhook "policy.sigstore.dev" denied the request: validation failed: failed policy: ghcr-io-image-policy-keyless: spec.containers[0].image +ghcr.io/alekdu/nginx-keyless-unsigned@sha256:942ae2dfd73088b54d7151a3c3fd5af038a51c50029bfcfd21f1e650d9579967 signature +keyless validation failed for authority authority-0 for ghcr.io/alekdu/nginx-keyless-unsigned@sha256:942ae2dfd73088b54d7151a3c3fd5af038a51c50029bfcfd21f1e650d9579967: no matching signatures: diff --git a/generate/playbook_templates/infra_playbook.j2 b/generate/playbook_templates/infra_playbook.j2 index 0b2a5821..9d31fe39 100644 --- a/generate/playbook_templates/infra_playbook.j2 +++ b/generate/playbook_templates/infra_playbook.j2 @@ -1,16 +1,15 @@ --- # apply common cluster node configuration - hosts: k8s_cluster,vm_host - tasks: [] + handlers: + - name: reboot server + reboot: { reboot_timeout: 1200 } pre_tasks: - name: End play for VM host meta: end_host when: - "'vm_host' in group_names" - on_vms | default(false) | bool - handlers: - - name: reboot server - reboot: { reboot_timeout: 1200 } roles: - role: cluster_defaults tags: always @@ -39,24 +38,39 @@ - role: bootstrap/configure_dns when: - ansible_distribution == "Ubuntu" and dns_disable_stub_listener | default(true) | bool - - not vm_enabled or on_vms | default(false) | bool - role: bootstrap/golang_install tags: golang-install + - role: bootstrap/configure_docker_daemon + tags: docker + post_tasks: + - name: Execute handlers + ansible.builtin.meta: flush_handlers + - name: Check for failure in handlers execution + ansible.builtin.assert: + that: >- + ( + on_vms | default(false) and + ansible_play_hosts_all | difference(groups['vm_host']) | difference(ansible_play_hosts) | length() == 0 + ) or + ( + not on_vms | default(false) and + ansible_play_hosts_all | difference(ansible_play_hosts) | length() == 0 + ) + msg: Failure detected in handlers executiuon. Please look for fails in previous tasks. environment: "{{ '{{' }} proxy_env | d({}) {{ '}}' }}" any_errors_fatal: true # apply worker node kernel configuration - hosts: kube_node,vm_host - tasks: [] + handlers: + - name: reboot server + reboot: { reboot_timeout: 1200 } pre_tasks: - name: End play for VM host meta: end_host when: - "'vm_host' in group_names" - on_vms | default(false) | bool - handlers: - - name: reboot server - reboot: { reboot_timeout: 1200 } roles: - role: cluster_defaults tags: defaults @@ -64,6 +78,28 @@ when: - vm_enabled | default(false) | bool - not on_vms | default(false) | bool + - name: check_machine_type + tags: + - sgx + - intel-platform-sgx-setup + - kmra + - istio-service-mesh + - sst + - apply-intel-pstate + when: + - configure_sgx | default(false) | bool or + kmra.oran.enabled | default(false) | bool or + kmra.pccs.enabled | default(false) | bool or + kmra.apphsm.enabled | default(false) | bool or + kmra.ctk_loadkey_demo.enabled | default(false) | bool or + istio_service_mesh.enabled | default(true) | bool or + sst_bf_configuration_enabled | default(false) | bool or + sst_cp_configuration_enabled | default(false) | bool or + sst_tf_configuration_enabled | default(false) | bool or + sst_pp_configuration_enabled | default(false) | bool or + intel_pstate_enabled | default(true) | bool + - role: bootstrap/install_realtime_kernel + when: rt_kernel_enabled | default(false) | bool - role: bootstrap/configure_hugepages tags: - hugepages @@ -75,23 +111,25 @@ when: cpusets_enabled | default(false) | bool - role: bootstrap/configure_intel_pstate when: intel_pstate_enabled | default(true) | bool - - role: bootstrap/configure_cstates - when: cstate_enabled | default(false) | bool - - role: bootstrap/configure_ufs - when: ufs_enabled | default(false) | bool + - role: bootstrap/reset_qat_option + tags: + - reset_qat_option + - intel-platform-qat-setup + when: + - update_qat_drivers | default(false) | bool - role: bootstrap/auto_detect_qat_devices tags: - auto-detect-qat-device - intel-platform-qat-setup when: - - update_qat_drivers | default(false) | bool + - configure_qat | default(false) | bool - qat_devices | default([]) | length == 0 - role: bootstrap/set_sriov_kernel_flags tags: - setup-sriov - intel-platform-qat-setup when: - - iommu_enabled | default(true) | bool or on_vms | default(false) | bool + - iommu_enabled | default(true) | bool - not ((configure_dlb_devices is defined and configure_dlb_devices) or (configure_dsa_devices is defined and configure_dsa_devices)) - role: bootstrap/set_siov_kernel_flags @@ -109,7 +147,7 @@ when: telegraf_enabled | default(true) | bool - role: bootstrap/set_intel_flexran_kernel_flags when: intel_flexran_enabled | default(false) | bool -{%- if playbook_name in ['full_nfv', 'remote_fp', 'on_prem', 'build_your_own'] %} +{% if playbook_name in ['full_nfv', 'remote_fp', 'on_prem', 'on_prem_vss', 'build_your_own'] %} - role: bootstrap/configure_sst tags: sst when: @@ -120,37 +158,60 @@ sst_tf_configuration_enabled | default(false) | bool or sst_pp_configuration_enabled | default(false) | bool - not vm_enabled or on_vms | default(false) | bool -{%- endif %} -{%- if playbook_name in ['full_nfv', 'on_prem', 'regional_dc', 'build_your_own'] %} +{% endif %} +{% if playbook_name in ['full_nfv', 'on_prem', 'on_prem_vss', 'regional_dc', 'build_your_own'] %} + - role: bootstrap/set_pcie_kernel_flags + when: + - configure_fpga | default(false) | bool + - not vm_enabled or on_vms | default(false) | bool - role: bootstrap/install_gpu_driver when: - configure_gpu | default(false) | bool - not vm_enabled or on_vms | default(false) | bool -{%- endif %} + - role: bootstrap/configure_fpga + when: + - configure_fpga | default(false) | bool + - not vm_enabled or on_vms | default(false) | bool +{% endif %} - role: bootstrap/update_grub tags: - grub-update - intel-platform-qat-setup + post_tasks: + - name: Execute handlers + ansible.builtin.meta: flush_handlers + - name: Check for failure in handlers execution + ansible.builtin.assert: + that: >- + ( + on_vms | default(false) and + ansible_play_hosts_all | difference(groups['vm_host']) | difference(ansible_play_hosts) | length() == 0 + ) or + ( + not on_vms | default(false) and + ansible_play_hosts_all | difference(ansible_play_hosts) | length() == 0 + ) + msg: Failure detected in handlers executiuon. Please look for fails in previous tasks. environment: "{{ '{{' }} proxy_env | d({}) {{ '}}' }}" any_errors_fatal: true -{% if playbook_name in ['full_nfv', 'access', 'on_prem', 'remote_fp', 'build_your_own'] -%} +{% if playbook_name in ['full_nfv', 'access', 'on_prem', 'on_prem_vss', 'remote_fp', 'build_your_own'] %} # install worker node qat software - hosts: kube_node,vm_host - tasks: [] + handlers: + - name: reboot server + reboot: { reboot_timeout: 1200 } pre_tasks: - name: End play for VM host meta: end_host when: - "'vm_host' in group_names" - on_vms | default(false) | bool - handlers: - - name: reboot server - reboot: { reboot_timeout: 1200 } roles: - role: cluster_defaults tags: defaults - role: bootstrap/apply_intel_pstate + tags: apply-intel-pstate when: intel_pstate_enabled | default(true) | bool - role: bootstrap/install_qat_drivers_services tags: @@ -169,13 +230,27 @@ - role: bootstrap/configure_dsa tags: dsa-dp when: configure_dsa_devices | default(false) + post_tasks: + - name: Execute handlers + ansible.builtin.meta: flush_handlers + - name: Check for failure in handlers execution + ansible.builtin.assert: + that: >- + ( + on_vms | default(false) and + ansible_play_hosts_all | difference(groups['vm_host']) | difference(ansible_play_hosts) | length() == 0 + ) or + ( + not on_vms | default(false) and + ansible_play_hosts_all | difference(ansible_play_hosts) | length() == 0 + ) + msg: Failure detected in handlers executiuon. Please look for fails in previous tasks. environment: "{{ '{{' }} proxy_env | d({}) {{ '}}' }}" any_errors_fatal: true -{%- endif %} +{% endif %} # install worker node network software - hosts: kube_node,vm_host - tasks: [] pre_tasks: - name: End play for VM host meta: end_host @@ -228,21 +303,29 @@ - intel-platform-qat-setup when: - ovs_dpdk_enabled | default(false) | bool or install_dpdk | default(true) | bool -{%- if playbook_name in ['full_nfv', 'remote_fp', 'build_your_own'] %} +{% if playbook_name in ['full_nfv', 'remote_fp', 'build_your_own'] %} - role: install_ddp_pkgs when: install_ddp_packages | default(true) | bool -{%- endif %} +{% endif %} - role: sriov_nic_init tags: setup-sriov-nic when: - install_dpdk | default(true) | bool - iommu_enabled | default(true) | bool + - not on_vms | default(false) | bool - (kubernetes | default(true) | bool and not container_runtime_only_deployment | default(false) | bool and not sriov_network_operator_enabled | default(false) | bool or (not kubernetes | default(true) | bool and container_runtime_only_deployment | default(false) | bool)) -{%- if playbook_name in ['full_nfv', 'access', 'on_prem', 'remote_fp', 'build_your_own'] %} +{% if playbook_name in ['full_nfv', 'access', 'on_prem', 'on_prem_vss', 'remote_fp', 'build_your_own'] %} + - role: bootstrap/install-qatlibs + tags: qatlibs + when: + - qat_devices | default([]) | length > 0 + - iommu_enabled | default(true) | bool + - configure_qat | default(false) | bool + - not update_qat_drivers | default(false) | bool - role: bootstrap/configure_qat tags: - setup-sriov-qat @@ -250,23 +333,23 @@ when: - qat_devices | default([]) | length > 0 - iommu_enabled | default(true) | bool - or on_vms | default(false) | bool - - update_qat_drivers | default(false) | bool + - not on_vms | default(false) | bool + - configure_qat | default(false) | bool - role: bootstrap/configure_openssl tags: - configure-openssl - intel-platform-qat-setup when: - openssl_install | default(false) | bool -{%- endif %} -{%- if playbook_name in ['full_nfv', 'on_prem', 'remote_fp', 'regional_dc', 'build_your_own'] %} +{% endif %} +{% if playbook_name in ['full_nfv', 'access', 'on_prem', 'on_prem_vss', 'remote_fp', 'regional_dc', 'build_your_own'] %} - role: bootstrap/configure_sgx tags: - sgx - intel-platform-sgx-setup when: - configure_sgx | default(false) | bool -{%- endif %} +{% endif %} environment: "{{ '{{' }} proxy_env | d({}) {{ '}}' }}" any_errors_fatal: true diff --git a/generate/playbook_templates/intel_playbook.j2 b/generate/playbook_templates/intel_playbook.j2 index c6c90571..23753b7b 100644 --- a/generate/playbook_templates/intel_playbook.j2 +++ b/generate/playbook_templates/intel_playbook.j2 @@ -12,22 +12,32 @@ tags: remove-kubespray-host-dns-settings when: - remove_kubespray_host_dns_settings | default(false) | bool +# install sigstore policy controller ahead of others to allow namespace signing enforcement + - role: sigstore_policy_controller + tags: sigstore + when: + - sigstore_policy_controller_install | default(false) | bool + - kubernetes | default(false) | bool - role: check_cert_manager tags: check-cert-manager when: - cert_manager_enabled | d(false) | bool + - role: intel_oneapi_install + tags: intel-oneapi + when: + - intel_oneapi_enabled | default(false) | bool - role: adq_dp_install tags: adq-dp when: adq_dp.enabled | default(false) | bool - role: nfd_install - tags: + tags: - nfd - intel-platform-qat-setup - intel-platform-sgx-setup when: nfd_enabled | default(true) | bool -{%- if playbook_name in ['full_nfv', 'remote_fp', 'build_your_own'] %} +{% if playbook_name in ['full_nfv', 'remote_fp', 'build_your_own'] %} - role: intel_cpu_controlplane - tags: cpu_ctlplane + tags: cpu-ctlplane when: intel_cpu_controlplane.enabled | default(false) | bool {% endif %} - role: operator_framework @@ -55,9 +65,9 @@ - sriov_network_operator_enabled | default(false) | bool - not sriov_net_dp_enabled | default(false) | bool - not sriov_cni_enabled | default(false) | bool -{%- if playbook_name in ['access', 'full_nfv', 'on_prem', 'regional_dc', 'remote_fp', 'build_your_own'] %} +{% if playbook_name in ['access', 'full_nfv', 'on_prem', 'on_prem_vss', 'regional_dc', 'remote_fp', 'build_your_own'] %} - role: intel_dp_operator - tags: + tags: - dp-operator - intel-platform-qat-setup - intel-platform-sgx-setup @@ -66,27 +76,26 @@ qat_dp_enabled | default(false) or dsa_dp_enabled | default(false) or dlb_dp_enabled | default(false) -{%- endif %} -{%- if playbook_name in ['full_nfv', 'on_prem', 'regional_dc', 'build_your_own'] %} + - role: sgx_dp_install + tags: + - sgx-dp + - intel-platform-sgx-setup + when: + - sgx_dp_enabled | default(false) +{% endif %} +{% if playbook_name in ['full_nfv', 'on_prem', 'on_prem_vss', 'regional_dc', 'build_your_own'] %} - role: gpu_dp_install tags: gpu-dp when: gpu_dp_enabled | default(false) | bool -{%- endif %} -{%- if playbook_name in ['access', 'full_nfv', 'on_prem', 'remote_fp', 'build_your_own'] %} +{% endif %} +{% if playbook_name in ['access', 'full_nfv', 'on_prem', 'on_prem_vss', 'remote_fp', 'build_your_own'] %} - role: qat_dp_install - tags: + tags: - qat-dp - intel-platform-qat-setup when: qat_dp_enabled | default(false) | bool -{%- endif %} -{%- if playbook_name in ['full_nfv', 'on_prem', 'remote_fp', 'regional_dc', 'build_your_own'] %} - - role: sgx_dp_install - tags: - - sgx-dp - - intel-platform-sgx-setup - when: - - sgx_dp_enabled | default(false) - - ansible_os_family == "Debian" or (ansible_os_family == "RedHat" and ansible_distribution_version >= '8.3') +{% endif %} +{% if playbook_name in ['full_nfv', 'on_prem', 'on_prem_vss', 'remote_fp', 'regional_dc', 'build_your_own'] %} - role: dlb_dp_install tags: dlb-dp when: @@ -97,10 +106,13 @@ - role: dsa_dp_install tags: dsa-dp when: dsa_dp_enabled is defined and dsa_dp_enabled | default(false) | bool +{% endif %} +{% if playbook_name in ['access', 'full_nfv', 'on_prem', 'on_prem_vss', 'remote_fp', 'regional_dc', 'build_your_own'] %} - role: kmra_install tags: kmra when: - - kmra.pccs.enabled | default(false) | bool or + - kmra.oran.enabled | default(false) | bool or + kmra.pccs.enabled | default(false) | bool or kmra.apphsm.enabled | default(false) | bool or kmra.ctk_loadkey_demo.enabled | default(false) | bool - (ansible_distribution == "Ubuntu" and ansible_distribution_version >= '20.04') @@ -113,24 +125,24 @@ tags: tac when: - tac.enabled | default(false) | bool -{%- endif %} -{%- if playbook_name in ['access', 'full_nfv', 'on_prem', 'remote_fp', 'regional_dc', 'build_your_own'] %} +{% endif %} +{% if playbook_name in ['access', 'full_nfv', 'on_prem', 'on_prem_vss', 'remote_fp', 'regional_dc', 'build_your_own'] %} - role: intel_power_manager tags: power-manager when: intel_power_manager is defined and intel_power_manager.enabled | default(false) | bool -{%- endif %} -{%- if playbook_name in ['access', 'full_nfv', 'on_prem', 'remote_fp', 'build_your_own'] %} +{% endif %} +{% if playbook_name in ['access', 'full_nfv', 'on_prem', 'on_prem_vss', 'remote_fp', 'build_your_own'] %} - role: openssl_engine_install tags: - openssl-engine - intel-platform-qat-setup when: openssl_engine_enabled | default(false) | bool -{%- endif %} -{%- if playbook_name in ['full_nfv', 'on_prem', 'regional_dc', 'remote_fp', 'build_your_own'] %} +{% endif %} +{% if playbook_name in ['full_nfv', 'on_prem', 'on_prem_vss', 'regional_dc', 'remote_fp', 'build_your_own'] %} - role: platform_aware_scheduling_install tags: platform-aware-scheduling when: tas_enabled | default(true) | bool or gas_enabled | default(true) | bool -{%- endif %} +{% endif %} - role: kube_prometheus tags: kube-prometheus when: @@ -140,73 +152,68 @@ when: - collectd_enabled | default(false) | bool - not (telegraf_enabled | default(true) | bool) - vars: - collectd_profile: {{ playbook_name }} - role: telegraf_install when: - telegraf_enabled | default(true) | bool - not (collectd_enabled | default(false) | bool) tags: monitoring - vars: - telegraf_profile: {{ playbook_name }} -{%- if playbook_name in ['access', 'full_nfv', 'build_your_own'] %} +{% if playbook_name in ['access', 'full_nfv', 'build_your_own'] %} - role: intel_sriov_fec_operator tags: intel-sriov-fec-operator when: - intel_sriov_fec_operator_enabled | default(false) | bool - not (intel_flexran_enabled | default(false) | bool and intel_flexran_type == "pod") -{%- endif %} -{%- if playbook_name in ['access', 'full_nfv', 'on_prem', 'regional_dc', 'remote_fp', 'build_your_own'] %} +{% endif %} +{% if playbook_name in ['access', 'full_nfv', 'on_prem', 'on_prem_vss', 'regional_dc', 'remote_fp', 'build_your_own'] %} - role: istio_service_mesh tags: istio-service-mesh when: - istio_service_mesh.enabled | default(true) | bool -{%- endif %} -{%- if playbook_name in ['basic', 'full_nfv', 'on_prem', 'regional_dc', 'remote_fp', 'build_your_own'] %} - - role: cndp_install - tags: cndp - when: - - cndp_enabled | default(false) | bool - - (ansible_distribution == "Ubuntu" and ansible_distribution_version >= "20.04") or - (ansible_os_family == "RedHat" and ansible_distribution_version >= "8.5") - - role: cndp_dp_install - tags: cndp-dp - when: - - cndp_dp_enabled | default(false) | bool - - (ansible_distribution == "Ubuntu" and ansible_distribution_version >= "20.04") or - (ansible_os_family == "RedHat" and ansible_distribution_version >= "8.5") {% endif %} -{%- if playbook_name in ['full_nfv', 'on_prem', 'regional_dc', 'build_your_own'] %} +{% if playbook_name in ['full_nfv', 'on_prem', 'on_prem_vss', 'regional_dc', 'build_your_own'] %} - role: minio_install tags: minio when: - minio_enabled | default(false) | bool -{%- endif %} -{%- if playbook_name in ['full_nfv', 'on_prem', 'regional_dc', 'build_your_own'] %} +{% endif %} +{% if playbook_name in ['full_nfv', 'on_prem', 'on_prem_vss', 'regional_dc', 'build_your_own'] %} - role: rook_install tags: rook-ceph when: - rook_ceph.enabled | default(false) | bool -{%- endif %} -{%- if playbook_name in ['on_prem', 'build_your_own'] %} - - role: intel_ai - tags: intel-ai - when: - - intel_ai_enabled | default(false) | bool -{%- endif %} -{%- if playbook_name in ['full_nfv', 'build_your_own'] %} +{% endif %} +{% if playbook_name in ['on_prem', 'on_prem_vss', 'build_your_own'] %} + - role: intel_media_analytics + tags: intel-media-analytics + when: + - intel_media_analytics_enabled | default(false) | bool +{% endif %} +{% if playbook_name in ['full_nfv', 'on_prem', 'regional_dc', 'build_your_own'] %} + - role: ffmpeg_install + tags: intel-ffmpeg + when: + - ffmpeg_install_enabled | default(false) | bool +{% endif %} +{% if playbook_name in ['full_nfv', 'build_your_own'] %} - role: tadk_install tags: tadk when: - tadk_install | default(false) | bool -{%- endif %} +{% endif %} - role: cadvisor_install tags: cadvisor when: - cadvisor_enabled | default(false) | bool +{% if playbook_name in ['on_prem_sw_defined_factory', 'build_your_own'] %} + - role: intel_eci + tags: intel-eci + when: + - intel_eci_enabled | default(false) | bool +{% endif %} environment: - "{{ '{{' }} proxy_env | d({}) {{ '}}' }}" - REGISTRY_AUTH_FILE: {% raw %}"{% if on_cloud | default(false) %}/home/ubuntu/.crauth{% else %}{{ registry_containerd }}{% endif %}"{% endraw %} + any_errors_fatal: true - hosts: kube_node @@ -219,16 +226,16 @@ - sriov_cni_enabled | default(false) | bool - not sriov_network_operator_enabled | default(false) | bool tags: sriov-cni -{%- if playbook_name in ['full_nfv', 'on_prem', 'remote_fp', 'build_your_own'] %} +{% if playbook_name in ['full_nfv', 'on_prem', 'on_prem_vss', 'remote_fp', 'build_your_own'] %} - role: bond_cni_install when: bond_cni_enabled | default(true) | bool tags: bond-cni -{%- endif %} -{%- if playbook_name in ['full_nfv', 'remote_fp', 'build_your_own'] %} +{% endif %} +{% if playbook_name in ['full_nfv', 'remote_fp', 'build_your_own'] %} - role: userspace_cni_install tags: userspace-cni when: userspace_cni_enabled | default(true) | bool -{%- endif %} +{% endif %} environment: "{{ '{{' }} proxy_env | d({}) {{ '}}' }}" any_errors_fatal: true @@ -255,13 +262,15 @@ when: - kibana_enabled | default(false) | bool tags: kibana -{%- if playbook_name in ['access', 'full_nfv', 'on_prem', 'regional_dc', 'remote_fp', 'build_your_own'] %} +{% if playbook_name in ['access', 'full_nfv', 'on_prem', 'on_prem_vss', 'regional_dc', 'remote_fp', 'build_your_own'] %} - role: linkerd_service_mesh tags: linkerd-service-mesh when: - linkerd_service_mesh.enabled | default(false) | bool -{%- endif %} +{% endif %} - role: wait_for_kubernetes_ready + vars: + force_check: true tags: k8s-ready-final ignore_errors: yes when: @@ -272,13 +281,13 @@ - hosts: oru, kube_node[0] tasks: [] roles: -{%- if playbook_name in ['access', 'full_nfv', 'build_your_own'] %} +{% if playbook_name in ['access', 'full_nfv', 'build_your_own'] %} - role: cluster_defaults tags: defaults - role: intel_flexran tags: intel-flexran when: - intel_flexran_enabled | default(false) | bool -{%- endif %} +{% endif %} environment: "{{ '{{' }} proxy_env | d({}) {{ '}}' }}" any_errors_fatal: true diff --git a/generate/playbook_templates/main_playbook.j2 b/generate/playbook_templates/main_playbook.j2 index fae4e06f..65e78188 100644 --- a/generate/playbook_templates/main_playbook.j2 +++ b/generate/playbook_templates/main_playbook.j2 @@ -14,6 +14,13 @@ when: - kubernetes | default(true) - not on_cloud | default(false) + - kube_provisioner == "kubespray" +- name: provision Kubernetes cluster using rke2 + import_playbook: k8s/rke2.yml + when: + - kubernetes | default(true) + - not on_cloud | default(false) + - kube_provisioner == "rke2" - name: install Intel Container Experience Kit features import_playbook: intel/{{ playbook_name }}.yml when: kubernetes | default(true) diff --git a/generate/profiles_templates/cloud/profiles.yml b/generate/profiles_templates/cloud/profiles.yml index 1b1e140a..11eef93d 100644 --- a/generate/profiles_templates/cloud/profiles.yml +++ b/generate/profiles_templates/cloud/profiles.yml @@ -23,6 +23,8 @@ # - sgx # - sgx_dp # - kmra: +# sbx +# oran # pccs # apphsm # ctk_demo @@ -39,11 +41,12 @@ # - network_userspace # - dpdk # - ovs_dpdk -# - pstate -# - cstate -# - ufs - uncore frequency scaling # - sst -# - power_manager +# - power: +# manager +# pstate +# cstate +# uncore_frequency # - telemetry: # prometheus # collectd @@ -72,10 +75,12 @@ # ddp # fw_update # - intel_sriov_fec_operator +# - intel_oneapi +# base +# ai # - intel_flexran # - tadk -# - remove_kubespray_host_dns_settings -# - enable_dhclient_systemd_service +# - cadvisor --- access: name: access @@ -104,6 +109,8 @@ access: sgx: off sgx_dp: off kmra: + sbx: off + oran: off pccs: off apphsm: off ctk_demo: off @@ -115,11 +122,12 @@ access: network_userspace: off dpdk: on ovs_dpdk: off - pstate: off - cstate: off - ufs: off sst: off - power_manager: off + power: + manager: off + pstate: off + cstate: off + uncore_frequency: off telemetry: prometheus: on collectd: off @@ -149,10 +157,12 @@ access: ddp: off fw_update: off intel_sriov_fec_operator: off + intel_oneapi: + base: off + ai: off intel_flexran: off adq_dp: off - remove_kubespray_host_dns_settings: off - enable_dhclient_systemd_service: off + cadvisor: on basic: name: basic @@ -168,8 +178,11 @@ basic: sriov_network_dp: off nic_drivers: off dpdk: optional - cstate: optional - ufs: off + power: + manager: off + pstate: optional + cstate: optional + uncore_frequency: off telemetry: prometheus: on collectd: off @@ -189,8 +202,10 @@ basic: flow_config: off fw_update: off adq_dp: off - remove_kubespray_host_dns_settings: off - enable_dhclient_systemd_service: off + intel_oneapi: + base: optional + ai: optional + cadvisor: on full_nfv: name: full_nfv @@ -220,6 +235,8 @@ full_nfv: sgx: off sgx_dp: off kmra: + sbx: off + oran: off pccs: off apphsm: off ctk_demo: off @@ -231,11 +248,12 @@ full_nfv: network_userspace: on dpdk: on ovs_dpdk: on - pstate: optional - cstate: optional - ufs: off sst: off - power_manager: off + power: + manager: off + pstate: optional + cstate: optional + uncore_frequency: off telemetry: prometheus: on collectd: off @@ -265,11 +283,13 @@ full_nfv: ddp: off fw_update: off intel_sriov_fec_operator: off + intel_oneapi: + base: optional + ai: optional intel_flexran: off tadk: on adq_dp: off - remove_kubespray_host_dns_settings: off - enable_dhclient_systemd_service: off + cadvisor: on on_prem: name: on_prem @@ -288,6 +308,8 @@ on_prem: sgx: off sgx_dp: off kmra: + sbx: off + oran: off pccs: off apphsm: off ctk_demo: off @@ -303,11 +325,12 @@ on_prem: tas: off dpdk: on bond_cni: off - pstate: off - cstate: optional - ufs: off sst: off - power_manager: off + power: + manager: off + pstate: off + cstate: optional + uncore_frequency: off telemetry: prometheus: on collectd: off @@ -335,8 +358,10 @@ on_prem: flow_config: off fw_update: off adq_dp: off - remove_kubespray_host_dns_settings: off - enable_dhclient_systemd_service: off + intel_oneapi: + base: optional + ai: optional + cadvisor: on regional_dc: name: regional_dc @@ -357,6 +382,8 @@ regional_dc: sgx: off sgx_dp: off kmra: + sbx: off + oran: off pccs: off apphsm: off ctk_demo: off @@ -365,8 +392,11 @@ regional_dc: tas: off gas: off dpdk: optional - cstate: optional - ufs: off + power: + manager: off + pstate: optional + cstate: optional + uncore_frequency: off telemetry: prometheus: on collectd: off @@ -394,8 +424,10 @@ regional_dc: flow_config: off fw_update: off adq_dp: off - remove_kubespray_host_dns_settings: off - enable_dhclient_systemd_service: off + intel_oneapi: + base: optional + ai: optional + cadvisor: on remote_fp: name: remote_fp @@ -415,6 +447,8 @@ remote_fp: sgx: off sgx_dp: off kmra: + sbx: off + oran: off pccs: off apphsm: off ctk_demo: off @@ -432,11 +466,12 @@ remote_fp: bond_cni: off network_userspace: optional dpdk: on - pstate: off - cstate: optional - ufs: off sst: off - power_manager: off + power: + manager: off + pstate: off + cstate: optional + uncore_frequency: off telemetry: prometheus: on collectd: off @@ -465,8 +500,10 @@ remote_fp: ddp: off fw_update: off adq_dp: off - remove_kubespray_host_dns_settings: off - enable_dhclient_systemd_service: off + intel_oneapi: + base: optional + ai: optional + cadvisor: optional build_your_own: name: build_your_own @@ -494,6 +531,8 @@ build_your_own: sgx: off sgx_dp: off kmra: + sbx: off + oran: off pccs: off apphsm: off ctk_demo: off @@ -505,11 +544,12 @@ build_your_own: network_userspace: optional dpdk: optional ovs_dpdk: optional - pstate: off - cstate: optional - ufs: off sst: off - power_manager: off + power: + manager: off + pstate: off + cstate: optional + uncore_frequency: off telemetry: prometheus: optional collectd: off @@ -539,8 +579,10 @@ build_your_own: ddp: off fw_update: off intel_sriov_fec_operator: off + intel_oneapi: + base: optional + ai: optional intel_flexran: off tadk: optional adq_dp: off - remove_kubespray_host_dns_settings: off - enable_dhclient_systemd_service: off + cadvisor: optional diff --git a/generate/profiles_templates/common/group_vars.j2 b/generate/profiles_templates/common/group_vars.j2 index f03cdd6e..52923192 100644 --- a/generate/profiles_templates/common/group_vars.j2 +++ b/generate/profiles_templates/common/group_vars.j2 @@ -1,86 +1,268 @@ --- -## Container Experience Kits (CEK) primary playbook variables ## -# Do not change profile_name and configured_arch here! -# To generate vars for different profile/architecture use make command -# At present, the profile and arch generated are as follows +########################### +## Profile Configuration ## +########################### +## Do not modify values listed here +## Re-run the "make" command to change profile configuration + profile_name: {{ name }} configured_arch: {{ arch }} -# Extends the list of CPUs, that can be used for installation. -# You can get the model of your CPU using command `lscpu`. -# The CPU models in unconfirmed_cpu_models list can be used for the CEK installation, -# nevertheless they haven't been tested, so installation process may fail -# or some features may not work properly. -{%- if cloud_mode == 'on' %} -unconfirmed_cpu_models: ['8259C', '8175M', '8259CL', '8171M', '2673', '8370C'] # update list if required such as, unconfirmed_cpu_models: ['$0000%@'] or unconfirmed_cpu_models: ['$0000%@', '8490H'] -{%- else %} -unconfirmed_cpu_models: [] # update list if required such as, unconfirmed_cpu_models: ['$0000%@'] or unconfirmed_cpu_models: ['$0000%@', '8490H'] -{%- endif %} +{% if vm_mode in ['optional'] %} +# vm_enabled can't be enabled manually here +# To enable it, vm specific configuration from examples/vm need to be taken +{% endif %} +vm_enabled: {% if vm_mode == 'on' %}true{% else %}false{% endif %} + + +{% if cloud_mode == 'on' %} +# on_cloud is used when deploying Cloud RA +on_cloud: true + +{% endif %} +######################### +## Misc. Configuration ## +######################### + +# Preflight will check vars configuration +# It is NOT recommended to disable preflight, unless it is a conscious decision +preflight_enabled: true + +unconfirmed_cpu_models: {% if cloud_mode == 'on' %}['8259C', '8175M', '8259CL', '8171M', '2673', '8370C']{% else %}[]{% endif %} # Update list if required, e.g. ['$0000%@', '8490H'] # CEK project directory on all nodes project_root_dir: /opt/cek/ -vm_enabled: {% if vm_mode == 'on' %}true{% else %}false{% endif %} -{%- if vm_mode in ['optional'] %} -# vm_mode can't be enabled manually here -# To enable it, vm specific configuration from examples/vm need to be taken -{%- endif %} -{%- if vm_mode == 'on' %} +{% if vm_mode == 'on' %} # When vm_recreate_existing is false, existing VMs are not touch during cluster update/scaling # When vm_recreate_existing is true, existing VMs are destroyed and created again during cluster update/scaling vm_recreate_existing: false -{%- endif %} - -{%- if cloud_mode == 'on' %} -# on_cloud is used when deploying Cloud RA -on_cloud: true -{%- endif %} +{% endif %} +# Improve deployment stability of Kubespray by increasing wait between retries of failed ops like pushing/downloading +retry_stagger: 20 -#POST DEPLOYMENT HOOKS: hooks_local dir will run .py, .sh and .yaml files (it will find inside this dir) on ansible host -#hooks_remote dir will run .py and .sh scripts (it will find inside this dir) on kube_control_plane; +# Post deployment hooks: hooks_local dir will run .py, .sh and .yaml files (it will find inside this dir) on ansible host +# hooks_remote dir will run .py and .sh scripts (it will find inside this dir) on kube_control_plane; post_deployment_hook_enabled: false hooks_local: /root/hooks/local hooks_remote: /root/hooks/remote -# Kubernetes version -kubernetes: true -kube_version: v1.26.1 # test placeholder: n version -#kube_version: v1.25.6 # test placeholder: n-1 version -#kube_version: v1.24.10 # test placeholder: n-2 version # To deploy only container runtime set this variable as "true", and kubernetes as "false" # Set both variables as "false" to perform only host configuration container_runtime_only_deployment: false + +########################## +## System Configuration ## +########################## + +# Run system-wide package update (apt dist-upgrade, yum update, ...) +# Note: enabling this may lead to unexpected results +# Tip: you can set this per host using host_vars +update_all_packages: false +update_kernel: false + +# Add arbitrary parameters to GRUB +additional_grub_parameters_enabled: false +additional_grub_parameters: "" + +# SELinux configuration state: current, enabled, disabled +selinux_state: current + +{% if firewall in ['on', 'optional'] %} +firewall_enabled: {% if firewall == "on" %}true{% else %}false{% endif %} + + +{% endif %} +## Proxy configuration ## +#http_proxy: "http://proxy.example.com:1080" +#https_proxy: "http://proxy.example.com:1080" +#additional_no_proxy: ".example.com,mirror_ip" # No need to include the following (will be added automatically): localhost, 127.0.0.1, controllerIPs, nodesIPs + +# (Ubuntu only) Disables DNS stub listener which may cause issues on Ubuntu +dns_disable_stub_listener: true + +# Remove the block between ansible markers set by kubespray in dhclient & hosts files to avoid DNS & LDAP issues (connection loss) after K8s setup after reboot +# It is needed only in some lab environments. Default setting is false. +# TODO: JP: This workaround should not be needed any longer. To be removed completely once it pass complete validation cycle +remove_kubespray_host_dns_settings: false + +############################## +## Kubernetes Configuration ## +############################## + +kubernetes: true +# Kubernetes provisioner, Support: rke2(work with os ubuntu22.04 and containerd as container_runtime only), kubespray(default option) +kube_provisioner: kubespray +kube_version: v1.26.3 # test placeholder: n version +#kube_version: v1.25.8 # test placeholder: n-1 version +#kube_version: v1.24.12 # test placeholder: n-2 version +rke2_version: v1.26.2+rke2r1 # test placeholder: n version + +{% if kube_dashboard in ['on', 'optional'] %} +# Kubernetes Dashboard +kube_dashboard_enabled: {% if kube_dashboard == 'on' %}true{% else %}false{% endif %} + + +{% endif %} +# Kubernetes cluster name, also will be used as DNS domain +cluster_name: cluster.local + +{% if cert_manager in ['on', 'optional']%} +# Cert manager deployment +cert_manager_enabled: {% if cert_manager == "on"%}true{% else %}false{% endif%} + + +{% endif %} # Kubenetes Audit policy custom rules # https://github.com/kubernetes-sigs/kubespray/blob/master/roles/kubernetes/control-plane/templates/apiserver-audit-policy.yaml.j2 audit_policy_custom_rules: "" # Kubernetes container runtime: docker, containerd, crio # When "crio" is set, please enable "crio_registries" section -{%- if cloud_mode == 'on' %} +{% if cloud_mode == 'on' %} container_runtime: containerd {% else %} container_runtime: docker -{%- endif %} +{% endif %} + +{% if rancher_manager in ['on', 'optional'] %} +# Rancher Manager(supported on rke2 currently) +rancher_manager_enabled: {% if rancher_manager == 'on' %}true{% else %}false{% endif %} +{% endif %} + +######################## +## Kubernetes Network ## +######################## + +kube_controller_manager_bind_address: 127.0.0.1 +kube_proxy_metrics_bind_address: 127.0.0.1 + +# Comment this line out if you want to expose k8s services of type nodePort externally. +kube_proxy_nodeport_addresses_cidr: 127.0.0.0/8 + +kube_pods_subnet: 10.244.0.0/16 +{% if name in ['regional_dc', 'full_nfv', 'access', 'build_your_own'] %} +{% set mask = 18 %} +{% elif name == 'remote_fp' %} +{% set mask = 19 %} +{% elif name in ['on_prem', 'on_prem_vss', 'on_prem_sw_defined_factory'] %} +{% set mask = 21 %} +{% elif name == 'basic' %} +{% set mask = 22 %} +{% endif %} +kube_service_addresses: 10.233.0.0/{{ mask }} + +# Supported plugins: calico, flannel, cilium for kubespray; canal, calico, cilium for rke2 +kube_network_plugin: calico + +# Calico settings +{% if vm_mode in ['on'] %} +# For VM mode calico_backend has to be vxlan, otherwise deployment will fail +calico_network_backend: vxlan +{% else %} +calico_network_backend: vxlan # Supported backends: [vxlan, bird(kubespray only)] +{% endif %} + +kube_network_plugin_multus: {% if multus == 'on' %}true{% else %}false{% endif %} + + +# Set on true if you want to enable the eBPF dataplane support +calico_bpf_enabled: false + +{% if sriov_network_dp in ["on", "optional"] or network_userspace in ["on", "optional"] %} +# Create reference net-attach-def objects +example_net_attach_defs: + # Values below should match host_vars CNI configuration +{% if sriov_network_dp in ["on", "optional"] %} + sriov_net_dp: {% if sriov_network_dp == "on" %}true{% else %}false{% endif %} + +{% endif %} +{% if network_userspace in ["on", "optional"] %} + userspace_ovs_dpdk: {% if network_userspace == "on" %}true{% else %}false{% endif %} + + userspace_vpp: false +{% endif %} + +{% endif %} +############################## +## Kubernetes Node Features ## +############################## + +{% if nfd in ['on', 'optional'] %} +# Node Feature Discovery +nfd_enabled: {% if nfd == 'on' %}true{% else %}false{% endif %} + +nfd_namespace: kube-system +nfd_sleep_interval: 60s +{% endif %} + +{% if tas in ['on', 'optional'] or gas in ['on', 'optional'] %} +# Intel Platform Aware Scheduling (PAS) +pas_namespace: kube-system + +{% if tas in ['on', 'optional'] %} +# Intel Platform Aware Scheduling - Telemetry Aware Scheduling (TAS) +tas_enabled: {% if tas == 'on' %}true{% else %}false{% endif %} + +tas_build_image_locally: false +# Create and enable TAS demonstration policy: [true, false] +tas_enable_demo_policy: false +{% endif %} + +{% if gas in ['on', 'optional'] %} +# Intel Platform Aware Scheduling - GPU Aware Scheduling (GAS) +gas_enabled: {% if gas == 'on' %}true{% else %}false{% endif %} + +gas_build_image_locally: false +{% endif %} +{% endif %} + +################## +## CPU Features ## +################## {% if intel_cpu_controlplane in ['on', 'optional'] %} -# Resource management control plane component provides CPU Management funtionality within k8s, -# enables fine-granular control of pinning, NUMA, ... +# CPU Control Plane Plugin for Kubernetes +# https://github.com/intel/cpu-control-plane-plugin-for-kubernetes intel_cpu_controlplane: - enabled: false - allocator: default # ['default', 'numa', 'numa-namespace', 'numa-namespace-exclusive'] - agent_namespace_prefix: test- # control plane agent namespace + enabled: false # Enable Intel CPU Control Plane + allocator: default # Intel CPU Control plane allocation policy ['default', 'numa', 'numa-namespace', 'numa-namespace-exclusive'] + agent_namespace_prefix: test- # Intel CPU Control plane agent namespace + enable_memory_pinning: true # Enable Intel CPU Control Plane memory pinning + {% endif %} +{% if native_cpu_manager in ['on', 'optional'] %} +# Native CPU Manager (Kubernetes built-in) +# Setting this option as "true" enables the "static" policy, otherwise the default "none" policy is used. +# The reserved CPU cores settings are individual per each worker node, and therefore are available to configure in the host_vars file +native_cpu_manager_enabled: {% if native_cpu_manager == 'on' %}true{% else %}false{% endif %} + + +{% endif %} +{% if topology_manager in ['on', 'optional'] %} +# Enable Kubernetes built-in Topology Manager +topology_manager_enabled: {% if topology_manager == 'on' %}true{% else %}false{% endif %} + +# There are four supported policies: none, best-effort, restricted, single-numa-node. +topology_manager_policy: "best-effort" +{% endif %} + +###################### +## Storage Features ## +###################### {% if lpvsp in ['on', 'optional'] %} # The local persistence volume static provisioner # https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner local_volume_provisioner_enabled: {% if lpvsp == 'on' %}true{% else %}false{% endif %} -{% endif %} + +{% endif %} {% if rook_ceph in ['on', 'optional'] %} rook_ceph: enabled: {% if rook_ceph == 'on' %}true{% else %}false{% endif %} + log_level: "DEBUG" # The logging level for the operator: ["ERROR", "WARNING", "INFO", "DEBUG"] allow_loop_devices: true # Allow using loop devices for osds in test clusters enable_nfs: true # Enable the CSI NFS drivers @@ -96,80 +278,42 @@ rook_ceph: # In that case, one mgr will be active and one in standby. When Ceph updates which # mgr is active, Rook will update the mgr services to match the active mgr. allow_multiple_mgr_per_node: true -{% endif %} - -# cAdvisor -cadvisor_enabled: false -cadvisor_custom_events_config_on: false - -# Preflight will check vars configuration -# It is NOT recommended to disable preflight, unless it is a conscious decision -preflight_enabled: true - -# Run system-wide package update (apt dist-upgrade, yum update, ...) -# Note: enabling this may lead to unexpected results -# Tip: you can set this per host using host_vars -update_all_packages: false -update_kernel: false -# Add arbitrary parameters to GRUB -additional_grub_parameters_enabled: false -additional_grub_parameters: "" - -# SELinux configuration state: current, enabled, disabled -selinux_state: current -{% if nfd in ['on', 'optional'] %} -# Node Feature Discovery -nfd_enabled: {% if nfd == 'on' %}true{% else %}false{% endif %} -nfd_namespace: kube-system -nfd_sleep_interval: 60s -{% endif %} - -{%- if kube_dashboard in ['on', 'optional'] %} -# Kubernetes Dashboard -kube_dashboard_enabled: {% if kube_dashboard == 'on' %}true{% else %}false{% endif %} {% endif %} +{% if minio in ['on', 'optional'] %} +## MinIO variables ## +# Enable Minio Storage service. +minio_enabled: {% if minio == 'on' %}true{% else %}false{% endif %} -{%- if native_cpu_manager in ['on', 'optional'] %} -# Native CPU Manager (Kubernetes built-in) -# Setting this option as "true" enables the "static" policy, otherwise the default "none" policy is used. -# The reserved CPU cores settings are individual per each worker node, and therefore are available to configure in the host_vars file -native_cpu_manager_enabled: {% if native_cpu_manager == 'on' %}true{% else %}false{% endif %} -{% endif %} -{% if topology_manager in ['on', 'optional'] -%} -# Enable Kubernetes built-in Topology Manager -topology_manager_enabled: {% if topology_manager == 'on' %}true{% else %}false{% endif %} -# There are four supported policies: none, best-effort, restricted, single-numa-node. -topology_manager_policy: "best-effort" -{% endif %} +minio_tenant_enabled: true # Specifies whether to install MinIO Sample Tenant +minio_tenant_servers: 4 # The number of MinIO Tenant nodes +minio_tenant_volumes_per_server: 4 # The number of volumes per servers +minio_tenant_volume_size: 5 + # The size of each volume (unit: GiB) +minio_deploy_test_mode: true # When true, use a file as loop device when creating storage + # called "virtual block device" which is useful for test or automation purpose. + # When false, use an actual NVME or SSD device when creating storage +minio_build_image_locally: true # Build custom MinIO image locally +minio_awsclient_pods_enabled: true # Run AWS client pods for MinIO Tenant service +minio_ingress_enabled: false # Enable MinIO tenant ingress -{%- if cloud_mode == 'on' %} -# OpenShift SRIOV Network Operator -# Do not change the value of sriov_network_operator_enabled, as it will cause Cloud RA to fail -sriov_network_operator_enabled: false -{% else %} -{%- if sriov_operator in ['on', 'optional'] %} -# OpenShift SRIOV Network Operator -sriov_network_operator_enabled: {% if sriov_operator == 'on' %}true{% else %}false{% endif %} -{%- if vm_mode in ['on', 'optional'] %} -# For VM mode sriov_network_operator_enabled has to be false, otherwise VFs -# are not created before VM creation -{%- endif %} -sriov_network_operator_namespace: "sriov-network-operator" -{%- endif %} {% endif %} +#################### +## Device Plugins ## +#################### -{%- if sriov_network_dp in ['on', 'optional'] %} +{% if sriov_network_dp in ['on', 'optional'] %} # Intel SRIOV Network Device Plugin sriov_net_dp_enabled: {% if sriov_network_dp == 'on' %}true{% else %}false{% endif %} + sriov_net_dp_namespace: kube-system -# whether to build and store image locally or use one from public external registry +# Whether to build and store image locally or use one from public external registry sriov_net_dp_build_image_locally: false # SR-IOV network device plugin configuration. # For more information on supported configuration refer to: https://github.com/intel/sriov-network-device-plugin#configurations -{%- if intel_flexran == 'on' %} +{% if intel_flexran == 'on' %} # sriovdp_config_data for Intel FlexRAN is defined in the helm_values for the sriov_dp_install role -{%- else %} +{% else %} sriovdp_config_data: | { "resourceList": [{ @@ -195,7 +339,7 @@ sriovdp_config_data: | "devices": ["1889"], "drivers": ["vfio-pci"] } - {% if name in ['full_nfv', 'access', 'regional_dc', 'build_your_own'] -%} +{% if name in ['full_nfv', 'access', 'regional_dc', 'build_your_own'] %} }, { "resourceName": "intel_fpga", @@ -205,75 +349,42 @@ sriovdp_config_data: | "devices": ["0d90"] } } - {%- else -%} +{% else %} } - {%- endif %} +{% endif %} ] } {% endif %} -{% endif %} -{%- if power_manager in ['on', 'optional'] and arch in ['icx', 'clx', 'spr'] %} -# Intel Kubernetes Power Manager -intel_power_manager: - enabled: {% if power_manager == 'on' %}true{% else %}false{% endif %} # enable intel_power_manager - # The performance profile is available for nodes that has CPU max MHz > 3500.0000 - use 'lscpu' command to see your node details - power_profiles: [performance, balance-performance, balance-power] # the list of PowerProfiles that will be available on the nodes - # possible PowerProfiles are: performance, balance_performance, balance_power - power_nodes: [] # list of nodes that should be considered during Operator work and profiles deployment - # - node1 - # - node2 - build_image_locally: false # build Intel Power Manager image locally - deploy_example_pods: false # deploy example Pods that will utilize special resources - global_shared_profile_enabled: false # deploy custom Power Profile with user defined frequencies that can be applied to all power nodes - # to make use of Shared Profile fill Shared Workload settings in host vars - max_shared_frequency: 1500 # max frequency that will be applied for cores by Shared Workload - min_shared_frequency: 1000 # min frequency that will be applied for cores by Shared Workload {% endif %} - -{%- if sgx_dp in ['on', 'optional'] and arch in ['icx', 'spr'] or +{% if sgx_dp in ['on', 'optional'] and arch in ['icx', 'spr', 'emr'] or gpu_dp in ['on', 'optional'] or qat_dp in ['on', 'optional'] or - dsa_dp in ['on', 'optional'] and arch in ['spr'] or - dlb_dp in ['on', 'optional'] and arch in ['spr'] %} + dsa_dp in ['on', 'optional'] and arch in ['spr', 'emr'] or + dlb_dp in ['on', 'optional'] and arch in ['spr', 'emr'] %} # Intel Device Plugin Operator -intel_dp_namespace: kube-system # namespace will be applied for SGX DP, GPU DP and QAT DP -{% endif %} +intel_dp_namespace: kube-system # Namespace will be applied for SGX DP, GPU DP and QAT DP -{%- if intel_ai in ['on', 'optional'] %} -intel_ai_enabled: {% if intel_ai == 'on' %}true{% else %}false{% endif %} {% endif %} - -{%- if dlb_dp in ['on', 'optional'] and arch in ['spr'] %} +{% if dlb_dp in ['on', 'optional'] and arch in ['spr', 'emr'] %} # Intel Dynamic Load Balancing Device Plugin (Intel DLB DP) for Kubernetes -dlb_dp_enabled: {% if dlb_dp == 'on' %}true{% else %}false{% endif %} # if true set configure_dlb_devices to true in host vars +dlb_dp_enabled: {% if dlb_dp == 'on' %}true{% else %}false{% endif %} # If true set configure_dlb_devices to true in host vars dlb_dp_build_image_locally: false dlb_dp_verbosity: 4 -{% endif %} -{%- if dsa_dp in ['on', 'optional'] and arch in ['spr'] %} +{% endif %} +{% if dsa_dp in ['on', 'optional'] and arch in ['spr', 'emr'] %} # Intel Data Streaming Accelerator Device Plugin (Intel DSA DP) for Kubernetes -dsa_dp_enabled: {% if dsa_dp == 'on' %}true{% else %}false{% endif %} # if true set configure_dsa_devices to true in host vars +dsa_dp_enabled: {% if dsa_dp == 'on' %}true{% else %}false{% endif %} # If true set configure_dsa_devices to true in host vars dsa_dp_build_image_locally: false dsa_dp_verbosity: 4 -dsa_shared_devices: 10 # number of containers that can share the same DSA device. -{% endif %} - -{%- if intel_ethernet_operator.enabled in ['on', 'optional'] %} -# Intel Ethernet Operator for Intel E810 Series network interface cards -intel_ethernet_operator_enabled: {% if intel_ethernet_operator.enabled == 'on' and nic == 'cvl' %}true{% else %}false{% endif %} -# Use together with flow_configuration set in hostvars -intel_ethernet_operator_flow_config_enabled: {% if intel_ethernet_operator.flow_config == 'on' and nic == 'cvl' %}true{% else %}false{% endif %} -{% endif %} +dsa_shared_devices: 10 # Number of containers that can share the same DSA device. -{%- if intel_sriov_fec_operator in ['on', 'optional'] %} -# Intel Operator for SR-IOV Wireless Forward Error Correction (FEC) Accelerators -intel_sriov_fec_operator_enabled: {% if intel_sriov_fec_operator == 'on' %}true{% else %}false{% endif %} # if true, deploy FEC Operator {% endif %} - -{%- if qat_dp in ['on', 'optional'] %} +{% if qat_dp in ['on', 'optional'] %} # Intel QAT Device Plugin for Kubernetes qat_dp_enabled: {% if qat_dp == 'on' %}true{% else %}false{% endif %} + qat_dp_verbosity: 4 # Maximum number of QAT devices (minimum is 1) to be provided to the QAT Device Plugin. # To use all available QAT devices on each node, qat_dp_max_devices must be equal to the highest number of QAT Devices from all nodes @@ -283,8 +394,8 @@ qat_dp_max_num_devices: 32 qat_dp_build_image_locally: false # Allocation policy - 2 possible values: balanced and packed. # Balanced mode spreads allocated QAT VF resources balanced among QAT PF devices, and packed mode packs one QAT PF device -# full of QAT VF resources before allocating resources from the next QAT PF.(There is no default.) -# allocation_policy: balanced +# full of QAT VF resources before allocating resources from the next QAT PF. There is no default value. +#allocation_policy: balanced qat_supported_pf_dev_ids: - "435" - "37c8" @@ -304,30 +415,27 @@ qat_supported_vf_dev_ids: - "18a1" - "4941" - "4943" -{% endif %} -{%- if openssl in ['on', 'optional'] %} -# This feature will enable OpenSSL*Engine -openssl_engine_enabled: {% if openssl == 'on' %}true{% else %}false{% endif %} # to activate OpenSSL*Engine set 'install_openssl' to 'true' in host_vars {% endif %} - -{%- if gpu_dp in ['on', 'optional'] %} +{% if gpu_dp in ['on', 'optional'] %} # Intel GPU Device Plugin for Kubernetes gpu_dp_enabled: {% if gpu == 'on' %}true{% else %}false{% endif %} + gpu_dp_verbosity: 4 gpu_dp_build_image_locally: false # Configuration-options # To fully discover the below settings usage, please refer to: https://github.com/intel/intel-device-plugins-for-kubernetes/tree/v0.24.0/cmd/gpu_plugin -gpu_dp_shared_devices: 10 # number of containers (min. 1) that can share the same GPU device -gpu_dp_monitor_resources: false # enable monitoring all GPU resources on the node -gpu_dp_fractional_manager: false # enable handling of fractional resources for multi-GPU nodes -gpu_dp_prefered_allocation: 'none' # available policies are: ['balanced', 'packed', 'none'] -{% endif %} +gpu_dp_shared_devices: 10 # Number of containers (min. 1) that can share the same GPU device +gpu_dp_monitor_resources: false # Enable monitoring all GPU resources on the node +gpu_dp_fractional_manager: false # Enable handling of fractional resources for multi-GPU nodes +gpu_dp_prefered_allocation: 'none' # Available policies are: ['balanced', 'packed', 'none'] -{%- if sgx_dp in ['on', 'optional'] and arch in ['icx', 'spr'] %} +{% endif %} +{% if sgx_dp in ['on', 'optional'] and arch in ['icx', 'spr'] %} # Intel SGX Device Plugin for Kubernetes sgx_dp_enabled: {% if sgx_dp == 'on' %}true{% else %}false{% endif %} + sgx_dp_verbosity: 4 sgx_dp_build_image_locally: false sgx_aesmd_namespace: intel-sgx-aesmd @@ -337,228 +445,296 @@ sgx_aesmd_demo_enable: false sgx_dp_provision_limit: 20 # EnclaveLimit is a number of containers that can share the same SGX enclave device. sgx_dp_enclave_limit: 20 -{%- if vm_mode == 'on' %} +{% if vm_mode == 'on' %} # Memory size for SGX enclave in MB sgx_memory_size: 16 -{%- endif %} {% endif %} -{%- if (kmra and (kmra.pccs in ['on', 'optional'] or - kmra.apphsm in ['on', 'optional'] or - kmra.ctk_demo in ['on', 'optional'])) and - arch in ['icx'] %} -# KMRA (Key Management Reference Application) -# Please, refer to the roles/kmra_install/defaults/main.yml for the full list of configuration options available. -kmra: - {%- if kmra.pccs in ['on', 'optional'] %} - pccs: - enabled: {% if kmra.pccs == 'on' %}true{% else %}false{% endif %} # enable PCCS application - # The PCCS uses this API key to request collaterals from Intel's Provisioning Certificate Service. User needs to subscribe first to obtain an API key. - # For how to subscribe to Intel Provisioning Certificate Service and receive an API key, go to https://api.portal.trustedservices.intel.com/provisioning-certification, - # and get an API key by clicking 'Subscribe'. - api_key: "ffffffffffffffffffffffffffffffff" - {%- endif %} - {%- if kmra.apphsm in ['on', 'optional'] %} - apphsm: - enabled: {% if kmra.apphsm == 'on' %}true{% else %}false{% endif %} # enable AppHSM application - {%- endif %} - {%- if kmra.ctk_demo in ['on', 'optional'] %} - ctk_loadkey_demo: - enabled: {% if kmra.ctk_demo == 'on' %}true{% else %}false{% endif %} # enable CTK demo application - {%- endif %} {% endif %} -{%- if istio_service_mesh and istio_service_mesh.enabled in ['on', 'optional'] %} +############### +## Operators ## +############### + +{% if cloud_mode == 'on' %} +# OpenShift SRIOV Network Operator +# Do not change the value of sriov_network_operator_enabled, as it will cause Cloud RA to fail +sriov_network_operator_enabled: false +{% else %} +{% if sriov_operator in ['on', 'optional'] %} +# OpenShift SRIOV Network Operator +sriov_network_operator_enabled: {% if sriov_operator == 'on' %}true{% else %}false{% endif %} + +{% if vm_mode in ['on', 'optional'] %} +# For VM mode sriov_network_operator_enabled has to be false, otherwise VFs +# are not created before VM creation +{% endif %} +sriov_network_operator_namespace: "sriov-network-operator" +{% endif %} + +{% endif %} +{% if intel_ethernet_operator.enabled in ['on', 'optional'] %} +# Intel Ethernet Operator for Intel E810 Series network interface cards +intel_ethernet_operator_enabled: {% if intel_ethernet_operator.enabled == 'on' and nic == 'cvl' %}true{% else %}false{% endif %} + +# Use together with flow_configuration set in hostvars +intel_ethernet_operator_flow_config_enabled: {% if intel_ethernet_operator.flow_config == 'on' and nic == 'cvl' %}true{% else %}false{% endif %} + + +{% endif %} +{% if intel_sriov_fec_operator in ['on', 'optional'] %} +# Intel Operator for SR-IOV Wireless Forward Error Correction (FEC) Accelerators +intel_sriov_fec_operator_enabled: {% if intel_sriov_fec_operator == 'on' %}true{% else %}false{% endif %} # Enable FEC Operator +# When intel_sriov_fec_operator == true and container_runtime == containerd, +# Red Hat Account is needed, refer to https://access.redhat.com/RegistryAuthentication to apply, +# and uncomment below two lines with correct values. +# redhat_user: ffffffffffffffffffffffffffffff +# redhat_password: ffffffffffffffffffffffffffffff + +{% endif %} +################## +## Service Mesh ## +################## + +{% if istio_service_mesh and istio_service_mesh.enabled in ['on', 'optional'] %} # Service mesh deployment # https://istio.io/latest/docs/setup/install/istioctl/ # Intel Istio # https://github.com/intel/istio -# for all available options, please, refer to the 'roles/istio_service_mesh/vars/main.yml; -# for the options dependencies and compatibility, please, refer to the official CEK documentation; +# For all available options, please, refer to 'roles/istio_service_mesh/vars/main.yml'. +# For the options dependencies and compatibility, please, refer to the official CEK documentation. istio_service_mesh: - enabled: {% if istio_service_mesh.enabled == 'on' %}true{% else %}false{% endif %} # enable Istio Service Mesh - # available profiles are: 'default', 'demo', 'minimal', 'external', 'empty', 'preview', + enabled: {% if istio_service_mesh.enabled == 'on' %}true{% else %}false{% endif %} # Enable Istio Service Mesh + # Available profiles are: 'default', 'demo', 'minimal', 'external', 'empty', 'preview', # 'sgx-mtls', 'intel-qat-hw', 'intel-qat-sw', 'intel-cryptomb' - # if custom profile needs to be deployed, please, place the file named '.yaml' - # into the directory 'roles/istio_service_mesh/files/profiles/' - # 'custom-ca' profile name is reserved for usage by sgx_signer if sgx_signer option is enabled - # any profile name provided will be overwritten in this case + # If custom profile needs to be deployed, please, place the file named '.yaml' + # into the directory 'roles/istio_service_mesh/files/profiles/'. + # 'custom-ca' profile name is reserved for usage by sgx_signer if sgx_signer option is enabled. + # Any profile name provided will be overwritten in this case profile: {% if istio_service_mesh.sgx_signer == 'on' and arch in ['icx'] %}custom-ca{% else %}default{% endif %} # Istio profile intel_preview: - enabled: {% if istio_service_mesh.intel_preview == 'on' %}true{% else %}false{% endif %} # enable intel istio preview - {%- if istio_service_mesh.tcpip_bypass_ebpf in ['on', 'optional'] %} + enabled: {% if istio_service_mesh.intel_preview == 'on' %}true{% else %}false{% endif %} # Enable intel istio preview +{% if istio_service_mesh.tcpip_bypass_ebpf in ['on', 'optional'] %} tcpip_bypass_ebpf: - enabled: {% if istio_service_mesh.tcpip_bypass_ebpf == 'on' %}true{% else %}false{% endif %} # enable tcp/ip ebpf bypass demo - {%- endif %} - {%- if istio_service_mesh.tls_splicing in ['on', 'optional'] %} + enabled: {% if istio_service_mesh.tcpip_bypass_ebpf == 'on' %}true{% else %}false{% endif %} # Enable tcp/ip ebpf bypass demo +{% endif %} +{% if istio_service_mesh.tls_splicing in ['on', 'optional'] %} tls_splicing: - enabled: {% if istio_service_mesh.tls_splicing == 'on' %}true{% else %}false{% endif %} # enable TLS splicing demo - {%- endif %} - {%- if istio_service_mesh.sgx_signer in ['on', 'optional'] and arch in ['icx'] %} + enabled: {% if istio_service_mesh.tls_splicing == 'on' %}true{% else %}false{% endif %} # Enable TLS splicing demo +{% endif %} +{% if istio_service_mesh.sgx_signer in ['on', 'optional'] and arch in ['icx'] %} sgx_signer: - enabled: {% if istio_service_mesh.sgx_signer == 'on' %}true{% else %}false{% endif %} # enable automated key management integration + enabled: {% if istio_service_mesh.sgx_signer == 'on' %}true{% else %}false{% endif %} # Enable automated key management integration name: sgx-signer - {%- endif %} - {%- if istio_service_mesh.intel_preview in ['on', 'optional'] and arch not in ['spr']%} +{% endif %} +{% if istio_service_mesh.intel_preview in ['on', 'optional'] and arch not in ['spr', 'emr']%} # uncomment following section and enable intel_preview if sgx-mtls profile is selected - {% if istio_service_mesh.intel_preview == 'optional' %}#{% endif %}set: # istio intel preview with sgx-mtls - {% if istio_service_mesh.intel_preview == 'optional' %}# {% endif %}- values.global.proxy.sgx.enabled=true # istio intel preview with sgx-mtls - {% if istio_service_mesh.intel_preview == 'optional' %}# {% endif %}- values.global.proxy.sgx.certExtensionValidationEnabled=true # istio intel preview with sgx-mtls - {% if istio_service_mesh.intel_preview == 'optional' %}# {% endif %}- values.gateways.sgx.enabled=true # istio intel preview with sgx-mtls - {% if istio_service_mesh.intel_preview == 'optional' %}# {% endif %}- values.gateways.sgx.certExtensionValidationEnabled=true # istio intel preview with sgx-mtls - {%- endif %} + {% if istio_service_mesh.intel_preview == 'optional' %}#{% endif %}set: # Istio intel preview with sgx-mtls + {% if istio_service_mesh.intel_preview == 'optional' %}# {% endif %}- values.global.proxy.sgx.enabled=true # Istio intel preview with sgx-mtls + {% if istio_service_mesh.intel_preview == 'optional' %}# {% endif %}- values.global.proxy.sgx.certExtensionValidationEnabled=true # Istio intel preview with sgx-mtls + {% if istio_service_mesh.intel_preview == 'optional' %}# {% endif %}- values.gateways.sgx.enabled=true # Istio intel preview with sgx-mtls + {% if istio_service_mesh.intel_preview == 'optional' %}# {% endif %}- values.gateways.sgx.certExtensionValidationEnabled=true # Istio intel preview with sgx-mtls {% endif %} -{%- if linkerd_service_mesh and linkerd_service_mesh.enabled in ['on', 'optional'] %} +{% endif %} +{% if linkerd_service_mesh and linkerd_service_mesh.enabled in ['on', 'optional'] %} # LinkerD service mesh # https://linkerd.io/ -# linkerd_service_mesh: - enabled: {% if linkerd_service_mesh.enabled == 'on' %}true{% else %}false{% endif %} # enable LinkerD Service Mesh -{% endif %} - -{%- if tcs in ['on', 'optional'] and - arch in ['icx'] %} -# Trusted Certificate Service deployment -# https://github.com/intel/trusted-certificate-issuer -tcs: - enabled: {% if tcs == 'on' %}true{% else %}false{% endif %} # enable Trusted Certificate Issuer - build_image_locally: false # build Trusted Certificate Issuer image locally -{% endif %} + enabled: {% if linkerd_service_mesh.enabled == 'on' %}true{% else %}false{% endif %} # Enable LinkerD Service Mesh -{%- if tac in ['on', 'optional'] and - arch in ['icx'] %} -# Trusted Attestation Controller deployment -# https://github.com/intel/trusted-attestation-controller -tac: - enabled: {% if tac == 'on' %}true{% else %}false{% endif %} # enable Trusted Attestation Controller - build_image_locally: false # build Trusted Attestation Controller image locally -{% endif %} - -{%- if tas in ['on', 'optional'] or gas in ['on', 'optional'] %} -# Intel Platform Aware Scheduling (PAS) -pas_namespace: kube-system - -{%- if tas in ['on', 'optional'] %} -# Intel Platform Aware Scheduling - Telemetry Aware Scheduling (TAS) -tas_enabled: {% if tas == 'on' %}true{% else %}false{% endif %} -tas_build_image_locally: false -# create and enable TAS demonstration policy: [true, false] -tas_enable_demo_policy: false {% endif %} - -{%- if gas in ['on', 'optional'] %} -# Intel Platform Aware Scheduling - GPU Aware Scheduling (GAS) -gas_enabled: {% if gas == 'on' %}true{% else %}false{% endif %} -gas_build_image_locally: false -{%- endif %} -{%- endif %} +############################### +## Telemetry & Observability ## +############################### # Telemetry configuration. There are two options, Telegraf and Collectd, which are mutually exclusive. # Default option is Telegraf. # If Telegraf is enabled then the following parts of the stack need to be enabled as well: elasticsearch, # jaeger, opentelemetry, kibana. Collectd has to be disabled in that case. # If Collectd is enabled then all Telegraf stack components need to be disabled. -{%- if telemetry.prometheus in ['on', 'optional'] %} +{% if telemetry.prometheus in ['on', 'optional'] %} prometheus_operator: {% if telemetry.prometheus == 'on'%}true{% else %}false{% endif %} -{%- endif %} -{%- if telemetry.collectd in ['on', 'optional'] %} + +{% endif %} +{% if telemetry.collectd in ['on', 'optional'] %} collectd_enabled: {% if telemetry.collectd == 'on'%}true{% else %}false{% endif %} -{%- endif %} -{%- if telemetry.telegraf in ['on', 'optional'] %} + +{% endif %} +{% if telemetry.telegraf in ['on', 'optional'] %} telegraf_enabled: {% if telemetry.telegraf == 'on'%}true{% else %}false{% endif %} -{%- endif %} -{%- if telemetry.jaeger in ['on', 'optional'] %} + +{% endif %} +{% if telemetry.jaeger in ['on', 'optional'] %} jaeger_operator: {% if telemetry.jaeger == 'on'%}true{% else %}false{% endif %} -{%- endif %} -{%- if telemetry.opentelemetry in ['on', 'optional'] %} + +{% endif %} +{% if telemetry.opentelemetry in ['on', 'optional'] %} opentelemetry_enabled: {% if telemetry.opentelemetry == 'on'%}true{% else %}false{% endif %} -{%- endif %} -{%- if telemetry.elasticsearch in ['on', 'optional'] %} + +{% endif %} +{% if telemetry.elasticsearch in ['on', 'optional'] %} elasticsearch_enabled: {% if telemetry.elasticsearch == 'on'%}true{% else %}false{% endif %} -{%- endif %} -{%- if telemetry.kibana in ['on', 'optional'] %} + +{% endif %} +{% if telemetry.kibana in ['on', 'optional'] %} kibana_enabled: {% if telemetry.kibana == 'on'%}true{% else %}false{% endif %} -{%- endif %} + +{% endif %} collectd_scrap_interval: 30 telegraf_scrap_interval: 30 -{% if sriov_network_dp in ["on", "optional"] or network_userspace in ["on", "optional"] -%} -# Create reference net-attach-def objects -example_net_attach_defs: -{%- if sriov_network_dp in ["on", "optional"] %} - sriov_net_dp: {% if sriov_network_dp == "on" %}true{% else %}false{% endif %} # update to match host_vars CNI configuration -{%- endif -%} -{%- if network_userspace in ["on", "optional"] %} - userspace_ovs_dpdk: {% if network_userspace == "on" %}true{% else %}false{% endif %} # update to match host_vars CNI configuration - userspace_vpp: false # update to match host_vars CNI configuration -{%- endif %} -{%- endif %} -{% if firewall in ['on', 'optional'] %} -firewall_enabled: {% if firewall == "on" %}true{% else %}false{% endif %} -{%- endif %} +{% if cadvisor in ['on', 'optional'] %} +# cAdvisor +cadvisor_enabled: {% if cadvisor == 'on' %}true{% else %}false{% endif %} -## Proxy configuration ## -#http_proxy: "http://proxy.example.com:1080" -#https_proxy: "http://proxy.example.com:1080" -#additional_no_proxy: ".example.com,mirror_ip" # no need to include the following (will be added automatically): localhost, 127.0.0.1, controllerIPs, nodesIPs +cadvisor_sample_perf_events_enabled: false +cadvisor_pik_perf_events_enabled: false -# (Ubuntu only) disables DNS stub listener which may cause issues on Ubuntu -dns_disable_stub_listener: true +{% endif %} -# Remove the block between ansible markers set by kubespray in dhclient & hosts files to avoid DNS & LDAP issues (connection loss) after K8s setup after reboot -remove_kubespray_host_dns_settings: {% if remove_kubespray_host_dns_settings == "on" %}true{% else %}false{% endif %} +###################### +## Power Management ## +###################### +{% if power.manager in ['on', 'optional'] and arch in ['icx', 'clx', 'spr', 'emr'] %} +# Intel Kubernetes Power Manager +intel_power_manager: + enabled: {% if power.manager == 'on' %}true{% else %}false{% endif %} # Enable/Disable power manager -# Kubernetes cluster name, also will be used as DNS domain -cluster_name: cluster.local + power_nodes: [] # List of power_nodes that should be considered during Operator work and profiles deployment + # - node1 + # - node2 -## Kubespray variables ## + build_image_locally: false # Build Intel Power Manager image locally + deploy_example_pods: true # Deploy example Pods that will utilize special resources + global_shared_profile_enabled: true # Deploy custom Power Profile with user defined frequencies that can be applied to all power nodes + # to make use of Shared Profile fill Shared Workload settings in host vars + global_max_frequency: 1500 # Max frequency that will be applied for cores by Shared Workload + global_min_frequency: 1000 # Min frequency that will be applied for cores by Shared Workload + +{% if power.pstate in ['on', 'optional'] and arch in ['icx', 'clx', 'spr', 'emr'] %} + # P-State governor decides what frequency within the CPUfreq policy should be used + # "powersave" - Lowest frequency within the borders of min_frequency and max_frequency. + # "performance" - Highest frequency within the borders of min_frequency and max_frequency. + global_pstate_governor: "powersave" +{% endif %} -# Improve deployment stability by increased wait between retries of failed ops like pushing/downloading -retry_stagger: 20 +{% endif %} +############################## +## Security & Certification ## +############################## -{% if cert_manager in ['on', 'optional']%} -# Cert manager deployment -cert_manager_enabled: {% if cert_manager == "on"%}true{% else %}false{% endif%} -{%- endif %} -# Supported network plugins(calico, flannel, cilium) and kube-proxy configuration -kube_controller_manager_bind_address: 127.0.0.1 -kube_proxy_metrics_bind_address: 127.0.0.1 -kube_network_plugin: calico -# Supported calico backend: [vxlan, bird] -{%- if vm_mode in ['on'] %} -calico_network_backend: vxlan -# For VM mode calico_backend has to be vxlan, otherwise deployment will fail -{%- else %} -calico_network_backend: vxlan -{%- endif %} -# Advanced calico options -# https://github.com/kubernetes-sigs/kubespray/blob/master/docs/calico.md -# if set to 'true', variables defined by user will be used and default CEK configuration will be ignored -calico_advanced_options: false -wireguard_enabled: {% if wireguard == 'on' %}true{% else %}false{% endif %} -kube_network_plugin_multus: {% if multus == 'on' %}true{% else %}false{% endif %} -kube_pods_subnet: 10.244.0.0/16 -{%- if name in ['regional_dc', 'full_nfv', 'access', 'build_your_own'] -%} -{% set mask = 18 %} -{%- elif name == 'remote_fp' -%} -{% set mask = 19 %} -{%- elif name == 'on_prem' -%} -{% set mask = 21 %} -{%- elif name == 'basic' -%} -{% set mask = 22 %} -{%- endif %} -kube_service_addresses: 10.233.0.0/{{ mask }} -kube_proxy_mode: iptables +{% if openssl in ['on', 'optional'] %} +# This feature will enable OpenSSL*Engine +openssl_engine_enabled: {% if openssl == 'on' %}true{% else %}false{% endif %} # to activate OpenSSL*Engine set 'install_openssl' to 'true' in host_vars -# Set on true if you want to enable the eBPF dataplane support -calico_bpf_enabled: false +{% endif %} +{% if (kmra and (kmra.pccs in ['on', 'optional'] or + kmra.apphsm in ['on', 'optional'] or + kmra.ctk_demo in ['on', 'optional'] or + kmra.oran in ['on', 'optional'])) and + arch in ['icx', 'spr'] %} +# KMRA (Key Management Reference Application) +# Please, refer to the roles/kmra_install/defaults/main.yml for the full list of configuration options available. +kmra: +{% if kmra.sbx in ['on', 'optional'] %} + sbx: {% if kmra.sbx == 'on' %}true{% else %}false{% endif %} # Enable pre-PRQ SGX platform +{% endif %} +{% if kmra.oran in ['on', 'optional'] %} + oran: + enabled: {% if kmra.oran == 'on' %}true{% else %}false{% endif %} # Put KMRA into ORAN mode + local_build: {% if kmra.oran == 'on' %}true{% else %}false{% endif %} # Build oran container by default unless you have pre-built oran container + oran_netopeer2_server: + enabled: {% if kmra.oran == 'on' %}true{% else %}false{% endif %} # Enable netopeer2 server + oran_netopeer2_client: + enabled: {% if kmra.oran == 'on' %}true{% else %}false{% endif %} # Enable netopeer2 client +{% endif %} +{% if kmra.pccs in ['on', 'optional'] %} + pccs: + enabled: {% if kmra.pccs == 'on' %}true{% else %}false{% endif %} # Enable PCCS application + # The PCCS uses this API key to request collaterals from Intel's Provisioning Certificate Service. User needs to subscribe first to obtain an API key. + # For how to subscribe to Intel Provisioning Certificate Service and receive an API key, go to https://api.portal.trustedservices.intel.com/provisioning-certification, + # and get an API key by clicking 'Subscribe'. + api_key: "ffffffffffffffffffffffffffffffff" +{% endif %} +{% if kmra.apphsm in ['on', 'optional'] %} + apphsm: + enabled: {% if kmra.apphsm == 'on' %}true{% else %}false{% endif %} # Enable AppHSM application +{% endif %} +{% if kmra.ctk_demo in ['on', 'optional'] %} + ctk_loadkey_demo: + enabled: {% if kmra.ctk_demo == 'on' %}true{% else %}false{% endif %} # Enable CTK demo application +{% endif %} -# Comment this line out if you want to expose k8s services of type nodePort externally. -kube_proxy_nodeport_addresses_cidr: 127.0.0.0/8 +{% endif %} +{% if tcs in ['on', 'optional'] and + arch in ['icx'] %} +# Trusted Certificate Service deployment +# https://github.com/intel/trusted-certificate-issuer +tcs: + enabled: {% if tcs == 'on' %}true{% else %}false{% endif %} # Enable Trusted Certificate Issuer + build_image_locally: false # Build Trusted Certificate Issuer image locally + +{% endif %} +{% if tac in ['on', 'optional'] and + arch in ['icx'] %} +# Trusted Attestation Controller deployment +# https://github.com/intel/trusted-attestation-controller +tac: + enabled: {% if tac == 'on' %}true{% else %}false{% endif %} # Enable Trusted Attestation Controller + build_image_locally: false # Build Trusted Attestation Controller image locally + +{% endif %} +{% if sigstore_policy_controller in ['on', 'optional'] %} +# Install sigstore policy controller to enforce cosign container image security +sigstore_policy_controller_install: {% if sigstore_policy_controller == 'on' %}true{% else %}false{% endif %} + + +{% endif %} +{% if intel_media_analytics in ['on', 'optional'] %} +#################### +## Video AI / VSS ## +#################### + +intel_media_analytics_enabled: {% if intel_media_analytics == 'on' %}true{% else %}false{% endif %} + + +{% endif %} +{% if intel_oneapi and (intel_oneapi.values() | reject('eq', 'off') | list | length() != 0) %} +########################### +## Intel OneAPI Toolkits ## +########################### +intel_oneapi_enabled: {% if (intel_oneapi.values() | select('eq', 'on')) | list | length() > 0 %}true{% else %}false{% endif %} # enable Intel oneAPI toolkits deployment +intel_oneapi: +{%- if intel_oneapi.base in ['on', 'optional'] +%} + # Set to true to deploy Intel oneAPI Base Kit + basekit: {% if intel_oneapi.base == 'on' %}true{% else %}false{% endif %} +{% endif %} +{%- if intel_oneapi.ai in ['on', 'optional'] +%} + # Set to true to deploy Intel oneAPI AI Analytics Kit + ai_analytics: {% if intel_oneapi.ai == 'on' %}true{% else %}false{% endif %} +{% endif %} + + +{% endif %} + +########################## +## Container Registries ## +########################## + +{% if registry in ['on', 'optional'] %} +# Docker registry running on the cluster allows us to store images not available on Docker Hub +registry_enable: {% if registry == 'on' %}true{% else %}false{% endif %} + +registry_nodeport: "30500" # The range of valid ports is 30000-32767 +registry_local_address: "localhost:{{ '{{' }} registry_nodeport {{ '}}' }}" -# Local Docker Hub mirror, if it exists +{% endif %} +# Set image pull policy to Always. Pull images prior to starting containers. Valid credentials must be configured. +always_pull_enabled: false + +# Registry mirrors can be configured using the following options #docker_registry_mirrors: # - http://mirror_ip:mirror_port #docker_insecure_registries: @@ -579,75 +755,37 @@ kube_proxy_nodeport_addresses_cidr: 127.0.0.0/8 #crio_insecure_registries: # - http://crio_insecure_registry_ip -{%- if registry in ['on', 'optional'] %} +####################### +## Workloads & Demos ## +####################### -# Docker registry running on the cluster allows us to store images not available on Docker Hub -# The range of valid ports is 30000-32767 -registry_enable: {% if registry == 'on' %}true{% else %}false{% endif %} -registry_nodeport: "30500" -registry_local_address: "localhost:{{ '{{' }} registry_nodeport {{ '}}' }}" -{%- endif %} +{% if rt_kernel in ['on', 'optional'] %} +# Realtime kernel +rt_kernel_enabled: {% if rt_kernel == 'on' %}true{% else %}false{% endif %} # if true, install realtime kernel +ubuntu_pro_token: "ffffffffffffffffffffffffffffff" # need to attach Ubuntu Pro free token to download RT kernel(please apply it firstly) +{% endif %} -# Set image pull policy to Always. Pull images prior to starting containers. Valid credentials must be configured. -always_pull_enabled: false +{% if intel_flexran in ['on', 'optional'] %} +# Intel FlexRAN +intel_flexran_enabled: {% if intel_flexran == 'on' %}true{% else %}false{% endif %} # Enable deployment of FlexRAN +intel_flexran_type: "host" # Supported values are "host" and "pod" +intel_flexran_mode: "timer" # Supported values are "timer" and "xran" +# The below 4 values must be strings in extended Bus:Device.Function (BDF) notation +intel_flexran_bbu_front_haul: "0000:43:00.0" +intel_flexran_bbu_ptp_sync: "0000:43:00.1" +intel_flexran_oru_front_haul: "0000:4b:00.0" +intel_flexran_oru_ptp_sync: "0000:4b:00.1" -{%- if minio in ['on', 'optional'] %} +{% endif %} +{% if tadk in ['on', 'optional'] %} +# Traffic Analytics Development Kit (TADK) +tadk_install: {% if tadk == 'on' %}true{% else %}false{% endif %} # Install Web Application Firewall (WAF) using TADK -## MinIO variables ## -# Enable Minio Storage service. -minio_enabled: {% if minio == 'on' %}true{% else %}false{% endif %} -minio_tenant_enabled: true # Specifies whether to install MinIO Sample Tenant -minio_tenant_servers: 4 # The number of MinIO Tenant nodes -minio_tenant_volumes_per_server: 4 # The number of volumes per servers -minio_tenant_volume_size: 5 - # The size of each volume (unit: GiB) -minio_deploy_test_mode: true # true (Test Mode) - use a file as loop device when creating storage - # called "virtual block device" which is useful for test or automation purpose - # false (Performance Mode) - use an actual NVME or SSD device when creating storage -minio_build_image_locally: true # build custom MinIO image locally -minio_awsclient_pods_enabled: true # run AWS client pods for MinIO Tenant service -minio_ingress_enabled: false # enable MinIO tenant ingress -{%- endif %} -{%- if cndp in ['on', 'optional'] or cndp_dp in ['on', 'optional'] %} - -# Intel Cloud Native Data Plane. -{%- if cndp_dp in ['on', 'optional'] %} -cndp_dp_enabled: {% if cndp_dp == 'on' %}true{% else %}false{% endif %} -{%- if cndp_dp == 'on' %} -cndp_net_attach_def_enabled: true # Whether or not to create NetworkAttachmentDefinition resource. -cndp_net_attach_def_conf: - name: afxdp-network # (Optional) Name of NetworkAttachmentDefinition resource. - ipam: # (Optional) ipam configuration section NetworkAttachmentDefinition resource. - subnet: "192.168.1.0/24" # (Optional) Default is "192.168.1.0/24". - rangeStart: "192.168.1.200" # (Optional) Default is "192.168.1.200". - rangeEnd: "192.168.1.220" # (Optional) Default is "192.168.1.220". - gateway: "192.168.1.1" # (Optional) Default is "192.168.1.1". -{% else %} -cndp_net_attach_def_enabled: false -{%- endif %} -{%- endif %} -{%- endif %} -{%- if tadk in ['on', 'optional'] %} - -## Traffic Analytics Development Kit (TADK) ## -# Install Web Application Firewall (WAF) using TADK -tadk_install: {% if tadk == 'on' %}true{% else %}false{% endif %} -{%- endif %} - -{%- if intel_flexran in ['on', 'optional'] %} -# Intel FlexRAN -intel_flexran_enabled: {% if intel_flexran == 'on' %}true{% else %}false{% endif %} # if true, deploy FlexRAN -intel_flexran_type: "pod" # supported values are "host" and "pod" -intel_flexran_mode: "timer" # supported values are "timer" and "xran" -intel_flexran_bbu_front_haul: "0000:43:00.0" # must be string in [a-fA-F0-9]{4}:[a-fA-F0-9]{2}:[01][a-fA-F0-9].[0-7] format (for 'host' type only) -intel_flexran_bbu_ptp_sync: "0000:43:00.1" # must be string in [a-fA-F0-9]{4}:[a-fA-F0-9]{2}:[01][a-fA-F0-9].[0-7] format (for 'host' type only) -intel_flexran_oru_front_haul: "0000:4b:00.0" # must be string in [a-fA-F0-9]{4}:[a-fA-F0-9]{2}:[01][a-fA-F0-9].[0-7] format (for 'host' type only) -intel_flexran_oru_ptp_sync: "0000:4b:00.1" # must be string in [a-fA-F0-9]{4}:[a-fA-F0-9]{2}:[01][a-fA-F0-9].[0-7] format (for 'host' type only) -{% endif %} - -{%- if adq_dp in ['on', 'optional'] %} -# Note: ADQ is experimental feature and enabling it may lead to unexpected results. -# ADQ requires back-to-back connection between control plane and worker node on CVL interfaces. +{% endif %} +{% if adq_dp in ['on', 'optional'] %} +# Application Device Queues (ADQ) +# ADQ is experimental feature and enabling it may lead to unexpected results. +# ADQ requires back-to-back connection between control and worker nodes on CVL interfaces. # Name of CVL interfaces must be the same on both nodes, IP address must be present. # In inventory.ini set "ip=" to IP address of CVL interface. # Additional requirements and details can be found in docs/adq.md @@ -658,7 +796,17 @@ adq_dp: interface_name: "ens107" {% endif %} +{% if intel_eci and (intel_eci.values() | reject('eq', 'off')) | list | length() > 0 %} +# Please contact eci-support@intel.com on how to access this repo. +# Also refer to ESH (https://www.intel.com/content/www/us/en/edge-computing/edge-software-hub.html) +intel_eci_repo: +{% endif %} + {% if mirrors == 'true' %} +######################## +## Repository Mirrors ## +######################## + ## Kubespray image repositories mirrors gcr_image_repo: "gcr.m.daocloud.io" kube_image_repo: "k8s.m.daocloud.io" @@ -678,9 +826,37 @@ containerd_ubuntu_repo_repokey: 'YOURREPOKEY' ## Kubespray url mirrros mirror_urls: - original: "https://storage.googleapis.com" - mirror: "https://googleapis.daocloud.io" # please set mirror url + mirror: "https://googleapis.daocloud.io" # Please set mirror url - original: "https://github.com" - mirror: "https://github.daocloud.io" # please set mirror url + mirror: "https://github.daocloud.io" # Please set mirror url - original: "https://get.helm.sh" - mirror: "https://helm.daocloud.io" # please set mirror url + mirror: "https://helm.daocloud.io" # Please set mirror url + +{% endif %} + +{% if intel_ffmpeg in ['on', 'optional'] %} +## Install FFmpeg with custom patches +# ffmpeg_patches: # List of patch resources to apply +# - url: # URL of archive or git repository +# type: # Type of source ["tar.gz", "zip", "git"] +# sha256: # SHA needed for successful deployment, generate from file +# subdirectory: # Where to search for patches inside repository (for root directory use '/') +# git_tag: # Use when type is set to git +# patchset_enabled: true # [true/false] If true, this set of patches will be applied in the FFmpeg code, if false patches are skipped +# apply_all_patches: # If true script will apply all patches from defined subdirectory, +# if false script will apply patches from list bellow +# patches_to_apply: # Used only when "apply_all_patches" is 'false' +# - patch_1 +# - patch_2 + +ffmpeg_install_enabled: {% if intel_ffmpeg == 'on' %}true{% else %}false{% endif %} + +ffmpeg_patches: + - url: "https://github.com/intel/cartwheel-ffmpeg/archive/refs/tags/2023q1.tar.gz" + type: "tar.gz" + sha256: "5f4c34edbc32298cd3f8fb8ebf789a2d2dbceb5864ec20e5cd501b3cba130871" + subdirectory: "patches/" + patchset_enabled: true + apply_all_patches: true + {% endif %} diff --git a/generate/profiles_templates/common/host_vars.j2 b/generate/profiles_templates/common/host_vars.j2 index ecec483e..7501f98c 100644 --- a/generate/profiles_templates/common/host_vars.j2 +++ b/generate/profiles_templates/common/host_vars.j2 @@ -1,389 +1,599 @@ --- -# Kubernetes node configuration -# Do not change profile_name, configured_nic and configured_arch here !!! -# To generate vars for different profile/architecture use make command -# generated for profile and arch: +########################### +## Profile Configuration ## +########################### +## Do not modify values listed here +## Re-run the "make" command to change profile configuration + profile_name: {{ name }} configured_arch: {{ arch }} configured_nic: {{ nic }} -{% if sriov_operator in ['on', 'optional'] or sriov_network_dp in ['on', 'optional'] or qat in ['on', 'optional'] or dsa in ['on', 'optional'] -%} +######################## +## Host Configuration ## +######################## + +{% if sriov_operator in ['on', 'optional'] or sriov_network_dp in ['on', 'optional'] or qat in ['on', 'optional'] or dsa in ['on', 'optional'] or fpga in ['on', 'optional'] %} # Enable IOMMU (required for SR-IOV networking and QAT) -iommu_enabled: {% if (sriov_operator == 'on' or sriov_network_dp == 'on' or qat == 'on' or dsa == 'on' or dlb == 'on') and on_vms != 'on' %}true{% else %}false{% endif %} +iommu_enabled: {% if (sriov_operator == 'on' or sriov_network_dp == 'on' or qat == 'on' or dsa == 'on' or dlb == 'on' or fpga == 'on') %}true{% else %}false{% endif %} + + +{% endif %} +{% if hugepages in ['on', 'optional'] %} +# Enables hugepages support +hugepages_enabled: {% if hugepages == 'on' %}true{% else %}false{% endif %} + +# Hugepage sizes available: 2M, 1G +default_hugepage_size: {% if vpp == 'on' %}2M{% else %}1G{% endif %} + +# Sets how many hugepages should be created +{% if cloud_mode == 'on' %} +number_of_hugepages_1G: 2 +number_of_hugepages_2M: 512 +{% else %} +number_of_hugepages_1G: 4 +number_of_hugepages_2M: 1024 +{% endif %} +{% endif %} + +{% if isolcpu in ["on", "optional"] %} +# CPU isolation from Linux scheduler +isolcpus_enabled: {% if isolcpu == 'on' %}true{% else %}false{% endif %} + +{% if on_vms == 'on' %} +isolcpus: "4-15" +{% else %} +{% if vm_mode == 'on' %} +# isolcpus variable can't be enabled in case of VMRA deployment. +# Its content is generated automatically. +# isolcpus: "" +{% else %} +isolcpus: "4-11" +{% endif %} +{% endif %} +{% endif %} + +{% if cpusets in ["on", "optional"] %} +# CPU shielding +cpusets_enabled: {% if cpusets == 'on' %}true{% else %}false{% endif %} + +{% if on_vms == 'on' %} +cpusets: "4-15" +{% else %} +cpusets: "4-11" +{% endif %} +{% endif %} + +{% if dpdk in ['on', 'optional'] %} +# Install DPDK (required for SR-IOV networking) +install_dpdk: {% if dpdk == 'on' %}true{% else %}false{% endif %} + +# DPDK version (will be in action if install_dpdk: true) +dpdk_version: {% if intel_flexran == 'on' %}"22.11.1"{% elif arch == "emr" %}"22.11.1"{% else %}"23.03"{% endif %} # Note: dpdk_version is also dependent on ovs_dpdk when enabled (see preflight) +# Custom DPDK patches local path +{% if intel_flexran == 'on' %}dpdk_local_patches_dir: "/tmp/flexran"{% else %}#dpdk_local_patches_dir: "/tmp/patches/dpdk"{% endif %} + +# It might be necessary to adjust the patch strip parameter, update as required. +{% if intel_flexran == 'on' %}dpdk_local_patches_strip: 1{% else %}#dpdk_local_patches_strip: 0{% endif %} + {% endif %} + +{% if openssl in ['on', 'optional'] %} +# Install and configure OpenSSL cryptography +openssl_install: {% if openssl == 'on' %}true{% else %}false{% endif %} + +{% endif %} +{% if not cloud_mode == 'on' %} +# Useful if system loses IP after reboot. Note: make sure IP is stable / system gets same IP after reboot else will cause failures. +# It is needed only in some lab environments. Default setting is false. +# TODO: JP: This workaround should not be needed any longer. To be removed completely once it pass complete validation cycle +enable_dhclient_systemd_service: false + +{% endif %} +################################## +## Network Device Configuration ## +################################## + # dataplane interface configuration list dataplane_interfaces: [] -{%- if on_vms == 'on' %} -# - bus_info: "06:00.0" # pci bus info -# pf_driver: iavf # driver inside VM +#dataplane_interfaces: +{% if on_vms == 'on' %} +# - bus_info: "06:00.0" # PCI bus info +# pf_driver: iavf # Driver inside VM # sriov_numvfs: 0 # default_vf_driver: "igb_uio" -# - bus_info: "07:00.0" # pci bus info -# pf_driver: iavf # driver inside VM +# - bus_info: "07:00.0" +# pf_driver: iavf # sriov_numvfs: 0 # default_vf_driver: "iavf" -# - bus_info: "08:00.0" # pci bus info -# pf_driver: iavf # driver inside VM +# - bus_info: "08:00.0" +# pf_driver: iavf # sriov_numvfs: 0 # default_vf_driver: "iavf" -# - bus_info: "09:00.0" # pci bus info -# pf_driver: iavf # driver inside VM +# - bus_info: "09:00.0" +# pf_driver: iavf # sriov_numvfs: 0 # default_vf_driver: "igb_uio" -{%- else %} -# - bus_info: "18:00.0" # pci bus info +{% else %} +# - bus_info: "18:00.0" # PCI bus info # pf_driver: {% if nic == 'cvl' %}ice{% else %}i40e{% endif %} # PF driver, "i40e", "ice" -{%- if ddp in ['on', 'optional'] %} -# ddp_profile: {% if nic == 'cvl' %}"ice_comms-1.3.37.0.pkg"{% else %}gtp.pkgo{% endif %} # DDP package name to be loaded into the NIC +{% if ddp in ['on', 'optional'] %} +# ddp_profile: {% if nic == 'cvl' %}"ice_comms-1.3.40.0.pkg"{% else %}gtp.pkgo{% endif %} # DDP package name to be loaded into the NIC # For i40e(XV710-*) allowable ddp values are: "ecpri.pkg", "esp-ah.pkg", "ppp-oe-ol2tpv2.pkgo", "mplsogreudp.pkg" and "gtp.pkgo", replace as required - # For ice(E810-*) allowable ddp values are: ice_comms-1.3.[17,20,22,24,28,30,31,35].0.pkg such as "ice_comms-1.3.37.0.pkg", replace as required + # For ice(E810-*) allowable ddp values are: ice_comms-1.3.[17,20,22,24,28,30,31,35,37,40].0.pkg such as "ice_comms-1.3.40.0.pkg", replace as required # ddp_profile must be defined for first port of each network device. bifurcated cards will appear as unique devices. {% endif %} -{%- if intel_ethernet_operator.enabled in ['on', 'optional'] %} -# flow_configuration: {% if intel_ethernet_operator.flow_config == 'on' and nic == "cvl" %}true{% else %}false{% endif %} # Flow Configuration # NOTE: this option is for Intel E810 Series NICs and requires Intel Ethernet Operator and Flow Config to be enabled in group vars. - # with Flow Configuration enabled the first VF (VF0) will be reserved for Flow Configuration and the rest of VFs will be indexed starting from 1. +{% if intel_ethernet_operator.enabled in ['on', 'optional'] %} +# flow_configuration: {% if intel_ethernet_operator.flow_config == 'on' and nic == "cvl" %}true{% else %}false{% endif %} # This option is for Intel E810 Series NICs and requires Intel Ethernet Operator and Flow Config to be enabled in group vars. + # With Flow Configuration enabled the first VF (VF0) will be reserved for Flow Configuration and the rest of VFs will be indexed starting from 1. {% endif %} -{%- if intel_flexran in ['on', 'optional'] %} +{% if intel_flexran in ['on', 'optional'] %} # default_vf_driver: "vfio-pci" # FlexRAN in POD requires SRIOV VFs with vfio-pci -# sriov_numvfs: 4 # total number of VFs for FlexRAN in POD must be 4 -{%- else %} -# default_vf_driver: "iavf" # default driver to be used with VFs if specific driver is not defined in the "sriov_vfs" section -# sriov_numvfs: 8 # total number of VFs to create including VFs listed in the "sriov_vfs" section. +# sriov_numvfs: 4 # Total number of VFs for FlexRAN in POD must be 4 +{% else %} +# default_vf_driver: "iavf" # Default driver to be used with VFs if specific driver is not defined in the "sriov_vfs" section +# sriov_numvfs: 8 # Total number of VFs to create including VFs listed in the "sriov_vfs" section. # If total number of VFs listed in the "sriov_vfs" section is greater than "sriov_numvfs" then excessive entities will be ignored. # VF's name should follow scheme: _ # If index in the VF's name is greater than "sriov_numfs - 1" such VF will be ignored. {% endif %} -{%- if minio in ['on', 'optional'] %} +{% if minio in ['on', 'optional'] %} # minio_vf: true {% endif %} -{%- if intel_flexran not in ['on', 'optional'] %} -# sriov_vfs: # list of VFs to create on this PF with specific driver +{% if intel_flexran not in ['on', 'optional'] %} +# sriov_vfs: # List of VFs to create on this PF with specific driver # vf_00: "vfio-pci" # VF driver to be attached to this VF under this PF. Options: "iavf", "vfio-pci", "igb_uio" # vf_05: "vfio-pci" {% endif %} # - bus_info: "18:00.1" # pf_driver: {% if nic == 'cvl' %}ice{% else %}i40e{% endif %} -{%- if ddp in ['on', 'optional'] %} -# ddp_profile: {% if nic == 'cvl' %}"ice_comms-1.3.37.0.pkg"{% else %}gtp.pkgo{% endif %} -{%- endif %} -{%- if intel_ethernet_operator.enabled in ['on', 'optional'] %} + +{% if ddp in ['on', 'optional'] %} +# ddp_profile: {% if nic == 'cvl' %}"ice_comms-1.3.40.0.pkg"{% else %}gtp.pkgo{% endif %} + +{% endif %} +{% if intel_ethernet_operator.enabled in ['on', 'optional'] %} # flow_configuration: {% if intel_ethernet_operator.flow_config == 'on' and nic == "cvl" %}true{% else %}false{% endif %} + {% endif %} # default_vf_driver: "vfio-pci" # sriov_numvfs: 4 -{%- if minio in ['on', 'optional'] %} +{% if minio in ['on', 'optional'] %} # minio_vf: true {% endif %} -{%- if intel_flexran not in ['on', 'optional'] %} -# sriov_vfs: {} # no VFs with specific driver on this PF or "sriov_vfs" can be omitted for convenience +{% if intel_flexran not in ['on', 'optional'] %} +# sriov_vfs: {} # No VFs with specific driver on this PF or "sriov_vfs" can be omitted for convenience {% endif %} {% endif %} -{%- if nic_drivers in ['on', 'optional'] %} -# Set to 'true' to update i40e, ice and iavf kernel modules + +{% if nic_drivers in ['on', 'optional'] %} +# Set 'true' to update / downgrade i40e, ice and iavf kernel modules update_nic_drivers: {% if nic_drivers == 'on' %}true{% else %}false{% endif %} -#i40e_driver_version: "2.22.8" # Downgrading i40e drivers is not recommended due to the possible consequences. Users should update and proceed at their own risk. -#i40e_driver_checksum: "sha1:9ae9a51b8d16f5d6ea9a817de5d3f37eb96101a1" # update checksum per required i40e drivers version -#ice_driver_version: "1.10.1.2.2" # Downgrading ice drivers is not recommended due to the possible consequences. Users should update and proceed at their own risk. -#ice_driver_checksum: "sha1:a71d0497307b462059b5819cf8686b2f9361a930" # update checksum per required ice drivers version -#iavf_driver_version: "4.6.1" # Downgrading iavf drivers is not recommended due to the possible consequences. Users should update and proceed at their own risk. -#iavf_driver_checksum: "sha1:7102e6fcb6271f6cb14bcd9e64eccc58fcafd788" # update checksum per required iavf drivers version -{% endif %} -# Set 'true' to upgrade / downgrade NIC firmware. FW upgrade / downgrade will be executed on all NICs listed in "dataplane_interfaces[*].bus_info". -update_nic_firmware: false # Note: downgrading FW is not recommended, users should proceed at their own risk. -{%- if nic == 'fvl' %} -#nvmupdate: [] # remove '[]' in case of downgrading FW such as 'nvmupdate:' -# i40e: [] # remove '[]' in case of downgrading FW to get required version of NVM 'i40e' 700 Series such as 'i40e:' -# nvmupdate_pkg_url: "https://downloadmirror.intel.com/769287/700Series_NVMUpdatePackage_v9_20_Linux.tar.gz" -# nvmupdate_pkg_checksum: "sha1:87F0BDA58BAAEE0ADF1FADBBCC485AF0A2F0777F" -# required_fw_version: "9.20" -# # min fw version for ddp was taken from: -# # https://www.intel.com/content/www/us/en/developer/articles/technical/dynamic-device-personalization-for-intel-ethernet-700-series.html -# min_ddp_loadable_fw_version: "6.01" -# min_updatable_fw_version: "5.02" -# # when downgrading only, the recommended below version is required to download the supported NVMupdate64E tool. Users should replace the tool at their own risk. -# supported_nvmupdate_tool_pkg_url: "https://downloadmirror.intel.com/738715/E810_NVMUpdatePackage_v4_00_Linux.tar.gz" -# supported_nvmupdate_tool_pkg_checksum: "sha1:7C168880082653B579FDF225A2E6E9301C154DD1" -# supported_nvmupdate_tool_fw_version: "4.0" -{%- endif %} -{%- if nic == 'cvl' %} -#nvmupdate: [] # remove '[]' in case of downgrading FW such as 'nvmupdate:' -# ice: [] # remove '[]' in case of downgrading FW to get required version of NVM 'ICE' 800 Series such as 'ice:' -# nvmupdate_pkg_url: "https://downloadmirror.intel.com/769278/E810_NVMUpdatePackage_v4_20_Linux.tar.gz" -# nvmupdate_pkg_checksum: "sha1:36CE159E53E6060F2AC4E3419DB8A21E3D982A85" -# required_fw_version: "4.20" -# # https://builders.intel.com/docs/networkbuilders/intel-ethernet-controller-800-series-device-personalization-ddp-for-telecommunications-workloads-technology-guide.pdf -# # document above does not specify any min fw version needed for ddp feature. So, min_ddp_loadable_fw is the same as min_updatable_fw -# min_ddp_loadable_fw_version: "0.70" -# min_updatable_fw_version: "0.70" - # when downgrading only, the recommended below version is required to download the supported NVMupdate64E tool. Users should replace the tool at their own risk. -# supported_nvmupdate_tool_pkg_url: "https://downloadmirror.intel.com/738715/E810_NVMUpdatePackage_v4_00_Linux.tar.gz" -# supported_nvmupdate_tool_pkg_checksum: "sha1:7C168880082653B579FDF225A2E6E9301C154DD1" -# supported_nvmupdate_tool_fw_version: "4.0" -{%- endif %} -{%- if cloud_mode == 'on' %} +# The below options can be used to downgrade drivers. This is not recommended and users should proceed at their own risk. +#i40e_driver_version: "2.22.18" +#i40e_driver_checksum: "sha1:0c94bd91014a0d81bd6b99fb41d0e4f1c12b09ff" +#ice_driver_version: "1.11.14" +#ice_driver_checksum: "sha1:730cd04fcfd0ba1b33ba21aaf671d0e1654c999a" +#iavf_driver_version: "4.8.2" +#iavf_driver_checksum: "sha1:fcc997aebeee3744e621e0fd3290205bd18f6a45" + +{% endif %} +# Set 'true' to upgrade / downgrade NIC firmware. This will be executed on all NICs listed in "dataplane_interfaces[*].bus_info". +# Downgrading firmware is not recommended and users should proceed at their own risk. +update_nic_firmware: false +{% if nic == 'fvl' %} +#nvmupdate: +# i40e: +# nvmupdate_pkg_url: "https://downloadmirror.intel.com/769287/700Series_NVMUpdatePackage_v9_20_Linux.tar.gz" +# nvmupdate_pkg_checksum: "sha1:87F0BDA58BAAEE0ADF1FADBBCC485AF0A2F0777F" +# required_fw_version: "9.20" +# # min fw version for ddp was taken from: +# # https://www.intel.com/content/www/us/en/developer/articles/technical/dynamic-device-personalization-for-intel-ethernet-700-series.html +# min_ddp_loadable_fw_version: "6.01" +# min_updatable_fw_version: "5.02" +# # when downgrading only, the recommended below version is required to download the supported NVMupdate64E tool. Users should replace the tool at their own risk. +# supported_nvmupdate_tool_pkg_url: "https://downloadmirror.intel.com/738715/E810_NVMUpdatePackage_v4_00_Linux.tar.gz" +# supported_nvmupdate_tool_pkg_checksum: "sha1:7C168880082653B579FDF225A2E6E9301C154DD1" +# supported_nvmupdate_tool_fw_version: "4.0" + +{% endif %} +{% if nic == 'cvl' %} +#nvmupdate: +# ice: +# nvmupdate_pkg_url: "https://downloadmirror.intel.com/769278/E810_NVMUpdatePackage_v4_20_Linux.tar.gz" +# nvmupdate_pkg_checksum: "sha1:36CE159E53E6060F2AC4E3419DB8A21E3D982A85" +# required_fw_version: "4.20" +# # https://builders.intel.com/docs/networkbuilders/intel-ethernet-controller-800-series-device-personalization-ddp-for-telecommunications-workloads-technology-guide.pdf +# # document above does not specify any min fw version needed for ddp feature. So, min_ddp_loadable_fw is the same as min_updatable_fw +# min_ddp_loadable_fw_version: "0.70" +# min_updatable_fw_version: "0.70" +# # when downgrading only, the recommended below version is required to download the supported NVMupdate64E tool. Users should replace the tool at their own risk. +# supported_nvmupdate_tool_pkg_url: "https://downloadmirror.intel.com/738715/E810_NVMUpdatePackage_v4_00_Linux.tar.gz" +# supported_nvmupdate_tool_pkg_checksum: "sha1:7C168880082653B579FDF225A2E6E9301C154DD1" +# supported_nvmupdate_tool_fw_version: "4.0" + +{% endif %} +{% if cloud_mode == 'on' %} # install Intel x700 & x800 series NICs DDP packages # For Cloud RA, the install_ddp_packages option must be false install_ddp_packages: false -{%- endif %} +{% endif %} {% if ddp in ['on', 'optional'] %} # install Intel x700 & x800 series NICs DDP packages install_ddp_packages: {% if ddp == 'on' and nic == 'fvl'%}true{% else %}false{% endif %} + # If following error appears: "Flashing failed: Operation not permitted" # run deployment with update_nic_firmware: true -# or -# Disable ddp installation via install_ddp_packages: false +# or Disable DDP installation via install_ddp_packages: false + +enable_ice_systemd_service: {% if ddp == "on" %}true{% else %}false{% endif %} # Enable custom ddp package to be loaded after reboot -# set 'true' to enable custom ddp package to be loaded after reboot -enable_ice_systemd_service: {% if ddp == "on" %}true{% else %}false{% endif %} {% endif %} +####################### +## CNI Configuration ## +####################### -{%- if sriov_network_dp in ['on', 'optional'] %} +{% if sriov_network_dp in ['on', 'optional'] %} sriov_cni_enabled: {% if sriov_network_dp == 'on' %}true{% else %}false{% endif %} -{% endif %} -{%- if sriov_operator in ['on', 'optional'] %} -# Custom SriovNetworkNodePolicy manifests local path -# custom_sriov_network_policies_dir: /tmp/sriov -{%- endif %} -{%- if bond_cni in ['on', 'optional'] %} +{% endif %} +{% if bond_cni in ['on', 'optional'] %} # Bond CNI bond_cni_enabled: {% if bond_cni == 'on' %}true{% else %}false{% endif %} -{% endif %} -{%- if dpdk in ['on', 'optional'] %} -# Install DPDK (required for SR-IOV networking) -install_dpdk: {% if dpdk == 'on' %}true{% else %}false{% endif %} -# DPDK version (will be in action if install_dpdk: true) -dpdk_version: {% if intel_flexran == 'on' %}"21.11"{% elif ovs_dpdk == 'on' %}"22.07"{% else %}"22.11.1"{% endif %} # ovs_version: "v3.0.3" does NOT support dpdk_version: "22.11.1" -# Custom DPDK patches local path -{% if intel_flexran == 'on' %}dpdk_local_patches_dir: "/tmp/flexran"{% else %}#dpdk_local_patches_dir: "/tmp/patches/dpdk"{% endif %} -# It might be necessary to adjust the patch strip parameter, update as required. -{% if intel_flexran == 'on' %}dpdk_local_patches_strip: 1{% else %}#dpdk_local_patches_strip: 0{% endif %} -{%- endif %} + +{% endif %} {% if network_userspace in ['on', 'optional'] %} -# Userspace networking +# Userspace CNI userspace_cni_enabled: {% if network_userspace == 'on' %}true{% else %}false{% endif %} -ovs_dpdk_enabled: {% if ovs_dpdk == 'on' %}true{% else %}false{% endif %} # Should be enabled with Userspace CNI, when VPP is set to "false"; 1G hugepages required -ovs_version: "v3.0.3" # this version has to be compatible/functional with the DPDK version set by 'dpdk_version' + + +ovs_dpdk_enabled: {% if ovs_dpdk == 'on' %}true{% else %}false{% endif %} # Should be enabled with Userspace CNI, when VPP is set to "false"; 1G hugepages required +ovs_version: "v3.1.1" # OVS version has to be compatible/functional with the DPDK version set by 'dpdk_version' # CPU mask for OVS-DPDK PMD threads ovs_dpdk_lcore_mask: 0x1 -# Huge memory pages allocated by OVS-DPDK per NUMA node in megabytes -# example 1: "256,512" will allocate 256MB from node 0 and 512MB from node 1 -# example 2: "1024" will allocate 1GB from node 0 on a single socket board, e.g. in a VM -ovs_dpdk_socket_mem: "256,0" -vpp_enabled: {% if vpp == 'on'%}true{% else %}false{% endif %} # Should be enabled with Userspace CNI, when ovs_dpdk is set to "false"; 2M hugepages required +# Hugepages allocated by OVS-DPDK per NUMA node in megabytes +ovs_dpdk_socket_mem: "256,0" # Example 1: "256,512" allocates 256MB from node 0 and 512MB from node 1 + # Example 2: "1024" allocates 1GB from node 0 on a single socket board, e.g. in a VM + +vpp_enabled: {% if vpp == 'on'%}true{% else %}false{% endif %} # Should be enabled with Userspace CNI, when ovs_dpdk is set to "false"; 2M hugepages required + {% endif %} +################## +## CPU Features ## +################## + +{% if native_cpu_manager in ["on", "optional"] %} +# Native CPU Manager (Kubernetes built-in) +# These settings are relevant only if in group_vars native_cpu_manager_enabled: true +native_cpu_manager_system_reserved_cpus: 2000m # Amount of CPU cores reserved for the housekeeping (2000m = 2000 millicores = 2 cores) +native_cpu_manager_kube_reserved_cpus: 1000m # Amount of CPU cores reserved for Kubelet +#native_cpu_manager_reserved_cpus: "0,1,2" # Explicit list of the CPUs reserved for the host level system threads and Kubernetes related threads + # Note: All remaining unreserved CPU cores will be consumed by the workloads. + +{% endif %} +###################### +## Storage Features ## +###################### + +{% if minio in ['on', 'optional'] %} +# MinIO storage configuration +minio_pv: [] +#minio_pv: +# - name: "mnt-data-1" # PV identifier will be used for PVs names followed by node name(e.g., mnt-data-1-hostname) +# storageClassName: "local-storage" # Storage class name to match with PVC +# accessMode: "ReadWriteOnce" # Access mode when mounting a volume, e.g., ReadWriteOnce/ReadOnlyMany/ReadWriteMany/ReadWriteOncePod +# persistentVolumeReclaimPolicy: "Retain" # Reclaim policy when a volume is released once it's bound, e.g., Retain/Recycle/Delete +# mountPath: /mnt/data0 # Mount path of a volume +# device: /dev/nvme0n1 # Target storage device name when creating a volume. + # When group_vars: minio_deploy_test_mode == true, use a file as a loop device for storage + # otherwise, an actual NVME or SSD device for storage on the device name. + +# - name: "mnt-data-2" +# storageClassName: "local-storage" +# accessMode: "ReadWriteOnce" +# persistentVolumeReclaimPolicy: "Retain" +# mountPath: /mnt/data1 +# device: /dev/nvme1n1 + +# - name: "mnt-data-3" +# storageClassName: "local-storage" +# accessMode: "ReadWriteOnce" +# persistentVolumeReclaimPolicy: "Retain" +# mountPath: /mnt/data2 +# device: /dev/nvme2n1 + +# - name: "mnt-data-4" +# storageClassName: "local-storage" +# accessMode: "ReadWriteOnce" +# persistentVolumeReclaimPolicy: "Retain" +# mountPath: /mnt/data3 +# device: /dev/nvme3n1 + +# - name: "mnt-data-5" +# storageClassName: "local-storage" +# accessMode: "ReadWriteOnce" +# persistentVolumeReclaimPolicy: "Retain" +# mountPath: /mnt/data4 +# device: /dev/nvme4n1 + +# - name: "mnt-data-6" +# storageClassName: "local-storage" +# accessMode: "ReadWriteOnce" +# persistentVolumeReclaimPolicy: "Retain" +# mountPath: /mnt/data5 +# device: /dev/nvme5n1 -{%- if hugepages in ['on', 'optional'] %} -# Enables hugepages support -hugepages_enabled: {% if hugepages == 'on' %}true{% else %}false{% endif %} -# Hugepage sizes available: 2M, 1G -default_hugepage_size: {% if vpp == 'on' %}2M{% else %}1G{% endif %} -# Sets how many hugepages should be created -{%- if cloud_mode == 'on' %} -number_of_hugepages_1G: 2 -number_of_hugepages_2M: 512 -{%- else %} -number_of_hugepages_1G: 4 -number_of_hugepages_2M: 1024 -{%- endif %} {% endif %} -{%- if dlb in ['on', 'optional'] and arch in ['spr'] %} +########################## +## Device Configuration ## +########################## + +# VMRA doest not support fpga yet. +{% if fpga in ['on', 'optional'] and vm_mode != 'on' %} +# Intel FPGA card +configure_fpga: {% if fpga == 'on' %}true{% else %}false{% endif %} + +# When configure_fpga is set to true, uncomment below two lines and fit w/ correct values +# fpga_driver_staging_folder: /tmp/intel_fpga/ +# fpga_install_script: fpga-ofs-2022-10-06-rc3-deb.sh + +{% endif %} + +{% if sgx in ['on', 'optional'] and arch in ['icx', 'spr'] %} +# Intel Software Guard Extensions (SGX) +configure_sgx: {% if sgx == 'on' %}true{% else %}false{% endif %} + + +{% endif %} +{% if gpu in ['on', 'optional'] %} +# Intel custom GPU kernel - this is required to be true in order to deploy Intel GPU Device Plugin on that node +configure_gpu: {% if gpu == 'on' %}true{% else %}false{% endif %} + +{% endif %} +{% if dlb in ['on', 'optional'] and arch in ['spr', 'emr'] %} # Configure SIOV and Intel DLB devices - required for Intel DLB Device Plugin support configure_dlb_devices: {% if dlb == "on" %}true{% else %}false{% endif %} -{% endif %} -{%- if dsa in ['on', 'optional'] and arch in ['spr'] %} + +{% endif %} +{% if dsa in ['on', 'optional'] and arch in ['spr', 'emr'] %} # Configure SIOV and Intel DSA devices - required for Intel DSA Device Plugin support configure_dsa_devices: {% if dsa == "on" %}true{% else %}false{% endif %} + # Example DSA devices configuration list. If left empty and configure_dsa_devices is set to true then default configuration will be applied. # It is possible to configure more DSA devices by extending dsa_devices list based on example config. dsa_devices: [] - # - name: dsa0 # name of DSA device from /sys/bus/dsa/devices/ - # groups: 1 # number of groups to configure. The maximum number of groups per device can be found on /sys/bus/dsa/devices/dsaX/max_groups - # engines: 1 # number of engines to configure - one engine per group will be configured. - # # The maximum number of engines can be found on /sys/bus/dsa/devices/dsa0/max_engines - # wqs: # work queues will be named as wq., for example wq0.0 - WQ with id 0 owned by dsa0 device - # - id: 0 # work queue id - # mode: "dedicated" # [shared, dedicated] - # type: "user" # [kernel, user] - # size: 8 # sum of all configured WQs size must be less than /sys/bus/dsa/devices/dsa0/max_workqueue_size - # prio: 4 # must be set between 1 and 15 - # group_id: 0 # work queue will be assigned to specific group - # max_batch_size: 1024 # specify the max batch size used by a work queue - powers of 2 are accetable - # max_transfer_size: 2147483648 # specify the max transfer size used by a work queue - powers of 2 are accetable - # block_on_fault: 0 # [0, 1] If block on fault is disabled, - # # if a page fault occurs on a source or destination memory access, the operation stops and the page fault is reported to the software - # - id: 1 - # mode: "shared" - # type: "user" - # size: 8 - # prio: 5 - # threshold: 7 # only for Shared WQ, must be at least one less than size of WQ - # group_id: 0 - # max_batch_size: 1024 - # max_transfer_size: 2147483648 - # block_on_fault: 0 -{% endif %} - -{%- if intel_ethernet_operator.enabled in ['on', 'optional'] %} -# Intel Ethernet Operator for Intel E810 Series network interface cards -intel_ethernet_operator: -{%- if intel_ethernet_operator.ddp in ['on', 'optional'] %} - ddp_update: {% if intel_ethernet_operator.ddp == 'on' and nic == 'cvl' %}true{% else %}false{% endif %} # perform DDP update on PFs listed in dataplane_interfaces using selected DDP profile -{%- endif %} - fw_update: {% if intel_ethernet_operator.fw_update == 'on' and nic == 'cvl' %}true{% else %}false{% endif %} # perform firmware update on PFs listed in dataplane_interfaces - # ClusterFlowConfig does not require additional configuration and can be used in conjunction with NodeFlowConfig - node_flow_config_enabled: false # enable NodeFlowConfig - # NodeFlowConfig/ClusterFlowConfig manifests local path - # For more information refer to: - # https://github.com/intel/intel-ethernet-operator/blob/main/docs/flowconfig-daemon/creating-rules.md - # flow_config_dir: /tmp/flow_config +#dsa_devices: +# - name: dsa0 # Name of DSA device from /sys/bus/dsa/devices/ +# groups: 1 # Number of groups to configure. The maximum number of groups per device can be found on /sys/bus/dsa/devices/dsaX/max_groups +# engines: 1 # Number of engines to configure - one engine per group will be configured. +# # The maximum number of engines can be found on /sys/bus/dsa/devices/dsa0/max_engines +# wqs: # Work queues will be named as wq., for example wq0.0 - WQ with id 0 owned by dsa0 device +# - id: 0 # Work queue id +# mode: "dedicated" # Supported values: ["shared", "dedicated"] +# type: "user" # Supported values: ["kernel", "user"] +# size: 8 # Sum of all configured WQs size must be less than /sys/bus/dsa/devices/dsa0/max_workqueue_size +# prio: 4 # Must be set between 1 and 15 +# threshold: 7 # Only for WQ in mode "shared" - must be at least one less than size of WQ +# group_id: 0 # Work queue will be assigned to specific group +# max_batch_size: 1024 # Specify the max batch size used by a work queue - powers of 2 are accetable +# max_transfer_size: 2147483648 # Specify the max transfer size used by a work queue - powers of 2 are accetable +# block_on_fault: 0 # Supported values: [0, 1] - enables (1) or disables (0) block on fault. +# # If a page fault occurs on a source or destination memory access, the operation stops and the page fault is reported to the software + +{% endif %} +{% if intel_sriov_fec_operator in ['on', 'optional'] %} +# Wireless FEC H/W Accelerator Device (e.g. ACC100) PCI ID +fec_acc: "0000:6f:00.0" # must be string in extended Bus:Device.Function (BDF) notation + {% endif %} +{% if qat in ['on', 'optional'] %} +# Enabling this feature will install QAT drivers + services (OOT Drivers), otherwise Intree will be used. +update_qat_drivers: {% if qat == "on" %}true{% else %}false{% endif %} -{%- if intel_sriov_fec_operator in ['on', 'optional'] %} -# Wireless FEC H/W Accelerator Device (e.g. ACC100) PCI ID -fec_acc: "0000:6f:00.0" # must be string in [a-fA-F0-9]{4}:[a-fA-F0-9]{2}:[01][a-fA-F0-9].[0-7] format +{% if arch == "emr" and qat == "on" %} +# EMR QAT driver version +emr_qat_driver_package: QAT20.L.1.1.11-00016.tar.gz +# SHA1 sum value for the driver package +emr_qat_driver_pkg_checksum: 73ba41e63bc83f9437a34131ff5e8fb09b4746ae +# Path to store the EMR QAT package on the ansible host. +emr_qat_driver_staging_folder: /tmp/emr_qat/ {% endif %} -{%- if qat in ['on', 'optional'] %} -# Enabling this feature will install QAT drivers + services -update_qat_drivers: {% if qat == "on" %}true{% else %}false{% endif %} -# Optional: uncomment and update qat_drivers_dir location as required. It will be used to store QAT drivers, QATLibs etc... -# Default location is derived from "project_root_dir" variable, defined in group_vars. Default location is "/qat_drivers" -#qat_drivers_dir: "/opt/intel/QAT/build" +# Enabling the option will configure the QAT device. Must be enabled when qat is on. +configure_qat: {% if qat == "on" %}true{% else %}false{% endif %} + # There are two services on the system which can be used to start qat devices. They can't run in parallel. One of them needs to be enabled and the second one disabled. enabled_qat_service: "qat" disabled_qat_service: "qat_service" -{%- if arch in ['spr'] %} -# This package provides user space libraries that allow access to Intel(R) QuickAssist devices and expose the Intel(R) QuickAssist APIs and sample codes. -enable_intel_qatlibs: {% if qat == "on" %}true{% else %}false{% endif %} # Make sure "openssl_install" is set to "true" else this feature will be skipped in deployment. -enable_qat_svm: {% if qat == "on" %}true{% else %}false{% endif %} # Enable QAT Shared Virtual Memory (SVM). +{% if arch in ['spr', 'emr'] %} +enable_qat_svm: false # Enable QAT Shared Virtual Memory (SVM). Only for OOT driver. + {% endif %} -# qat parameters used by auto detection of qat devices -qat_sriov_numvfs_required: 8 -qat_vf_driver_required: {% if arch == "spr" %}"4xxxvf"{% else %}"c6xxvf"{% endif %} +# QAT parameters used by auto detection of qat devices +qat_sriov_numvfs_required: {% if on_vms == 'on' %}0{% else %}8{% endif %} + +qat_vf_driver_required: {% if arch in ['spr', 'emr'] %}"4xxxvf"{% else %}"c6xxvf"{% endif %} -# qat interface configuration list + +# QAT interface configuration list qat_devices: [] -{%- if on_vms == 'on' %} +#qat_devices: +{% if on_vms == 'on' %} # - qat_id: "0000:0a:00.0" -# qat_sriov_numvfs: 0 # Have to be set to 0 here to not create any VFs inside VM. +# qat_sriov_numvfs: 0 # Has to be set to 0 here to not create any VFs inside VM. # - qat_id: "0000:0b:00.0" -# qat_sriov_numvfs: 0 # Have to be set to 0 here to not create any VFs inside VM. -{%- else %} -# - qat_id: "0000:ab:00.0" # QAT device id one using DPDK compatible driver for VF devices to be used by vfio-pci kernel driver, replace as required -# qat_sriov_numvfs: 12 # Number of VFs per PF to create - cannot exceed the maximum number of VFs available for the device. Set to 0 to not create any VFs. -# # Note: Currently when trying to create fewer virtual functions than the maximum, the maximum number always gets created. -# qat_default_vf_driver: {% if arch == "spr" %}"4xxxvf"{% else %}"c6xxvf"{% endif %} -# qat_vfs: # VFs drivers settings will be overridden by QAT device plugin. -# vf_00: "vfio-pci" -# vf_05: "vfio-pci" +# qat_sriov_numvfs: 0 # Has to be set to 0 here to not create any VFs inside VM. +{% else %} +# - qat_id: "0000:ab:00.0" # QAT device id one using DPDK compatible driver for VF devices to be used by vfio-pci kernel driver, replace as required +# qat_sriov_numvfs: 12 # Number of VFs per PF to create - cannot exceed the maximum number of VFs available for the device. Set to 0 to not create any VFs. +# # Note: Currently when trying to create fewer virtual functions than the maximum, the maximum number always gets created. +# qat_default_vf_driver: {% if arch in ['spr', 'emr'] %}"4xxxvf"{% else %}"c6xxvf"{% endif %} + +# qat_vfs: # Used to configure a non-default VF driver for individual VFs +# vf_00: "vfio-pci" # Configures the 1st VF with "vfio-pci" driver +# vf_05: "vfio-pci" # Configured the 6th VF with "vfio-pci" driver # - qat_id: "0000:xy:00.0" # qat_sriov_numvfs: 10 -# qat_default_vf_driver: {% if arch == "spr" %}"4xxxvf"{% else %}"c6xxvf"{% endif %} -# qat_vfs: {} +# qat_default_vf_driver: {% if arch in ['spr', 'emr'] %}"4xxxvf"{% else %}"c6xxvf"{% endif %} -# - qat_id: "0000:yz:00.0" -# qat_sriov_numvfs: 10 -# qat_default_vf_driver: {% if arch == "spr" %}"4xxxvf"{% else %}"c6xxvf"{% endif %} -# qat_vfs: {} -{%- endif %} +# qat_vfs: {} # To use the default VF driver for all VFs {% endif %} -{%- if openssl in ['on', 'optional'] %} -# Install and configure OpenSSL cryptography -openssl_install: {% if openssl == 'on' %}true{% else %}false{% endif %} -{% endif -%} -{%- if isolcpu in ["on", "optional"] %} -# CPU isolation from Linux scheduler -isolcpus_enabled: {% if isolcpu == 'on' %}true{% else %}false{% endif %} -{%- if on_vms == 'on' %} -isolcpus: "4-15" -{%- else -%} -{% if vm_mode == 'on' %} -# isolcpus variable can't be enabled in case of VMRA deployment. -# Its content is generated automatically. -# isolcpus: "" -{%- else %} -isolcpus: "4-11" {% endif %} -{%- endif %} -{%- endif %} -{%- if cpusets in ["on", "optional"] %} -# CPU shielding -cpusets_enabled: {% if cpusets == 'on' %}true{% else %}false{% endif %} -{%- if on_vms == 'on' %} -cpusets: "4-15" -{%- else %} -cpusets: "4-11" -{%- endif %} +############### +## Operators ## +############### + +{% if intel_ethernet_operator.enabled in ['on', 'optional'] %} +# Intel Ethernet Operator for Intel E810 series ethernet network adapters +intel_ethernet_operator: +{% if intel_ethernet_operator.ddp in ['on', 'optional'] %} + ddp_update: {% if intel_ethernet_operator.ddp == 'on' and nic == 'cvl' %}true{% else %}false{% endif %} # Perform DDP update on PFs listed in dataplane_interfaces using selected DDP profile {% endif %} + fw_update: {% if intel_ethernet_operator.fw_update == 'on' and nic == 'cvl' %}true{% else %}false{% endif %} # Perform firmware update on PFs listed in dataplane_interfaces + # ClusterFlowConfig does not require additional configuration and can be used in conjunction with NodeFlowConfig + node_flow_config_enabled: false # Enable NodeFlowConfig + # NodeFlowConfig/ClusterFlowConfig manifests local path + # For more information refer to: + # https://github.com/intel/intel-ethernet-operator/blob/main/docs/flowconfig-daemon/creating-rules.md + #flow_config_dir: /tmp/flow_config +{% endif %} +{% if sriov_operator in ['on', 'optional'] %} +# Custom SriovNetworkNodePolicy manifests local path +#custom_sriov_network_policies_dir: /tmp/sriov -{%- if native_cpu_manager in ["on", "optional"] %} -# Native CPU Manager (Kubernetes built-in) -# These settings are relevant only if in group_vars native_cpu_manager_enabled: true -# Amount of CPU cores that will be reserved for the housekeeping (2000m = 2000 millicores = 2 cores) -native_cpu_manager_system_reserved_cpus: 2000m -# Amount of CPU cores that will be reserved for Kubelet -native_cpu_manager_kube_reserved_cpus: 1000m -# Explicit list of the CPUs reserved for the host level system threads and Kubernetes related threads -#native_cpu_manager_reserved_cpus: "0,1,2" -# Note: All remaining unreserved CPU cores will be consumed by the workloads. -{% endif %} -{%- if (pstate in ['on', 'optional'] or sst in ['on', 'optional']) and arch in ['clx', 'icx', 'spr'] %} +{% endif %} +############################### +## Telemetry & Observability ## +############################### + +{% if telemetry.collectd in ['on', 'optional'] %} +# Telemetry configuration +# intel_pmu plugin collects information provided by Linux perf interface. +enable_intel_pmu_plugin: false +{% if on_vms == 'on' %} + +# Temporary Fix for collectd startup issue on VMs +enable_pkgpower_plugin: false +{% endif %} + +# CPU Threads to be monitored by Intel PMU Plugin. +# Please refer to https://collectd.org/wiki/index.php/Plugin:Intel_PMU for configuration details. +intel_pmu_plugin_monitored_cores: "" # If the field is empty, all available cores will be monitored. + +# CPU Threads to be monitored by Intel RDT Plugin. +# Please refer to https://collectd.org/wiki/index.php/Plugin:IntelRDT for configuration details. +intel_rdt_plugin_monitored_cores: "" # If the field is empty, all available cores will be monitored. + +# Additional list of plugins that will be excluded from collectd deployment. +exclude_collectd_plugins: [] + +{% endif %} +###################### +## Power Management ## +###################### +{% if power.manager in ['on', 'optional'] and arch in ['icx', 'clx', 'spr', 'emr'] %} +# The performance profile is available for nodes that has CPU max MHz > 3500.0000 - use 'lscpu' command to see your node details +# To use PowerProfiles in this list as sample pods on this node, please set 'deploy_example_pods' to true in group_vars +power_profiles: [balance-performance] # Possible PowerProfiles are: [performance, balance-performance, balance-power] + +# Power Manager Shared Profile/Workload settings. +# It is possible to create node-specific Power Profile +local_shared_profile: + enabled: false # Enable/Disable local shared profile + local_max_frequency: 2000 + local_min_frequency: 1500 + +{% if power.pstate in ['on', 'optional'] and arch in ['icx', 'clx', 'spr', 'emr'] %} + # P-State governor decides what frequency within the CPUfreq policy should be used + # "powersave" - Lowest frequency within the borders of min_freq and max_freq. + # "performance" - Highest frequency within the borders of min_freq and max_freq. + local_pstate_governor: "powersave" +{% endif %} + +# Shared Workload is required to make use of Shared Power Profile +shared_workload: + enabled: true # Enable/Disable shared workload + reserved_cpus: [] # The CPUs in reserved_cpus should match the value of the reserved system CPUs in your Kubelet config file, if none please + # set a dummy core here - the last one to avoid AppQos bug. + shared_workload_type: "global" # Set to node name to make use of node-specific Power Profile, 'global' means use cluster-specific custom Power Profile + +# EMR uncore_frequency has not supported in the kernel driver yet. +{% if power.uncore_frequency in ['on', 'optional'] and arch in ['icx', 'clx', 'spr'] %} +uncore_frequency: + enabled: {% if power.uncore_frequency == "on" %}true{% else %}false{% endif %} # Enable/Disable uncore frequency + + # The min/max values must be within the range of values supported by the CPU model. + # Please refer to the documentation of your CPU model to check the supported uncore frequencies. + system_max_frequency: 2300000 + system_min_frequency: 1300000 + + # If needed, you can choose specific frequency per die + die_selector: [] + # die_selector: + # - package: 0 + # die: 0 + # min: 1500000 + # max: 2400000 +{% endif %} + +{% if power.cstate in ['on', 'optional'] and arch in ['icx', 'clx', 'spr', 'emr'] %} +cstates: + enabled: {% if power.cstate == "on" %}true{% else %}false{% endif %} # Enable/Disable cstates + + shared: + C1: true + profile_exclusive: + balance-performance: + C1: false + + # If needed, you can choose specific C-State for each core: + core: {} + # core: + # "3": + # C1: true + # C6: false +{% endif %} + +{% endif %} +{% if (power.pstate in ['on', 'optional'] or sst in ['on', 'optional']) and arch in ['icx', 'clx', 'spr', 'emr'] %} # Enable/Disable Intel PState scaling driver -intel_pstate_enabled: {% if pstate == "on" or sst == "on" %}true{% else %}false{% endif %} -# Config options for intel_pstate: disable, passive, force, no_hwp, hwp_only, support_acpi_ppc, per_cpu_perf_limits -intel_pstate: {% if pstate == "on" or sst == "on" %}hwp_only{% else %}disable{% endif %} +intel_pstate_enabled: {% if power.pstate == "on" or sst == "on" %}true{% else %}false{% endif %} + +intel_pstate: {% if power.pstate == "on" or sst == "on" %}hwp_only{% else %}disable{% endif %} # Supported values: [disable, passive, force, no_hwp, hwp_only, support_acpi_ppc, per_cpu_perf_limits] + # Enable/Disable Intel Turbo Boost PState attribute turbo_boost_enabled: {% if on_vms != 'on' %}true{% else %}false{% endif %} -{% endif -%} -{% if cstate in ['on', 'optional'] %} -cstate_enabled: {% if cstate == "on" %}true{% else %}false{% endif %} -cstates: -{%- if name == 'access' %} - C6: # default values: C6 for access, C1 for other profiles - cpu_range: '0-9' # change as needed, cpus to modify cstates on - enable: true # true - enable given cstate, false - disable given cstate -{%- else %} - C1: # default values: C6 for access, C1 for other profiles - cpu_range: '0-9' # change as needed, cpus to modify cstates on - enable: true # true - enable given cstate, false - disable given cstate -{% endif -%} -{% endif -%} - -{% if ufs in ['on', 'optional'] %} -ufs_enabled: {% if ufs == "on" %}true{% else %}false{% endif %} -ufs: # uncore frequency scaling - min: 1000 # minimal uncore frequency - max: 2000 # maximal uncore frequency -{% endif -%} +{% endif %} {% if sst in ['on', 'optional'] %} -{%- if arch in ['icx', 'spr'] %} +{% if arch in ['icx', 'spr', 'emr'] %} # Intel(R) SST-PP (perf-profile) configuration -# [true] Enable Intel(R) SST-PP (perf-profile) -# [false] Disable Intel(R) SST-PP (perf-profile) sst_pp_configuration_enabled: {% if sst == "on" %}true{% else %}false{% endif %} -sst_pp_config_list: # "enable" or "disable" list options per SST-PP setup requirements - - sst_bf: "enable" # "enable" or "disable" Intel(R) SST-BF (base-freq) to configure with SST-PP - - sst_cp: "enable" # "enable" or "disable" Intel(R) SST-CP (core-power) to configure with SST-PP. - - sst_tf: "enable" # "enable" or "disable" Intel(R) SST-TF (turbo-freq) to configure with SST-PP. - online_cpus_range: "auto" # "auto" will config turbo-freq for all available online CPUs or else define specific CPUs such as "2,3,5" to prioritize among others. + +sst_pp_config_list: + - sst_bf: "enable" # enable/disable Intel(R) SST-BF (base-freq) configured through SST-PP. + - sst_cp: "enable" # enable/disable Intel(R) SST-CP (core-power) configured through SST-PP. + - sst_tf: "enable" # enable/disable Intel(R) SST-TF (turbo-freq) configured through SST-PP. + online_cpus_range: "auto" # "auto" configures turbo-freq for all available online CPUs. + # Alternatively define specific CPUs such as "2,3,5" to prioritize among others. + {% endif %} +{% if arch == 'clx' %} # Intel Speed Select Base-Frequency configuration. -{%- if arch == 'clx' %} sst_bf_configuration_enabled: {% if sst == "on" %}true{% else %}false{% endif %} + # Intel Speed Select Base-Frequency configuration for Cascade Lake (CLX) Platforms. # CLX support of SST-BF requires 'intel_pstate' to be 'enabled' # Option clx_sst_bf_mode requires sst_bf_configuration_enabled to be set to 'true'. @@ -392,99 +602,50 @@ sst_bf_configuration_enabled: {% if sst == "on" %}true{% else %}false{% endif %} # [m] Set P1 on all cores (set min/max to 2300/2300) # [r] Revert cores to min/Turbo (set min/max to 800/3900) clx_sst_bf_mode: s -{%- endif %} -{%- if arch == 'icx' %} +{% endif %} +{% if arch == 'icx' %} # Intel Speed Select Base-Frequency configuration for Ice Lake (ICX) Platforms. -# [true] Enable Intel Speed Select Base Frequency (SST-BF) -# [false] Disable Intel Speed Select Base Frequency (SST-BF) # Requires `sst_bf_configuration_enabled` variable to be 'true' icx_sst_bf_enabled: {% if sst == "on" %}true{% else %}false{% endif %} + # Prioritze (SST-CP) power flow to high frequency cores in case of CPU power constraints. icx_sst_bf_with_core_priority: {% if sst == "on" %}true{% else %}false{% endif %} + # SST CP config -# Variables are only examples. -# For more information, please visit: +# Variables are only examples. For more information, please visit: # https://www.kernel.org/doc/html/latest/admin-guide/pm/intel-speed-select.html#enable-clos-based-prioritization # Enabling this configuration overrides `icx_sst_bf_with_core_priority`. sst_cp_configuration_enabled: {% if sst == "on" %}true{% else %}false{% endif %} -sst_cp_priority_type: "1" # For Proportional select "0" and for Ordered select "1", update as required -sst_cp_clos_groups: # configure up to 4 CLOS groups + +sst_cp_priority_type: "1" # For Proportional select "0" and for Ordered select "1", update as required +sst_cp_clos_groups: # configure up to 4 CLOS groups (id: 0-3) - id: 0 - frequency_weight: 0 # used only with Proportional type + frequency_weight: 0 # used only with Proportional type min_MHz: 0 max_MHz: 25500 - id: 1 - frequency_weight: 0 # used only with Proportional type - min_MHz: 0 - max_MHz: 25500 - - id: 2 - frequency_weight: 0 # used only with Proportional type - min_MHz: 0 - max_MHz: 25500 - - id: 3 - frequency_weight: 0 # used only with Proportional type + frequency_weight: 0 min_MHz: 0 max_MHz: 25500 -sst_cp_cpu_clos: # assign required values to CLOS group after priority type setup - - clos: 0 # associating CPUs with a CLOS group such as "clos: 3", update as required. - cpus: 2,3,5 # define specific CPUs per CLOS group such as "cpus: 2,6,9", update as required. + +sst_cp_cpu_clos: # Assign required values to CLOS group after priority type setup + - clos: 0 # Associate CPUs with a CLOS group 0 (id: 0) + cpus: 2,3,5 # List of CPUs to associate with CLOS group - clos: 1 - cpus: 12 + cpus: 12 # Intel(R) SST-TF (feature turbo-freq) configuration for Ice Lake (ICX) Platforms. -# [true] Enable Intel Speed Select Turbo Frequency (SST-TF) -# [false] Disable Intel Speed Select Turbo Frequency (SST-TF) sst_tf_configuration_enabled: {% if sst == "on" %}true{% else %}false{% endif %} -{%- endif %} -{% endif %} -{%- if sgx in ['on', 'optional'] and arch in ['icx', 'spr'] %} -# Intel Software Guard Extensions (SGX) -configure_sgx: {% if sgx == 'on' %}true{% else %}false{% endif %} -{% endif %} -{%- if gpu in ['on', 'optional'] %} -# Intel custom GPU kernel - this is required to be true in order to -# deploy Intel GPU Device Plugin on that node -configure_gpu: {% if gpu == 'on' %}true{% else %}false{% endif %} {% endif %} - -{%- if telemetry.collectd in ['on', 'optional'] %} -# Telemetry configuration -# intel_pmu plugin collects information provided by Linux perf interface. -enable_intel_pmu_plugin: false - -{%- if on_vms == 'on' %} - -# Temporary Fix for collectd startup issue on VMs -enable_pkgpower_plugin: false -{%- endif %} - -# CPU Threads to be monitored by Intel PMU Plugin. -# If the field is empty, all available cores will be monitored. -# Please refer to https://collectd.org/wiki/index.php/Plugin:Intel_PMU for configuration details. -intel_pmu_plugin_monitored_cores: "" - -# CPU Threads to be monitored by Intel RDT Plugin. -# If the field is empty, all available cores will be monitored. -# Please refer to https://collectd.org/wiki/index.php/Plugin:IntelRDT for configuration details. -intel_rdt_plugin_monitored_cores: "" - -# Additional list of plugins that will be excluded from collectd deployment. -exclude_collectd_plugins: [] {% endif %} +####################### +## Workloads & Demos ## +####################### -{%- if cndp in ['on', 'optional'] %} -# Intel Cloud Native Data Plane. -cndp_enabled: {% if cndp == 'on' %}true{% else %}false{% endif %} -{%- if cndp_dp in ['on', 'optional'] %} -cndp_dp_pools: - - name: "e2e" - drivers: {% raw %}"{{ dataplane_interfaces | map(attribute='pf_driver') | list | unique }}"{% endraw %} # List of NIC driver to be included in CNDP device plugin ConfigMap. -{% endif %} -{%- endif %} -{%- if adq_dp in ['on', 'optional'] %} +{% if adq_dp in ['on', 'optional'] %} # Note: ADQ is experimental feature and enabling it may lead to unexpected results. # ADQ requires back-to-back connection between control plane and worker node on CVL interfaces. # Name of CVL interfaces must be the same on both nodes, IP address must be present. @@ -494,230 +655,170 @@ adq_dp: enabled: false # IP address of CVL interface located on the worker node interface_address: "192.168.0.11" + +{% if intel_eci and (intel_eci.values() | reject('eq', 'off')) | list | length() > 0 %} +# Intel ECI (Edge Controls for Industrial) +intel_eci_enabled: {% if (intel_eci.values() | select('eq', 'on')) | list | length() > 0 %}true{% else %}false{% endif %} # if true, deploy Intel ECI +intel_eci: + eci-process-automation: {% if intel_eci.process_automation == 'on' %}true{% else %}false{% endif %} + + eci-manufacturing-equipment: {% if intel_eci.manufacturing_equipment == 'on' %}true{% else %}false{% endif %} + + eci-discrete-manufacturing: {% if intel_eci.discrete_manufacturing == 'on' %}true{% else %}false{% endif %} + + eci-realtime: {% if intel_eci.realtime == 'on' %}true{% else %}false{% endif %} + + eci-connectivity: {% if intel_eci.connectivity == 'on' %}true{% else %}false{% endif %} + + eci-infra-clients: {% if intel_eci.infra_clients == 'on' %}true{% else %}false{% endif %} + + eci-inference: {% if intel_eci.inference == 'on' %}true{% else %}false{% endif %} + + eci-softplc: {% if intel_eci.softplc == 'on' %}true{% else %}false{% endif %} + + eci-acrn: {% if intel_eci.acrn == 'on' %}true{% else %}false{% endif %} + +# The following ECI meta-packages aren't included becuase they are NOT suported yet on Ubuntu (as of 05/2023): +# eci-robotics-control +# eci-robotics +# eci-rth +# eci-kvm +# eci-xenomai + +opcua_framework: + codesys_opcua_client: {% if opcua_framework.codesys_opcua_client == 'on' %}true{% else %}false{% endif %} + + standalone_opcua_server: {% if opcua_framework.standalone_opcua_server == 'on' %}true{% else %}false{% endif %} + {% endif %} -{%- if vm_mode in ['on'] and on_vms != 'on' %} -# The only common VM image for all VMs inside deployment is supported at the moment -# -{%- if secondary_host == 'true' %} -# Do not set VM image info here - do it just on the first vm_host +{% endif %} +{% if vm_mode in ['on'] and on_vms != 'on' %} +######################## +## VMRA Configuration ## +######################## + +# Only a common VM image for all VMs inside deployment is supported at the moment +{% if secondary_host == 'true' %} +# Secondary vm_host - VM image can only be changed on the first vm_host # Secondary vm_host - do not change dhcp settings here dhcp: [] + {% else %} -# Default VM image version is Ubuntu 22.04 -# Supported VM image distributions ['ubuntu', 'rocky']. Default is 'ubuntu'. -#vm_image_distribution: "rocky" -# Supported VM image ubuntu versions ['22.04']. Default version is '22.04'. -#vm_image_version_ubuntu: "22.04" -# Supported VM image rocky versions ['8.5', '9.0']. Default version is '8.5'. -#vm_image_version_rocky: "9.0" -# dhcp for vxlan have to be enabled just on the first vm_host +# Default VM image version is Ubuntu 22.04. Uncomment relevant options below to modify. +#vm_image_distribution: "rocky" # VM image distribution. Supports ['ubuntu, 'rocky']. Default is 'ubuntu' +#vm_image_version_ubuntu: "22.04" # Ubuntu VM image version. Supoorts ['22.04']. +#vm_image_version_rocky: "9.0" # Rocky VM image version. Supports ['8.5', '9.0']. Default is '8.5' + +# DHCP for VXLAN has to be enabled only on the first vm_host dhcp: - 120 + vxlan_gw_ip: "172.31.0.1/24" -{% endif -%} +{% endif %} # Set hashed password for root user inside VMs. Current value is just placeholder. # To create hashed password use e.g.: openssl passwd -6 -salt SaltSalt vm_hashed_passwd: 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' -# Set hashed password for non root user inside VMs. Current value is just placeholder. + +# Set hashed password for non-root user inside VMs. Current value is just placeholder. # If value is not specified then vm_hashed_passwd value will be used for non root user as well #vm_hashed_passwd_non_root: 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' -# Set physical network subnet, which will be used to create VXLANs for VMs communication -# This parameter is mandatory for VM multinode configuration + +# Set physical network subnet, which will be used to create VXLANs for VM communication +# This parameter is required for VM multinode configuration vxlan_physical_network: "11.0.0.0/8" + #cpu_host_os will change number of CPUs reserved for host OS. Default value is 16 #cpu_host_os: 8 + # VM cluster name is used to group all VMs from single deployment together #vm_cluster_name: "cluster1.local" + vms: -{%- if secondary_host == 'true' %} -# - type: "ctrl" +{% if secondary_host == 'true' %} +# - type: "ctrl" # Type of VM (controller, worker). Supported values: ["ctrl", "work"] # name: "vm-ctrl-1" +# # By default CPU and NUMA node allocation is done automatically. +# # The 'cpus' and 'numa' options can be used to manually set values. +# #cpus: "8-11,64-67" +# #numa: 0 +# # if you set cpu_total: to 0 then rest of unallocated CPUs from selected numa will be used # cpu_total: 8 +# # if 'alloc_all: true' is used then 'cpu_total' have to be set to '0' +# # It will take all unallocated CPUs from all NUMA nodes on the vm_host. +# #alloc_all: true # memory: 20480 # vxlan: 120 {% else %} - - type: "ctrl" + - type: "ctrl" # Type of VM (controller, worker). Supported values: ["ctrl", "work"] name: "vm-ctrl-1" - # Parameter cpus and numa can be uncommented and specified manually, but by default - # are CPUs and NUMA node allocated automatically. - # cpus: "8-11,64-67" - # numa: 0 - # Parameter emu_cpus is DEPRECATED, uncommenting line below would not have - # any impact, as emulators CPUs are picked-up automatically. - # emu_cpus: "8,64" + # By default CPU and NUMA node allocation is done automatically. + # The 'cpus' and 'numa' options can be used to manually set values. + #cpus: "8-11,64-67" + #numa: 0 # if you set cpu_total: to 0 then rest of unallocated CPUs from selected numa will be used cpu_total: 8 # if 'alloc_all: true' is used then 'cpu_total' have to be set to '0' # It will take all unallocated CPUs from all NUMA nodes on the vm_host. - # alloc_all: true + #alloc_all: true memory: 20480 vxlan: 120 -{% endif -%} +{% endif %} # - type: "ctrl" # name: "vm-ctrl-2" # cpu_total: 8 # memory: 20480 # vxlan: 120 -# - type: "ctrl" # name: "vm-ctrl-3" # cpu_total: 8 # memory: 20480 # vxlan: 120 -{%- if secondary_host == 'true' %} -# - type: "work" -# name: "vm-work-1" -# cpu_total: 16 -# memory: 61440 -# vxlan: 120 -{%- if name not in ['build_your_own'] %} -# pci: -# - "18:02.2" -# - "18:02.3" -# - "18:02.4" -# - "18:02.5" -{%- if qat == "on" %} -## - "3d:01.1" -## - "3f:01.1" -{%- endif %} -{%- else %} -# pci: [] -{%- endif %} -{% else %} - type: "work" +{% if secondary_host == 'true' %} + name: "vm-work-2" +{% else %} name: "vm-work-1" - # Parameter cpus and numa can be uncommented and specified manually, but by default - # are CPUs and NUMA node allocated automatically. - # cpus: "28-35,84-91" - # numa: 1 - # Parameter emu_cpus is DEPRECATED, uncommenting line below would not have - # any impact, as emulators CPUs are picked-up automatically. - # emu_cpus: "28,84" - # if you set cpu_total: to 0 then rest of unallocated CPUs from selected numa will be used +{% endif %} + #cpus: "28-35,84-91" + #numa: 1 cpu_total: 16 - # if 'alloc_all: true' is used then 'cpu_total' have to be set to '0' - # It will take all unallocated CPUs from all NUMA nodes on the vm_host. - # alloc_all: true + #alloc_all: true memory: 61440 vxlan: 120 -{%- if name not in ['build_your_own'] %} +{% if name not in ['build_your_own'] %} pci: - - "18:02.2" + - "18:02.2" # 18:xx.x are example VFs for networking - "18:02.3" - "18:02.4" - "18:02.5" -{%- if qat == "on" %} -# - "3d:01.1" -# - "3f:01.1" -{%- endif %} -{%- else %} - pci: [] -{%- endif %} -{% endif -%} -{% if secondary_host == 'true' %} - type: "work" - name: "vm-work-2" - cpu_total: 16 - memory: 61440 - vxlan: 120 -{%- if name not in ['build_your_own'] %} - pci: - - "18:02.0" - - "18:02.1" - - "18:02.6" - - "18:02.7" -{%- if qat == "on" %} -# - "3d:01.2" -# - "3f:01.2" -{%- endif %} -{%- else %} +{% if qat == "on" %} + - "3d:01.1" # 3x:xx.x are example VFs for QAT + - "3f:01.1" +{% endif %} +{% else %} pci: [] -{%- endif %} -{% else -%} +{% endif %} # - type: "work" -# name: "vm-work-2" +{% if secondary_host == 'true' %} +# name: "vm-work-4" +{% else %} +# name: "vm-work-3" +{% endif %} # cpu_total: 16 # memory: 61440 # vxlan: 120 -{%- if name not in ['build_your_own'] %} +{% if name not in ['build_your_own'] %} # pci: -# - "18:02.0" +# - "18:02.0" # 18:xx.x are example VFs for networking # - "18:02.1" # - "18:02.6" # - "18:02.7" -{%- if qat == "on" %} -## - "3d:01.2" -## - "3f:01.2" -{%- endif %} -{%- else %} +{% if qat == "on" %} +# - "3d:01.2" # 3x:xx.x are example VFs for QAT +# - "3f:01.2" +{% endif %} +{% else %} # pci: [] -{%- endif %} -{% endif -%} -{% endif -%} -{%- if power_manager in ['on', 'optional'] and arch in ['icx', 'clx', 'spr'] -%} -# Power Manager Shared Profile/Workload settings. -# It is possible to create node-specific Power Profile -local_shared_profile: - enabled: false - node_max_shared_frequency: 2000 - node_min_shared_frequency: 1500 - -# Shared Workload is required to make use of Shared Power Profile -shared_workload: - enabled: false - reserved_cpus: [] # The CPUs in reserved_cpus should match the value of the reserved system CPUs in your Kubelet config file, if none please - # set here a dummy core - the last one to avoid AppQos bug - shared_workload_type: "global" # set to node name to make use of node-specific Power Profile, 'global' means use cluster-specific custom Power Profile {% endif %} -{%- if not cloud_mode == 'on' %} -# Useful if system loses IP after reboot. Note: make sure IP is stable / system gets same IP after reboot else will cause failures. -enable_dhclient_systemd_service: {% if enable_dhclient_systemd_service == "on" %}true{% else %}false{% endif %} -{%- endif %} - -{% if minio in ['on', 'optional'] %} -# MinIO storage configuration -minio_pv: [] -# - name: "mnt-data-1" # PV identifier will be used for PVs names followed by node name(e.g., mnt-data-1-hostname) -# storageClassName: "local-storage" # Storage class name to match with PVC -# accessMode: "ReadWriteOnce" # Access mode when mounting a volume, e.g., ReadWriteOnce/ReadOnlyMany/ReadWriteMany/ReadWriteOncePod -# persistentVolumeReclaimPolicy: "Retain" # Reclaim policy when a volume is released once it's bound, e.g., Retain/Recycle/Delete -# mountPath: /mnt/data0 # Mount path of a volume -# device: /dev/nvme0n1 # Target storage device name when creating a volume. - # When group_vars: minio_deploy_test_mode == true, use a file as a loop device for storage - # otherwise, an actual NVME or SSD device for storage on the device name. - -# - name: "mnt-data-2" -# storageClassName: "local-storage" -# accessMode: "ReadWriteOnce" -# persistentVolumeReclaimPolicy: "Retain" -# mountPath: /mnt/data1 -# device: /dev/nvme1n1 - -# - name: "mnt-data-3" -# storageClassName: "local-storage" -# accessMode: "ReadWriteOnce" -# persistentVolumeReclaimPolicy: "Retain" -# mountPath: /mnt/data2 -# device: /dev/nvme2n1 - -# - name: "mnt-data-4" -# storageClassName: "local-storage" -# accessMode: "ReadWriteOnce" -# persistentVolumeReclaimPolicy: "Retain" -# mountPath: /mnt/data3 -# device: /dev/nvme3n1 - -# - name: "mnt-data-5" -# storageClassName: "local-storage" -# accessMode: "ReadWriteOnce" -# persistentVolumeReclaimPolicy: "Retain" -# mountPath: /mnt/data4 -# device: /dev/nvme4n1 - -# - name: "mnt-data-6" -# storageClassName: "local-storage" -# accessMode: "ReadWriteOnce" -# persistentVolumeReclaimPolicy: "Retain" -# mountPath: /mnt/data5 -# device: /dev/nvme5n1 -{% endif -%} +{% endif %} diff --git a/generate/profiles_templates/k8s/inventory.j2 b/generate/profiles_templates/k8s/inventory.j2 index 6f6d9053..8dd98deb 100644 --- a/generate/profiles_templates/k8s/inventory.j2 +++ b/generate/profiles_templates/k8s/inventory.j2 @@ -1,4 +1,4 @@ -{%- if intel_flexran == 'on' %} +{% if intel_flexran == 'on' %} [all] bbu ansible_host=10.0.0.1 ip=10.0.0.1 ansible_user=USER ansible_password=XXXX oru ansible_host=10.0.0.2 ip=10.0.0.2 ansible_user=USER ansible_password=XXXX @@ -24,7 +24,7 @@ kube_node [all:vars] ansible_python_interpreter=/usr/bin/python3 -{%- else %} +{% else %} [all] controller1 ansible_host=10.0.0.1 ip=10.0.0.1 ansible_user=USER ansible_password=XXXX controller2 ansible_host=10.0.0.2 ip=10.0.0.2 ansible_user=USER ansible_password=XXXX @@ -55,4 +55,4 @@ kube_node [all:vars] ansible_python_interpreter=/usr/bin/python3 -{%- endif %} +{% endif %} diff --git a/generate/profiles_templates/k8s/profiles.yml b/generate/profiles_templates/k8s/profiles.yml index 294d4772..d591c780 100644 --- a/generate/profiles_templates/k8s/profiles.yml +++ b/generate/profiles_templates/k8s/profiles.yml @@ -10,6 +10,7 @@ # - on_vms - is 'optional(false)' on k8s and on vm_host and is 'on(true)' on VMs # - nfd # - kube_dashboard +# - rancher_manager # - isolcpu # - cpusets # - intel_cpu_controlplane @@ -22,6 +23,8 @@ # - sgx # - sgx_dp # - kmra: +# sbx +# oran # pccs # apphsm # ctk_demo @@ -29,6 +32,7 @@ # - tac # - qat # - dsa +# - dsa_dp # - dlb # - dlb_dp # - gpu @@ -40,11 +44,12 @@ # - network_userspace # - dpdk # - ovs_dpdk -# - pstate -# - cstate -# - ufs - uncore frequency scaling # - sst -# - power_manager +# - power: +# manager +# pstate +# cstate +# uncore_frequency # - telemetry: # prometheus # collectd @@ -58,7 +63,8 @@ # - minio # - lpvsp # - rook_ceph -# - intel_ai +# - intel_media_analytics +# - intel_ffmpeg # - cert_manager # - registry # - hugepages @@ -76,10 +82,28 @@ # ddp # fw_update # - intel_sriov_fec_operator +# - intel_oneapi +# base +# ai +# - rt_kernel # - intel_flexran +# - intel_eci +# process_automation +# manufacturing_equipment +# discrete_manufacturing +# realtime +# connectivity +# softplc +# infra_clients +# inference +# acrn +# - opcua_framework +# codesys_opcua_client +# standalone_opcua_server # - tadk -# - remove_kubespray_host_dns_settings -# - enable_dhclient_systemd_service +# - sigstore_policy_controller +# - cadvisor +# - fpga --- access: name: access @@ -87,6 +111,7 @@ access: on_vms: optional nfd: on kube_dashboard: on + rancher_manager: optional isolcpu: optional cpusets: optional native_cpu_manager: on @@ -104,12 +129,14 @@ access: dlb_dp: optional gpu: off gpu_dp: off - sgx: off - sgx_dp: off + sgx: optional + sgx_dp: optional kmra: - pccs: off - apphsm: off - ctk_demo: off + sbx: optional + oran: optional + pccs: optional + apphsm: optional + ctk_demo: optional tcs: off tac: off tas: off @@ -118,11 +145,12 @@ access: network_userspace: off dpdk: on ovs_dpdk: off - pstate: off - cstate: off - ufs: off sst: off - power_manager: off + power: + manager: off + pstate: off + cstate: off + uncore_frequency: off telemetry: prometheus: on collectd: optional @@ -145,7 +173,8 @@ access: minio: off lpvsp: optional rook_ceph: off - intel_ai: off + intel_media_analytics: off + intel_ffmpeg: off cert_manager: on registry: on hugepages: on @@ -155,10 +184,14 @@ access: ddp: optional fw_update: optional intel_sriov_fec_operator: on + intel_oneapi: + base: on + ai: optional + rt_kernel: optional intel_flexran: on adq_dp: optional - remove_kubespray_host_dns_settings: on - enable_dhclient_systemd_service: on + sigstore_policy_controller: on + cadvisor: on basic: name: basic @@ -166,6 +199,7 @@ basic: on_vms: optional nfd: on kube_dashboard: on + rancher_manager: optional isolcpu: optional cpusets: optional topology_manager: on @@ -173,8 +207,11 @@ basic: sriov_network_dp: optional nic_drivers: on dpdk: optional - cstate: optional - ufs: optional + power: + manager: optional + pstate: optional + cstate: optional + uncore_frequency: optional telemetry: prometheus: on collectd: optional @@ -189,7 +226,8 @@ basic: minio: off lpvsp: on rook_ceph: off - intel_ai: off + intel_media_analytics: off + intel_ffmpeg: off cert_manager: on registry: on hugepages: optional @@ -198,8 +236,11 @@ basic: flow_config: optional fw_update: optional adq_dp: optional - remove_kubespray_host_dns_settings: on - enable_dhclient_systemd_service: on + sigstore_policy_controller: optional + intel_oneapi: + base: optional + ai: optional + cadvisor: on full_nfv: name: full_nfv @@ -207,6 +248,7 @@ full_nfv: on_vms: optional nfd: on kube_dashboard: on + rancher_manager: optional isolcpu: optional cpusets: optional intel_cpu_controlplane: optional @@ -228,6 +270,8 @@ full_nfv: sgx: on sgx_dp: on kmra: + sbx: optional + oran: optional pccs: on apphsm: on ctk_demo: on @@ -239,11 +283,12 @@ full_nfv: network_userspace: on dpdk: on ovs_dpdk: on - pstate: optional - cstate: optional - ufs: optional sst: optional - power_manager: on + power: + manager: on + pstate: on + cstate: on + uncore_frequency: on telemetry: prometheus: on collectd: optional @@ -266,7 +311,8 @@ full_nfv: minio: optional lpvsp: on rook_ceph: on - intel_ai: off + intel_media_analytics: off + intel_ffmpeg: optional cert_manager: on registry: on hugepages: on @@ -278,8 +324,12 @@ full_nfv: intel_sriov_fec_operator: optional tadk: on adq_dp: optional - remove_kubespray_host_dns_settings: on - enable_dhclient_systemd_service: on + sigstore_policy_controller: optional + intel_oneapi: + base: optional + ai: optional + cadvisor: on + fpga: optional on_prem: name: on_prem @@ -287,6 +337,7 @@ on_prem: on_vms: optional nfd: on kube_dashboard: on + rancher_manager: optional isolcpu: optional cpusets: optional native_cpu_manager: on @@ -299,6 +350,8 @@ on_prem: sgx: on sgx_dp: on kmra: + sbx: optional + oran: optional pccs: on apphsm: on ctk_demo: on @@ -312,13 +365,15 @@ on_prem: dlb_dp: optional openssl: on tas: on + gas: optional dpdk: on bond_cni: optional - pstate: optional - cstate: optional - ufs: optional sst: optional - power_manager: optional + power: + manager: optional + pstate: optional + cstate: optional + uncore_frequency: optional telemetry: prometheus: on collectd: optional @@ -341,7 +396,8 @@ on_prem: minio: optional lpvsp: on rook_ceph: optional - intel_ai: optional + intel_media_analytics: optional + intel_ffmpeg: optional cert_manager: on registry: on hugepages: on @@ -350,8 +406,193 @@ on_prem: flow_config: optional fw_update: optional adq_dp: optional - remove_kubespray_host_dns_settings: on - enable_dhclient_systemd_service: on + sigstore_policy_controller: optional + intel_oneapi: + base: optional + ai: optional + cadvisor: on + fpga: optional + +on_prem_vss: + name: on_prem_vss + vm_mode: optional + on_vms: optional + nfd: on + kube_dashboard: on + isolcpu: on + cpusets: optional + native_cpu_manager: optional + topology_manager: optional + sriov_operator: on + sriov_network_dp: optional + nic_drivers: on + bond_cni: optional + qat: on + qat_dp: on + openssl: on + dsa: on + dsa_dp: on + dlb: on + dlb_dp: on + gpu: on + gpu_dp: on + sgx: on + sgx_dp: on + kmra: + sbx: optional + oran: optional + pccs: optional + apphsm: optional + ctk_demo: optional + tcs: optional + tac: optional + tas: on + gas: on + ddp: optional + network_userspace: optional + dpdk: on + ovs_dpdk: optional + sst: optional + power: + manager: optional + pstate: optional + cstate: optional + uncore_frequency: optional + telemetry: + prometheus: optional + collectd: optional + telegraf: optional + jaeger: optional + opentelemetry: optional + elasticsearch: optional + kibana: optional + istio_service_mesh: + enabled: on + tcpip_bypass_ebpf: optional + tls_splicing: optional + sgx_signer: optional + intel_preview: optional + linkerd_service_mesh: + enabled: optional + wireguard: optional + multus: on + firewall: optional + minio: optional + lpvsp: on + rook_ceph: on + intel_media_analytics: on + cert_manager: on + registry: on + hugepages: on + intel_ethernet_operator: + enabled: optional + flow_config: optional + ddp: optional + fw_update: optional + adq_dp: optional + sigstore_policy_controller: optional + intel_oneapi: + base: optional + ai: optional + +on_prem_sw_defined_factory: + name: on_prem_sw_defined_factory + vm_mode: optional + on_vms: optional + nfd: optional + kube_dashboard: optional + rancher_manager: optional + isolcpu: optional + cpusets: optional + native_cpu_manager: optional + intel_cpu_controlplane: optional + topology_manager: optional + sriov_operator: optional + sriov_network_dp: optional + nic_drivers: optional + bond_cni: optional + qat: optional + qat_dp: optional + openssl: optional + dsa: optional + dsa_dp: optional + dlb: optional + dlb_dp: optional + gpu: optional + gpu_dp: optional + sgx: optional + sgx_dp: optional + kmra: + sbx: optional + oran: optional + pccs: optional + apphsm: optional + ctk_demo: optional + tcs: optional + tac: optional + tas: optional + gas: optional + ddp: optional + network_userspace: optional + dpdk: optional + ovs_dpdk: optional + sst: optional + power: + manager: optional + pstate: optional + cstate: optional + uncore_frequency: optional + telemetry: + prometheus: optional + collectd: optional + telegraf: optional + jaeger: optional + opentelemetry: optional + elasticsearch: optional + kibana: optional + istio_service_mesh: + enabled: optional + tcpip_bypass_ebpf: optional + tls_splicing: optional + sgx_signer: optional + intel_preview: optional + linkerd_service_mesh: + enabled: optional + wireguard: optional + multus: optional + firewall: optional + minio: optional + lpvsp: optional + rook_ceph: optional + intel_media_analytics: optional + cert_manager: optional + registry: optional + hugepages: optional + intel_ethernet_operator: + enabled: optional + flow_config: optional + ddp: optional + fw_update: optional + intel_sriov_fec_operator: optional + rt_kernel: optional + intel_flexran: optional + intel_eci: + process_automation: on + manufacturing_equipment: optional + discrete_manufacturing: optional + realtime: optional + connectivity: optional + softplc: optional + infra_clients: optional + inference: optional + acrn: optional + opcua_framework: + codesys_opcua_client: on + standalone_opcua_server: optional + tadk: optional + adq_dp: optional + sigstore_policy_controller: optional + cadvisor: optional regional_dc: name: regional_dc @@ -359,6 +600,7 @@ regional_dc: on_vms: optional nfd: on kube_dashboard: on + rancher_manager: optional isolcpu: optional cpusets: optional topology_manager: on @@ -371,16 +613,21 @@ regional_dc: sgx: on sgx_dp: on kmra: + sbx: optional + oran: optional pccs: on apphsm: on ctk_demo: on tcs: on tac: on tas: on - gas: on + gas: optional dpdk: optional - cstate: optional - ufs: optional + power: + manager: optional + pstate: optional + cstate: optional + uncore_frequency: optional telemetry: prometheus: on collectd: optional @@ -403,7 +650,8 @@ regional_dc: minio: optional lpvsp: on rook_ceph: optional - intel_ai: off + intel_media_analytics: off + intel_ffmpeg: optional cert_manager: on registry: on hugepages: optional @@ -412,8 +660,12 @@ regional_dc: flow_config: optional fw_update: optional adq_dp: optional - remove_kubespray_host_dns_settings: on - enable_dhclient_systemd_service: on + sigstore_policy_controller: optional + intel_oneapi: + base: optional + ai: optional + cadvisor: on + fpga: optional remote_fp: name: remote_fp @@ -421,6 +673,7 @@ remote_fp: on_vms: optional nfd: on kube_dashboard: on + rancher_manager: optional isolcpu: optional cpusets: optional intel_cpu_controlplane: optional @@ -449,11 +702,12 @@ remote_fp: bond_cni: optional network_userspace: optional dpdk: on - pstate: on - cstate: optional - ufs: optional sst: optional - power_manager: optional + power: + manager: optional + pstate: on + cstate: optional + uncore_frequency: optional telemetry: prometheus: on collectd: on @@ -476,7 +730,8 @@ remote_fp: minio: off lpvsp: on rook_ceph: off - intel_ai: off + intel_media_analytics: off + intel_ffmpeg: off cert_manager: on registry: on hugepages: on @@ -486,8 +741,11 @@ remote_fp: ddp: on fw_update: optional adq_dp: optional - remove_kubespray_host_dns_settings: on - enable_dhclient_systemd_service: on + sigstore_policy_controller: optional + intel_oneapi: + base: optional + ai: optional + cadvisor: optional build_your_own: name: build_your_own @@ -495,6 +753,7 @@ build_your_own: on_vms: optional nfd: optional kube_dashboard: optional + rancher_manager: optional isolcpu: optional cpusets: optional native_cpu_manager: optional @@ -516,6 +775,8 @@ build_your_own: sgx: optional sgx_dp: optional kmra: + sbx: optional + oran: optional pccs: optional apphsm: optional ctk_demo: optional @@ -527,11 +788,12 @@ build_your_own: network_userspace: optional dpdk: optional ovs_dpdk: optional - pstate: optional - cstate: optional - ufs: optional sst: optional - power_manager: optional + power: + manager: optional + pstate: optional + cstate: optional + uncore_frequency: optional telemetry: prometheus: optional collectd: optional @@ -554,7 +816,8 @@ build_your_own: minio: optional lpvsp: optional rook_ceph: optional - intel_ai: optional + intel_media_analytics: optional + intel_ffmpeg: optional cert_manager: optional registry: optional hugepages: optional @@ -564,8 +827,26 @@ build_your_own: ddp: optional fw_update: optional intel_sriov_fec_operator: optional + intel_oneapi: + base: optional + ai: optional + rt_kernel: optional intel_flexran: optional + intel_eci: + process_automation: optional + manufacturing_equipment: optional + discrete_manufacturing: optional + realtime: optional + connectivity: optional + softplc: optional + infra_clients: optional + inference: optional + acrn: optional + opcua_framework: + codesys_opcua_client: optional + standalone_opcua_server: optional tadk: optional adq_dp: optional - remove_kubespray_host_dns_settings: optional - enable_dhclient_systemd_service: optional + sigstore_policy_controller: optional + cadvisor: optional + fpga: optional diff --git a/generate/profiles_templates/vm/vm_host_profiles.yml b/generate/profiles_templates/vm/vm_host_profiles.yml index bc785de8..4844c2b7 100644 --- a/generate/profiles_templates/vm/vm_host_profiles.yml +++ b/generate/profiles_templates/vm/vm_host_profiles.yml @@ -21,6 +21,8 @@ # - sgx # - sgx_dp # - kmra: +# sbx +# oran # pccs # apphsm # ctk_demo @@ -36,9 +38,11 @@ # - network_userspace # - dpdk # - ovs_dpdk -# - pstate -# - cstate -# - ufs - uncore frequency scaling +# - power: +# manager +# pstate +# cstate +# uncore_frequency # - sst # - telemetry: # prometheus @@ -66,12 +70,14 @@ # flow_config # ddp # fw_update -# - remove_kubespray_host_dns_settings -# - enable_dhclient_systemd_service +# - sigstore_policy_controller +# - intel_oneapi +# base +# ai +# - cadvisor # sriov_operator is permanently disabled in VM mode # sriov_network_dp and dpdk are enabled for all VM mode profiles except build_your_own -# kmra is temporary disabled in VM mode - needs to be tested # sst is temporary disabled in VM mode # gpu and gpu_dp are temporary disabled in VM mode # flow_config of intel_ethernet_operator is permanently disabled in VM mode since it depends on sriov_network_operator @@ -94,9 +100,11 @@ access: qat_dp: on openssl: on dpdk: on - pstate: optional - cstate: optional - ufs: optional + power: + manager: off + pstate: off + cstate: off + uncore_frequency: off telemetry: prometheus: on collectd: optional @@ -123,8 +131,11 @@ access: enabled: optional flow_config: optional fw_update: optional - remove_kubespray_host_dns_settings: on - enable_dhclient_systemd_service: on + sigstore_policy_controller: on + intel_oneapi: + base: optional + ai: optional + cadvisor: on basic: name: basic @@ -139,8 +150,11 @@ basic: sriov_network_dp: on nic_drivers: on dpdk: on - cstate: optional - ufs: optional + power: + manager: optional + pstate: optional + cstate: optional + uncore_frequency: optional telemetry: prometheus: on collectd: optional @@ -159,8 +173,11 @@ basic: enabled: optional flow_config: optional fw_update: optional - remove_kubespray_host_dns_settings: on - enable_dhclient_systemd_service: on + sigstore_policy_controller: optional + intel_oneapi: + base: optional + ai: optional + cadvisor: on full_nfv: name: full_nfv @@ -184,20 +201,24 @@ full_nfv: sgx: on sgx_dp: on kmra: - pccs: optional - apphsm: optional - ctk_demo: optional - tcs: optional - tac: optional + sbx: optional + oran: optional + pccs: on + apphsm: on + ctk_demo: on + tcs: on + tac: on tas: on gas: optional ddp: on network_userspace: on dpdk: on ovs_dpdk: on - pstate: optional - cstate: optional - ufs: optional + power: + manager: optional + pstate: optional + cstate: optional + uncore_frequency: optional sst: optional telemetry: prometheus: on @@ -211,7 +232,7 @@ full_nfv: enabled: on tcpip_bypass_ebpf: on tls_splicing: on - sgx_signer: optional + sgx_signer: on intel_preview: optional linkerd_service_mesh: enabled: optional @@ -226,8 +247,11 @@ full_nfv: flow_config: optional ddp: optional fw_update: optional - remove_kubespray_host_dns_settings: on - enable_dhclient_systemd_service: on + sigstore_policy_controller: optional + intel_oneapi: + base: optional + ai: optional + cadvisor: on on_prem: name: on_prem @@ -245,20 +269,24 @@ on_prem: sgx: on sgx_dp: on kmra: - pccs: optional - apphsm: optional - ctk_demo: optional - tcs: optional - tac: optional + sbx: optional + oran: optional + pccs: on + apphsm: on + ctk_demo: on + tcs: on + tac: on qat: on qat_dp: on openssl: on tas: on dpdk: on bond_cni: optional - pstate: optional - cstate: optional - ufs: optional + power: + manager: optional + pstate: optional + cstate: optional + uncore_frequency: optional sst: optional telemetry: prometheus: on @@ -272,7 +300,7 @@ on_prem: enabled: on tcpip_bypass_ebpf: on tls_splicing: on - sgx_signer: optional + sgx_signer: on intel_preview: optional linkerd_service_mesh: enabled: optional @@ -286,8 +314,11 @@ on_prem: enabled: optional flow_config: optional fw_update: optional - remove_kubespray_host_dns_settings: on - enable_dhclient_systemd_service: on + sigstore_policy_controller: optional + intel_oneapi: + base: optional + ai: optional + cadvisor: on regional_dc: name: regional_dc @@ -304,11 +335,24 @@ regional_dc: native_cpu_manager: on gpu: optional gpu_dp: optional + sgx: on + sgx_dp: on + kmra: + sbx: optional + oran: optional + pccs: on + apphsm: on + ctk_demo: on + tcs: on + tac: on tas: on - gas: on + gas: optional dpdk: on - cstate: optional - ufs: optional + power: + manager: optional + pstate: optional + cstate: optional + uncore_frequency: optional telemetry: prometheus: on collectd: optional @@ -321,7 +365,7 @@ regional_dc: enabled: on tcpip_bypass_ebpf: on tls_splicing: on - sgx_signer: optional + sgx_signer: on intel_preview: optional linkerd_service_mesh: enabled: optional @@ -335,8 +379,11 @@ regional_dc: enabled: optional flow_config: optional fw_update: optional - remove_kubespray_host_dns_settings: on - enable_dhclient_systemd_service: on + sigstore_policy_controller: optional + intel_oneapi: + base: optional + ai: optional + cadvisor: on remote_fp: name: remote_fp @@ -354,22 +401,26 @@ remote_fp: sgx: on sgx_dp: on kmra: + sbx: optional + oran: optional pccs: optional apphsm: optional ctk_demo: optional tcs: optional tac: optional qat: on - qat_dp: optional + qat_dp: on openssl: on tas: on ddp: on bond_cni: optional network_userspace: optional dpdk: on - pstate: on - cstate: optional - ufs: optional + power: + manager: optional + pstate: optional + cstate: optional + uncore_frequency: optional sst: optional telemetry: prometheus: on @@ -398,8 +449,11 @@ remote_fp: flow_config: optional ddp: optional fw_update: optional - remove_kubespray_host_dns_settings: on - enable_dhclient_systemd_service: on + sigstore_policy_controller: optional + intel_oneapi: + base: optional + ai: optional + cadvisor: optional build_your_own: name: build_your_own @@ -423,6 +477,8 @@ build_your_own: sgx: optional sgx_dp: optional kmra: + sbx: optional + oran: optional pccs: optional apphsm: optional ctk_demo: optional @@ -434,9 +490,11 @@ build_your_own: network_userspace: optional dpdk: optional ovs_dpdk: optional - pstate: optional - cstate: optional - ufs: optional + power: + manager: optional + pstate: optional + cstate: optional + uncore_frequency: optional sst: optional telemetry: prometheus: optional @@ -465,5 +523,8 @@ build_your_own: flow_config: optional ddp: optional fw_update: optional - remove_kubespray_host_dns_settings: optional - enable_dhclient_systemd_service: optional + sigstore_policy_controller: optional + intel_oneapi: + base: optional + ai: optional + cadvisor: optional diff --git a/generate/profiles_templates/vm/vms_profiles.yml b/generate/profiles_templates/vm/vms_profiles.yml index 7da027a3..8c2739d51 100644 --- a/generate/profiles_templates/vm/vms_profiles.yml +++ b/generate/profiles_templates/vm/vms_profiles.yml @@ -21,6 +21,8 @@ # - sgx # - sgx_dp # - kmra: +# sbx +# oran # pccs # apphsm # ctk_demo @@ -36,10 +38,12 @@ # - network_userspace # - dpdk # - ovs_dpdk -# - pstate -# - cstate -# - ufs - uncore frequency scaling # - sst +# - power: +# manager +# pstate +# cstate +# uncore_frequency # - telemetry: # prometheus # collectd @@ -66,12 +70,14 @@ # flow_config # ddp # fw_update -# - remove_kubespray_host_dns_settings -# - enable_dhclient_systemd_service +# - sigstore_policy_controller +# - intel_oneapi +# base +# ai +# - cadvisor # sriov_operator is permanently disabled in VM mode # sriov_network_dp and dpdk are enabled for all VM mode profiles except build_your_own -# kmra is temporary disabled in VM mode - needs to be tested # sst is temporary disabled in VM mode # gpu and gpu_dp are temporary disabled in VM mode # flow_config of intel_ethernet_operator is permanently disabled in VM mode since it depends on sriov_network_operator @@ -100,9 +106,11 @@ access: qat_dp: on openssl: on dpdk: on - pstate: optional - cstate: optional - ufs: optional + power: + manager: off + pstate: off + cstate: off + uncore_frequency: off telemetry: prometheus: on collectd: optional @@ -129,8 +137,11 @@ access: enabled: optional flow_config: optional fw_update: optional - remove_kubespray_host_dns_settings: on - enable_dhclient_systemd_service: on + sigstore_policy_controller: on + intel_oneapi: + base: optional + ai: optional + cadvisor: on basic: name: basic @@ -145,8 +156,11 @@ basic: sriov_network_dp: on nic_drivers: on dpdk: on - cstate: optional - ufs: optional + power: + manager: optional + pstate: optional + cstate: optional + uncore_frequency: optional telemetry: prometheus: on collectd: optional @@ -165,8 +179,11 @@ basic: enabled: optional flow_config: optional fw_update: optional - remove_kubespray_host_dns_settings: on - enable_dhclient_systemd_service: on + sigstore_policy_controller: optional + intel_oneapi: + base: optional + ai: optional + cadvisor: on full_nfv: name: full_nfv @@ -190,21 +207,25 @@ full_nfv: sgx: on sgx_dp: on kmra: - pccs: optional - apphsm: optional - ctk_demo: optional - tcs: optional - tac: optional + sbx: optional + oran: optional + pccs: on + apphsm: on + ctk_demo: on + tcs: on + tac: on tas: on gas: optional ddp: optional network_userspace: on dpdk: on ovs_dpdk: on - pstate: optional - cstate: optional - ufs: optional sst: optional + power: + manager: optional + pstate: optional + cstate: optional + uncore_frequency: optional telemetry: prometheus: on collectd: optional @@ -217,7 +238,7 @@ full_nfv: enabled: on tcpip_bypass_ebpf: on tls_splicing: on - sgx_signer: optional + sgx_signer: on intel_preview: optional linkerd_service_mesh: enabled: optional @@ -232,8 +253,11 @@ full_nfv: flow_config: optional ddp: optional fw_update: optional - remove_kubespray_host_dns_settings: on - enable_dhclient_systemd_service: on + sigstore_policy_controller: optional + intel_oneapi: + base: optional + ai: optional + cadvisor: on on_prem: name: on_prem @@ -251,21 +275,25 @@ on_prem: sgx: on sgx_dp: on kmra: - pccs: optional - apphsm: optional - ctk_demo: optional - tcs: optional - tac: optional + sbx: optional + oran: optional + pccs: on + apphsm: on + ctk_demo: on + tcs: on + tac: on qat: on qat_dp: on openssl: on tas: on dpdk: on bond_cni: optional - pstate: optional - cstate: optional - ufs: optional sst: optional + power: + manager: optional + pstate: optional + cstate: optional + uncore_frequency: optional telemetry: prometheus: on collectd: optional @@ -278,7 +306,7 @@ on_prem: enabled: on tcpip_bypass_ebpf: on tls_splicing: on - sgx_signer: optional + sgx_signer: on intel_preview: optional linkerd_service_mesh: enabled: optional @@ -292,8 +320,11 @@ on_prem: enabled: optional flow_config: optional fw_update: optional - remove_kubespray_host_dns_settings: on - enable_dhclient_systemd_service: on + sigstore_policy_controller: optional + intel_oneapi: + base: optional + ai: optional + cadvisor: on regional_dc: name: regional_dc @@ -310,11 +341,24 @@ regional_dc: native_cpu_manager: on gpu: optional gpu_dp: optional + sgx: on + sgx_dp: on + kmra: + sbx: optional + oran: optional + pccs: on + apphsm: on + ctk_demo: on + tcs: on + tac: on tas: on - gas: on + gas: optional dpdk: on - cstate: optional - ufs: optional + power: + manager: optional + pstate: optional + cstate: optional + uncore_frequency: optional telemetry: prometheus: on collectd: optional @@ -327,7 +371,7 @@ regional_dc: enabled: on tcpip_bypass_ebpf: on tls_splicing: on - sgx_signer: optional + sgx_signer: on intel_preview: optional linkerd_service_mesh: enabled: optional @@ -341,8 +385,11 @@ regional_dc: enabled: optional flow_config: optional fw_update: optional - remove_kubespray_host_dns_settings: on - enable_dhclient_systemd_service: on + sigstore_policy_controller: optional + intel_oneapi: + base: optional + ai: optional + cadvisor: on remote_fp: name: remote_fp @@ -360,23 +407,27 @@ remote_fp: sgx: on sgx_dp: on kmra: + sbx: optional + oran: optional pccs: optional apphsm: optional ctk_demo: optional tcs: optional tac: optional qat: on - qat_dp: optional + qat_dp: on openssl: on tas: on ddp: optional bond_cni: optional network_userspace: optional dpdk: on - pstate: on - cstate: optional - ufs: optional sst: optional + power: + manager: optional + pstate: optional + cstate: optional + uncore_frequency: optional telemetry: prometheus: on collectd: on @@ -404,8 +455,11 @@ remote_fp: flow_config: optional ddp: optional fw_update: optional - remove_kubespray_host_dns_settings: on - enable_dhclient_systemd_service: on + sigstore_policy_controller: optional + intel_oneapi: + base: optional + ai: optional + cadvisor: optional build_your_own: name: build_your_own @@ -429,6 +483,8 @@ build_your_own: sgx: optional sgx_dp: optional kmra: + sbx: optional + oran: optional pccs: optional apphsm: optional ctk_demo: optional @@ -440,10 +496,12 @@ build_your_own: network_userspace: optional dpdk: optional ovs_dpdk: optional - pstate: optional - cstate: optional - ufs: optional sst: optional + power: + manager: optional + pstate: optional + cstate: optional + uncore_frequency: optional telemetry: prometheus: optional collectd: optional @@ -471,5 +529,8 @@ build_your_own: flow_config: optional ddp: optional fw_update: optional - remove_kubespray_host_dns_settings: optional - enable_dhclient_systemd_service: optional + sigstore_policy_controller: optional + intel_oneapi: + base: optional + ai: optional + cadvisor: optional diff --git a/generate/render.py b/generate/render.py index c1158a03..a7e59990 100644 --- a/generate/render.py +++ b/generate/render.py @@ -21,18 +21,21 @@ import argparse import importlib -from render.common.cli import parse_cli -from render.renderers.playbook import render_playbooks + +from render_util.common.cli import parse_cli +from render_util.renderers.playbook import render_playbooks + def main(): args = parse_cli() - # render profiles in given mode + # render_util profiles in given mode _render_mode(args) - # render playbooks + # render_util playbooks render_playbooks(args.profile) + def _render_mode(args: argparse.Namespace) -> None: # determine function name based on passed mode mode_to_render = "render_{}_profiles".format(args.mode) @@ -43,11 +46,12 @@ def _render_mode(args: argparse.Namespace) -> None: # try to import module and then # obtain and call required method try: - renderer_module = importlib.import_module("render.renderers.{}".format(renderer), package=None) + renderer_module = importlib.import_module("render_util.renderers.{}".format(renderer), package=None) method = getattr(renderer_module, mode_to_render) method(args) except (ImportError, NameError) as e: print("The method '{}' is not defined or cannot be imported... \nError: {}".format(method, e)) + if __name__ == "__main__": main() diff --git a/generate/render/__init__.py b/generate/render_util/__init__.py similarity index 100% rename from generate/render/__init__.py rename to generate/render_util/__init__.py diff --git a/generate/render/common/cli.py b/generate/render_util/common/cli.py similarity index 64% rename from generate/render/common/cli.py rename to generate/render_util/common/cli.py index ad31430c..f496d7bb 100644 --- a/generate/render/common/cli.py +++ b/generate/render_util/common/cli.py @@ -16,27 +16,31 @@ import argparse + def parse_cli() -> argparse.Namespace: """parse_cli creates CLI interface and returns parsed arguments""" parser = argparse.ArgumentParser() # common args parser.add_argument('--config', '-c', type=str, default="k8s_profiles.yml", - help='path to the profiles configuration file') + help='path to the profiles configuration file') parser.add_argument('--output', type=str, default="../examples/k8s", - help='directory where generated example files for k8s mode will be stored') + help='directory where generated example files for k8s mode will be stored') parser.add_argument('--inventory', type=str, default="k8s_inventory.j2", - help='inventory template filepath') + help='inventory template filepath') parser.add_argument('--group', '-g', type=str, default="group_vars.j2", - help='group_vars template filepath') + help='group_vars template filepath') parser.add_argument('--host', type=str, default="host_vars.j2", - help='host_vars template filepath') + help='host_vars template filepath') parser.add_argument('--profile', '-p', type=str, default='', - choices={'all_examples', 'access', 'basic', 'full_nfv', 'on_prem', 'regional_dc', 'remote_fp', 'build_your_own'}, # add new profiles here - help='''profile name which files, required in deployment, will be copied to the project root directory''') - parser.add_argument('--arch', '-a', type=str, default='icx', choices={"icx", "clx", "skl", "spr"}) # please add acronyms for new architectures here + choices={'all_examples', 'access', 'basic', 'full_nfv', 'on_prem', + 'on_prem_vss', 'on_prem_sw_defined_factory', 'regional_dc', 'remote_fp', 'build_your_own'}, + # add new profiles here + help='''profile name which files, required in deployment, will be copied to the project root directory''') + parser.add_argument('--arch', '-a', type=str, default='icx', choices={"atom", "core", "skl", "clx", "icx", "spr", "emr"}) # please add arch acronyms here parser.add_argument('--nic', '-n', type=str, default='cvl', choices={"cvl", "fvl"}) # please add new NICs here - parser.add_argument('--mode', type=str, default='k8s', choices={"k8s", "vm", "cloud"}, help='generate configuration files for selected mode') # please add new modes' name here + parser.add_argument('--mode', type=str, default='k8s', choices={"k8s", "vm", "cloud"}, + help='generate configuration files for selected mode') # please add new modes' name here parser.add_argument('--mirrors', '-m', type=str, default="false", choices=["true", "false"], help='include parameters for setting mirror links') # vm mode specific args diff --git a/generate/render/common/common.py b/generate/render_util/common/common.py similarity index 92% rename from generate/render/common/common.py rename to generate/render_util/common/common.py index 13e9971a..00a4be0e 100644 --- a/generate/render/common/common.py +++ b/generate/render_util/common/common.py @@ -26,6 +26,7 @@ from ruamel.yaml import YAML from jinja2 import Environment, FileSystemLoader + def load_config(path: str) -> dict: """Loads YAML file and returns it as configuration dict.""" with open(path) as config_file: @@ -33,38 +34,49 @@ def load_config(path: str) -> dict: profiles = yaml.load(config_file) return profiles + def create_dir_idempotent(path: str) -> None: """Creates directory if not present.""" if not os.path.exists(path): os.makedirs(path) + def render(template_path: str, jinja_vars: dict, target_path: str) -> None: """Renders Jinja template and writes it to file.""" file_loader = FileSystemLoader('.') - template = Environment(keep_trailing_newline=True, loader=file_loader, autoescape=True).get_template(template_path) + template = Environment(keep_trailing_newline=True, trim_blocks=True, loader=file_loader, autoescape=True).get_template(template_path) out = template.render(jinja_vars) with open(target_path, "w+") as f: f.write(out) + def add_arch_parameter(profiles: dict, args: argparse.Namespace) -> None: """Add architecture information to profiles config""" for p in profiles.values(): p['arch'] = args.arch + def add_nic_parameter(profiles: dict, args: argparse.Namespace) -> None: """Add NIC information to profiles config""" for p in profiles.values(): p['nic'] = args.nic + def add_mirrors_parameter(profiles: dict, args: argparse.Namespace) -> None: """Add mirrors information to profiles config""" for p in profiles.values(): print(args.mirrors) p['mirrors'] = args.mirrors -def create_backups(src: str, dirs: list=[], files: list=[]) -> None: + +def create_backups(src: str, dirs: list = None, files: list = None) -> None: """Create backup for given dirs/files""" # create specific backup dir + if dirs is None: + dirs = [] + if files is None: + files = [] + previous_profile_name = _get_previous_profile_name() if previous_profile_name: backup_dir_name = previous_profile_name + "_" + datetime.now().strftime('%Y%m%d_%H%M%S') @@ -77,22 +89,26 @@ def create_backups(src: str, dirs: list=[], files: list=[]) -> None: for f in files: _backup_files(src, path_to_backup_dir, f) + # Helper Functions def _move(path_to_dir: str, path_to_backup_dir: str) -> None: """Move directory from specific path to backup path""" move(path_to_dir, path_to_backup_dir) + def _backup_dirs(src: str, dst: str, name: str) -> None: path = os.path.join(src, name) if os.path.exists(path): _move(path, dst) + def _backup_files(src: str, dst: str, name: str) -> None: path = os.path.join(src, name) if os.path.exists(path): print(path) copy(path, dst) + def _get_previous_profile_name() -> str: group_vars_path = os.path.join('./', 'group_vars', 'all.yml') if os.path.exists(group_vars_path): diff --git a/generate/render/renderers/cloud_profiles.py b/generate/render_util/renderers/cloud_profiles.py similarity index 89% rename from generate/render/renderers/cloud_profiles.py rename to generate/render_util/renderers/cloud_profiles.py index 684d5a05..2eabef5e 100644 --- a/generate/render/renderers/cloud_profiles.py +++ b/generate/render_util/renderers/cloud_profiles.py @@ -20,12 +20,13 @@ import argparse import os -from render.common.common import create_dir_idempotent, render, load_config, add_arch_parameter, add_nic_parameter, add_mirrors_parameter, create_backups +from render_util.common.common import create_dir_idempotent, render, load_config, add_arch_parameter, add_nic_parameter, add_mirrors_parameter, create_backups + def render_cloud_profiles(args: argparse.Namespace) -> None: """Creates example CEK profiles in cloud mode""" # create backup for already generated profile's files - if 'all_examples' != args.profile: + if args.profile != 'all_examples': src = "./" # look for reference in project_root_dir create_backups(src, ['host_vars', 'group_vars']) @@ -41,9 +42,10 @@ def render_cloud_profiles(args: argparse.Namespace) -> None: # add mirrors information add_mirrors_parameter(cloud_profiles, args) - # create example diretory with all profiles and its files + # create example directory with all profiles and its files _create_cloud_examples(cloud_profiles, args) + # Helper functions def _create_example(config: dict, vars_path_prefix: str, args: argparse.Namespace) -> None: group_vars_dir_path = os.path.join(vars_path_prefix, "group_vars") @@ -54,10 +56,11 @@ def _create_example(config: dict, vars_path_prefix: str, args: argparse.Namespac render(args.group, config, os.path.join(group_vars_dir_path, "all.yml")) render(args.host, config, os.path.join(host_vars_dir_path, "node1.yml")) + def _create_cloud_examples(profiles: dict, args: argparse.Namespace) -> None: """Creates all sample files for profiles in cloud mode if provided profiles is 'all_examples', otherwise - only files for the specific profile will be generated into project root direcotory""" - if 'all_examples' == args.profile: + only files for the specific profile will be generated into project root directory""" + if args.profile == 'all_examples': for cloud_profile, cloud_config in profiles.items(): vars_path_prefix = os.path.join(args.output, cloud_profile) diff --git a/generate/render/renderers/k8s_profiles.py b/generate/render_util/renderers/k8s_profiles.py similarity index 89% rename from generate/render/renderers/k8s_profiles.py rename to generate/render_util/renderers/k8s_profiles.py index fe8cbc24..01a31900 100644 --- a/generate/render/renderers/k8s_profiles.py +++ b/generate/render_util/renderers/k8s_profiles.py @@ -20,14 +20,15 @@ import argparse import os -from render.common.common import create_dir_idempotent, render, load_config, add_arch_parameter, add_nic_parameter, add_mirrors_parameter, create_backups +from render_util.common.common import create_dir_idempotent, render, load_config, add_arch_parameter, add_nic_parameter, add_mirrors_parameter, create_backups + def render_k8s_profiles(args: argparse.Namespace) -> None: """Creates example CEK profiles in k8s mode""" # create backup for already generated profile's files - if 'all_examples' != args.profile: + if args.profile != 'all_examples': src = "./" # look for reference in project_root_dir - create_backups(src, ['host_vars', 'group_vars'], ['inventory.ini',]) + create_backups(src, ['host_vars', 'group_vars'], ['inventory.ini']) # load config from k8s_profiles.yml k8s_profiles = load_config(args.config) @@ -41,9 +42,10 @@ def render_k8s_profiles(args: argparse.Namespace) -> None: # add mirrors information add_mirrors_parameter(k8s_profiles, args) - # create example diretory with all profiles and its files + # create example directory with all profiles and its files _create_k8s_examples(k8s_profiles, args) + # Helper functions def _create_example(config: dict, vars_path_prefix: str, inventory_path: str, args: argparse.Namespace) -> None: group_vars_dir_path = os.path.join(vars_path_prefix, "group_vars") @@ -59,10 +61,11 @@ def _create_example(config: dict, vars_path_prefix: str, inventory_path: str, ar if not os.path.exists(inventory_path): render(args.inventory, config, inventory_path) + def _create_k8s_examples(profiles: dict, args: argparse.Namespace) -> None: """Creates all sample files for profiles in k8s mode if provided profiles is 'all_examples', otherwise - only files for the specific profile will be generated into project root direcotory""" - if 'all_examples' == args.profile: + only files for the specific profile will be generated into project root directory""" + if args.profile == 'all_examples': for k8s_profile, k8s_config in profiles.items(): vars_path_prefix = os.path.join(args.output, k8s_profile) inventory_path = vars_path_prefix diff --git a/generate/render/renderers/playbook.py b/generate/render_util/renderers/playbook.py similarity index 89% rename from generate/render/renderers/playbook.py rename to generate/render_util/renderers/playbook.py index d7386ad2..fc753987 100644 --- a/generate/render/renderers/playbook.py +++ b/generate/render_util/renderers/playbook.py @@ -19,11 +19,13 @@ """ import os -from render.common.common import render +from render_util.common.common import render -_available_playbooks = ['basic', 'full_nfv', 'access', 'remote_fp', 'regional_dc', 'on_prem', 'build_your_own'] +_available_playbooks = [ 'access', 'basic', 'full_nfv', 'on_prem', 'on_prem_vss', + 'on_prem_sw_defined_factory', 'remote_fp', 'regional_dc', 'build_your_own'] _playbook_dir = 'playbooks' + def render_playbooks(profile: str) -> None: """Renders playbooks for all CEK profiles""" # generate playbooks @@ -36,18 +38,21 @@ def render_playbooks(profile: str) -> None: if profile != 'all_examples': _print_command(profile) -def _create_playbook(template_name: str, playbook_file: str, jinja_vars: dict, playbook_subdir: str='') -> None: + +def _create_playbook(template_name: str, playbook_file: str, jinja_vars: dict, playbook_subdir: str = '') -> None: """Creates one playbook""" playbook_path = os.path.join(_playbook_dir, playbook_subdir, playbook_file) template_path = os.path.join("generate/playbook_templates", template_name) render(template_path, jinja_vars, playbook_path) + def _create_all_playbooks() -> None: """Creates all playbooks files""" for playbook_name in _available_playbooks: _create_playbooks_for_profile(playbook_name) + def _create_playbooks_for_profile(profile: str) -> None: """Creates playbooks only for specific profile""" playbook_file = profile + ".yml" @@ -56,6 +61,7 @@ def _create_playbooks_for_profile(profile: str) -> None: _create_playbook("infra_playbook.j2", playbook_file, jinja_vars, playbook_subdir="infra") _create_playbook("intel_playbook.j2", playbook_file, jinja_vars, playbook_subdir="intel") + def _print_command(profile: str) -> None: print("""To run your deployment configure host vars and group vars, then use the following command: diff --git a/generate/render/renderers/vm_profiles.py b/generate/render_util/renderers/vm_profiles.py similarity index 92% rename from generate/render/renderers/vm_profiles.py rename to generate/render_util/renderers/vm_profiles.py index eac5ea7a..89f9e672 100644 --- a/generate/render/renderers/vm_profiles.py +++ b/generate/render_util/renderers/vm_profiles.py @@ -20,14 +20,15 @@ import argparse import os -from render.common.common import create_dir_idempotent, render, load_config, add_arch_parameter, add_nic_parameter, add_mirrors_parameter, create_backups +from render_util.common.common import create_dir_idempotent, render, load_config, add_arch_parameter, add_nic_parameter, add_mirrors_parameter, create_backups + def render_vm_profiles(args: argparse.Namespace) -> None: """Creates example CEK profiles in VM mode""" # create backup for already generated profile's hv, gv and inventory - if 'all_examples' != args.profile: + if args.profile != 'all_examples': src = "./" # look for reference in project_root_dir - create_backups(src, ['host_vars', 'group_vars'], ['inventory.ini',]) + create_backups(src, ['host_vars', 'group_vars'], ['inventory.ini']) # load config from vm_profiles.yml vm_profiles = load_config(args.vmsconfig) @@ -41,7 +42,7 @@ def render_vm_profiles(args: argparse.Namespace) -> None: # add nic information add_nic_parameter(vm_profiles, args) - # create example diretory with all profiles and its files for VM configuration + # create example directory with all profiles and its files for VM configuration _create_vms_examples(vm_profiles, args) # load config for VMs' host @@ -56,9 +57,10 @@ def render_vm_profiles(args: argparse.Namespace) -> None: # add nic information add_nic_parameter(host_vm_profiles, args) - # create example diretory with all profiles and its files + # create example directory with all profiles and its files _create_host_vm_examples(host_vm_profiles, args) + # Helper Functions # creating files needed by the VMs def _create_vm_example(config: dict, vars_path_prefix: str, args: argparse.Namespace) -> None: @@ -72,10 +74,11 @@ def _create_vm_example(config: dict, vars_path_prefix: str, args: argparse.Names render(args.host, config, os.path.join(host_vars_dir_path, "vm-ctrl-1.cluster1.local.yml")) render(args.host, config, os.path.join(host_vars_dir_path, "vm-work-1.cluster1.local.yml")) + def _create_vms_examples(profiles: dict, args: argparse.Namespace) -> None: """Creates sample configuration files required by the VMs. If profile is marked as all_examples then all available examples will be created, otherwise only specific files will be generated into project root directory.""" - if 'all_examples' == args.profile: + if args.profile == 'all_examples': for vms_profile, vms_config in profiles.items(): vars_path_prefix = os.path.join(args.output, vms_profile) @@ -87,6 +90,7 @@ def _create_vms_examples(profiles: dict, args: argparse.Namespace) -> None: _create_vm_example(vms_config, vars_path_prefix, args) + # creating files needed by host on top of which VMs will be created def _create_host_example(config: dict, vars_path_prefix: str, inventory_path: str, args: argparse.Namespace) -> None: """Create one sample file required by host on top of which VMs will be created""" @@ -106,11 +110,12 @@ def _create_host_example(config: dict, vars_path_prefix: str, inventory_path: st if not os.path.exists(inventory_path): render(args.inventory, config, inventory_path) + def _create_host_vm_examples(profiles: dict, args: argparse.Namespace) -> None: """Creates sample configuration files required by host on which the VMs will be created. If profile is marked as all_examples then all available examples will be created, otherwise only specific files will be generated into project root directory.""" - if 'all_examples' == args.profile: + if args.profile == 'all_examples': for host_vm_profile, host_vm_config in profiles.items(): vars_path_prefix = os.path.join(args.output, host_vm_profile) inventory_path = vars_path_prefix # inventory file is supposed to be created only with host-related files diff --git a/library/check_nic_firmware.py b/library/check_nic_firmware.py index 92bed4cd..6243b850 100644 --- a/library/check_nic_firmware.py +++ b/library/check_nic_firmware.py @@ -13,10 +13,11 @@ # Severity: Low Confidence: High # More Info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_imports.html#b404-import-subprocess # pylint: disable=line-too-long # -> considered -import subprocess # nosec B404 +import subprocess # nosec B404 + from ansible.module_utils.basic import AnsibleModule -__metaclass__ = type # pylint: disable=invalid-name +__metaclass__ = type # pylint: disable=invalid-name DOCUMENTATION = r''' --- @@ -155,12 +156,12 @@ def run_module(): # Severity: High Confidence: High # More Info: https://bandit.readthedocs.io/en/latest/plugins/b602_subprocess_popen_with_shell_equals_true.html # pylint: disable=line-too-long # -> considered - nic_name_result = subprocess.run(cmd, shell=True, check=True, stdout=subprocess.PIPE) # nosec + nic_name_result = subprocess.run(cmd, shell=True, check=True, stdout=subprocess.PIPE) # nosec if not nic_name_result.stdout: - module.fail_json(msg="Name for the requested nic interface '" + module.params['pci_id'] + - "' not found. Update 'dataplane_interfaces' accordingly and run " + - "deployment again.", **result) + module.fail_json(msg=(f"Name for the requested nic interface '{module.params['pci_id']}" + "' not found. Update 'dataplane_interfaces' accordingly and run " + "deployment again."), **result) nic_name = str(nic_name_result.stdout.rstrip(), encoding) result['interface_name'] = nic_name @@ -174,30 +175,29 @@ def run_module(): # Severity: Low Confidence: High # More Info: https://bandit.readthedocs.io/en/latest/plugins/b603_subprocess_without_shell_equals_true.html # pylint: disable=line-too-long # -> considered - sub_result = subprocess.Popen(["ethtool", "-i", nic_name], stdout=subprocess.PIPE) # pylint: disable=consider-using-with # nosec - - if not sub_result.stdout.readline(): - module.fail_json(msg="Requested nic interface '" + module.params['pci_id'] + - "' with name '" + nic_name + "' not found. " + - "Update 'dataplane_interfaces' accordingly and run deployment again.", + with subprocess.Popen(["ethtool", "-i", nic_name], stdout=subprocess.PIPE) as sub_result: # nosec + if not sub_result.stdout.readline(): + module.fail_json(msg=(f"Requested nic interface '{module.params['pci_id']}" + f"' with name '{nic_name}' not found. " + "Update 'dataplane_interfaces' accordingly and run deployment again."), **result) - for line in sub_result.stdout: - if b'firmware-version' in line: - result['current_firmware_version'] = str(line.rstrip().split()[1], encoding) - if float(result['current_firmware_version']) < float(module.params['min_fw_version']): - if module.params['ddp']: - module.fail_json(msg="Current nic firmware version doesn't allow loading of " + - "DDP profile. Set 'update_nic_firmware' " + - "to 'true' and run deployment again.", **result) - else: - module.fail_json(msg="Current nic firmware version is lower than minimum " + - "version needed for automatic firmware update. " + - "Update nic firmware manually and run deployment again.", + for line in sub_result.stdout: + if b'firmware-version' in line: + result['current_firmware_version'] = str(line.rstrip().split()[1], encoding) + if float(result['current_firmware_version']) < float(module.params['min_fw_version']): + if module.params['ddp']: + module.fail_json(msg=("Current nic firmware version doesn't allow loading of " + "DDP profile. Set 'update_nic_firmware' " + "to 'true' and run deployment again."), **result) + else: + module.fail_json(msg=("Current nic firmware version is lower than minimum " + "version needed for automatic firmware update. " + "Update nic firmware manually and run deployment again."), **result) - else: - result['msg'] = "nic firmware version is sufficient to proceed" - module.exit_json(**result) + else: + result['msg'] = "nic firmware version is sufficient to proceed" + module.exit_json(**result) module.exit_json(**result) diff --git a/library/cpupin.py b/library/cpupin.py index ce119c86..d2a19510 100644 --- a/library/cpupin.py +++ b/library/cpupin.py @@ -1,6 +1,6 @@ #!/usr/bin/env python3 from __future__ import (absolute_import, division, print_function) -from logging import raiseExceptions + __metaclass__ = type DOCUMENTATION = r''' diff --git a/playbooks/dockerfiles.yml b/playbooks/dockerfiles.yml new file mode 100644 index 00000000..b8439b91 --- /dev/null +++ b/playbooks/dockerfiles.yml @@ -0,0 +1,30 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +# This playbook templates all j2 Dockerfile templates from roles to enable complete hadolint scan of repository +- hosts: localhost + vars: + dockerfiles_dir: "{{ (playbook_dir, '..', '.dockerfiles') | path_join }}" + tasks: + - name: Ensure dockerfiles directory exists + ansible.builtin.file: + path: "{{ dockerfiles_dir }}" + state: directory + mode: 0755 + - name: Template media analytics Dockerfile + ansible.builtin.include_role: + name: intel_media_analytics + tasks_from: template_dockerfile diff --git a/playbooks/dyna_config.yml b/playbooks/dyna_config.yml new file mode 100644 index 00000000..84f9a7aa --- /dev/null +++ b/playbooks/dyna_config.yml @@ -0,0 +1,46 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +# This playbook contains those tasks can be executed separatedly after +# RA installation, to dynamically adjust some system configurations. + +# Preflight check for config dpdk link operaton +- hosts: "{{node | default('kube_node')}}" + tasks: + - name: check config dpdk link nodes count + vars: + link_node_count: "{{ groups['kube_node'] | length }}" + + assert: + that: "{{ link_node_count }} == 2" + msg: "Config dpdk link is a 2 nodes operation, but current nodes count is {{ link_node_count }}" + tags: + - dyna_config_dpdk + when: + - dyna_config_dpdk_link | default(false) | bool + +# Execute config dpdk role tasks according to tag and conditions +- hosts: "{{node | default('kube_node')}}" + roles: + - role: configure_dpdk + dpdk_link_node1: "{{ groups.kube_node[0] }}" + dpdk_link_node2: "{{ groups.kube_node[1] }}" + tags: + - dyna_config_dpdk + when: + - dyna_config_dpdk_bind | default(false) | bool or + dyna_config_dpdk_link | default(false) | bool or + dyna_config_dpdk_unbind | default(false) | bool diff --git a/playbooks/infra/prepare_ipu.yml b/playbooks/infra/prepare_ipu.yml new file mode 100644 index 00000000..afb5b071 --- /dev/null +++ b/playbooks/infra/prepare_ipu.yml @@ -0,0 +1,47 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- hosts: ipu_host + roles: + - role: cluster_defaults + tags: always + - role: bootstrap/configure_proxy + tags: proxy + - role: ipu/common + - role: ipu/flash_ipu_ssd + when: + - not ipu_1gbe_connected_to_linkp + +- hosts: ipu_linkp + roles: + - role: cluster_defaults + tags: always + - role: bootstrap/configure_proxy + tags: proxy + - role: ipu/common + - role: ipu/prepare_ipu_linkp + - role: ipu/flash_ipu_ssd + when: + - ipu_1gbe_connected_to_linkp + - role: ipu/flash_ipu_nvm + +- hosts: ipu_imc + roles: + - role: ipu/imc + +- hosts: ipu_acc + roles: + - role: ipu/acc diff --git a/playbooks/infra/prepare_vms.yml b/playbooks/infra/prepare_vms.yml index 0b92eda2..9bf4d790 100644 --- a/playbooks/infra/prepare_vms.yml +++ b/playbooks/infra/prepare_vms.yml @@ -21,6 +21,8 @@ - ansible_distribution == "Ubuntu" and ansible_distribution_version == "22.04" - sgx_dp_enabled | default(false) - role: vm/conf_libvirt + environment: "{{ proxy_env | d({}) }}" + any_errors_fatal: true - hosts: vm_host gather_facts: false @@ -29,21 +31,24 @@ - role: vm/manage_imgs - role: vm/manage_bridges - role: vm/manage_vms - - role: vm/vm_sgx_enable - when: - - sgx_dp_enabled | default(false) - role: vm/prepare_cek + environment: "{{ proxy_env | d({}) }}" + any_errors_fatal: true - hosts: vm_host gather_facts: false serial: 1 roles: - role: vm/prepare_bastion_host_config + environment: "{{ proxy_env | d({}) }}" + any_errors_fatal: true - hosts: vm_host gather_facts: false roles: - vm/prepare_cek_vxlan + environment: "{{ proxy_env | d({}) }}" + any_errors_fatal: true - hosts: vm_host gather_facts: false @@ -51,6 +56,8 @@ roles: - role: vm/prepare_bastion_host_config_vxlan - role: vm/prepare_vm_inventory + environment: "{{ proxy_env | d({}) }}" + any_errors_fatal: true - hosts: k8s_cluster handlers: @@ -67,3 +74,5 @@ - reboot VMs changed_when: needs_reboot.rc == 1 when: vm_image_distribution == "rocky" + environment: "{{ proxy_env | d({}) }}" + any_errors_fatal: true diff --git a/playbooks/infra/redeploy_cleanup.yml b/playbooks/infra/redeploy_cleanup.yml old mode 100755 new mode 100644 index d64d14b8..528ef892 --- a/playbooks/infra/redeploy_cleanup.yml +++ b/playbooks/infra/redeploy_cleanup.yml @@ -68,7 +68,7 @@ - hosts: "{{ node | default('k8s_cluster') }}" roles: - role: redeploy_cleanup - tag: cleanup + tags: cleanup handlers: - name: reboot server reboot: {reboot_timeout: 1200} diff --git a/playbooks/infra/roles b/playbooks/infra/roles deleted file mode 120000 index 148b1320..00000000 --- a/playbooks/infra/roles +++ /dev/null @@ -1 +0,0 @@ -../../roles/ \ No newline at end of file diff --git a/playbooks/intel/.gitkeep b/playbooks/intel/.gitkeep new file mode 100644 index 00000000..e69de29b diff --git a/playbooks/intel/roles b/playbooks/intel/roles deleted file mode 120000 index 148b1320..00000000 --- a/playbooks/intel/roles +++ /dev/null @@ -1 +0,0 @@ -../../roles/ \ No newline at end of file diff --git a/playbooks/k8s/k8s.yml b/playbooks/k8s/k8s.yml index 0da3a7b5..b1d48a73 100644 --- a/playbooks/k8s/k8s.yml +++ b/playbooks/k8s/k8s.yml @@ -59,7 +59,6 @@ when: - kube_network_plugin == "calico" - calico_network_backend == "vxlan" - - not calico_advanced_options - name: prepare calico CNI facts for bird backend set_fact: calico_ipip_mode: 'Always' @@ -71,7 +70,6 @@ when: - kube_network_plugin == "calico" - calico_network_backend == "bird" - - not calico_advanced_options - name: prepare ADQ facts set_fact: calico_ipv4pool_ipip: "Never" @@ -111,22 +109,10 @@ skip_downloads: false etcd_deployment_type: host when: container_runtime == "crio" - # this is workaround for crio as 1.25 currently doesn't work with SGX - # https://github.com/opencontainers/runtime-tools/issues/759 - - name: change cri-o version when sgx is enabled - set_fact: - crio_version: v1.24.3 - when: sgx_dp_enabled | default(false) and container_runtime == "crio" - - # Workaround until kubespray will fix it - # Cillium operator has set 2 replicas by default which is not good for single-node deployment - - name: Set cillium replicas to 1 in single-node deployment - ansible.builtin.set_fact: - cilium_operator_replicas: 1 - when: groups.k8s_cluster | length == 1 # Run kubespray to deploy or scale cluster -- import_playbook: "{% if scale | default(false) | bool %}kubespray/scale.yml{% else %}kubespray/cluster.yml{% endif %}" +- name: Deploy cluster via Kubespray + ansible.builtin.import_playbook: "{% if scale | default(false) | bool %}kubernetes_sigs.kubespray.scale{% else %}kubernetes_sigs.kubespray.cluster{% endif %}" vars: any_errors_fatal: true kubeadm_enabled: true @@ -137,9 +123,10 @@ nginx_image_tag: 1.23.3-alpine calico_node_livenessprobe_timeout: 15 calico_node_readinessprobe_timeout: 15 + kube_proxy_mode: iptables enable_network_policy: true override_system_hostname: false - kube_proxy_mode: iptables + cilium_ipam_mode: kubernetes enable_nodelocaldns: false system_reserved: true dashboard_enabled: "{{ kube_dashboard_enabled | default(true) }}" @@ -162,8 +149,6 @@ kube_kubeadm_apiserver_extra_args: service-account-lookup: true service-account-key-file: "{{ kube_cert_dir }}/sa.key" - kube_kubeadm_scheduler_extra_args: - profiling: false kube_kubeadm_controller_extra_args: service-account-private-key-file: "{{ kube_cert_dir }}/sa.key" kubelet_config_extra_args: @@ -359,5 +344,5 @@ any_errors_fatal: true # Run certificate generation for mTLS in kubelet -- import_playbook: kubelet-certificates.yml +- ansible.builtin.import_playbook: kubelet-certificates.yml when: kubernetes | default(true) diff --git a/playbooks/k8s/kubespray b/playbooks/k8s/kubespray deleted file mode 160000 index d8197862..00000000 --- a/playbooks/k8s/kubespray +++ /dev/null @@ -1 +0,0 @@ -Subproject commit d81978625c7548eba10c4e4d455b3996235e6b91 diff --git a/playbooks/k8s/rke2.yml b/playbooks/k8s/rke2.yml new file mode 100644 index 00000000..6f0891f0 --- /dev/null +++ b/playbooks/k8s/rke2.yml @@ -0,0 +1,225 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +# Initialize rke2 on server +- hosts: k8s_cluster + roles: + - role: cluster_defaults + - role: rke2_target_setup + +- hosts: kube_control_plane + vars: + kube_pod_security_use_default: true + tasks: + - name: prepare additional rke2 facts + ansible.builtin.set_fact: + rke2_root_dir: "{{ (project_root_dir, 'rke2') | path_join }}" + rke2_conf_dir: "/etc/rancher/rke2" + rke2_manifests_dir: "/var/lib/rancher/rke2/server/manifests" + + - name: Create rke2 workdir + ansible.builtin.file: + path: "{{ rke2_root_dir }}" + state: directory + mode: 0755 + + - name: Set rke2 installer and configuration paths + ansible.builtin.set_fact: + rke2_installer: "{{ (rke2_root_dir, 'rke2_install.sh') | path_join }}" + rke2_audit_policy_file: "{{ (rke2_conf_dir, 'audit-policy.yaml') | path_join }}" + rke2_config_file: "{{ (rke2_conf_dir, 'config.yaml') | path_join }}" + rke2_calico_config_file: "{{ (rke2_manifests_dir, 'calico-custom-config.yaml') | path_join }}" + rke2_admission_config_file: "{{ (rke2_conf_dir, 'rke2-admission-config.yaml') | path_join }}" + + - name: Download rke2 install file + ansible.builtin.get_url: + url: https://get.rke2.io + dest: "{{ rke2_installer }}" + mode: 0755 + register: rke2_downloaded + retries: "{{ number_of_retries | default(3) }}" + until: rke2_downloaded is success + delay: "{{ retry_delay | default(3) }}" + + - name: Install rke2 + ansible.builtin.command: "{{ rke2_installer }}" + register: rke2_installed + retries: 3 + environment: + INSTALL_RKE2_VERSION: "{{ rke2_version }}" + until: rke2_installed is success + changed_when: true + + - name: set /usr/local/lib/systemd/system/rke2-server.env proxy settings + ansible.builtin.lineinfile: + path: /usr/local/lib/systemd/system/rke2-server.env + state: present + regexp: '^{{ item.key }}' + line: '{{ item.key }}={{ item.value }}' + create: yes + owner: root + group: root + mode: 0644 + with_dict: "{{ proxy_env }}" + when: '"http_proxy" in proxy_env or "https_proxy" in proxy_env' + + - name: Create directory for rke2 config files + ansible.builtin.file: + path: "{{ rke2_conf_dir }}" + state: directory + mode: 0750 + + - name: Populate rke2 Admission configuration + ansible.builtin.template: + src: rke2-admission-config.yaml.j2 + dest: "{{ rke2_admission_config_file }}" + mode: 0644 + + - name: Set up rke2 audit policy + ansible.builtin.template: + src: rke2-audit-policy.yaml.j2 + dest: "{{ rke2_audit_policy_file }}" + mode: 0644 + + - name: Set up rke2 cluster configuration + ansible.builtin.template: + src: rke2_config.yaml.j2 + dest: "{{ rke2_config_file }}" + mode: 0644 + + - name: Start rke2 server + ansible.builtin.systemd: + name: rke2-server.service + state: started + enabled: true + + - name: Create directory for kube config files + ansible.builtin.file: + path: "{{ ansible_env.HOME }}/.kube/" + state: directory + mode: 0750 + + - name: Set up kube config file + ansible.builtin.copy: + src: /etc/rancher/rke2/rke2.yaml + dest: "{{ ansible_env.HOME }}/.kube/config" + remote_src: yes + mode: 0640 + + - name: Copy rancher binaries to /usr/local/bin/ for cluster access + ansible.builtin.copy: + src: /var/lib/rancher/rke2/bin/ + dest: /usr/local/bin/ + remote_src: yes + force: yes + owner: root + group: root + mode: 0755 + + - name: Link the crictl client config to ease containerd access + block: + - name: remove existing config + ansible.builtin.file: path="/etc/crictl.yaml" state=absent + - name: link the crictl client config to the default path + ansible.builtin.file: + src: /var/lib/rancher/rke2/agent/etc/crictl.yaml + path: /etc/crictl.yaml + state: link + + - name: Link Kubernetes CA to align with kubespray's certificate path + block: + - name: remove existing certs and keys + ansible.builtin.file: path="/etc/kubernetes/ssl/" state=absent + - name: ensure that path exists + ansible.builtin.file: + path: "/etc/kubernetes/ssl/" + mode: 0755 + owner: root + group: root + state: directory + - name: link Kubernetes CA cert in the /etc/kubernetes/ssl/ + ansible.builtin.file: + src: /var/lib/rancher/rke2/server/tls/server-ca.crt + path: /etc/kubernetes/ssl/ca.crt + state: link + - name: link Kubernetes CA key in the /etc/kubernetes/ssl/ + ansible.builtin.file: + src: /var/lib/rancher/rke2/server/tls/server-ca.key + path: /etc/kubernetes/ssl/ca.key + state: link + + - name: check for all pods + ansible.builtin.include_role: + name: wait_for_kubernetes_ready + + - name: Enable custom settings for calico + block: + - name: populate calico custom config yaml + ansible.builtin.template: + src: rke2_calico_config.yaml.j2 + dest: "{{ rke2_calico_config_file }}" + mode: 0644 + - name: apply calico custom config + kubernetes.core.k8s: + state: present + src: "{{ rke2_calico_config_file }}" + wait_sleep: 30 + when: kube_network_plugin == "calico" + + - name: Enable dashboard # noqa role-name[path] + ansible.builtin.include_role: + name: rke2_kubernetes_apps/dashboard + when: kube_dashboard_enabled | default(true) + + - name: Install helm # noqa role-name[path] + ansible.builtin.include_role: + name: rke2_kubernetes_apps/helm + + - name: Install cert-manager # noqa role-name[path] + ansible.builtin.include_role: + name: rke2_kubernetes_apps/cert_manager_install + when: cert_manager_enabled | default(false) + + - name: Install podman # noqa role-name[path] + ansible.builtin.include_role: + name: container_engine/podman + + - name: registries.conf configuration on containerd + block: + - name: check if registries.conf exists + ansible.builtin.stat: + path: /etc/containers/registries.conf + register: registries_conf + + - name: add "unqualified-search-registries" to registries.conf + ansible.builtin.lineinfile: + path: /etc/containers/registries.conf + regexp: '^# unqualified-search-registries' + line: unqualified-search-registries = ["docker.io"] + when: registries_conf.stat.exists + when: container_runtime == "containerd" + + - name: Install container registry + ansible.builtin.include_role: + name: container_registry + when: registry_enable | default(false) + + - name: Install Rancher # noqa role-name[path] + ansible.builtin.include_role: + name: rke2_kubernetes_apps/rancher + when: rancher_manager_enabled | default(false) + environment: "{{ proxy_env | d({}) }}" + any_errors_fatal: true diff --git a/playbooks/k8s/roles b/playbooks/k8s/roles deleted file mode 120000 index 148b1320..00000000 --- a/playbooks/k8s/roles +++ /dev/null @@ -1 +0,0 @@ -../../roles/ \ No newline at end of file diff --git a/playbooks/k8s/templates/rke2-admission-config.yaml.j2 b/playbooks/k8s/templates/rke2-admission-config.yaml.j2 new file mode 100644 index 00000000..b06f1184 --- /dev/null +++ b/playbooks/k8s/templates/rke2-admission-config.yaml.j2 @@ -0,0 +1,88 @@ +--- +apiVersion: apiserver.config.k8s.io/v1 +kind: AdmissionConfiguration +plugins: + - name: EventRateLimit + configuration: + apiVersion: eventratelimit.admission.k8s.io/v1alpha1 + kind: Configuration + limits: + - type: Server + qps: 10 + burst: 50 + - type: Namespace + qps: 50 + burst: 100 + - type: User + qps: 10 + burst: 50 + - type: SourceAndObject + qps: 10 + burst: 50 + - name: PodSecurity + configuration: +{% if kube_pod_security_use_default is defined and kube_pod_security_use_default %} + apiVersion: pod-security.admission.config.k8s.io/v1beta1 + kind: PodSecurityConfiguration + defaults: + enforce: "privileged" + enforce-version: "latest" + audit: "privileged" + audit-version: "latest" + warn: "privileged" + warn-version: "latest" + exemptions: + usernames: [] + runtimeClasses: [] + namespaces: [] +{% else %} + apiVersion: pod-security.admission.config.k8s.io/v1 + kind: PodSecurityConfiguration + defaults: + enforce: "restricted" + enforce-version: "latest" + audit: "restricted" + audit-version: "latest" + warn: "restricted" + warn-version: "latest" + exemptions: + usernames: [] + runtimeClasses: [] + namespaces: [calico-apiserver, + calico-system, + cattle-alerting, + cattle-csp-adapter-system, + cattle-elemental-system, + cattle-epinio-system, + cattle-externalip-system, + cattle-fleet-local-system, + cattle-fleet-system, + cattle-gatekeeper-system, + cattle-global-data, + cattle-global-nt, + cattle-impersonation-system, + cattle-istio, + cattle-istio-system, + cattle-logging, + cattle-logging-system, + cattle-monitoring-system, + cattle-neuvector-system, + cattle-prometheus, + cattle-resources-system, + cattle-sriov-system, + cattle-system, + cattle-ui-plugin-system, + cattle-windows-gmsa-system, + cert-manager, + cis-operator-system, + fleet-default, + ingress-nginx, + istio-system, + kube-node-lease, + kube-public, + kube-system, + longhorn-system, + rancher-alerting-drivers, + security-scan, + tigera-operator] +{% endif %} diff --git a/playbooks/k8s/templates/rke2-audit-policy.yaml.j2 b/playbooks/k8s/templates/rke2-audit-policy.yaml.j2 new file mode 100644 index 00000000..ca7bcf80 --- /dev/null +++ b/playbooks/k8s/templates/rke2-audit-policy.yaml.j2 @@ -0,0 +1,129 @@ +apiVersion: audit.k8s.io/v1 +kind: Policy +rules: +{% if audit_policy_custom_rules is defined and audit_policy_custom_rules != "" %} +{{ audit_policy_custom_rules | indent(2, true) }} +{% else %} + # The following requests were manually identified as high-volume and low-risk, + # so drop them. + - level: None + users: ["system:kube-proxy"] + verbs: ["watch"] + resources: + - group: "" # core + resources: ["endpoints", "services", "services/status"] + - level: None + # Ingress controller reads `configmaps/ingress-uid` through the unsecured port. + # TODO(#46983): Change this to the ingress controller service account. + users: ["system:unsecured"] + namespaces: ["kube-system"] + verbs: ["get"] + resources: + - group: "" # core + resources: ["configmaps"] + - level: None + users: ["kubelet"] # legacy kubelet identity + verbs: ["get"] + resources: + - group: "" # core + resources: ["nodes", "nodes/status"] + - level: None + userGroups: ["system:nodes"] + verbs: ["get"] + resources: + - group: "" # core + resources: ["nodes", "nodes/status"] + - level: None + users: + - system:kube-controller-manager + - system:kube-scheduler + - system:serviceaccount:kube-system:endpoint-controller + verbs: ["get", "update"] + namespaces: ["kube-system"] + resources: + - group: "" # core + resources: ["endpoints"] + - level: None + users: ["system:apiserver"] + verbs: ["get"] + resources: + - group: "" # core + resources: ["namespaces", "namespaces/status", "namespaces/finalize"] + # Don't log HPA fetching metrics. + - level: None + users: + - system:kube-controller-manager + verbs: ["get", "list"] + resources: + - group: "metrics.k8s.io" + # Don't log these read-only URLs. + - level: None + nonResourceURLs: + - /healthz* + - /version + - /swagger* + # Don't log events requests. + - level: None + resources: + - group: "" # core + resources: ["events"] + # Secrets, ConfigMaps, TokenRequest and TokenReviews can contain sensitive & binary data, + # so only log at the Metadata level. + - level: Metadata + resources: + - group: "" # core + resources: ["secrets", "configmaps", "serviceaccounts/token"] + - group: authentication.k8s.io + resources: ["tokenreviews"] + omitStages: + - "RequestReceived" + # Get responses can be large; skip them. + - level: Request + verbs: ["get", "list", "watch"] + resources: + - group: "" # core + - group: "admissionregistration.k8s.io" + - group: "apiextensions.k8s.io" + - group: "apiregistration.k8s.io" + - group: "apps" + - group: "authentication.k8s.io" + - group: "authorization.k8s.io" + - group: "autoscaling" + - group: "batch" + - group: "certificates.k8s.io" + - group: "extensions" + - group: "metrics.k8s.io" + - group: "networking.k8s.io" + - group: "policy" + - group: "rbac.authorization.k8s.io" + - group: "settings.k8s.io" + - group: "storage.k8s.io" + omitStages: + - "RequestReceived" + # Default level for known APIs + - level: RequestResponse + resources: + - group: "" # core + - group: "admissionregistration.k8s.io" + - group: "apiextensions.k8s.io" + - group: "apiregistration.k8s.io" + - group: "apps" + - group: "authentication.k8s.io" + - group: "authorization.k8s.io" + - group: "autoscaling" + - group: "batch" + - group: "certificates.k8s.io" + - group: "extensions" + - group: "metrics.k8s.io" + - group: "networking.k8s.io" + - group: "policy" + - group: "rbac.authorization.k8s.io" + - group: "settings.k8s.io" + - group: "storage.k8s.io" + omitStages: + - "RequestReceived" + # Default level for all other requests. + - level: Metadata + omitStages: + - "RequestReceived" +{% endif %} diff --git a/playbooks/k8s/templates/rke2_calico_config.yaml.j2 b/playbooks/k8s/templates/rke2_calico_config.yaml.j2 new file mode 100644 index 00000000..f9efae2f --- /dev/null +++ b/playbooks/k8s/templates/rke2_calico_config.yaml.j2 @@ -0,0 +1,23 @@ +--- +apiVersion: helm.cattle.io/v1 +kind: HelmChartConfig +metadata: + name: rke2-calico + namespace: kube-system +spec: + valuesContent: |- + installation: + imagePath: "calico" + imagePrefix: "" +{% if calico_bpf_enabled | default(false) %} + calicoNetwork: + linuxDataplane: BPF + hostPorts: null +{% endif %} + tigeraOperator: + image: tigera/operator + version: v1.29.5 + registry: quay.io + calicoctl: + image: docker.io/calico/ctl + tag: v3.25.1 diff --git a/playbooks/k8s/templates/rke2_config.yaml.j2 b/playbooks/k8s/templates/rke2_config.yaml.j2 new file mode 100644 index 00000000..bf02f67e --- /dev/null +++ b/playbooks/k8s/templates/rke2_config.yaml.j2 @@ -0,0 +1,52 @@ +--- +disable: rke2-ingress-nginx +profile: cis-1.23 +protect-kernel-defaults: true +audit-policy-file: {{ rke2_audit_policy_file }} + +kube-apiserver-arg: +- "enable-admission-plugins=NodeRestriction,EventRateLimit" +- "admission-control-config-file={{ rke2_admission_config_file }}" + +{% if kube_controller_manager_bind_address %} +kube-controller-manager-arg: +- "bind-address={{ kube_controller_manager_bind_address }}" +{% endif %} + +{% if kube_proxy_metrics_bind_address or kube_proxy_nodeport_addresses_cidr %} +kube-proxy-arg: +{% if kube_proxy_metrics_bind_address %} +- "metrics-bind-address={{ kube_proxy_metrics_bind_address }}" +{% endif %} +{% if kube_proxy_nodeport_addresses_cidr %} +- "nodeport-addresses={{ kube_proxy_nodeport_addresses_cidr }}" +{% endif %} +{% endif %} + +{% if kube_pods_subnet %} +cluster-cidr: {{ kube_pods_subnet }} +{% endif %} + +{% if kube_service_addresses %} +service-cidr: {{ kube_service_addresses }} +{% endif %} + +{% if kube_network_plugin %} +{% if kube_network_plugin_multus == true %} +cni: multus,{{ kube_network_plugin }} +{% else %} +cni: {{ kube_network_plugin }} +{% endif %} +{% endif %} + +{% if native_cpu_manager_enabled == true or topology_manager_enabled == true %} +kubelet-arg: +{% if native_cpu_manager_enabled == true %} +- "cpu-manager-policy=static" +- "kube-reserved=cpu={{ native_cpu_manager_kube_reserved_cpus | default('1000m') }}" +- "system-reserved=cpu={{ native_cpu_manager_system_reserved_cpus | default('1000m') }}" +{% endif %} +{% if topology_manager_enabled == true %} +- "topology-manager-policy={{ topology_manager_policy }}" +{% endif %} +{% endif %} diff --git a/playbooks/preflight.yml b/playbooks/preflight.yml index 922b9461..a8bf8c77 100644 --- a/playbooks/preflight.yml +++ b/playbooks/preflight.yml @@ -46,7 +46,6 @@ # - Check VPP Dependencies (for 2M Hugepages) # - Check CNI Dependencies (for OVS DPDK or VPP and Hugepages) # - Check SST (not on RHEL 8.2 or old OSs) -# - Check Linux distro for cstates # - Warn BIOS VT-d (should be enabled) # - Warn BIOS Intel Virtualization Technology (should be enabled) # - Warn BIOS Hyper-Threading (should be enabled) @@ -91,6 +90,26 @@ that: (ansible_python_version is version_compare(cek_supported_python, '>=')) msg: "Python version must be at least {{ cek_supported_python }}. Please update" + - name: Check kubernetes provisioner + ansible.builtin.assert: + that: + - kube_provisioner in ['rke2', 'kubespray'] + fail_msg: "kube_provisioner supports only 'rke2' and 'kubespray' values, please correct the configuration in groups/all.yml" + success_msg: "kube_provisioner set to '{{ kube_provisioner }}'" + when: kubernetes + + - name: Check kubespray version + ansible.builtin.import_role: + name: kubespray_check + tasks_from: check_kubespray_version + when: kube_provisioner != 'rke2' + + - name: Check applied kubespray patch + ansible.builtin.import_role: + name: kubespray_patch + tasks_from: preflight_checksum + when: kube_provisioner != 'rke2' + - name: read Group Vars stat: path: "{{ inventory_dir }}/group_vars/" @@ -217,6 +236,21 @@ - scale is defined - vm_enabled | default(false) + - name: check k8s for rancher manager + block: + - name: check if kube_provisioner is rke2 + ansible.builtin.assert: + that: kube_provisioner == 'rke2' + fail_msg: "rancher manager is only supported on rke2 currently" + - name: regex k8s version + ansible.builtin.set_fact: + kube_version_number: "{{ rke2_version | regex_search('(?<=v)\\d+\\.\\d+') }}" + - name: assert kube_version_number + ansible.builtin.assert: + that: "{{ kube_version_number is version('1.25', '<=') }}" + fail_msg: "Maximum k8s version for rancher manager is v1.25, current version is v{{ kube_version_number }}, please update group_vars" + when: rancher_manager_enabled is defined and rancher_manager_enabled + ############################################## # Prerequisites for Control and Worker Nodes # ############################################## @@ -225,7 +259,7 @@ gather_facts: true vars: cek_supported_distros: [RedHat, Rocky, Ubuntu] - cek_supported_distros_versions: ['8.6', '9.0', '9.1', '22.04'] + cek_supported_distros_versions: ['8.6', '9.0', '9.1', '9.2', '22.04'] cpusets_ranges: [] cpusets_discretes: [] isolcpus_ranges: [] @@ -409,6 +443,12 @@ msg: "Minimum supported k8s version is 1.22, please update kube_version variable with correct version" when: kubernetes and not container_runtime_only_deployment + - name: check RKE2 requirements + ansible.builtin.include_role: + name: rke2_defaults + tasks_from: rke2_preflight + when: kube_provisioner == "rke2" + - name: assert that Multus is enabled in the config assert: that: @@ -595,15 +635,15 @@ tags: - cpu_ctlplane - - name: check Intel AI configuration + - name: check Intel Media Analytics configuration import_role: - name: intel_ai - tasks_from: preflight_intel_ai.yml + name: intel_media_analytics + tasks_from: preflight_intel_media_analytics.yml when: - kubernetes - - intel_ai_enabled | default(false) + - intel_media_analytics_enabled | default(false) tags: - - intel-ai + - intel-media-analytics #################################### # Prerequisites for Worker Node(s) # @@ -611,11 +651,9 @@ - hosts: kube_node,vm_host any_errors_fatal: true vars: - cstates_supported_distros: [Ubuntu] - cstates_supported_distros_versions: ['22.04'] phy_nics_pciids: [] vars_files: - - "roles/check_machine_type/vars/main.yml" + - "../roles/check_machine_type/vars/main.yml" tasks: - name: end play for VM host @@ -650,6 +688,16 @@ (sriov_cni_enabled is defined and sriov_cni_enabled) or (sriov_network_operator_enabled is defined and sriov_network_operator_enabled) + # FlexRAN needs 2 NIC PFs + - name: check DP Interfaces PF numbers for FlexRAN + assert: + that: + - dataplane_interfaces[0].bus_info is defined + - dataplane_interfaces[1].bus_info is defined + msg: "For FlexRAN, 2 Dataplane (DP) interface(s) on target '{{ ansible_hostname }}' are needed. Please correct the configuration" + when: + - intel_flexran_enabled | default(false) + - debug: msg: "Network interfaces present on target '{{ ansible_hostname }}' = {{ ansible_interfaces }}" @@ -668,17 +716,6 @@ with_items: "{{ dataplane_interfaces }}" when: dataplane_interfaces is defined and dataplane_interfaces != [] - - name: check invalid driver for CNDP on DP Interfaces - assert: - that: ("{{ item.pf_driver }}" in "['i40e', 'ice', 'iavf']") - msg: >- - "Dataplane interface defined with PCI ID '{{ item.bus_info }}' have unssupported pf_driver '{{ item.pf_driver }}' for CNDP. - Please correct the configuration. Supported pf_drivers are ['i40e', 'ice', 'iavf']" - with_items: "{{ dataplane_interfaces }}" - when: - - dataplane_interfaces is defined and dataplane_interfaces != [] - - cndp_dp_enabled | default(false) | bool - - name: load firmware specific variables include_vars: "../roles/bootstrap/update_nic_firmware/defaults/main.yml" when: nvmupdate is not defined @@ -771,6 +808,57 @@ Please be aware that by using CPU model that is not confirmed, some features may not work properly." when: (not vm_enabled) or (vm_enabled and (not on_vms | default(false))) + - name: check ubuntu pro token is not a placeholder + assert: + that: + - ubuntu_pro_token is defined + - ubuntu_pro_token != "ffffffffffffffffffffffffffffff" + fail_msg: + - "Please, visit https://ubuntu.com/pro to apply the token for rt kernel install." + - "And update ubuntu_pro_token placeholder inside group_vars with real token." + success_msg: "ubuntu_pro_token verified" + when: + - rt_kernel_enabled | default(false) + + - name: check EMR QAT drvier package + block: + - name: print debug message EMR QAT driver package + ansible.builtin.debug: + msg="Expecting file {{ (emr_qat_driver_staging_folder, emr_qat_driver_package) | path_join }} on local ansible host" + - name: probe for EMR QAT driver package + delegate_to: localhost + become: false + ansible.builtin.stat: + path: "{{ (emr_qat_driver_staging_folder, emr_qat_driver_package) | path_join }}" + checksum_algorithm: sha1 + register: emr_qat_driver + - name: print debug message for emr qat driver existence + ansible.builtin.debug: + msg="{{ emr_qat_driver_package }} exists is {{ emr_qat_driver.stat.exists }}" + - name: check emr qat driver files exists + ansible.builtin.assert: + that: "emr_qat_driver.stat.exists" + msg: + - Mandatory file {{ (emr_qat_driver_staging_folder, emr_qat_driver_package) | path_join }} does NOT exist on localhost. + - Please acquire the zip file and place it in the location indicated above in order to deploy EMR QAT. See docs/emr.md + - name: check the qat driver package integrity + ansible.builtin.assert: + that: "emr_qat_driver.stat.checksum == '{{ emr_qat_driver_pkg_checksum }}'" + msg: + - File {{ (emr_qat_driver_staging_folder, emr_qat_driver_package) | path_join }} on localhost is NOT the expected one. + - Please provide the correct file. See docs/emr.md + when: + - update_qat_drivers is defined and update_qat_drivers + - configured_arch == "emr" + + - name: check QAT SVM precheck + ansible.builtin.assert: + that: + - update_qat_drivers + fail_msg: "qat svm only works on the Out of tree driver, please set the update_qat_drivers to true in host_vars." + when: + - enable_qat_svm | default(false) | bool + - name: check QAT Devices list is configured properly block: - debug: @@ -858,26 +946,90 @@ success_msg: "GPU device plugin is enabled" when: configure_gpu is defined and configure_gpu + - name: FPGA environment preflight check + block: + - name: FPGA OS precheck + ansible.builtin.assert: + that: ((ansible_distribution == "Ubuntu") and (ansible_distribution_version == '22.04')) + msg: >- + Currently fpga is only supported on Ubuntu 22.04. + + - name: fpga dependencies check + ansible.builtin.assert: + that: + - hugepages_enabled + - iommu_enabled + msg: + - "fpga has dependency on hugepages and iommu, please enable them firstly in the host_vars yaml file." + + - name: probe fpga driver installation script + delegate_to: localhost + become: false + ansible.builtin.stat: + path: "{{ (fpga_driver_staging_folder, fpga_install_script) | path_join }}" + register: fpga_register + + - name: check fpga install scripts exists + ansible.builtin.assert: + that: "fpga_register.stat.exists" + msg: + - "Mandatory file {{ (fpga_driver_staging_folder, fpga_install_script) | path_join }} does NOT exist on localhost." + - "Please acquire the file from Intel Resource and Design Center and place it in the location indicated above in order to deploy fpga." + when: + - configure_fpga is defined and configure_fpga + - name: check OpenSSL and OpenSSL*Engine requirements when OOT or Intree QAT setup configured assert: that: - openssl_install - fail_msg: "OpenSSL & OpenSSL*Engine must be configured if 'update_qat_drivers' or 'enable_intel_qatlibs' are set to 'true'" + fail_msg: "OpenSSL & OpenSSL*Engine must be configured if configure_qat is set to 'true'" success_msg: "Assertion of OpenSSL & OpenSSL*Engine for QAT passed" when: - - update_qat_drivers | default(false) or enable_intel_qatlibs | default(false) + - configure_qat | default(false) | bool + + - name: check gas(gpu aware scheduling) configuration + assert: + that: + - gpu_dp_enabled + fail_msg: "gas installation requires gpu_dp_enabled set to true" + success_msg: "gas requirement verified" + when: + - gas_enabled | default(false) - name: check KMRA sgx_dp requirements assert: that: - sgx_dp_enabled fail_msg: "KMRA installation requires sgx_dp_enabled set to 'true'" - success_msg: "KMRA requirements verified" + success_msg: "KMRA sgx_dp requirements verified" when: - kmra.ctk_loadkey_demo.enabled | default(false) or kmra.pccs.enabled | default(false) or kmra.apphsm.enabled | default(false) + - name: check KMRA sbx requirements + include_role: + name: kmra_install + tasks_from: kmra_sbx_preflight + when: + - kmra.sbx | default(false) + + - name: make sure netopeer2 server/client is off when oran disabled + assert: + that: + - not (kmra.oran_netopeer2_server.enabled | default(false)) + - not (kmra.oran_netopeer2_client.enabled | default(false)) + fail_msg: "oran disabled, so netopeer2 server/client cannot set" + when: + - not kmra.oran.enabled | default(false) + + - name: check KMRA oran requirements + include_role: + name: kmra_install + tasks_from: kmra_oran_preflight + when: + - kmra.oran.enabled | default(false) + - name: check Intel SGX DP configuration assert: that: @@ -1097,28 +1249,6 @@ - sst_pp_configuration_enabled is defined and sst_pp_configuration_enabled - "'skylake' in verify_platform_for_sst_pp.stdout" - -# Cstates are supported only with kernel version >= 5.13 - - name: Check if distribution is supported by cstates - assert: - that: "ansible_distribution in cstates_supported_distros and ansible_distribution_version in cstates_supported_distros_versions" - msg: - - Linux distribution {{ ansible_distribution }} {{ ansible_distribution_version }} on target '{{ inventory_hostname }}' does NOT support cstates - - Must be one of {{ cstates_supported_distros }} and version {{ cstates_supported_distros_versions }} - when: cstate_enabled is defined and cstate_enabled - - -# Uncore frequency scaling is supported only with kernel version >= 5.13 - - name: Check if distribution is supported by uncore frequency scaling - assert: - that: "ansible_distribution in cstates_supported_distros and ansible_distribution_version in cstates_supported_distros_versions" - msg: - - Linux distribution {{ ansible_distribution }} {{ ansible_distribution_version }} on target '{{ inventory_hostname }}' - - does NOT support uncore frequency scaling - - Must be one of {{ cstates_supported_distros }} and version {{ cstates_supported_distros_versions }} - when: ufs_enabled is defined and ufs_enabled - - # STORY: Intel VT-d should be enabled in BIOS - name: check Intel VT-d on BMs block: @@ -1195,12 +1325,16 @@ - on_cloud is not defined or not on_cloud # STORY: CPU Hyper-Threading should be enabled in BIOS + - debug: msg="ansible_processor_threads_per_core={{ ansible_processor_threads_per_core }}" + - debug: msg="CPU={{ ansible_processor[2] }} cores={{ ansible_processor_cores }} count={{ ansible_processor_count }} nproc={{ ansible_processor_nproc }} tpc={{ ansible_processor_threads_per_core }} vcpus={{ ansible_processor_vcpus }}" # noqa yaml[line-length] + - name: warn about Hyper-Threading fail: - msg: "Warning: Intel Hyper-Threading Tech is DISABLED on target. Please check BIOS under 'Advanced > Processor Configuration' and Enable if necessary" + msg: "Warning: Intel Hyper-Threading Tech is DISABLED on target. Please check BIOS under 'Advanced > Processor Configuration' and Enable if necessary" when: - ansible_processor_threads_per_core != 2 - on_cloud is not defined or not on_cloud + - configured_arch not in ['atom', 'core'] # STORY: "collectd and telegraf are mutually exclusive" - name: fail if collectd and telegraf are both enabled @@ -1227,23 +1361,26 @@ - not container_runtime_only_deployment - istio_service_mesh is defined - istio_service_mesh.version is defined -# STORY: "TCS depends on KMRA AppHSM and KMRA PCCS" - - name: check if KMRA Apps are enabled when TCS is enabled + +# STORY: "TCS depends on sgx dp and cert manager" + - name: check if sgx dp and cert manager are enabled when TCS enabled assert: that: - - "{{ kmra.apphsm.enabled | default(false) }}" - - "{{ kmra.pccs.enabled | default(false) }}" - msg: "KMRA AppHSM and PCCS applications should be enabled in order to have TCS functional." + - "{{ sgx_dp_enabled | default(false) }}" + - "{{ cert_manager_enabled | default(false) }}" + msg: "sgx_dp and cert manager should be enabled in order to have TCS functional." when: - - tcs.enabled | default(false) or tac.enabled | default(false) + - tcs.enabled | default(false) - configured_arch in ['icx'] # STORY: "TAC depends on TCS" - name: check if TCS is enabled when TAC enabled assert: that: - - "{{ tcs.enabled | default(false ) }}" - msg: "TCS should be enabled in order to have TAC functional." + - "{{ tcs.enabled | default(false) }}" + - "{{ kmra.apphsm.enabled | default(false) }}" + - "{{ kmra.pccs.enabled | default(false) }}" + msg: "TCS, KMRA AppHSM and PCCS should be enabled in order to have TAC functional." when: - tac.enabled | default(false) - configured_arch in ['icx'] @@ -1258,6 +1395,18 @@ - istio_service_mesh.enabled | default(false) - configured_arch not in ['icx'] +# STORY: "istio_service_mesh.sgx_signer' option must be true when profile is ca_custom" + - name: particular service mesh options must be set together + assert: + that: + - "{{ istio_service_mesh.sgx_signer.enabled | default(false) }}" + msg: "'istio_service_mesh.sgx_signer' must be enabled for custom-ca profile." + when: + - istio_service_mesh is defined + - istio_service_mesh.enabled | default(false) + - istio_service_mesh.profile == 'custom-ca' | default('default') + - configured_arch in ['icx'] + # STORY: TCS is available only for icx platforms" - name: TCS is available only for specific platforms assert: @@ -1314,7 +1463,8 @@ - name: check OVS DPDK compatibility assert: that: - (ovs_version >= 'v2.17.0' and ovs_version <= 'v3.0.3') and (dpdk_version >= '21.11' and dpdk_version <= '22.07') + ovs_version == 'v3.1.1' and (dpdk_version == '23.03' or dpdk_version == '22.11.1') + or (ovs_version >= 'v2.17.0' and ovs_version <= 'v3.0.3') and (dpdk_version >= '21.11' and dpdk_version <= '22.07') or (ovs_version < 'v2.16.2' and ovs_version >= 'v2.16.0') and dpdk_version == '21.08' or ovs_version == 'v2.15.0' and dpdk_version == '20.11' or ovs_version == 'v2.14.2' and dpdk_version == '19.11.6' @@ -1333,9 +1483,8 @@ - name: check settings for Intel Power Manager assert: that: - - intel_power_manager.power_profiles | length > 0 - intel_power_manager.power_nodes | length > 0 - fail_msg: "Intel Power Manager is enabled, but either Power Profiles or Power Nodes are not specified in group vars." + fail_msg: "Intel Power Manager is enabled, but Power Nodes are not specified in group vars." when: intel_power_manager is defined and intel_power_manager.enabled - name: check if power_nodes are available in inventory @@ -1346,6 +1495,30 @@ loop: "{{ intel_power_manager.power_nodes }}" when: intel_power_manager is defined and intel_power_manager.enabled + # kubelet cpuManagerPolicy should be 'static' for IPM 2.2.0 and higher + - name: check if native_cpu_manager is enabled + assert: + that: + - native_cpu_manager_enabled + fail_msg: "Please set 'native_cpu_manager_enabled' to 'true' in group vars" + when: intel_power_manager is defined and intel_power_manager.enabled + + # kubelet reservedSystemCPUs should be set to desired value + - name: check if reserved cpus are defined in host vars + assert: + that: + - (native_cpu_manager_system_reserved_cpus is defined + and native_cpu_manager_kube_reserved_cpus is defined + and native_cpu_manager_reserved_cpus is not defined) or + (native_cpu_manager_system_reserved_cpus is not defined + and native_cpu_manager_kube_reserved_cpus is not defined + and native_cpu_manager_reserved_cpus is defined) + + fail_msg: "Reserved cpus are not defined. + Please configure ('native_cpu_manager_system_reserved_cpus' and 'native_cpu_manager_kube_reserved_cpus') + or 'native_cpu_manager_reserved_cpus' in host vars." + when: intel_power_manager is defined and intel_power_manager.enabled + - name: check if Intel Power Manager is enabled, the ISST features should be disabled assert: that: @@ -1388,13 +1561,21 @@ include_role: name: intel_sriov_fec_operator tasks_from: preflight_sriov_fec_operator - when: intel_sriov_fec_operator_enabled | default(false) | bool + when: + - intel_sriov_fec_operator_enabled | default(false) | bool + - intel_flexran_type != "pod" - name: check Intel FlexRAN requirements include_role: name: intel_flexran tasks_from: flexran_preflight - when: intel_flexran_enabled | default(false) + when: intel_flexran_enabled | default(false) | bool + + - name: check Intel ECI requirements + include_role: + name: intel_eci + tasks_from: eci_preflight + when: intel_eci_enabled | default(false) | bool - name: check OS when DLB or DSA is enabled assert: @@ -1411,6 +1592,7 @@ If you wish to use DLB or DSA feature set 'update_kernel' as true. {% endif %} when: configure_dsa_devices | d(false) or configure_dlb_devices | d(false) + # SGX on VMs require Ubuntu 22.04 for VM Host - name: Check requirements to enable Intel SGX on VMs block: @@ -1420,6 +1602,17 @@ - ansible_distribution == "Ubuntu" - ansible_distribution_version == "22.04" msg: "Deploying SGX on VMRA is supported only on Ubuntu 22.04 VM host. Please change the o/s for VM host" + + - name: Check if configured SGX memory is not bigger than total memory + assert: + that: + - item.memory > sgx_memory_size + msg: | + Improper memory configuration for vms. + SGX memory size ({{ sgx_memory_size }}MB) can't be bigger than total memory ({{ item.memory }}MB) of VM: {{ item.name }}. + with_items: "{{ vms }}" + when: + - item.type == "work" when: - vm_enabled | default(false) - sgx_dp_enabled | default(false) @@ -1446,6 +1639,18 @@ - hosts: k8s_cluster any_errors_fatal: true tasks: + - name: Include Sigstore policy controller checks + import_role: + name: sigstore_policy_controller + tasks_from: preflight.yml + when: sigstore_policy_controller_install | default(false) | bool + + - name: Include oneAPI kits checks + import_role: + name: intel_oneapi_install + tasks_from: preflight.yml + tags: intel-oneapi + when: intel_oneapi_enabled | default(false) | bool # STORY: "Observability: assert that all required compontents are enabled" - name: assert that all observability/monitoring variables are disabled @@ -1487,6 +1692,24 @@ when: - collectd_enabled | default(false) + - name: Check Telegraf configuration + ansible.builtin.include_role: + name: telegraf_install + tasks_from: preflight + when: telegraf_enabled | default(false) + + - name: Check Collectd configuration + ansible.builtin.include_role: + name: collectd_install + tasks_from: preflight + when: collectd_enabled | default(false) + + - name: Check cAdvisor configuration + ansible.builtin.include_role: + name: cadvisor_install + tasks_from: preflight + when: cadvisor_enabled | default(false) + - name: check ADQ configuration include_role: name: adq_dp_install diff --git a/playbooks/remove_node.yml b/playbooks/remove_node.yml index 2160c896..ad8010f5 100644 --- a/playbooks/remove_node.yml +++ b/playbooks/remove_node.yml @@ -15,8 +15,8 @@ ## --- - name: prepare for removing worker node(s) - import_playbook: k8s/kubespray/remove-node.yml + ansible.builtin.import_playbook: kubernetes_sigs.kubespray.remove_node - name: prepare for Intel cleanup - import_playbook: infra/redeploy_cleanup.yml + ansible.builtin.import_playbook: infra/redeploy_cleanup.yml when: kubernetes | default(true) diff --git a/playbooks/roles b/playbooks/roles deleted file mode 120000 index d8c4472c..00000000 --- a/playbooks/roles +++ /dev/null @@ -1 +0,0 @@ -../roles \ No newline at end of file diff --git a/playbooks/versions.yml b/playbooks/versions.yml index 88250664..90cb182a 100644 --- a/playbooks/versions.yml +++ b/playbooks/versions.yml @@ -39,7 +39,8 @@ - name: Show variable values block: - - shell: "echo -n '{{ item.description }}', && scripts/yaml_version_reader {{ item.var_file_path }} {{ item.shortname }}" + - name: Extract versions + shell: "echo -n '{{ item.description }}', && scripts/yaml_version_reader {{ item.var_file_path }} {{ item.shortname }}" changed_when: false args: chdir: ".." @@ -49,17 +50,25 @@ 'shortname' : 'telegraf_image_tag', 'var_file_path' : 'roles/telegraf_install/defaults/main.yml' } + - { 'description' : 'PMU Tools', + 'shortname' : 'telegraf_pmu_tools_version', + 'var_file_path' : 'roles/telegraf_install/defaults/main.yml' + } - { 'description' : 'Prometheus', 'shortname' : 'prometheus_stack_version', 'var_file_path' : 'roles/kube_prometheus/defaults/main.yml' } + - { 'description' : 'Kube State Metrics', + 'shortname' : 'kube_state_metrics_version', + 'var_file_path' : 'roles/kube_prometheus/defaults/main.yml' + } - { 'description' : 'Grafana', 'shortname' : 'grafana_version', 'var_file_path' : 'roles/kube_prometheus/defaults/main.yml' } - { 'description' : 'CollectD', 'shortname' : "image_collectd\\'\\]\\[\\'digest", - 'var_file_path' : 'playbooks/k8s/roles/collectd_install/defaults/main.yml' + 'var_file_path' : 'roles/collectd_install/defaults/main.yml' } - { 'description' : 'Docker', 'shortname' : 'docker_version', @@ -67,19 +76,29 @@ } - { 'description' : 'Docker CLI', 'shortname' : 'docker_cli_version', - 'var_file_path' : 'roles/container_engine/docker/defaults/main.yml' + 'var_file_path' : 'roles/container_engine/docker/defaults/main.yml', + 'optional' : 'true', + 'reason' : 'version is the same as for Docker' } - { 'description' : 'Kubernetes', 'shortname' : 'kube_version', - 'var_file_path' : 'examples/vm/full_nfv/group_vars/all.yml' + 'var_file_path' : 'examples/k8s/full_nfv/group_vars/all.yml' + } + - { 'description' : 'RKE2', + 'shortname' : 'rke2_version', + 'var_file_path' : 'examples/k8s/full_nfv/group_vars/all.yml' + } + - { 'description' : 'Rancher', + 'shortname' : 'rancher_version', + 'var_file_path' : 'roles/rke2_kubernetes_apps/rancher/defaults/main.yml' } - { 'description' : 'k8s node-exporter', 'var_file_path' : 'roles/kube_prometheus/defaults/main.yml', 'shortname' : 'node_exporter_version' } - { 'description' : 'k8s prometheus-operator', - 'var_file_path' : 'roles/kube_prometheus/files/kube-prometheus-stack/prometheusOperator-clusterRole.yaml', - 'shortname' : "metadata\\'\\]\\[\\'labels\\'\\]\\[\\'app.kubernetes.io/version" + 'var_file_path' : 'roles/kube_prometheus/defaults/main.yml', + 'shortname' : "prometheus_operator_version" } - { 'description' : 'k8s prometheus-adapter', 'var_file_path' : 'roles/kube_prometheus/files/kube-prometheus-stack/prometheusAdapter-clusterRole.yaml', @@ -99,27 +118,31 @@ } - { 'description' : 'CNI plugins', 'shortname' : 'cni_version', - 'var_file_path' : 'playbooks/k8s/kubespray/extra_playbooks/roles/download/defaults/main.yml' + 'var_file_path' : 'collections/ansible_collections/kubernetes_sigs/kubespray/roles/download/defaults/main.yml' } - { 'description' : 'calico', 'shortname' : 'calico_version', - 'var_file_path' : 'playbooks/k8s/kubespray/extra_playbooks/roles/download/defaults/main.yml' + 'var_file_path' : 'collections/ansible_collections/kubernetes_sigs/kubespray/roles/download/defaults/main.yml' } - { 'description' : 'flannel', 'shortname' : 'flannel_version', - 'var_file_path' : 'playbooks/k8s/kubespray/extra_playbooks/roles/download/defaults/main.yml' + 'var_file_path' : 'collections/ansible_collections/kubernetes_sigs/kubespray/roles/download/defaults/main.yml' } - { 'description' : 'coredns', 'shortname' : 'coredns_version', - 'var_file_path' : 'playbooks/k8s/kubespray/extra_playbooks/roles/download/defaults/main.yml' + 'var_file_path' : 'collections/ansible_collections/kubernetes_sigs/kubespray/roles/download/defaults/main.yml' } - { 'description' : 'krew', 'shortname' : 'krew_version', - 'var_file_path' : 'playbooks/k8s/kubespray/extra_playbooks/roles/download/defaults/main.yml' + 'var_file_path' : 'collections/ansible_collections/kubernetes_sigs/kubespray/roles/download/defaults/main.yml' } - { 'description' : 'helm', 'shortname' : 'helm_version', - 'var_file_path' : 'playbooks/k8s/kubespray/extra_playbooks/roles/download/defaults/main.yml' + 'var_file_path' : 'collections/ansible_collections/kubernetes_sigs/kubespray/roles/download/defaults/main.yml' + } + - { 'description' : 'helm on rke2', + 'shortname' : 'helm_version', + 'var_file_path' : 'roles/rke2_kubernetes_apps/helm/defaults/main.yml' } - { 'description' : 'SR-IOV CNI', 'var_file_path' : 'roles/sriov_shared_versions/defaults/main.yml', @@ -217,17 +240,9 @@ 'var_file_path' : 'roles/openssl_engine_install/defaults/main.yml', 'shortname' : 'intel_ipsec_version' } - - { 'description' : 'Intel® SGX DCAP Drivers (ubuntu)', - 'var_file_path' : 'roles/bootstrap/configure_sgx/defaults/main.yml', - 'shortname' : 'dcap_driver_series_ubuntu_20' - } - - { 'description' : 'Intel® SGX DCAP Drivers (rhel)', - 'var_file_path' : 'roles/bootstrap/configure_sgx/defaults/main.yml', - 'shortname' : 'dcap_driver_series_rhel' - } - { 'description' : 'Intel® SGX SDK (ubuntu)', 'var_file_path' : 'roles/bootstrap/configure_sgx/defaults/main.yml', - 'shortname' : 'sgx_sdk_version_ubuntu_20' + 'shortname' : 'sgx_sdk_version_ubuntu' } - { 'description' : 'Intel® SGX SDK (rhel)', 'var_file_path' : 'roles/bootstrap/configure_sgx/defaults/main.yml', @@ -265,19 +280,27 @@ 'shortname' : 'tadk_version', 'var_file_path' : 'roles/tadk_install/defaults/main.yml' } - - { 'description' : 'IstIO operator', + - { 'description' : 'IstIO Service Mesh - istio', + 'var_file_path' : 'roles/istio_service_mesh/vars/main.yml', + 'shortname' : "istio_service_mesh_defaults\\'\\]\\[\\'version" + } + - { 'description' : 'IstIO Service Mesh - istio intel_preview', + 'var_file_path' : 'roles/istio_service_mesh/vars/main.yml', + 'shortname' : "istio_service_mesh_defaults\\'\\]\\[\\'intel_preview\\'\\]\\[\\'version" + } + - { 'description' : 'IstIO operator - default', 'var_file_path' : 'roles/istio_service_mesh/charts/istioctl/values.yaml', 'shortname' : "image\\'\\]\\[\\'tag" } - - { 'description' : 'IstIO pilot-cryptomb (internal)', + - { 'description' : 'IstIO pilot-cryptomb (internal) - default', 'var_file_path' : 'roles/istio_service_mesh/files/profiles/intel-cryptomb.yaml', 'shortname' : "spec\\'\\]\\[\\'tag" } - - { 'description' : 'IstIO proxyv2-cryptomb (internal)', + - { 'description' : 'IstIO proxyv2-cryptomb (internal) - default', 'var_file_path' : 'roles/istio_service_mesh/files/profiles/intel-cryptomb.yaml', 'shortname' : "spec\\'\\]\\[\\'tag" } - - { 'description' : 'IstIO proxyv2-openssl (internal)', + - { 'description' : 'IstIO proxyv2-openssl (internal) - default', 'var_file_path' : 'roles/istio_service_mesh/files/profiles/intel-qat-sw.yaml', 'shortname' : "spec\\'\\]\\[\\'tag" } @@ -293,14 +316,6 @@ 'var_file_path' : 'roles/tcs_install/defaults/main.yml', 'shortname' : 'tcs_git_version' } - - { 'description' : 'Intel® CNDP DP', - 'var_file_path' : 'roles/cndp_dp_install/defaults/main.yml', - 'shortname' : 'intel_cndp_dp_version' - } - - { 'description' : 'Intel® CNDP CNI', - 'var_file_path' : 'roles/cndp_install/defaults/main.yml', - 'shortname' : 'intel_cndp_version' - } - { 'description' : 'MinIO Operator', 'var_file_path' : 'roles/minio_install/defaults/main.yaml', 'shortname' : "minio_operator_version" @@ -317,10 +332,6 @@ 'var_file_path' : 'roles/intel_power_manager/defaults/main.yml', 'shortname' : 'intel_power_manager_git_ref' } - - { 'description' : 'Intel® RDT telemetry plugin', - 'var_file_path' : 'roles/intel_power_manager/defaults/main.yml', - 'shortname' : 'intel_appqos_git_ref' - } - { 'description' : 'Intel SR-IOV FEC Operator', 'var_file_path' : 'roles/intel_sriov_fec_operator/defaults/main.yml', 'shortname' : 'intel_sriov_fec_operator_git_ref' @@ -357,17 +368,21 @@ 'var_file_path' : 'roles/bootstrap/configure_openssl/defaults/main.yml', 'shortname' : 'openssl_version' } - - { 'description' : 'Kubernetes', - 'var_file_path' : 'playbooks/k8s/kubespray/extra_playbooks/roles/kubespray-defaults/defaults/main.yaml', + - { 'description' : 'Kubernetes - kubespray defaults', + 'var_file_path' : 'collections/ansible_collections/kubernetes_sigs/kubespray/roles/kubespray-defaults/defaults/main.yaml', 'shortname' : 'kube_version' } - { 'description' : 'LinkerD', 'var_file_path' : 'roles/linkerd_service_mesh/defaults/main.yml', 'shortname' : 'linkerd_version' } - - { 'description' : 'cadvisor helm charts', + - { 'description' : 'cAdvisor helm chart', 'var_file_path' : 'roles/cadvisor_install/defaults/main.yaml', - 'shortname' : 'cadvisor_helm_charts_version' + 'shortname' : 'cadvisor_helm_chart_version' + } + - { 'description' : 'cAdvisor', + 'var_file_path' : 'roles/cadvisor_install/defaults/main.yaml', + 'shortname' : 'cadvisor_image_version' } - { 'description' : 'Intel® adq dp', 'var_file_path' : 'roles/adq_dp_install/defaults/main.yml', @@ -378,13 +393,21 @@ 'shortname' : 'adq_ice_fw_required_version' } - { 'description' : 'cilium', - 'var_file_path' : 'playbooks/k8s/kubespray/roles/download/defaults/main.yml', + 'var_file_path' : 'collections/ansible_collections/kubernetes_sigs/kubespray/roles/download/defaults/main.yml', 'shortname' : 'cilium_version' } - { 'description' : 'cert manager', - 'var_file_path' : 'playbooks/k8s/kubespray/extra_playbooks/roles/download/defaults/main.yml', + 'var_file_path' : 'collections/ansible_collections/kubernetes_sigs/kubespray/roles/download/defaults/main.yml', + 'shortname' : 'cert_manager_version' + } + - { 'description' : 'cert manager on rke2', + 'var_file_path' : 'roles/rke2_kubernetes_apps/cert_manager_install/defaults/main.yml', 'shortname' : 'cert_manager_version' } + - { 'description' : 'kube dashboard on rke2', + 'var_file_path' : 'roles/rke2_kubernetes_apps/dashboard/defaults/main.yml', + 'shortname' : 'dashboard_image_tag' + } - { 'description' : 'Telemetry aware scheduling', 'var_file_path' : 'roles/platform_aware_scheduling_install/defaults/main.yml', 'shortname' : 'tas_extender_image_tag_default' @@ -393,16 +416,20 @@ 'var_file_path' : 'roles/platform_aware_scheduling_install/defaults/main.yml', 'shortname' : 'gas_extender_image_tag_default' } + - { 'description' : 'Intel CPU Control Plane', + 'var_file_path' : 'roles/intel_cpu_controlplane/defaults/main.yml', + 'shortname' : 'cpu_ctlplane_version' + } - { 'description' : 'crio', 'var_file_path' : 'roles/container_engine/crio/defaults/main.yml', 'shortname' : 'crio_version' } - { 'description' : 'registry', - 'shortname' : 'registry_image', + 'shortname' : 'registry_version', 'var_file_path' : 'roles/container_registry/defaults/main.yml' } - { 'description' : 'nginx', - 'shortname' : 'nginx_image', + 'shortname' : 'registry_nginx_version', 'var_file_path' : 'roles/container_registry/defaults/main.yml' } - { 'description' : 'Open Telemetry Operator', @@ -418,7 +445,7 @@ 'var_file_path' : 'roles/bootstrap/golang_install/defaults/main.yml' } - { 'description' : 'cluster_name', - 'var_file_path' : 'playbooks/k8s/kubespray/roles/kubespray-defaults/defaults/main.yaml', + 'var_file_path' : 'collections/ansible_collections/kubernetes_sigs/kubespray/roles/kubespray-defaults/defaults/main.yaml', 'shortname' : 'cluster_name' } - { 'description' : 'containerd', @@ -426,7 +453,7 @@ 'shortname' : 'containerd_version' } - { 'description' : 'multus', - 'var_file_path' : 'playbooks/k8s/kubespray/roles/download/defaults/main.yml', + 'var_file_path' : 'collections/ansible_collections/kubernetes_sigs/kubespray/roles/download/defaults/main.yml', 'shortname' : 'multus_version' } - { 'description' : 'nfd', @@ -435,11 +462,11 @@ } - { 'description' : 'weave', 'shortname' : 'weave_version', - 'var_file_path' : 'playbooks/k8s/kubespray/roles/download/defaults/main.yml' + 'var_file_path' : 'collections/ansible_collections/kubernetes_sigs/kubespray/roles/download/defaults/main.yml' } - { 'description' : 'kube-vip', 'shortname' : 'kube_vip_image_tag', - 'var_file_path' : 'playbooks/k8s/kubespray/roles/download/defaults/main.yml' + 'var_file_path' : 'collections/ansible_collections/kubernetes_sigs/kubespray/roles/download/defaults/main.yml' } - { 'description' : 'nginx-ingress', 'shortname' : 'kubernetes_ingress_helm_chart_version', @@ -447,11 +474,11 @@ } - { 'description' : 'argocd', 'shortname' : 'argocd_version', - 'var_file_path' : 'playbooks/k8s/kubespray/roles/kubernetes-apps/argocd/defaults/main.yml' + 'var_file_path' : 'collections/ansible_collections/kubernetes_sigs/kubespray/roles/kubernetes-apps/argocd/defaults/main.yml' } - { 'description' : 'metallb', 'shortname' : 'metallb_version', - 'var_file_path' : 'playbooks/k8s/kubespray/roles/download/defaults/main.yml' + 'var_file_path' : 'collections/ansible_collections/kubernetes_sigs/kubespray/roles/download/defaults/main.yml' } - { 'description' : 'kibana', 'shortname' : 'kibana_chart_version', @@ -461,6 +488,22 @@ 'var_file_path' : 'roles/rook_install/defaults/main.yml', 'shortname' : "rook_git_tag" } + - { 'description' : 'FFmpeg', + 'var_file_path' : 'roles/ffmpeg_install/defaults/main.yml', + 'shortname' : "ffmpeg_commit_hash" + } + - { 'description' : 'Intel oneAPI Base kit', + 'var_file_path' : 'roles/intel_oneapi_install/defaults/main.yml', + 'shortname' : "oneapi_basekit_version" + } + - { 'description' : 'Intel oneAPI AI kit', + 'var_file_path' : 'roles/intel_oneapi_install/defaults/main.yml', + 'shortname' : "oneapi_ai_version" + } + - { 'description' : 'jaeger', + 'var_file_path' : 'roles/jaeger_install/defaults/main.yml', + 'shortname' : "jaeger_version" + } - name: Remove old version parsing results file: path: "{{ item }}" @@ -472,7 +515,9 @@ - name: Write versions into output file lineinfile: path: "{{ versions_output_file }}" - line: "{{ item.stdout }}{% if item.stderr and item.item.optional | default(false) %} {{ item.item.reason }}{% endif %}" + line: >- + "{{ item.stdout }}{% if (item.stderr and item.item.optional | default(false)) or (item.stdout and item.item.optional | default(false) and + item.stdout is regex('^.*,{{ .* }}')) %} {{ item.item.reason }}{% endif %}" mode: 0644 create: yes loop: "{{ item_value.results }}" @@ -494,11 +539,3 @@ lineinfile: path: "{{ versions_output_file }}" line: "ddp_profile,{{ ddp_profile.stdout }}" - - name: Add jaeger_version variable - shell: grep -oP 'v[0-9]+\.[0-9]+\.[0-9]+' roles/jaeger_install/defaults/main.yml - changed_when: false - register: jaeger_version - - name: Add jaeger version variable - lineinfile: - path: "{{ versions_output_file }}" - line: "jaeger,{{ jaeger_version.stdout }}" diff --git a/playbooks/vm.yml b/playbooks/vm.yml index 01ae40b8..7cdd9c22 100644 --- a/playbooks/vm.yml +++ b/playbooks/vm.yml @@ -26,7 +26,7 @@ import_playbook: infra/prepare_vms.yml - name: deploy CEK on VMs vars: - on_vms: True + on_vms: true group_vars_content: "{{ lookup('file', '../group_vars/all.yml') | from_yaml }}" import_playbook: "{{ group_vars_content['profile_name'] }}.yml" become: false diff --git a/requirements.txt b/requirements.txt index 0bc39062..b0ce6b1a 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,6 +1,6 @@ ansible==5.7.1 ansible-core==2.12.5 -cryptography==39.0.1 +cryptography==41.0.2 jinja2==3.1.2 netaddr==0.7.19 pbr==5.4.4 diff --git a/roles/bootstrap/apply_intel_pstate/tasks/main.yml b/roles/bootstrap/apply_intel_pstate/tasks/main.yml index 9d4d2a64..88d003c6 100644 --- a/roles/bootstrap/apply_intel_pstate/tasks/main.yml +++ b/roles/bootstrap/apply_intel_pstate/tasks/main.yml @@ -14,14 +14,6 @@ ## limitations under the License. ## --- -- name: determine machine type - include_role: - name: check_machine_type - when: - - inventory_hostname in groups['kube_node'] or - inventory_hostname in groups['vm_host'] - - not on_vms | default (false) - - name: setup turbo boost include_tasks: setup_turbo.yml when: diff --git a/roles/bootstrap/configure_cstates/tasks/main.yml b/roles/bootstrap/configure_cstates/tasks/main.yml deleted file mode 100644 index 6b4b3d1c..00000000 --- a/roles/bootstrap/configure_cstates/tasks/main.yml +++ /dev/null @@ -1,42 +0,0 @@ -## -## Copyright (c) 2020-2023 Intel Corporation. -## -## Licensed under the Apache License, Version 2.0 (the "License"); -## you may not use this file except in compliance with the License. -## You may obtain a copy of the License at -## -## http://www.apache.org/licenses/LICENSE-2.0 -## -## Unless required by applicable law or agreed to in writing, software -## distributed under the License is distributed on an "AS IS" BASIS, -## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -## See the License for the specific language governing permissions and -## limitations under the License. -## -- name: create CommsPowerManagement folder - file: - state: directory - dest: "{{ (project_root_dir, 'CommsPowerManagement') | path_join }}" - mode: 0755 - -- name: download power.py script into machine - get_url: - url: https://raw.githubusercontent.com/intel/CommsPowerManagement/72e58b1939e9aa13d3ad0137e9d674968c45dfcb/power.py - dest: "{{ (project_root_dir, 'CommsPowerManagement', 'power.py') | path_join }}" - mode: "0744" - -- name: create cstates service - template: - src: cstates.service.j2 - dest: /etc/systemd/system/cstates.service - mode: "0644" - register: cstates_service_file - -- name: enable cstates service - # noqa no-handler - intentionally implemented as not a handler - systemd: - name: cstates - enabled: yes - state: restarted - daemon_reload: yes - when: cstates_service_file.changed diff --git a/roles/bootstrap/configure_cstates/templates/cstates.service.j2 b/roles/bootstrap/configure_cstates/templates/cstates.service.j2 deleted file mode 100644 index eee69dfe..00000000 --- a/roles/bootstrap/configure_cstates/templates/cstates.service.j2 +++ /dev/null @@ -1,16 +0,0 @@ -[Unit] -Description=cstates configuration on boot - -[Service] -Type=oneshot - -{% for cstate in cstates %} -{% if cstates[cstate].enable == true %} -ExecStart={{ (project_root_dir, 'CommsPowerManagement', 'power.py') | path_join }} -r {{ cstates[cstate].cpu_range }} -e {{ cstate }} -{% else %} -ExecStart={{ (project_root_dir, 'CommsPowerManagement', 'power.py') | path_join }} -r {{ cstates[cstate].cpu_range }} -d {{ cstate }} -{% endif %} -{% endfor %} - -[Install] -WantedBy=multi-user.target diff --git a/roles/bootstrap/configure_dlb/defaults/main.yml b/roles/bootstrap/configure_dlb/defaults/main.yml index ed5cb6e9..013e3ccb 100644 --- a/roles/bootstrap/configure_dlb/defaults/main.yml +++ b/roles/bootstrap/configure_dlb/defaults/main.yml @@ -14,6 +14,6 @@ ## limitations under the License. ## --- -intel_dlb_driver_ver: "dlb_linux_src_release_8.1.0" -intel_dlb_driver_url: "https://downloadmirror.intel.com/768381/{{ intel_dlb_driver_ver }}.txz" -intel_dlb_driver_checksum: "sha1:752205B5A1414D42083763C673830FB2319767D0" +intel_dlb_driver_ver: "dlb_linux_src_release_8.4.0" +intel_dlb_driver_url: "https://downloadmirror.intel.com/779942/{{ intel_dlb_driver_ver }}.txz" +intel_dlb_driver_checksum: "sha1:AED28FE711213913D2B34160A7047199BEA46D97" diff --git a/roles/cadvisor_install/tasks/preflight_cadvisor.yml b/roles/bootstrap/configure_docker_daemon/defaults/main.yml similarity index 78% rename from roles/cadvisor_install/tasks/preflight_cadvisor.yml rename to roles/bootstrap/configure_docker_daemon/defaults/main.yml index 846cd329..2c8d3501 100644 --- a/roles/cadvisor_install/tasks/preflight_cadvisor.yml +++ b/roles/bootstrap/configure_docker_daemon/defaults/main.yml @@ -14,9 +14,4 @@ ## limitations under the License. ## --- -# - block: - # - name: preflight cAdvisor installation - # include_role: - # name: cadvisor_install - # tasks_from: preflight_cadvisor - # any_errors_fatal: true +docker_config_directory: "/etc/docker" diff --git a/roles/bootstrap/configure_docker_daemon/tasks/main.yml b/roles/bootstrap/configure_docker_daemon/tasks/main.yml new file mode 100644 index 00000000..1cefaae9 --- /dev/null +++ b/roles/bootstrap/configure_docker_daemon/tasks/main.yml @@ -0,0 +1,44 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: Update docker config and reload daemon + when: + - container_runtime == "docker" + block: + - name: Create a directory if it does not exist + ansible.builtin.file: + path: "{{ docker_config_directory }}" + state: directory + mode: 0755 + - name: Creating daemon.json file for docker configuration + ansible.builtin.copy: + dest: "{{ (docker_config_directory, 'daemon.json') | path_join }}" + content: | + { + "live-restore": true + } + mode: 0644 + - name: Get docker service status + ansible.builtin.systemd: + name: docker + register: docker_service_status + become: yes + - name: Reload docker service + ansible.builtin.systemd: + name: docker + state: reloaded + become: yes + when: "'inactive' not in docker_service_status.status.ActiveState" diff --git a/roles/bootstrap/configure_dsa/tasks/main.yml b/roles/bootstrap/configure_dsa/tasks/main.yml index 15166c8b..976f9a1b 100644 --- a/roles/bootstrap/configure_dsa/tasks/main.yml +++ b/roles/bootstrap/configure_dsa/tasks/main.yml @@ -33,7 +33,7 @@ - name: apply default configuration for DSA devices include_tasks: dsa_default_config.yml vars: - dsa_id: "{{ item.path[-1] }}" + dsa_id: "{{ item.path | basename | replace('dsa', '') }}" with_items: "{{ found_dsa_devices.files }}" when: - configure_dsa_devices | default(false) | bool @@ -54,10 +54,26 @@ - configure_dsa_devices | default(false) | bool - dsa_devices | default([]) | length > 0 +# config will be saved to /etc/accel-config/accel-config.conf as default. - name: save accel-config configuration command: accel-config save-config changed_when: true +# WA for configuring DSA devices +# in some CPU SKUs with specific BIOS version, wq_cap.wq_ats_support is disabled, so wq_ats_disable cannot be written. +- name: modify accel-config.conf + block: + - name: remove ats_disable parameter + ansible.builtin.lineinfile: + path: /etc/accel-config/accel-config.conf + regexp: "ats_disable" + state: absent + - name: remove extra comma + ansible.builtin.replace: + path: /etc/accel-config/accel-config.conf + regexp: "\"threshold\":0," + replace: "\"threshold\":0" + - name: create systemd unit file copy: src: "{{ (role_path , 'files', 'dsa_config.service') | path_join }}" diff --git a/roles/bootstrap/configure_fpga/defaults/main.yml b/roles/bootstrap/configure_fpga/defaults/main.yml new file mode 100644 index 00000000..804e5696 --- /dev/null +++ b/roles/bootstrap/configure_fpga/defaults/main.yml @@ -0,0 +1,18 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +# FPGA folder on the node +fpga_drivers_dir: "{{ (project_root_dir, 'fpga_drivers') | path_join }}" diff --git a/roles/bootstrap/configure_fpga/tasks/debian.yml b/roles/bootstrap/configure_fpga/tasks/debian.yml new file mode 100644 index 00000000..82714802 --- /dev/null +++ b/roles/bootstrap/configure_fpga/tasks/debian.yml @@ -0,0 +1,94 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: create directory {{ fpga_drivers_dir }} for fpga + file: + path: "{{ fpga_drivers_dir }}" + state: directory + mode: 0755 + +- name: copy fpga installation script from ansible host to node + ansible.builtin.copy: + src: "{{ (fpga_driver_staging_folder, fpga_install_script) | path_join }}" + dest: "{{ (fpga_drivers_dir, fpga_install_script) | path_join }}" + mode: 0755 + +- name: run fpga installation script + ansible.builtin.command: >- + sh -c {{ (fpga_drivers_dir, fpga_install_script) | path_join }} + register: installation_register + changed_when: true + +- debug: msg={{ installation_register.stdout }} + +- name: check whether the OPAE SDK and DFL packages installed + block: + - name: check whether opae sdk is installed + ansible.builtin.shell: set -o pipefail && dpkg --list |grep -i opae + args: + executable: /bin/bash + register: opae_register + failed_when: opae_register.rc != 0 + changed_when: false + + - name: check whether dfl package is installed + ansible.builtin.shell: set -o pipefail && dpkg --list |grep -i dfl + args: + executable: /bin/bash + register: dfl_register + failed_when: dfl_register.rc != 0 + changed_when: false + +- name: block check fpga environment + block: + - name: check fme info via fpgainfo tool + ansible.builtin.command: "fpgainfo fme" + register: fme_register + changed_when: false + failed_when: + - "'error' in fme_register.stdout" + + - name: check bmc info via fpgainfo tool + ansible.builtin.command: "fpgainfo bmc" + register: bmc_register + changed_when: false + failed_when: + - "'error' in bmc_register.stdout" + + - name: check ethernet PHY info via fpgainfo tool + ansible.builtin.command: "fpgainfo phy" + register: phy_register + changed_when: false + failed_when: + - "'error' in phy_register.stdout" + +- name: check the fpgad service file exists + ansible.builtin.stat: + path: /etc/opae/fpgad.cfg + register: stat_register + failed_when: + - not stat_register.stat.exists + +- debug: msg="fpgad.cfg file exists {{ stat_register.stat.exists }}" + +- name: start the fgpad service + ansible.builtin.systemd: + state: started + name: fpgad + enabled: true + register: fpgad_register + failed_when: + - fpgad_register.status.ActiveState != 'active' diff --git a/roles/vm/vm_sgx_enable/tasks/main.yml b/roles/bootstrap/configure_fpga/tasks/main.yml similarity index 75% rename from roles/vm/vm_sgx_enable/tasks/main.yml rename to roles/bootstrap/configure_fpga/tasks/main.yml index 8b541c21..9dbc06c7 100644 --- a/roles/vm/vm_sgx_enable/tasks/main.yml +++ b/roles/bootstrap/configure_fpga/tasks/main.yml @@ -14,11 +14,11 @@ ## limitations under the License. ## --- -- name: Adding SGX memory definition to VM domain - include_tasks: vm-domain-edit.yml - loop: '{{ vms }}' - loop_control: - loop_var: vm +- name: install dependencies for Intel FPGA device + include_role: + name: install_dependencies + +- name: debian fpga install + ansible.builtin.include_tasks: debian.yml when: - - vm.type == 'work' - - sgx_dp_enabled | default(false) + ansible_distribution == "Ubuntu" diff --git a/roles/bootstrap/configure_fpga/vars/main.yml b/roles/bootstrap/configure_fpga/vars/main.yml new file mode 100644 index 00000000..f012eba6 --- /dev/null +++ b/roles/bootstrap/configure_fpga/vars/main.yml @@ -0,0 +1,19 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +install_dependencies: + Debian: + - libssl-dev diff --git a/roles/bootstrap/configure_openssl/defaults/main.yml b/roles/bootstrap/configure_openssl/defaults/main.yml index f84f9bee..28fe7f07 100644 --- a/roles/bootstrap/configure_openssl/defaults/main.yml +++ b/roles/bootstrap/configure_openssl/defaults/main.yml @@ -15,14 +15,6 @@ ## --- openssl_url: "https://github.com/openssl/openssl.git" -openssl_version: "openssl-3.0.8" +openssl_version: "openssl-3.1.1" openssl_dir: "{{ (project_root_dir, 'openssl') | path_join }}" openssl_pkg_subdir: "{{ openssl_dir }}/{{ openssl_version }}" - -# QATLibs -intel_qatlib_download_url: "https://github.com/intel/qatlib.git" -intel_qatlib_download_url_version: "23.02.0" -intel_qatlib_download_url_dir: "{{ (project_root_dir, 'intel_qatlibs') | path_join }}" - -# Note: mentioned below variable name & folder location must match "roles/bootstrap/install_qat_drivers_services/defaults/main.yml" -qat_drivers_dir: "{{ (project_root_dir, 'qat_drivers') | path_join }}" diff --git a/roles/bootstrap/configure_openssl/tasks/intel_qatlibs_and_qatsvm_configuration.yml b/roles/bootstrap/configure_openssl/tasks/intel_qatlibs_and_qatsvm_configuration.yml deleted file mode 100644 index ae9e72fb..00000000 --- a/roles/bootstrap/configure_openssl/tasks/intel_qatlibs_and_qatsvm_configuration.yml +++ /dev/null @@ -1,78 +0,0 @@ -## -## Copyright (c) 2020-2023 Intel Corporation. -## -## Licensed under the Apache License, Version 2.0 (the "License"); -## you may not use this file except in compliance with the License. -## You may obtain a copy of the License at -## -## http://www.apache.org/licenses/LICENSE-2.0 -## -## Unless required by applicable law or agreed to in writing, software -## distributed under the License is distributed on an "AS IS" BASIS, -## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -## See the License for the specific language governing permissions and -## limitations under the License. -## ---- -# Intel QATLibs -- name: Install Intel QATLibs - block: - - name: create directory {{ intel_qatlib_download_url_dir }} for Intel QATLibs configuration - file: - path: "{{ intel_qatlib_download_url_dir }}" - state: directory - mode: '0700' - - - name: download Intel QATLib - git: - repo: "{{ intel_qatlib_download_url }}" - dest: "{{ intel_qatlib_download_url_dir }}" - version: "{{ intel_qatlib_download_url_version }}" - force: true - - # using shell module instead of comand as it was giving aclocal: warning: causing playbook failure - - name: run autogen before configure QATLibs - shell: './autogen.sh' - args: - chdir: "{{ intel_qatlib_download_url_dir }}" - executable: /bin/bash - changed_when: true - - - name: check all packages are present for QATLibs installation - command: './configure' - args: - chdir: "{{ intel_qatlib_download_url_dir }}" - changed_when: true - - - name: make install QAT drivers - make: - chdir: "{{ intel_qatlib_download_url_dir }}" - target: install - become: yes - - - name: reload the dynamic linker cache - command: "ldconfig" - changed_when: true - -# Mentioned below block is also present in "roles/bootstrap/install_qat_drivers_services/tasks/main.yml" that will only occurs if, -# "enable_intel_qatlibs" is "false" in host_vars because, in order to compile QAT configuration, -# QATlibs must be installed before SVM feature is configured -- name: configuration for QAT Shared Virtual Memory (SVM) - block: - - name: set QAT SVM is enabled - set_fact: - svm_value: 1 - - - name: enable address translation services for QAT Shared Virtual Memory (SVM) - replace: - path: "{{ item }}" - regexp: '(^SVMEnabled\s)(.*)$' - replace: 'SVMEnabled = {{ svm_value }}' - mode: 0600 - with_items: - - "{{ qat_drivers_dir }}/quickassist/utilities/adf_ctl/conf_files/4xxxvf_dev0.conf.vm" - - "{{ qat_drivers_dir }}/quickassist/utilities/adf_ctl/conf_files/4xxxvf_dev0.conf.sym.vm" - - "{{ qat_drivers_dir }}/quickassist/utilities/adf_ctl/conf_files/4xxxvf_dev0.conf.dc.vm" - - "{{ qat_drivers_dir }}/quickassist/utilities/adf_ctl/conf_files/4xxxvf_dev0.conf.asym.vm" - - "{{ qat_drivers_dir }}/quickassist/utilities/adf_ctl/conf_files/4xxxvf_dev0.conf.dc.sym.vm" - when: enable_qat_svm and enable_qat_svm is defined diff --git a/roles/bootstrap/configure_openssl/tasks/main.yml b/roles/bootstrap/configure_openssl/tasks/main.yml index bb914101..bb0c6cc6 100644 --- a/roles/bootstrap/configure_openssl/tasks/main.yml +++ b/roles/bootstrap/configure_openssl/tasks/main.yml @@ -18,8 +18,8 @@ include_role: name: install_dependencies -# Block for QAT check -- name: block for QAT check +# Block for OOT drivers status check +- name: block for QAT OOT drivers status check block: - name: confirm module before OpenSSL installation shell: "set -o pipefail && lsmod | grep qat" @@ -111,9 +111,3 @@ - name: reload the dynamic linker cache command: "ldconfig" changed_when: true - -- name: QATLibs and SVM configuration - include_tasks: intel_qatlibs_and_qatsvm_configuration.yml - when: - - configured_arch == "spr" - - enable_intel_qatlibs is defined and enable_intel_qatlibs diff --git a/roles/bootstrap/configure_openssl/vars/main.yml b/roles/bootstrap/configure_openssl/vars/main.yml index f5e7b372..c94638c5 100644 --- a/roles/bootstrap/configure_openssl/vars/main.yml +++ b/roles/bootstrap/configure_openssl/vars/main.yml @@ -27,6 +27,11 @@ install_dependencies: - cpuid - make - nasm + - systemd + - pkg-config + - yasm + - tar + - git RedHat: - "@Development Tools" - cmake @@ -41,3 +46,6 @@ install_dependencies: - libudev-devel - perl - nasm + - yasm + - tar + - git diff --git a/roles/bootstrap/configure_proxy/tasks/main.yml b/roles/bootstrap/configure_proxy/tasks/main.yml index 9c073fec..3da27c7b 100644 --- a/roles/bootstrap/configure_proxy/tasks/main.yml +++ b/roles/bootstrap/configure_proxy/tasks/main.yml @@ -45,10 +45,9 @@ file: path: "{{ ansible_env.HOME }}/.docker" state: "directory" - mode: 0755 + mode: 0750 owner: "{{ ansible_user | default(ansible_user_id) }}" group: "{{ ansible_user | default(ansible_user_id) }}" - when: container_runtime == "docker" - name: create Docker config.json file with proxy setttings template: @@ -57,7 +56,7 @@ owner: "{{ ansible_user | default(ansible_user_id) }}" group: "{{ ansible_user | default(ansible_user_id) }}" force: yes - mode: 0755 + mode: 0640 when: - '"http_proxy" in proxy_env or "https_proxy" in proxy_env' - container_runtime == "docker" diff --git a/roles/bootstrap/configure_qat/tasks/check_qat_status.yml b/roles/bootstrap/configure_qat/tasks/check_qat_status.yml index b4d5d114..76354b92 100644 --- a/roles/bootstrap/configure_qat/tasks/check_qat_status.yml +++ b/roles/bootstrap/configure_qat/tasks/check_qat_status.yml @@ -15,7 +15,7 @@ ## --- - name: confirm QAT module is loaded - shell: "set -o pipefail && lsmod | grep qat" + ansible.builtin.shell: "set -o pipefail && lsmod | grep qat" args: executable: /bin/bash register: qat_confirm_mod @@ -23,26 +23,44 @@ ignore_errors: true - name: QAT kernel module not found - fail: + ansible.builtin.fail: msg: "No QAT module found. Please set update_qat_drivers to true in host vars to resolve the issue." when: '"intel_qat" not in qat_confirm_mod.stdout' - name: make sure {{ enabled_qat_service }} service is started and enabled - service: + ansible.builtin.service: name: "{{ enabled_qat_service }}" state: started - enabled: yes + enabled: true + +- name: disable the multi-user.target in qat.service to avoid order cycle + ansible.builtin.lineinfile: + path: /lib/systemd/system/qat.service + regexp: "^After=multi-user.target" + line: "#After=multi-user.target" + mode: 0644 + become: true + +- name: restart the qat.service + ansible.builtin.systemd: + name: qat.service + state: restarted + enabled: true # ansible_facts.services is not supported currently on Ubuntu 20.04, once sorted will remove and use ansible service module -- name: check status of {{ enabled_qat_service }} service - shell: "set -o pipefail && service {{ enabled_qat_service }} status | grep qat_dev" - args: - executable: /bin/bash - register: qat_status_check - changed_when: false - ignore_errors: true +- name: block to check {{ enabled_qat_service }} service + block: + - name: check status of {{ enabled_qat_service }} service + ansible.builtin.shell: "set -o pipefail && service {{ enabled_qat_service }} status | grep qat_dev" + args: + executable: /bin/bash + register: qat_status_check + changed_when: false + ignore_errors: true -- name: configure_qat - {{ enabled_qat_service }} service not running properly, playbook terminated - fail: - msg: "Failed to start {{ enabled_qat_service }} service on system. Please set update_qat_drivers to true in host vars to resolve the issue." - when: "'up' not in qat_status_check.stdout" + - name: configure_qat - {{ enabled_qat_service }} service not running properly, playbook terminated + ansible.builtin.fail: + msg: "Failed to start {{ enabled_qat_service }} service on system. Please set update_qat_drivers to true in host vars to resolve the issue." + when: "'up' not in qat_status_check.stdout" + when: + - update_qat_drivers diff --git a/roles/bootstrap/configure_sgx/defaults/main.yml b/roles/bootstrap/configure_sgx/defaults/main.yml index fe060ee2..358f4db2 100644 --- a/roles/bootstrap/configure_sgx/defaults/main.yml +++ b/roles/bootstrap/configure_sgx/defaults/main.yml @@ -14,44 +14,31 @@ ## limitations under the License. ## --- -# Intel SGX-DCAP drivers module for Ubuntu 20.04 -dcap_driver_series_ubuntu_20: "1.41" -dcap_driver_version_ubuntu_20: "sgx_linux_x64_driver_{{ dcap_driver_series_ubuntu_20 }}.bin" -dcap_driver_url_ubuntu_20: "https://download.01.org/intel-sgx/sgx-dcap/1.15/linux/distro/ubuntu20.04-server/{{ dcap_driver_version_ubuntu_20 }}" -dcap_driver_checksum_ubuntu_20: "sha256:665E3EACEDDE85F4D449B2298B2F274BF7F9C86107648F7B8ABAFA6A40061EEE" -sgx_folder_check_ubuntu_20: "{{ project_root_dir }}/sgx-{{ dcap_driver_series_ubuntu_20 }}" - -sgx_sdk_version_ubuntu_20: "sgx_linux_x64_sdk_2.18.100.3.bin" -sgx_sdk_url_ubuntu_20: "https://download.01.org/intel-sgx/sgx-dcap/1.15/linux/distro/ubuntu20.04-server/{{ sgx_sdk_version_ubuntu_20 }}" -sgx_sdk_checksum_ubuntu_20: "sha256:9B6E5259909B35C3C269C4E2FBBB193B518F5BEBD08226E1FC51EBC47A904125" +# Intel SGX SDK for Ubuntu +sgx_sdk_version_ubuntu: "sgx_linux_x64_sdk_2.19.100.3.bin" +sgx_sdk_url_ubuntu: "https://download.01.org/intel-sgx/sgx-dcap/1.16/linux/distro/ubuntu22.04-server/{{ sgx_sdk_version_ubuntu }}" +sgx_sdk_checksum_ubuntu: "sha256:B99B66A2E7D3842D106CF37747A124C53A9B49B07649E1EE26C0DA2BEB5AB3CE" # Intel SGX-SGX Key configuration for Ubuntu >= 18.04.4 sgx_apt_source_list: "intel-sgx" sgx_apt_repo_url: "https://download.01.org/intel-sgx/sgx_repo/ubuntu" sgx_apt_repo_key: "{{ sgx_apt_repo_url }}/intel-sgx-deb.key" -# Intel SGX-DCAP drivers module for RHEL -dcap_driver_series_rhel: "1.41" -dcap_driver_version_rhel: "sgx_linux_x64_driver_{{ dcap_driver_series_rhel }}.bin" -dcap_driver_url_rhel: "https://download.01.org/intel-sgx/sgx-dcap/1.15/linux/distro/rhel8.6-server/{{ dcap_driver_version_rhel }}" -dcap_driver_checksum_rhel: "sha256:B1BD47CA702C69672A6C9B5130DFF556EADD796445FA91AC2A2BA57AA75DB5B2" -sgx_folder_check_rhel: "{{ project_root_dir }}/sgx-{{ dcap_driver_series_rhel }}" - -sgx_sdk_version_rhel: "sgx_linux_x64_sdk_2.18.100.3.bin" -sgx_sdk_url_rhel: "https://download.01.org/intel-sgx/sgx-dcap/1.15/linux/distro/rhel8.6-server/{{ sgx_sdk_version_rhel }}" -sgx_sdk_checksum_rhel: "sha256:27B578395EE82985305053D7B95BE96837098BFB2FDBFA05D29C514F253AE861" +# Intel SGX SDK for RHEL +sgx_sdk_version_rhel: "sgx_linux_x64_sdk_2.19.100.3.bin" +sgx_sdk_url_rhel: "https://download.01.org/intel-sgx/sgx-dcap/1.16/linux/distro/rhel8.6-server/{{ sgx_sdk_version_rhel }}" +sgx_sdk_checksum_rhel: "sha256:E293D0179F81264AAD81E5A9864065B117C99A6EAD2388BC2A807093CB7E837A" # Intel SGX RPM local repository for RHEL sgx_rpm_local_repo_version_rhel: "sgx_rpm_local_repo.tgz" -sgx_rpm_local_repo_url_rhel: "https://download.01.org/intel-sgx/sgx-dcap/1.15/linux/distro/rhel8.6-server/{{ sgx_rpm_local_repo_version_rhel }}" -sgx_rpm_local_repo_checksum_rhel: "sha256:55370364E1A6853101CDC04C7650B9AD229465486DEF8BFDD3F488B11CDB62EA" - +sgx_rpm_local_repo_url_rhel: "https://download.01.org/intel-sgx/sgx-dcap/1.16/linux/distro/rhel8.6-server/{{ sgx_rpm_local_repo_version_rhel }}" +sgx_rpm_local_repo_checksum_rhel: "sha256:CFC42AC150E167510190F6D56677B17619869E1B528A03DB713C3D41C85E3EA5" sgx_config_dir: "{{ project_root_dir }}" sgx_rpm_directory: "{{ (project_root_dir, 'sgx_rpm_local_repo') | path_join }}" -sgx_pkg_version: "2.18.100.3" -sgx_pkg_dcap_version: "1.15.100.3" +sgx_pkg_version: "2.19.100.3" +sgx_pkg_dcap_version: "1.16.100.2" protobuf_version: protobuf-3.5.0-13.el8.x86_64.rpm protobuf_repository: https://dl.rockylinux.org/vault/rocky/8.6/AppStream/x86_64/os/Packages/p diff --git a/roles/bootstrap/configure_sgx/files/configure-sgx-udev.service b/roles/bootstrap/configure_sgx/files/configure-sgx-udev.service new file mode 100644 index 00000000..2cf9565b --- /dev/null +++ b/roles/bootstrap/configure_sgx/files/configure-sgx-udev.service @@ -0,0 +1,10 @@ +[Unit] +Description=enabled the sgx udev rule on reboot +AssertPathExists=/usr/bin/udevadm + +[Service] +Type=simple +ExecStart=udevadm trigger + +[Install] +WantedBy=multi-user.target diff --git a/roles/bootstrap/configure_sgx/tasks/main.yml b/roles/bootstrap/configure_sgx/tasks/main.yml index f099de4a..eec54deb 100644 --- a/roles/bootstrap/configure_sgx/tasks/main.yml +++ b/roles/bootstrap/configure_sgx/tasks/main.yml @@ -14,14 +14,6 @@ ## limitations under the License. ## --- -- name: determine machine type - include_role: - name: check_machine_type - when: - - inventory_hostname in groups['kube_node'] or - inventory_hostname in groups['vm_host'] - - not on_vms | default (false) - - name: install dependencies - cpuid package: name: cpuid diff --git a/roles/bootstrap/configure_sgx/tasks/rhel.yml b/roles/bootstrap/configure_sgx/tasks/rhel.yml index d54f98c3..7a716e5f 100644 --- a/roles/bootstrap/configure_sgx/tasks/rhel.yml +++ b/roles/bootstrap/configure_sgx/tasks/rhel.yml @@ -25,43 +25,6 @@ mode: 0700 become: yes -- name: confirm DCAP driver module installed - command: "lsmod" - register: lsmod_output - changed_when: false - -- name: if intel_sgx module is not available, the block is executed - block: - - name: download DCAP drivers - get_url: - url: "{{ dcap_driver_url_rhel }}" - dest: "{{ project_root_dir }}" - mode: 0755 - checksum: "{{ dcap_driver_checksum_rhel }}" - register: get_url_results - retries: "{{ number_of_retries | default(5) }}" - until: get_url_results is success - delay: "{{ retry_delay | default(3) }}" - - - name: install DCAP driver - # noqa command-instead-of-shell - shell is used intentionally here - shell: "./{{ dcap_driver_version_rhel }}" - args: - chdir: "{{ project_root_dir }}" - executable: /bin/bash - register: dcap_output_rhel - failed_when: '"Installation is successful!" not in dcap_output_rhel.stdout' - changed_when: '"Installation is successful!" in dcap_output_rhel.stdout' - - - name: Load SGX module (DCAP) - modprobe: - name: intel_sgx - state: present - when: - - not update_kernel - - ansible_os_family == "RedHat" and ansible_distribution_version < '8.4' - - '"intel_sgx" not in lsmod_output.stdout' - - name: download SGX RPM local repository get_url: url: "{{ sgx_rpm_local_repo_url_rhel }}" diff --git a/roles/bootstrap/configure_sgx/tasks/ubuntu.yml b/roles/bootstrap/configure_sgx/tasks/ubuntu.yml index 04a41e3b..39f0bffa 100644 --- a/roles/bootstrap/configure_sgx/tasks/ubuntu.yml +++ b/roles/bootstrap/configure_sgx/tasks/ubuntu.yml @@ -25,44 +25,6 @@ mode: 0700 become: yes -- name: confirm DCAP driver module installed - command: "lsmod" - register: lsmod_output - changed_when: false - -- name: if intel_sgx module is not available, the block is executed - block: - - name: download DCAP drivers - get_url: - url: "{{ dcap_driver_url_ubuntu_20 }}" - dest: "{{ sgx_config_dir }}" - mode: 0755 - checksum: "{{ dcap_driver_checksum_ubuntu_20 }}" - register: get_url_results - retries: "{{ number_of_retries | default(5) }}" - until: get_url_results is success - delay: "{{ retry_delay | default(3) }}" - - - name: install DCAP driver - # noqa command-instead-of-shell - shell is used intentionally here - shell: "./{{ dcap_driver_version_ubuntu_20 }}" - args: - chdir: "{{ sgx_config_dir }}" - executable: /bin/bash - register: dcap_output_ubuntu_20 - failed_when: '"Installation is successful!" not in dcap_output_ubuntu_20.stdout' - changed_when: '"Installation is successful!" in dcap_output_ubuntu_20.stdout' - - - name: Load SGX module (DCAP) - modprobe: - name: intel_sgx - state: present - when: - - not update_kernel - - '"intel_sgx" not in lsmod_output.stdout' - - "not configure_gpu | default(false) | bool" - - ansible_distribution_version < '21.04' - - name: add {{ sgx_apt_source_list }} repo key apt_key: url: "{{ sgx_apt_repo_key }}" @@ -105,30 +67,19 @@ state: started name: aesmd -# ansible_facts.services is not supported currently on Ubuntu 20.04, once sorted will remove when conditions and merge code as one task. -- name: check status of aesmd service after started - command: systemctl status aesmd.service - args: - warn: false - register: aesmd_enabled - changed_when: true - -- debug: - var: aesmd_enabled.stdout_lines - - name: download sgx sdk get_url: - url: "{{ sgx_sdk_url_ubuntu_20 }}" + url: "{{ sgx_sdk_url_ubuntu }}" dest: "{{ sgx_config_dir }}" mode: 0755 - checksum: "{{ sgx_sdk_checksum_ubuntu_20 }}" + checksum: "{{ sgx_sdk_checksum_ubuntu }}" register: get_url_results retries: "{{ number_of_retries | default(5) }}" until: get_url_results is success delay: "{{ retry_delay | default(3) }}" - name: install sgx sdk - shell: "set -o pipefail && echo 'yes' | ./{{ sgx_sdk_version_ubuntu_20 }}" + shell: "set -o pipefail && echo 'yes' | ./{{ sgx_sdk_version_ubuntu }}" args: chdir: "{{ sgx_config_dir }}" executable: /bin/bash @@ -157,3 +108,44 @@ - debug: var: psw_confirm.stdout_lines when: '"Succeed" in psw_confirm.stdout' + +- name: prepare worker node with sgx enabled + block: + - name: ensure sgx_prv group exists + ansible.builtin.group: + name: sgx_prv + state: present + + - name: add user to sgx_prv group + ansible.builtin.user: + name: "{{ ansible_user_id }}" + groups: sgx_prv + append: yes + + - name: create udev rules + ansible.builtin.blockinfile: + path: /etc/udev/rules.d/10-sgx.rules + create: yes + mode: '0644' + block: | + SUBSYSTEM=="misc",KERNEL=="enclave",MODE="0666" + SUBSYSTEM=="misc",KERNEL=="provision",GROUP="sgx_prv",MODE="0660" + SUBSYSTEM=="sgx",KERNEL=="sgx/enclave",MODE="0666" + SUBSYSTEM=="sgx",KERNEL=="sgx/provision",MODE="0660" + SUBSYSTEM=="misc",KERNEL=="sgx_enclave",MODE="0666",SYMLINK+="sgx/enclave" + SUBSYSTEM=="misc",KERNEL=="sgx_provision",GROUP="sgx_prv",MODE="0660",SYMLINK+="sgx/provision" + + - name: copy configure-sgx-udev.service file + ansible.builtin.copy: + src: configure-sgx-udev.service + dest: /lib/systemd/system/configure-sgx-udev.service + mode: 0755 + + - name: ensure configure-sgx-udev.service started + ansible.builtin.systemd: + state: started + name: configure-sgx-udev + enabled: true + when: + - (ansible_distribution == "Ubuntu" and ansible_distribution_version >= '21.04') + or (ansible_os_family == "RedHat" and ansible_distribution_version >= '8.4') diff --git a/roles/bootstrap/configure_sst/defaults/main.yml b/roles/bootstrap/configure_sst/defaults/main.yml index 63054b0f..ec1c4bbe 100644 --- a/roles/bootstrap/configure_sst/defaults/main.yml +++ b/roles/bootstrap/configure_sst/defaults/main.yml @@ -20,5 +20,5 @@ clx_sst_bf_dir: "{{ project_root_dir }}/CommsPowerManagement" clx_sst_bf_exec: "/usr/local/bin/sst_bf.py" isst_tool_git_url: "https://github.com/torvalds/linux.git" -isst_tool_git_version: "v6.1" +isst_tool_git_version: "v6.3" isst_tool_src_dir: "{{ (project_root_dir, 'speedselect') | path_join }}" diff --git a/roles/bootstrap/configure_sst/tasks/main.yml b/roles/bootstrap/configure_sst/tasks/main.yml index c9c73c5e..93eea337 100644 --- a/roles/bootstrap/configure_sst/tasks/main.yml +++ b/roles/bootstrap/configure_sst/tasks/main.yml @@ -14,14 +14,6 @@ ## limitations under the License. ## --- -- name: determine machine type - include_role: - name: check_machine_type - when: - - inventory_hostname in groups['kube_node'] or - inventory_hostname in groups['vm_host'] - - not on_vms | default (false) - # Configuartion for Intel(R) Speed Select Technology "SST-BF,SST-CP,SST-TF and SST-PP" - name: configure Intel Speed Select Technology (ISST) include_tasks: sst_bf_cp_tf_pp_setup.yml diff --git a/roles/bootstrap/configure_ufs/tasks/main.yml b/roles/bootstrap/configure_ufs/tasks/main.yml deleted file mode 100644 index f0d78425..00000000 --- a/roles/bootstrap/configure_ufs/tasks/main.yml +++ /dev/null @@ -1,49 +0,0 @@ -## -## Copyright (c) 2020-2023 Intel Corporation. -## -## Licensed under the Apache License, Version 2.0 (the "License"); -## you may not use this file except in compliance with the License. -## You may obtain a copy of the License at -## -## http://www.apache.org/licenses/LICENSE-2.0 -## -## Unless required by applicable law or agreed to in writing, software -## distributed under the License is distributed on an "AS IS" BASIS, -## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -## See the License for the specific language governing permissions and -## limitations under the License. -## -- name: check presence of power.py script file - stat: - path: "{{ (project_root_dir, 'CommsPowerManagement', 'power.py') | path_join }}" - register: power_script - -- name: create CommsPowerManagement folder if not present - file: - state: directory - dest: "{{ (project_root_dir, 'CommsPowerManagement') | path_join }}" - mode: 0755 - when: not power_script.stat.exists - -- name: download power.py script file if not present - get_url: - url: https://raw.githubusercontent.com/intel/CommsPowerManagement/72e58b1939e9aa13d3ad0137e9d674968c45dfcb/power.py - dest: "{{ (project_root_dir, 'CommsPowerManagement', 'power.py') | path_join }}" - mode: "0744" - when: not power_script.stat.exists - -- name: create ufs service - template: - src: ufs.service.j2 - dest: /etc/systemd/system/ufs.service - mode: "0644" - register: ufs_service_file - -- name: enable ufs service -# noqa no-handler - intentionally implemented as not a handler - systemd: - name: ufs - enabled: yes - state: restarted - daemon_reload: yes - when: ufs_service_file.changed diff --git a/roles/bootstrap/configure_ufs/templates/ufs.service.j2 b/roles/bootstrap/configure_ufs/templates/ufs.service.j2 deleted file mode 100644 index a98ac209..00000000 --- a/roles/bootstrap/configure_ufs/templates/ufs.service.j2 +++ /dev/null @@ -1,15 +0,0 @@ -[Unit] -Description=ufs configuration on boot - -[Service] -Type=oneshot - -{% if ufs.max is defined %} -ExecStart={{ (project_root_dir, 'CommsPowerManagement', 'power.py') | path_join }} -U {{ ufs.max }} -{% endif %} -{% if ufs.min is defined %} -ExecStart={{ (project_root_dir, 'CommsPowerManagement', 'power.py') | path_join }} -u {{ ufs.min }} -{% endif %} - -[Install] -WantedBy=multi-user.target diff --git a/roles/bootstrap/determine_dataplane_interfaces/defaults/main.yml b/roles/bootstrap/determine_dataplane_interfaces/vars/main.yml similarity index 100% rename from roles/bootstrap/determine_dataplane_interfaces/defaults/main.yml rename to roles/bootstrap/determine_dataplane_interfaces/vars/main.yml diff --git a/roles/bootstrap/golang_install/defaults/main.yml b/roles/bootstrap/golang_install/defaults/main.yml index abe7b52c..b4104aff 100644 --- a/roles/bootstrap/golang_install/defaults/main.yml +++ b/roles/bootstrap/golang_install/defaults/main.yml @@ -14,6 +14,6 @@ ## limitations under the License. ## --- -golang_version: "1.19.3" +golang_version: "1.20.4" golang_download_url: "https://dl.google.com/go/go{{ golang_version }}.linux-amd64.tar.gz" -golang_download_checksum: "sha256:74b9640724fd4e6bb0ed2a1bc44ae813a03f1e72a4c76253e2d5c015494430ba" +golang_download_checksum: "sha256:698ef3243972a51ddb4028e4a1ac63dc6d60821bf18e59a807e051fee0a385bd" diff --git a/roles/cndp_dp_install/tasks/cndp_device_plugin_deploy.yml b/roles/bootstrap/install-qatlibs/defaults/main.yml similarity index 56% rename from roles/cndp_dp_install/tasks/cndp_device_plugin_deploy.yml rename to roles/bootstrap/install-qatlibs/defaults/main.yml index 004f9699..f6cf3ed9 100644 --- a/roles/cndp_dp_install/tasks/cndp_device_plugin_deploy.yml +++ b/roles/bootstrap/install-qatlibs/defaults/main.yml @@ -14,15 +14,10 @@ ## limitations under the License. ## --- -- name: populate Intel CNDP Device Plugin {{ cndp_k8s_object }} yaml file and push to controller - template: - src: "intel-cndp-plugin-{{ cndp_k8s_object }}.yml.j2" - dest: "{{ cndp_k8s_manifest_dir }}/intel-cndp-plugin-{{ cndp_k8s_object }}.yml" - trim_blocks: no - force: yes - mode: preserve +# QATLibs +intel_qatlib_download_url: "https://github.com/intel/qatlib.git" +intel_qatlib_download_url_version: "23.02.0" +intel_qatlib_download_url_dir: "{{ (project_root_dir, 'intel_qatlibs') | path_join }}" -- name: deploy Intel CNDP Device Plugin {{ cndp_k8s_object }} - k8s: - state: present - src: "{{ cndp_k8s_manifest_dir }}/intel-cndp-plugin-{{ cndp_k8s_object }}.yml" +intel_qat_4xxx_firmware_download_url: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/qat_4xxx.bin +intel_qat_4xxx_mmp_firmware_download_url: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/qat_4xxx_mmp.bin diff --git a/roles/bootstrap/install-qatlibs/tasks/main.yml b/roles/bootstrap/install-qatlibs/tasks/main.yml new file mode 100644 index 00000000..224c6a8e --- /dev/null +++ b/roles/bootstrap/install-qatlibs/tasks/main.yml @@ -0,0 +1,133 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: install dependencies for qatlibs + include_role: + name: install_dependencies + +- name: download qat_4xxx firmware if not exists + block: + - name: check qat_4xxx.bin firmware existence + ansible.builtin.stat: + path: /lib/firmware/qat_4xxx.bin + register: qat_4xxx_exist + + - name: check qat_4xxx.bin.xz firmware existence + ansible.builtin.stat: + path: /lib/firmware/qat_4xxx.bin.xz + register: qat_4xxx_xz_exist + + - name: download qat_4xxx firmware to /lib/firmware folder + ansible.builtin.get_url: + url: "{{ intel_qat_4xxx_firmware_download_url }}" + dest: /lib/firmware/qat_4xxx.bin + mode: 0755 + register: get_url_results + retries: "{{ number_of_retries | default(5) }}" + until: get_url_results is success + delay: "{{ retry_delay | default(3) }}" + when: + - not qat_4xxx_exist.stat.exists + - not qat_4xxx_xz_exist.stat.exists + + - name: check qat_4xxx_mmp.bin firmware existence + ansible.builtin.stat: + path: /lib/firmware/qat_4xxx_mmp.bin + register: qat_4xxx_mmp_exist + + - name: check qat_4xxx_mmp.bin.xz firmware existence + ansible.builtin.stat: + path: /lib/firmware/qat_4xxx_mmp.bin.xz + register: qat_4xxx_mmp_xz_exist + + - name: download qat_4xxx_mmp firmware to /lib/firmware folder + ansible.builtin.get_url: + url: "{{ intel_qat_4xxx_mmp_firmware_download_url }}" + dest: /lib/firmware/qat_4xxx_mmp.bin + mode: 0755 + register: get_url_results + retries: "{{ number_of_retries | default(5) }}" + until: get_url_results is success + delay: "{{ retry_delay | default(3) }}" + when: + - not qat_4xxx_mmp_exist.stat.exists + - not qat_4xxx_mmp_xz_exist.stat.exists + + - name: uninstall the qat_4xxx driver + community.general.modprobe: + name: qat_4xxx + state: absent + when: + - (not qat_4xxx_mmp_exist.stat.exists and + not qat_4xxx_mmp_xz_exist.stat.exists) or + (not qat_4xxx_exist.stat.exists and + not qat_4xxx_xz_exist.stat.exists) + + - name: reinstall the qat_4xxx driver + community.general.modprobe: + name: qat_4xxx + state: present + when: + - (not qat_4xxx_mmp_exist.stat.exists and + not qat_4xxx_mmp_xz_exist.stat.exists) or + (not qat_4xxx_exist.stat.exists and + not qat_4xxx_xz_exist.stat.exists) + when: + - not on_vms | default(false) | bool + +# Intel QATLibs +- name: Install Intel QATLibs + block: + - name: create directory {{ intel_qatlib_download_url_dir }} for Intel QATLibs configuration + file: + path: "{{ intel_qatlib_download_url_dir }}" + state: directory + mode: '0700' + + - name: download Intel QATLib + git: + repo: "{{ intel_qatlib_download_url }}" + dest: "{{ intel_qatlib_download_url_dir }}" + version: "{{ intel_qatlib_download_url_version }}" + force: true + + # using shell module instead of comand as it was giving aclocal: warning: causing playbook failure + - name: run autogen before configure QATLibs + shell: './autogen.sh' + args: + chdir: "{{ intel_qatlib_download_url_dir }}" + executable: /bin/bash + changed_when: true + + - name: check all packages are present for QATLibs installation + command: './configure --enable-service' + args: + chdir: "{{ intel_qatlib_download_url_dir }}" + changed_when: true + + - name: make install QAT drivers + make: + chdir: "{{ intel_qatlib_download_url_dir }}" + target: install + become: yes + + - name: reload the dynamic linker cache + command: "ldconfig" + changed_when: true + when: + - configured_arch in ["spr", "emr"] + - configure_qat | default(false) | bool + - not update_qat_drivers | default(false) | bool diff --git a/roles/cndp_install/vars/main.yml b/roles/bootstrap/install-qatlibs/vars/main.yml similarity index 60% rename from roles/cndp_install/vars/main.yml rename to roles/bootstrap/install-qatlibs/vars/main.yml index e83554b4..433cf199 100644 --- a/roles/cndp_install/vars/main.yml +++ b/roles/bootstrap/install-qatlibs/vars/main.yml @@ -16,28 +16,42 @@ --- install_dependencies: Debian: - - git - - make - - build-essential - - libbsd-dev - - libelf-dev - - libjson-c-dev - - libnl-3-dev - - libnl-cli-3-dev - - libnuma-dev - - libpcap-dev - - meson + - cmake + - g++ - pkg-config - RedHat: - - "https://dl.fedoraproject.org/pub/epel/epel-release-latest-{{ ansible_distribution_major_version }}.noarch.rpm" + - wget + - make + - yasm + - libboost-all-dev + - libnl-genl-3-dev + - zlib1g + - zlib1g-dev + - libssl-dev - git - - cmake - - "@Development tools" - - libbsd-devel - - libbpf-devel - - numactl-devel - - json-c-devel - - libpcap-devel + - autoconf + - libtool + - m4 + - nasm + - systemd + RedHat: + - "@Development Tools" + - pciutils + - libudev-devel + - gcc-c++ + - elfutils-devel + - gcc + - openssl-devel + - elfutils-libelf-devel + - wget + - make + - perl + - usbutils + - yasm + - boost-devel - libnl3-devel - - meson - - pkg-config + - git + - autoconf + - libtool + - m4 + - nasm + - systemd diff --git a/roles/bootstrap/install_gpu_driver/defaults/main.yml b/roles/bootstrap/install_gpu_driver/defaults/main.yml index b6325668..7519ac97 100644 --- a/roles/bootstrap/install_gpu_driver/defaults/main.yml +++ b/roles/bootstrap/install_gpu_driver/defaults/main.yml @@ -17,91 +17,69 @@ gpu_repo_key_url: https://repositories.intel.com/graphics/intel-graphics.key gpu_key_text_path: /tmp/intel-graphic-key.txt gpu_usr_key_path: /usr/share/keyrings/intel-graphics.gpg -gpu_oem_kernel_image: linux-image-{{ gpu_oem_kernel_version }} -gpu_tool_packages: - - hwinfo - - vainfo - - clinfo -# Variables for Ubuntu 20.04 -gpu_repo_list_path_focal: /etc/apt/sources.list.d/intel.gpu.focal.list -gpu_repo_spec_focal: "focal main" -gpu_repo_focal_url: https://repositories.intel.com/graphics/ubuntu +# repo for different OS and different GPU type +gpu_repo_ubuntu_url: https://repositories.intel.com/graphics/ubuntu +gpu_repo_list_path: /etc/apt/sources.list.d/intel.gpu.list -gpu_dkms_packages_focal: +gpu_repo_spec_u2204_flex: "jammy flex" +gpu_repo_spec_u2204_arc: "jammy arc" + +kernel_dkms_packages: - gawk - dkms - - linux-headers-{{ gpu_oem_kernel_version }} - - libc-dev - - intel-i915-dkms - - intel-platform-cse-dkms - - pmt + - libc6-dev -gpu_runtime_packages_focal: - - intel-opencl-icd - - intel-level-zero-gpu - - level-zero - - intel-media-va-driver-non-free - - libmfx1 - - libmfxgen1 - - libvpl2 - - libegl-mesa0 - - libegl1-mesa - - libegl1-mesa-dev - - libgbm1 - - libgl1-mesa-dev - - libgl1-mesa-dri - - libglapi-mesa - - libgles2-mesa-dev - - libglx-mesa0 - - libigdgmm11 - - libxatracker2 - - mesa-va-drivers - - mesa-vdpau-drivers - - mesa-vulkan-drivers - - va-driver-all +# intel dgpu release 20230526 for Ubuntu 22.04 +gpu_kmd_packages_u2204_20230526: + - {pkg: intel-platform-vsec-dkms, ver: 2023.20.0-3} + - {pkg: intel-platform-cse-dkms, ver: 2023.11.1-36} + - {pkg: intel-i915-dkms, ver: 1.23.4.15.230307.15.5.17.0.1030+i28-1} + - {pkg: intel-fw-gpu, ver: 2023.12.2+207} -# Variables for Ubuntu 22.04 -gpu_repo_list_path_jammy: /etc/apt/sources.list.d/intel.gpu.jammy.list -gpu_repo_spec_jammy: "jammy flex" +gpu_umd_rt_packages_u2204_20230526: + - {pkg: intel-opencl-icd, ver: 23.13.26032.26-627~22.04} + - {pkg: intel-level-zero-gpu, ver: 1.3.26032.26-627~22.04} + - {pkg: level-zero, ver: 1.9.9-625~22.04} + - {pkg: intel-media-va-driver-non-free, ver: 23.1.6-622~22.04} + - {pkg: libmfx1, ver: 23.1.6-622~22.04} + - {pkg: libmfxgen1, ver: 23.1.5-622~22.04} + - {pkg: libvpl2, ver: 2023.1.3.0-622~22.04} + - {pkg: libegl-mesa0, ver: 23.2.0.20230414.1+2061~u22.04} + - {pkg: libegl1-mesa, ver: 23.2.0.20230414.1+2061~u22.04} + - {pkg: libegl1-mesa-dev, ver: 23.2.0.20230414.1+2061~u22.04} + - {pkg: libgbm1, ver: 23.2.0.20230414.1+2061~u22.04} + - {pkg: libgl1-mesa-dev, ver: 23.2.0.20230414.1+2061~u22.04} + - {pkg: libgl1-mesa-dri, ver: 23.2.0.20230414.1+2061~u22.04} + - {pkg: libglapi-mesa, ver: 23.2.0.20230414.1+2061~u22.04} + - {pkg: libgles2-mesa-dev, ver: 23.2.0.20230414.1+2061~u22.04} + - {pkg: libglx-mesa0, ver: 23.2.0.20230414.1+2061~u22.04} + - {pkg: libigdgmm12, ver: 22.3.5-622~22.04} + - {pkg: libxatracker2, ver: 23.2.0.20230414.1+2061~u22.04} + - {pkg: mesa-va-drivers, ver: 23.2.0.20230414.1+2061~u22.04} + - {pkg: mesa-vdpau-drivers, ver: 23.2.0.20230414.1+2061~u22.04} + - {pkg: mesa-vulkan-drivers, ver: 23.2.0.20230414.1+2061~u22.04} + - {pkg: va-driver-all, ver: 2.18.0.2-60~u22.04} -gpu_dkms_packages_jammy: - - gawk - - dkms - - linux-headers-{{ gpu_oem_kernel_version }} - - libc6-dev - - intel-platform-vsec-dkms - - intel-platform-cse-dkms - - intel-i915-dkms - - intel-fw-gpu - - linux-modules-extra-{{ gpu_oem_kernel_version }} +gpu_dev_packages_u2204_20230526: + - {pkg: libigc1, ver: 1.0.13700.13-627~22.04} + - {pkg: libigc-dev, ver: 1.0.13700.13-627~22.04} + - {pkg: intel-igc-cm, ver: 1.0.176+i600~22.04} + - {pkg: libigdfcl1, ver: 1.0.13700.13-627~22.04} + - {pkg: libigdfcl-dev, ver: 1.0.13700.13-627~22.04} + - {pkg: libigfxcmrt7, ver: 23.1.6-622~22.04} + - {pkg: libigfxcmrt-dev, ver: 23.1.6-622~22.04} + - {pkg: level-zero-dev, ver: 1.9.9-625~22.04} + +gpu_tool_packages_u2204_20230526: + - {pkg: xpu-smi, ver: 1.2.3-13~u22.04} -gpu_runtime_packages_jammy: - - intel-opencl-icd - - intel-level-zero-gpu - - level-zero - - intel-media-va-driver-non-free - - libmfx1 - - libmfxgen1 - - libvpl2 - - libegl-mesa0 - - libegl1-mesa - - libegl1-mesa-dev - - libgbm1 - - libgl1-mesa-dev - - libgl1-mesa-dri - - libglapi-mesa - - libgles2-mesa-dev - - libglx-mesa0 - - libigdgmm12 - - libxatracker2 - - mesa-va-drivers - - mesa-vdpau-drivers - - mesa-vulkan-drivers - - va-driver-all - - libigc-dev - - intel-igc-cm - - libigdfcl-dev - - libigfxcmrt-dev - - level-zero-dev +# intel dgpu release independent test packages +gpu_test_packages: + - hwinfo + - vainfo + - clinfo + - mesa-utils + - vulkan-tools + - intel-gpu-tools diff --git a/roles/bootstrap/install_gpu_driver/files/cek_detect_gpu_type.py b/roles/bootstrap/install_gpu_driver/files/cek_detect_gpu_type.py new file mode 100644 index 00000000..4a80a919 --- /dev/null +++ b/roles/bootstrap/install_gpu_driver/files/cek_detect_gpu_type.py @@ -0,0 +1,56 @@ + +import os +import sys + +intel_dgpu_types = { + "56c0" : "Flex", + "56c1" : "Flex", + + "5690" : "Arc", + "5691" : "Arc", + "5692" : "Arc", + "5693" : "Arc", + "5694" : "Arc", + "5695" : "Arc", + "5696" : "Arc", + "5697" : "Arc", + + "56a0" : "Arc", + "56a1" : "Arc", + "56a2" : "Arc", + "56a3" : "Arc", + "56a4" : "Arc", + "56a5" : "Arc", + "56a6" : "Arc", + + "56b0" : "Arc", + "56b1" : "Arc", + "56b2" : "Arc", + "56b3" : "Arc", +} + + +def detect_gpu_type(): + cmd = 'lspci | grep -i -E "Display|VGA" | grep Intel' + result = os.popen(cmd) + info_list = result.read() + lines = info_list.splitlines() + line_count = len(lines) + if line_count > 0 : + line = lines[0] + idx1 = line.find("Device") + len("Device ") + idx2 = line.find(" ", idx1+1) + chip_id = line[idx1 : idx2].lower() + else: + chip_id = "Unknown" + + if chip_id in intel_dgpu_types : + gpu_type = intel_dgpu_types[chip_id] + else : + gpu_type = "Unknown" + + print(gpu_type) + print(chip_id) + +detect_gpu_type() +sys.exit(0) diff --git a/roles/bootstrap/install_gpu_driver/tasks/debian.yml b/roles/bootstrap/install_gpu_driver/tasks/debian.yml index 52aa2c83..a97e10ab 100644 --- a/roles/bootstrap/install_gpu_driver/tasks/debian.yml +++ b/roles/bootstrap/install_gpu_driver/tasks/debian.yml @@ -15,146 +15,127 @@ ## --- -# The installation steps based on get started guide at https://dgpu-docs.intel.com +# The installation steps based https://dgpu-docs.intel.com - name: Download Intel graphic gpg key in text format - get_url: + ansible.builtin.get_url: url: "{{ gpu_repo_key_url }}" dest: "{{ gpu_key_text_path }}" - force: yes + force: true mode: 0644 # TODO: This file will block the gpg command if not removed. - name: Remove the key file - file: + ansible.builtin.file: state: absent path: "{{ gpu_usr_key_path }}" - name: Add Intel graphic gpg key to system - expect: - command: "gpg --dearmor --output {{ gpu_usr_key_path }} {{ gpu_key_text_path }}" - responses: - enter: 'y' + command: "gpg --dearmor --output {{ gpu_usr_key_path }} {{ gpu_key_text_path }}" + changed_when: false - name: Add Intel graphic driver repo - apt_repository: + ansible.builtin.apt_repository: filename: "{{ gpu_repo_list_path }}" - repo: "deb [arch=amd64 signed-by={{ gpu_usr_key_path }}] {{ gpu_repo_focal_url }} {{ gpu_repo_spec }}" + repo: "deb [arch=amd64 signed-by={{ gpu_usr_key_path }}] {{ gpu_repo_ubuntu_url }} {{ gpu_repo_spec }}" state: present - update_cache: yes + update_cache: true - name: Run apt update before kernel installation - apt: - update_cache: yes + ansible.builtin.apt: + update_cache: true register: update_cache_results retries: "{{ number_of_retries | default(5) }}" until: update_cache_results is success delay: "{{ retry_delay | default(3) }}" -- name: Install OEM kernel - apt: - name: "{{ gpu_oem_kernel_image }}" - -- name: Fetch kernel fisrt entry - shell: "set -o pipefail && cat /boot/grub/grub.cfg | grep submenu | awk -F \"'\" '{print $2}'" - args: - executable: /bin/bash - register: kernel_fisrt_entry - failed_when: kernel_fisrt_entry.rc > 1 - changed_when: false - -- debug: msg={{ kernel_fisrt_entry.stdout }} - -- name: Fetch kernel second entry - shell: "set -o pipefail && cat /boot/grub/grub.cfg | grep menuentry | grep {{ gpu_oem_kernel_version }} | grep -v recovery | awk -F \"'\" '{print $2}'" - args: - executable: /bin/bash - register: kernel_second_entry - failed_when: kernel_second_entry.rc > 1 - changed_when: false - -- debug: msg={{ kernel_second_entry.stdout }} - -- name: Set OEM kernel(2-level entries) as default boot kernel - lineinfile: - path: /etc/default/grub - regexp: "^GRUB_DEFAULT" - line: GRUB_DEFAULT="{{ kernel_fisrt_entry.stdout }}>{{ kernel_second_entry.stdout }}" - when: kernel_fisrt_entry.stdout != "" - -- name: Set OEM kernel(1-level entry) as default boot kernel - lineinfile: - path: /etc/default/grub - regexp: "^GRUB_DEFAULT" - line: GRUB_DEFAULT="{{ kernel_second_entry.stdout }}" - when: kernel_fisrt_entry.stdout == "" - -- name: Update boot configure - command: "update-grub" - changed_when: false - -- name: Reboot to updated kernel - reboot: - reboot_timeout: 1200 - -- name: Get update kernel version - command: "uname -r" - register: oem_kernel_ver - changed_when: false - -- name: Show the new kernel version - debug: - msg: "New kernel version is {{ oem_kernel_ver.stdout }}" +- name: Set fact for kernel version + ansible.builtin.set_fact: + kernel_ver: "{{ ansible_kernel }}" -- name: Check new kernel version - assert: - that: oem_kernel_ver.stdout == "{{ gpu_oem_kernel_version }}" - msg: "Wrong kernel version: {{ oem_kernel_ver.stdout }}" +- name: Install kernel headers incase it missed + ansible.builtin.apt: + name: linux-headers-{{ kernel_ver }} -- name: Run apt update before dkms installation - apt: - update_cache: yes - register: update_cache_results - retries: "{{ number_of_retries | default(5) }}" - until: update_cache_results is success - delay: "{{ retry_delay | default(3) }}" - -- name: Remove the unused kernel headers before dkms installation +- name: Set current kernel as default boot kernel in case there are multiple kernels in system block: - - name: Fetch the installed kernel headers - shell: "set -o pipefail && dpkg --list | grep linux-headers | awk '{ print $2 }' |grep -v {{ gpu_oem_kernel_version }} | grep -v hwe" + - name: Fetch kernel first entry + ansible.builtin.shell: + "set -o pipefail && cat /boot/grub/grub.cfg | grep submenu | awk -F \"'\" '{print $2}'" + args: + executable: /bin/bash + register: kernel_first_entry + failed_when: kernel_first_entry.rc > 1 + changed_when: false + + - name: Fetch kernel second entry + ansible.builtin.shell: + "set -o pipefail && cat /boot/grub/grub.cfg | grep menuentry | grep {{ kernel_ver }} | grep -v recovery | awk -F \"'\" '{print $2}'" args: executable: /bin/bash - register: installed_kernel_headers - failed_when: installed_kernel_headers.rc > 1 + register: kernel_second_entry + failed_when: kernel_second_entry.rc > 1 changed_when: false - tags: - - atsm - - - debug: - msg: "{{ installed_kernel_headers.stdout }}" - tags: - - atsm - - name: Remove the unused linux kernel headers - apt: - name: "{{ item }}" - state: absent - with_items: "{{installed_kernel_headers.stdout_lines}}" - when: installed_kernel_headers.rc == 0 - tags: - - atsm - -- name: Install DKMS(Dynamic Kernel Module Support) and kernel header files - apt: + + - name: Set OEM kernel(2-level entries) as default boot kernel + ansible.builtin.lineinfile: + path: /etc/default/grub + regexp: "^GRUB_DEFAULT" + line: GRUB_DEFAULT="{{ kernel_first_entry.stdout }}>{{ kernel_second_entry.stdout }}" + when: kernel_first_entry.stdout != "" + + - name: Set OEM kernel(1-level entry) as default boot kernel + ansible.builtin.lineinfile: + path: /etc/default/grub + regexp: "^GRUB_DEFAULT" + line: GRUB_DEFAULT="{{ kernel_second_entry.stdout }}" + when: kernel_first_entry.stdout == "" + + - name: Add support for multi-gpu system + ansible.builtin.lineinfile: + path: /etc/default/grub + line: GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX} pci=realloc=off" + + - name: Update boot configure + ansible.builtin.command: "update-grub" + changed_when: false + +- name: Install kernel dkms packages + ansible.builtin.apt: name: "{{ item }}" - with_items: "{{ gpu_dkms_packages }}" + with_items: "{{ kernel_dkms_packages }}" + +- name: Install gpu kernel mode driver packages + ansible.builtin.apt: + name: "{{ item.pkg }}={{ item.ver }}" + allow_downgrade: true + with_items: "{{ gpu_kmd_packages }}" + +- name: Install gpu user mode driver and runtime packages + ansible.builtin.apt: + name: "{{ item.pkg }}={{ item.ver }}" + allow_downgrade: true + with_items: "{{ gpu_umd_rt_packages }}" + +- name: Install gpu dev packages + ansible.builtin.apt: + name: "{{ item.pkg }}={{ item.ver }}" + allow_downgrade: true + with_items: "{{ gpu_dev_packages }}" + +- name: Install gpu tool packages + ansible.builtin.apt: + name: "{{ item.pkg }}={{ item.ver }}" + allow_downgrade: true + with_items: "{{ gpu_tool_packages }}" -- name: Install run-time packages - apt: +- name: Install gpu test packages + ansible.builtin.apt: name: "{{ item }}" - with_items: "{{ gpu_runtime_packages }}" + allow_downgrade: true + with_items: "{{ gpu_test_packages }}" - name: Reboot the system for these changes to take effect - reboot: + ansible.builtin.reboot: reboot_timeout: 1200 - name: Create render group if it doesn't exist @@ -166,55 +147,8 @@ user: name: "{{ ansible_user_id }}" groups: render - append: yes + append: true - name: Apply the current user to the new group id immediately meta: reset_connection - -- name: Install graphic driver tools for inspection - apt: - name: "{{ item }}" - with_items: "{{ gpu_tool_packages }}" - -- name: Run hwinfo - command: hwinfo --display - register: hwinfo_msg - changed_when: false - -- name: Set fact for i915 driver string - set_fact: - drv_list: "{{ hwinfo_msg.stdout | regex_findall('Driver: \"i915\"', multiline=True) }}" - -- name: Check if i915 installed successfully - assert: - that: (drv_list | length) > 0 - msg: "Can't find i915 driver" - -- name: Run vainfo to check libVA readiness - command: vainfo - register: vainfo_msg - changed_when: false - -- name: Show vainfo output - debug: - msg: "{{ vainfo_msg.stdout }}" - -- name: Check libVA VA-API version greater than 1.14 - assert: - that: "{{ (vainfo_msg.stdout | regex_search('vainfo: VA-API version: \\d.\\d{2} \\(libva \\d.\\d{2}.\\d\\)')).split()[3] }} > 1.14" - msg: "VA-API version must be greater than 1.14" - -- name: Run clinfo to check OpenCL readiness - command: clinfo - register: clinfo_msg - changed_when: false - -- name: Show clinfo output - debug: - msg: "{{ clinfo_msg.stdout }}" - -- name: Check OpenCL driver-- 'Number of platforms' must be greater than 0 - assert: - that: "{{ (clinfo_msg.stdout | regex_search('Number of platforms\\s+\\d')).split()[3] }} > 0" - msg: "'Number of platforms' must be greater than 0" diff --git a/roles/bootstrap/install_gpu_driver/tasks/main.yml b/roles/bootstrap/install_gpu_driver/tasks/main.yml index 57af9fe0..efff7cb0 100644 --- a/roles/bootstrap/install_gpu_driver/tasks/main.yml +++ b/roles/bootstrap/install_gpu_driver/tasks/main.yml @@ -14,30 +14,56 @@ ## limitations under the License. ## --- -- name: Set fact for the installation for Ubuntu 22.04 +- name: Install gpu type detection script to /usr/local/bin + copy: + src: "{{ item }}" + dest: "/usr/local/bin/" + mode: 0700 + owner: root + group: root + force: true + with_items: + - 'cek_detect_gpu_type.py' + become: true + +- name: Detect gpu type + command: "python /usr/local/bin/cek_detect_gpu_type.py" + register: gpu_type_result + changed_when: false + +- name: Output gpu type detection result + debug: + msg: "{{ gpu_type_result.stdout_lines }}" + +- name: Set repo for Flex GPU installation on Ubuntu 22.04 set_fact: - gpu_repo_list_path: "{{ gpu_repo_list_path_jammy }}" - gpu_repo_spec: "{{ gpu_repo_spec_jammy }}" - gpu_dkms_packages: "{{ gpu_dkms_packages_jammy }}" - gpu_runtime_packages: "{{ gpu_runtime_packages_jammy }}" - when: ansible_distribution_version == "22.04" + gpu_repo_spec: "{{ gpu_repo_spec_u2204_flex }}" + when: + - (ansible_distribution == "Ubuntu" and ansible_distribution_version == "22.04") + - gpu_type_result.stdout_lines[0] == "Flex" -- name: Set fact for the installation for Ubuntu 20.04 +- name: Set repo for Arc GPU installation on Ubuntu 22.04 set_fact: - gpu_repo_list_path: "{{ gpu_repo_list_path_focal }}" - gpu_repo_spec: "{{ gpu_repo_spec_focal }}" - gpu_dkms_packages: "{{ gpu_dkms_packages_focal }}" - gpu_runtime_packages: "{{ gpu_runtime_packages_focal }}" - when: ansible_distribution_version == "20.04" + gpu_repo_spec: "{{ gpu_repo_spec_u2204_arc }}" + when: + - (ansible_distribution == "Ubuntu" and ansible_distribution_version == "22.04") + - gpu_type_result.stdout_lines[0] == "Arc" -# Based on Release Site: Only Ubuntu 22.04, 20.04 and RHEL 8.5, 8.6 are supported. -# https://dgpu-docs.intel.com/installation-guides/index.html#intel-data-center-gpu-flex-series -- name: install GPU drivers on RHEL 8.5 - include_tasks: rhel.yml +- name: Set gpu package version to specific release on Ubuntu 22.04 + set_fact: + gpu_kmd_packages: "{{ gpu_kmd_packages_u2204_20230526 }}" + gpu_umd_rt_packages: "{{ gpu_umd_rt_packages_u2204_20230526 }}" + gpu_dev_packages: "{{ gpu_dev_packages_u2204_20230526 }}" + gpu_tool_packages: "{{ gpu_tool_packages_u2204_20230526 }}" when: - - (ansible_os_family == "RedHat" and ansible_distribution_version >= "8.5") + - (ansible_distribution == "Ubuntu" and ansible_distribution_version == "22.04") -- name: install GPU drivers on Ubuntu +- name: Install GPU drivers on Ubuntu include_tasks: debian.yml when: - (ansible_distribution == 'Ubuntu') + +- name: Install GPU drivers on RHEL 8.x + include_tasks: rhel.yml + when: + - (ansible_os_family == "RedHat" and ansible_distribution_version >= "8.6") diff --git a/roles/bootstrap/install_gpu_driver/tasks/rhel.yml b/roles/bootstrap/install_gpu_driver/tasks/rhel.yml index 7e04d61c..4b08b78d 100644 --- a/roles/bootstrap/install_gpu_driver/tasks/rhel.yml +++ b/roles/bootstrap/install_gpu_driver/tasks/rhel.yml @@ -14,4 +14,5 @@ ## limitations under the License. ## --- -- debug: msg="RedHat installation TBD" +- name: GPU driver installation on RedHat + debug: msg="RedHat installation TBD" diff --git a/roles/bootstrap/install_packages/defaults/main.yml b/roles/bootstrap/install_packages/defaults/main.yml new file mode 100644 index 00000000..d3e739f2 --- /dev/null +++ b/roles/bootstrap/install_packages/defaults/main.yml @@ -0,0 +1,17 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +fedora_epel_repo_url: "https://dl.fedoraproject.org/pub/epel" diff --git a/roles/bootstrap/install_packages/tasks/debian.yml b/roles/bootstrap/install_packages/tasks/debian.yml index ddc255d1..2ec88b11 100644 --- a/roles/bootstrap/install_packages/tasks/debian.yml +++ b/roles/bootstrap/install_packages/tasks/debian.yml @@ -27,21 +27,33 @@ - {type: 'https', value: "{{ https_proxy | default('') }}"} when: http_proxy is defined or https_proxy is defined -- name: reconfigure unattended-upgrades package - command: dpkg-reconfigure --priority=low -f noninteractive unattended-upgrades - args: - creates: "/etc/apt/apt.conf.d/20auto-upgrades" - - name: disable automatic package updates - replace: - path: "{{ item }}" - regexp: "(APT::Periodic::.* )\"1\";$" - replace: "\\1\"0\";" - mode: 0600 + apt: + name: unattended-upgrades + purge: true + state: absent + lock_timeout: 120 + when: ansible_os_family == "Debian" + +- name: disable daily apt timers + ansible.builtin.systemd: + name: "{{ item }}" + state: stopped + enabled: false with_items: - - "/etc/apt/apt.conf.d/20auto-upgrades" - - "/etc/apt/apt.conf.d/10periodic" - failed_when: false + - "apt-daily-upgrade.timer" + - "apt-daily.timer" + when: ansible_os_family == "Debian" + +- name: disable all periodic routines by apt + copy: + dest: "/etc/apt/apt.conf.d/99periodic-disable" + content: | + APT::Periodic::Enable "0"; + owner: root + group: root + mode: 0644 + when: ansible_os_family == "Debian" - name: install build-essential package apt: @@ -130,6 +142,13 @@ link: /usr/bin/python when: ansible_os_family == "Debian" +- name: WA for libudev-dev version issue on Ubuntu + apt: + state: latest # noqa package-latest + name: udev + when: + - ansible_os_family == "Debian" + - name: perform dist-upgrade on Debian OS family apt: upgrade: dist @@ -144,6 +163,12 @@ state: present when: ansible_os_family == "Debian" +- name: ensure iptables is installed + apt: + name: iptables + state: present + when: ansible_os_family == "Debian" + - name: install command line tools to collect hardware details apt: name: diff --git a/roles/bootstrap/install_packages/tasks/main.yml b/roles/bootstrap/install_packages/tasks/main.yml index d07be723..62fcf10d 100644 --- a/roles/bootstrap/install_packages/tasks/main.yml +++ b/roles/bootstrap/install_packages/tasks/main.yml @@ -44,6 +44,7 @@ - six>=1.15.0 - websocket-client==0.58.0 - oauthlib==3.1.0 + - docker==6.1.3 state: present register: pip_result retries: 5 diff --git a/roles/bootstrap/install_packages/tasks/rhel.yml b/roles/bootstrap/install_packages/tasks/rhel.yml index eb7e00e1..e623d997 100644 --- a/roles/bootstrap/install_packages/tasks/rhel.yml +++ b/roles/bootstrap/install_packages/tasks/rhel.yml @@ -60,29 +60,31 @@ - name: obtain RPM-GPG-KEY-EPEL-8 rpm_key: state: present - key: https://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-8 + key: "{{ fedora_epel_repo_url }}/RPM-GPG-KEY-EPEL-8" when: - ansible_distribution in ['RedHat', 'Rocky'] - ansible_distribution_version >= '8' + - ansible_distribution_version < '9' - name: install RPM-GPG-KEY-EPEL-8 package: - name: https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm + name: "{{ fedora_epel_repo_url }}/epel-release-latest-8.noarch.rpm" when: - ansible_distribution in ['RedHat', 'Rocky'] - ansible_distribution_version >= '8' + - ansible_distribution_version < '9' - name: obtain RPM-GPG-KEY-EPEL-9 rpm_key: state: present - key: https://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-9 + key: "{{ fedora_epel_repo_url }}/RPM-GPG-KEY-EPEL-9" when: - ansible_distribution in ["RedHat", "Rocky"] - ansible_distribution_version >= "9" - name: install RPM-GPG-KEY-EPEL-9 package: - name: https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm + name: "{{ fedora_epel_repo_url }}/epel-release-latest-9.noarch.rpm" when: - ansible_distribution in ["RedHat", "Rocky"] - ansible_distribution_version >= "9" @@ -92,7 +94,7 @@ block: - name: get list of packages uri: - url: "https://download-ib01.fedoraproject.org/pub/epel/9/Everything/x86_64/Packages/c/" + url: "{{ fedora_epel_repo_url }}/9/Everything/x86_64/Packages/c/" return_content: true register: epel_output @@ -104,7 +106,7 @@ - name: download CPUID on Rocky >= 9.0 get_url: - url: "https://download-ib01.fedoraproject.org/pub/epel/9/Everything/x86_64/Packages/c/{{ cpuid_rpm }}" + url: "{{ fedora_epel_repo_url }}/9/Everything/x86_64/Packages/c/{{ cpuid_rpm }}" dest: "{{ project_root_dir }}" mode: 0755 @@ -196,7 +198,7 @@ - name: install epel-release on Amazon Linux 2 package: - name: https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm + name: "{{ fedora_epel_repo_url }}/epel-release-latest-7.noarch.rpm" state: present retries: 5 delay: 10 diff --git a/roles/bootstrap/install_qat_drivers_services/defaults/main.yml b/roles/bootstrap/install_qat_drivers_services/defaults/main.yml index a40261b5..d74c2f3f 100644 --- a/roles/bootstrap/install_qat_drivers_services/defaults/main.yml +++ b/roles/bootstrap/install_qat_drivers_services/defaults/main.yml @@ -15,14 +15,14 @@ ## --- # QAT 1.x (for QAT add-on cards) -qat_drivers_version: 'QAT.L.4.20.0-00001' -qat_drivers_download_url: 'https://downloadmirror.intel.com/761891/{{ qat_drivers_version }}.tar.gz' -qat_drivers_pkg_checksum: 'sha1:A78B360F33D270A1FBF7BBD60B2BA69253771771' +qat_drivers_version: 'QAT.L.4.22.0-00001' +qat_drivers_download_url: 'https://downloadmirror.intel.com/780675/{{ qat_drivers_version }}.tar.gz' +qat_drivers_pkg_checksum: 'sha1:36EF82C00802F6A6D1494BC31B5C1264E2DFDFC4' # QAT 2.x (for QAT embedded into SPR) -qat_spr_drivers_version: 'QAT20.L.1.0.10-00005' -qat_spr_drivers_download_url: 'https://downloadmirror.intel.com/769043/{{ qat_spr_drivers_version }}.tar.gz' -qat_spr_drivers_pkg_checksum: 'sha1:15524FCB45A12B2372258475CA2BA774FF8B97BD' +qat_spr_drivers_version: 'QAT20.L.1.0.40-00004' +qat_spr_drivers_download_url: 'https://downloadmirror.intel.com/781387/{{ qat_spr_drivers_version }}.tar.gz' +qat_spr_drivers_pkg_checksum: 'sha1:27B687C0D72D83BF5A8DE10A7FEF1FF3D57828F2' # If updating mentioned below folder location kindly update similar in roles/redeploy_cleanup/defaults/main.yml qat_drivers_dir: "{{ (project_root_dir, 'qat_drivers') | path_join }}" diff --git a/roles/bootstrap/install_qat_drivers_services/tasks/main.yml b/roles/bootstrap/install_qat_drivers_services/tasks/main.yml index c253d14b..9f13a7f3 100644 --- a/roles/bootstrap/install_qat_drivers_services/tasks/main.yml +++ b/roles/bootstrap/install_qat_drivers_services/tasks/main.yml @@ -18,29 +18,9 @@ include_role: name: install_dependencies -- name: WA for libudev-dev version issue on Ubuntu - apt: - name: 'udev' - state: latest # noqa package-latest - when: ansible_distribution == "Ubuntu" - -- name: get current udev package version - shell: "set -o pipefail && apt list --installed 2>/dev/null |grep '^udev' | awk 'NR==1{ print $2 }'" - args: - executable: /bin/bash - register: udev_pkg_version - changed_when: false - failed_when: "udev_pkg_version.stdout | length==0" - when: ansible_distribution == "Ubuntu" - -- name: current udev package version - debug: - msg: "udev_pkg_version={{ udev_pkg_version.stdout }}" - when: ansible_distribution == "Ubuntu" - - name: install libudev-dev package on Ubuntu apt: - name: 'libudev-dev={{ udev_pkg_version.stdout }}' + name: libudev-dev when: ansible_distribution == "Ubuntu" - name: create directory {{ qat_drivers_dir }} for all QAT dependencies @@ -67,7 +47,8 @@ dest: "{{ qat_drivers_dir }}" remote_src: yes mode: 0755 - when: configured_arch != "spr" + when: + - configured_arch not in ["spr", "emr"] - name: block for QAT 2.x block: @@ -87,7 +68,26 @@ dest: "{{ qat_drivers_dir }}" remote_src: yes mode: 0755 - when: configured_arch == "spr" + when: + - configured_arch in ["spr"] + +# Due to EMR is not lauched yet, EMR QAT driver temporally copy from ansible host +# When external driver offically support the EMR platform, converge w/ upper task +- name: block for EMR QAT driver package + block: + - name: copy EMR QAT driver package + ansible.builtin.copy: + src: "{{ (emr_qat_driver_staging_folder, emr_qat_driver_package) | path_join }}" + dest: "{{ (qat_drivers_dir, emr_qat_driver_package) | path_join }}" + mode: 0644 + - name: unarchive EMR QAT driver package + ansible.builtin.unarchive: + src: "{{ (qat_drivers_dir, emr_qat_driver_package) | path_join }}" + dest: "{{ qat_drivers_dir }}" + remote_src: yes + mode: 0755 + when: + - configured_arch == "emr" - name: check all packages are present for QAT drivers installation command: ./configure @@ -120,7 +120,8 @@ chdir: "{{ qat_drivers_dir }}" target: samples-install become: yes - when: configured_arch != "spr" + when: + - configured_arch not in ["spr", "emr"] # Reboot with driver ver: QAT20.L.0.8.0-00071 causing issues, there is no need to reboot. - name: block for QAT 2.x drivers and samples compilation @@ -137,7 +138,8 @@ chdir: "{{ qat_drivers_dir }}" target: samples-install become: yes - when: configured_arch == "spr" + when: + - configured_arch in ["spr", "emr"] - name: confirm QAT module installed shell: "set -o pipefail && lsmod | grep qat" @@ -171,9 +173,6 @@ name: "{{ enabled_qat_service }}" enabled: yes -# Mentioned below block is also present in "roles/bootstrap/configure_openssl/tasks/intel_qatlibs_and_qatsvm_configuration.yml" that will only occurs if, -# "enable_intel_qatlibs" is "true" in host_vars because, in order to compile QAT configuration, -# QATlibs must be installed before SVM feature is configured - name: configuration for QAT Shared Virtual Memory (SVM) block: - name: set QAT SVM is enabled @@ -193,5 +192,5 @@ - "{{ qat_drivers_dir }}/quickassist/utilities/adf_ctl/conf_files/4xxxvf_dev0.conf.asym.vm" - "{{ qat_drivers_dir }}/quickassist/utilities/adf_ctl/conf_files/4xxxvf_dev0.conf.dc.sym.vm" when: - - configured_arch == "spr" - - enable_intel_qatlibs is defined and not enable_intel_qatlibs + - configured_arch in ["spr", "emr"] + - enable_qat_svm | default(false) diff --git a/roles/bootstrap/install_qat_drivers_services/vars/main.yml b/roles/bootstrap/install_qat_drivers_services/vars/main.yml index e3ff015d..f306c3a1 100644 --- a/roles/bootstrap/install_qat_drivers_services/vars/main.yml +++ b/roles/bootstrap/install_qat_drivers_services/vars/main.yml @@ -42,3 +42,4 @@ install_dependencies: - usbutils - yasm - boost-devel + - libnl3-devel diff --git a/roles/bootstrap/install_realtime_kernel/defaults/main.yml b/roles/bootstrap/install_realtime_kernel/defaults/main.yml new file mode 100644 index 00000000..86498807 --- /dev/null +++ b/roles/bootstrap/install_realtime_kernel/defaults/main.yml @@ -0,0 +1,17 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +realtime_kernel_version: 5.15.0-1036-realtime +ubuntu_pro_token: "ffffffffffffffffffffffffffffff" diff --git a/roles/bootstrap/install_realtime_kernel/tasks/main.yml b/roles/bootstrap/install_realtime_kernel/tasks/main.yml index a19e41a0..e99d760f 100644 --- a/roles/bootstrap/install_realtime_kernel/tasks/main.yml +++ b/roles/bootstrap/install_realtime_kernel/tasks/main.yml @@ -16,18 +16,100 @@ --- # Installing the RT kernel is intended only for FlexRAN support -- debug: msg="NOP. Until Ubuntu 22.04 RT patch is made public by Canonical, realtime image is expected to be pre-built by user" - -# - name: Install realtime headers from the local DEB packages -# shell: "apt install -y /opt/rt-kits/*.deb" -# ignore_errors: True - -# - name: Find existing local DEB files -# find: -# paths: /opt/rt-kits/ -# patterns: "*.deb" -# register: found_debs -# - name: Install the local DEB packages -# apt: -# deb: "{{ item.path }}" -# with_items: "{{ found_debs.files }}" +- name: Get the candidate version of pro tool + shell: "set -o pipefail && apt-cache policy ubuntu-advantage-tools | grep Candidate |awk '{print $NF}'" + args: + executable: /bin/bash + register: candidate_version + changed_when: false + +- debug: + msg: "pro tool candidate version is {{ candidate_version.stdout }}" + +- name: Get the Installed version of pro tool + shell: "set -o pipefail && apt-cache policy ubuntu-advantage-tools | grep Installed |awk '{print $NF}'" + args: + executable: /bin/bash + register: installed_version + changed_when: false + +- debug: + msg: "pro tool installed version is {{ installed_version.stdout }}" + +- name: Install the pro tool + ansible.builtin.apt: + name: ubuntu-advantage-tools={{ candidate_version.stdout }} + state: present + when: installed_version.stdout != candidate_version.stdout + +- name: Configure the proxy settings for pro tool + command: "{{item}}" + with_items: + - pro config set http_proxy={{ http_proxy }} + - pro config set https_proxy={{ https_proxy }} + - pro config set apt_http_proxy={{ http_proxy }} + - pro config set apt_https_proxy={{ https_proxy }} + - pro config set ua_apt_http_proxy={{ http_proxy }} + - pro config set ua_apt_https_proxy={{ https_proxy }} + changed_when: false + +- name: Attach Ubuntu Pro token + command: "pro attach {{ ubuntu_pro_token }}" + no_log: True + changed_when: false + +- name: Enable Ubuntu RT kernel install + command: "echo y | pro enable realtime-kernel --access-only" + changed_when: false + +- name: Install Ubuntu RT kernel and related packages + ansible.builtin.apt: + name: + - linux-image-{{ realtime_kernel_version }} + - linux-headers-{{ realtime_kernel_version }} + - linux-modules-extra-{{ realtime_kernel_version }} + - linux-tools-{{ realtime_kernel_version }} + - linux-cloud-tools-{{ realtime_kernel_version }} + state: present + +- name: Fetch kernel first entry + shell: "set -o pipefail && cat /boot/grub/grub.cfg | grep submenu | awk -F \"'\" '{print $2}'" + args: + executable: /bin/bash + register: kernel_fisrt_entry + failed_when: kernel_fisrt_entry.rc > 1 + changed_when: false + +- debug: msg={{ kernel_fisrt_entry.stdout }} + +- name: Fetch kernel second entry + shell: "set -o pipefail && cat /boot/grub/grub.cfg | grep menuentry | grep {{ realtime_kernel_version }} | grep -v recovery | awk -F \"'\" '{print $2}'" + args: + executable: /bin/bash + register: kernel_second_entry + failed_when: kernel_second_entry.rc > 1 + changed_when: false + +- debug: msg={{ kernel_second_entry.stdout }} + +- name: Set RT kernel(2-level entries) as default boot kernel + lineinfile: + path: /etc/default/grub + regexp: "^GRUB_DEFAULT" + line: GRUB_DEFAULT="{{ kernel_fisrt_entry.stdout }}>{{ kernel_second_entry.stdout }}" + when: kernel_fisrt_entry.stdout != "" + +- name: Set RT kernel(1-level entry) as default boot kernel + lineinfile: + path: /etc/default/grub + regexp: "^GRUB_DEFAULT" + line: GRUB_DEFAULT="{{ kernel_second_entry.stdout }}" + when: kernel_fisrt_entry.stdout == "" + +- name: Update boot configure + command: "update-grub" + changed_when: false + +- name: Reboot to updated kernel + reboot: + reboot_timeout: 1200 diff --git a/roles/bootstrap/reset_qat_option/tasks/main.yml b/roles/bootstrap/reset_qat_option/tasks/main.yml new file mode 100644 index 00000000..19710959 --- /dev/null +++ b/roles/bootstrap/reset_qat_option/tasks/main.yml @@ -0,0 +1,32 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +# workaround for Redhat9.2 qat OOT(out-of-tree) not supported issue. +- name: block to force QAT InTree driver for RedHat9.2 OS + block: + - name: print warning message for users to force QAT InTree driver via reset the update_qat_drivers to false + ansible.builtin.debug: + msg="[warning:] RedHat9.2 only support intree driver currently, reset the update_qat_drivers to false" + + - name: reset the update_qat_drivers to false + ansible.builtin.set_fact: + update_qat_drivers: false + when: + - configure_qat | default(false) | bool + - update_qat_drivers | default(false) | bool + - ansible_os_family == "RedHat" + - ansible_distribution_version >= '9.2' + - configured_arch in ["spr", "emr"] diff --git a/roles/bootstrap/set_intel_flexran_kernel_flags/tasks/main.yml b/roles/bootstrap/set_intel_flexran_kernel_flags/tasks/main.yml index e09977c4..e1c059c1 100644 --- a/roles/bootstrap/set_intel_flexran_kernel_flags/tasks/main.yml +++ b/roles/bootstrap/set_intel_flexran_kernel_flags/tasks/main.yml @@ -109,15 +109,46 @@ - ansible_processor_cores == 56 - intel_flexran_type == "host" -- name: set Intel FlexRAN kernel flags for Docker POD on Host-32c-single (6338N CPU). See https://hub.docker.com/r/intel/flexran_vdu +- name: >- + set Intel FlexRAN kernel flags for Docker POD on Host-32c-single (6338N CPU) when kernel version is older than 5.15.0-1019RT. + See https://hub.docker.com/r/intel/flexran_vdu set_fact: - intel_flexran_cmdline: 'GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt usbcore.autosuspend=-1 selinux=0 enforcing=0 nmi_watchdog=0 crashkernel=auto softlockup_panic=0 audit=0 cgroup_disable=memory mce=off hugepagesz=1G hugepages=60 hugepagesz=2M hugepages=0 default_hugepagesz=1G kthread_cpus=0,28 irqaffinity=0,28" {{ intel_flexran_marker }}' # noqa yaml[line-length] + intel_flexran_cmdline: >- + GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt usbcore.autosuspend=-1 selinux=0 enforcing=0 nmi_watchdog=0 crashkernel=auto softlockup_panic=0 + audit=0 cgroup_disable=memory mce=off hugepagesz=1G hugepages=60 hugepagesz=2M hugepages=0 default_hugepagesz=1G + kthread_cpus=0,28 irqaffinity=0,28" {{ intel_flexran_marker }} intel_flexran_isol_cores: "1-27,29-55" intel_flexran_cpu_supported: true when: - ansible_processor_count == 1 - ansible_processor_cores == 32 - intel_flexran_type == "pod" + - ansible_kernel < "5.15.0-1019-realtime" + +# for 5.15.0-1019RT and later, cgroup_disable=memory is no longer neeeded. +# RKE2 can be not installed with cgroup_disable=memory, so use 5.15.0-1019RT and later for FlexRAN deployment on RKE2. +- name: >- + set Intel FlexRAN kernel flags for Docker POD on Host-32c-single (6338N CPU) when kernel version is 5.15.0-1019RT and later. + See https://hub.docker.com/r/intel/flexran_vdu + set_fact: + intel_flexran_cmdline: >- + GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt usbcore.autosuspend=-1 selinux=0 enforcing=0 nmi_watchdog=0 crashkernel=auto softlockup_panic=0 + audit=0 mce=off hugepagesz=1G hugepages=60 hugepagesz=2M hugepages=0 default_hugepagesz=1G kthread_cpus=0,28 irqaffinity=0,28" + {{ intel_flexran_marker }} + intel_flexran_isol_cores: "1-27,29-55" + intel_flexran_cpu_supported: true + when: + - ansible_processor_count == 1 + - ansible_processor_cores == 32 + - intel_flexran_type == "pod" + - ansible_kernel >= "5.15.0-1019-realtime" + +# This is for DSA when we enable it with FlexRAN at the same time +- name: add sm_on in iommu to be compatible with DSA requirements + set_fact: + intel_flexran_cmdline: "{{ intel_flexran_cmdline | replace('intel_iommu=on', 'intel_iommu=on,sm_on') }}" + when: + - configure_dsa_devices is defined and configure_dsa_devices - debug: msg="final kernel cmdline is {{ intel_flexran_cmdline }}" diff --git a/roles/net_attach_defs_create/defaults/main.yml b/roles/bootstrap/set_pcie_kernel_flags/defaults/main.yml similarity index 95% rename from roles/net_attach_defs_create/defaults/main.yml rename to roles/bootstrap/set_pcie_kernel_flags/defaults/main.yml index 4bba0611..7c9cc779 100644 --- a/roles/net_attach_defs_create/defaults/main.yml +++ b/roles/bootstrap/set_pcie_kernel_flags/defaults/main.yml @@ -14,4 +14,4 @@ ## limitations under the License. ## --- -cndp_cni_version: "0.3.0" +pcie_marker: "# pcie flag" diff --git a/roles/bootstrap/set_pcie_kernel_flags/tasks/main.yml b/roles/bootstrap/set_pcie_kernel_flags/tasks/main.yml new file mode 100644 index 00000000..d27a80ed --- /dev/null +++ b/roles/bootstrap/set_pcie_kernel_flags/tasks/main.yml @@ -0,0 +1,29 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: set pcie kernel flags + ansible.builtin.set_fact: + pcie_cmdline: 'GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX} pcie=realloc " {{ pcie_marker }}' + +- name: set fpga kernel flags in /etc/default/grub + ansible.builtin.lineinfile: + dest: /etc/default/grub + regexp: '^GRUB_CMDLINE_LINUX="\${GRUB_CMDLINE_LINUX}(.*?)" {{ pcie_marker }}$' + line: '{{ pcie_cmdline }}' + state: present + mode: 0664 + notify: + - reboot server diff --git a/roles/bootstrap/set_sriov_kernel_flags/tasks/setup_sriov_kernel_flags.yml b/roles/bootstrap/set_sriov_kernel_flags/tasks/setup_sriov_kernel_flags.yml index b106459c..387dbfdf 100644 --- a/roles/bootstrap/set_sriov_kernel_flags/tasks/setup_sriov_kernel_flags.yml +++ b/roles/bootstrap/set_sriov_kernel_flags/tasks/setup_sriov_kernel_flags.yml @@ -28,18 +28,6 @@ (ansible_distribution == "Ubuntu" and ansible_distribution_version >= "21.04") or (ansible_os_family == "RedHat" and ansible_distribution_version >= "8.4") -- name: set vfio default kernel flags for noiommu mode - set_fact: - vfio_noiommu_cmdline: "" - -- name: set vfio kernel flags for noiommu mode - set_fact: - vfio_noiommu_cmdline: " vfio.enable_unsafe_noiommu_mode=1" - when: - - qat_devices is defined and (qat_devices|length>0) - - install_dpdk | default(false) - - on_vms is defined and on_vms - - name: set noiommu default kernel flags set_fact: iommu_cmdline: "" @@ -52,7 +40,7 @@ - name: set sriov kernel flags set_fact: - sriov_cmdline: 'GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX}{{ iommu_cmdline }}{{ vfio_noiommu_cmdline }}{{ vfio_cmdline }}" {{ sriov_marker }}' + sriov_cmdline: 'GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX}{{ iommu_cmdline }}{{ vfio_cmdline }}" {{ sriov_marker }}' - name: set sriov kernel flags in /etc/default/grub lineinfile: diff --git a/roles/bootstrap/update_nic_drivers/defaults/main.yml b/roles/bootstrap/update_nic_drivers/defaults/main.yml index 45548a5a..f659c6c9 100644 --- a/roles/bootstrap/update_nic_drivers/defaults/main.yml +++ b/roles/bootstrap/update_nic_drivers/defaults/main.yml @@ -16,21 +16,19 @@ --- # i40e i40e_driver_name: i40e -i40e_driver_version: 2.22.8 +i40e_driver_version: 2.22.18 i40e_driver_url: https://sourceforge.net/projects/e1000/files/i40e%20stable/{{ i40e_driver_version }}/i40e-{{ i40e_driver_version }}.tar.gz -i40e_driver_checksum: sha1:9ae9a51b8d16f5d6ea9a817de5d3f37eb96101a1 +i40e_driver_checksum: sha1:0c94bd91014a0d81bd6b99fb41d0e4f1c12b09ff # ice ice_driver_name: ice -ice_driver_version: 1.10.1.2.2 +ice_driver_version: 1.11.14 ice_driver_url: https://sourceforge.net/projects/e1000/files/ice%20stable/{{ ice_driver_version }}/ice-{{ ice_driver_version }}.tar.gz -ice_driver_checksum: sha1:a71d0497307b462059b5819cf8686b2f9361a930 +ice_driver_checksum: sha1:730cd04fcfd0ba1b33ba21aaf671d0e1654c999a # iavf iavf_driver_name: iavf -# iavf_driver_version: 4.6.1 -# iavf_driver_url: https://sourceforge.net/projects/e1000/files/iavf%20stable/{{ iavf_driver_version }}/iavf-{{ iavf_driver_version }}.tar.gz -# iavf_driver_checksum: sha1:7102e6fcb6271f6cb14bcd9e64eccc58fcafd788 -iavf_driver_version: 4.7.0 -iavf_driver_url: https://downloadmirror.intel.com/762473/iavf-{{ iavf_driver_version }}.tar.gz -iavf_driver_checksum: sha1:999897D953B82F36BCD3DA46261AC7E491A3DB9D +iavf_driver_version: 4.8.2 +# iavf_driver_url: https://downloadmirror.intel.com/772532/iavf-{{ iavf_driver_version }}.tar.gz +iavf_driver_url: https://sourceforge.net/projects/e1000/files/iavf%20stable/{{ iavf_driver_version }}/iavf-{{ iavf_driver_version }}.tar.gz +iavf_driver_checksum: sha1:fcc997aebeee3744e621e0fd3290205bd18f6a45 diff --git a/roles/bootstrap/update_nic_drivers/tasks/i40e.yml b/roles/bootstrap/update_nic_drivers/tasks/i40e.yml index 416403b1..c7325332 100644 --- a/roles/bootstrap/update_nic_drivers/tasks/i40e.yml +++ b/roles/bootstrap/update_nic_drivers/tasks/i40e.yml @@ -23,6 +23,10 @@ - debug: msg: "Currently installed i40e version: {{ i40e_installed_version.stdout }}" +- name: set i40e driver build status + set_fact: + i40e_driver_build_failed: false + - name: unload i40e module modprobe: name: i40e @@ -65,6 +69,14 @@ with_items: - clean - install + rescue: + - name: handle driver build error + debug: + msg: "i40e driver build or installation failed. Rolling back to use inbox driver - functionality might be limited" + + - name: set i40e driver build failed status + set_fact: + i40e_driver_build_failed: true when: i40e_installed_version.stdout != i40e_driver_version - name: reboot node after driver update @@ -72,6 +84,7 @@ reboot: reboot_timeout: 1200 when: + - not i40e_driver_build_failed - (i40e_installed_version.stdout != i40e_driver_version and mgmt_interface_driver.stdout == i40e_driver_name) or (i40e_installed_version.stdout != i40e_driver_version and ((ansible_os_family == "RedHat" and ansible_distribution_version >= "9.0") or (ansible_distribution == "Ubuntu" and ansible_distribution_version >= "22.04") or update_kernel)) diff --git a/roles/bootstrap/update_nic_drivers/tasks/iavf.yml b/roles/bootstrap/update_nic_drivers/tasks/iavf.yml index 9ab13c9b..0c995c5b 100644 --- a/roles/bootstrap/update_nic_drivers/tasks/iavf.yml +++ b/roles/bootstrap/update_nic_drivers/tasks/iavf.yml @@ -23,6 +23,10 @@ - debug: msg: "Currently installed iavf version: {{ iavf_installed_version.stdout }}" +- name: set iavf driver build status + set_fact: + iavf_driver_build_failed: false + - name: unload iavf module modprobe: name: iavf @@ -62,6 +66,14 @@ with_items: - clean - install + rescue: + - name: handle driver build error + debug: + msg: "iavf driver build or installation failed. Rolling back to use inbox driver - functionality might be limited" + + - name: set iavf driver build failed status + set_fact: + iavf_driver_build_failed: true when: iavf_installed_version.stdout != iavf_driver_version - name: reboot if driver is used by management interface @@ -69,6 +81,7 @@ reboot: reboot_timeout: 1200 # wait up to 20 minutes when: + - not iavf_driver_build_failed - mgmt_interface_driver.stdout == iavf_driver_name - iavf_installed_version.stdout != iavf_driver_version diff --git a/roles/bootstrap/update_nic_drivers/tasks/ice.yml b/roles/bootstrap/update_nic_drivers/tasks/ice.yml index 32107849..1f2ac94c 100644 --- a/roles/bootstrap/update_nic_drivers/tasks/ice.yml +++ b/roles/bootstrap/update_nic_drivers/tasks/ice.yml @@ -23,6 +23,10 @@ - debug: msg: "Currently installed ice version: {{ ice_installed_version.stdout }}" +- name: set ice driver build status + set_fact: + ice_driver_build_failed: false + # unloading before update is probably not necessay and does not work anyway when irdma is using ice # - name: unload ice module # modprobe: @@ -69,7 +73,7 @@ - install when: not adq_dp.enabled |d(false) | bool - - name: build and install ice driver + - name: build and install ice driver with adq make: chdir: "{{ (ice_untar.dest, ice_untar.files[0], 'src') | path_join }}" target: "{{ item }}" @@ -80,6 +84,14 @@ - clean - install when: adq_dp.enabled |d(false) | bool + rescue: + - name: handle driver build error + debug: + msg: "ice driver build or installation failed. Rolling back to use inbox driver - functionality might be limited" + + - name: set ice driver build failed status + set_fact: + ice_driver_build_failed: true when: ice_installed_version.stdout != ice_driver_version - name: reboot node after driver update @@ -87,6 +99,7 @@ reboot: reboot_timeout: 1200 when: + - not ice_driver_build_failed - (ice_installed_version.stdout != ice_driver_version and mgmt_interface_driver.stdout == ice_driver_name) or (ice_installed_version.stdout != ice_driver_version and ((ansible_os_family == "RedHat" and ansible_distribution_version >= "9.0") or (ansible_distribution == "Ubuntu" and ansible_distribution_version >= "22.04") or update_kernel)) diff --git a/roles/cadvisor_install/defaults/main.yaml b/roles/cadvisor_install/defaults/main.yaml index 8bc545d4..ee9407fb 100644 --- a/roles/cadvisor_install/defaults/main.yaml +++ b/roles/cadvisor_install/defaults/main.yaml @@ -16,10 +16,14 @@ --- cadvisor_application_name: "cadvisor" # cAdvisor Main application name cadvisor_release_name: "cadvisor" # cAdvisor Helm Charts release name -perf_events_config_filename: "sample-perf-event.json" + +cadvisor_image: gcr.io/cadvisor/cadvisor # cAdvisor Docker image +cadvisor_image_version: 0.44.0 # cAdvisor Version cadvisor_helm_repo_url: "https://ckotzbauer.github.io/helm-charts" # cAdvisor Helm Repo URL -cadvisor_helm_chart_repo_name: "ckotzbauer" # cAdvisor Repo Name -cadvisor_helm_charts_ref: "ckotzbauer/cadvisor" # cAdvisor Helm Chart Reference -cadvisor_helm_charts_version: "2.2.4" # cAdvisor Version -cadvisor_helm_release_namespace: "kube-system" # cAdvisor Namespace +cadvisor_helm_repo_name: "ckotzbauer" # cAdvisor Repo Name +cadvisor_helm_chart_ref: "ckotzbauer/cadvisor" # cAdvisor Helm Chart Reference +cadvisor_helm_chart_version: "2.2.4" # cAdvisor Helm Chart Version +cadvisor_namespace: "cadvisor" # cAdvisor Namespace + +cadvisor_perf_config_filename: "perf-events.json" diff --git a/roles/cadvisor_install/files/pik-perf-event.json b/roles/cadvisor_install/files/pik-perf-event.json new file mode 100644 index 00000000..a34cd3f8 --- /dev/null +++ b/roles/cadvisor_install/files/pik-perf-event.json @@ -0,0 +1,12 @@ +{ + "core": { + "events": ["INST_RETIRED.ANY"], + "custom_events": [ + { + "config": ["0xc0"], + "name": "INST_RETIRED.ANY", + "type": 4 + } + ] + } +} diff --git a/roles/cadvisor_install/files/sample-perf-event.json b/roles/cadvisor_install/files/sample-perf-event.json index aad2f1a8..47d4ffb3 100644 --- a/roles/cadvisor_install/files/sample-perf-event.json +++ b/roles/cadvisor_install/files/sample-perf-event.json @@ -1,17 +1,16 @@ { - "core": { - "events": [ - "LLC-load-misses" - ], - "custom_events": [ - { - "type": 3, - "config": [ - "0x10002" - ], - "name": "LLC-load-misses" - } - ] - } + "core": { + "events": [ + "LLC-load-misses" + ], + "custom_events": [ + { + "type": 3, + "config": [ + "0x10002" + ], + "name": "LLC-load-misses" + } + ] } - \ No newline at end of file +} diff --git a/roles/cadvisor_install/tasks/cadvisor_install.yml b/roles/cadvisor_install/tasks/cadvisor_install.yml deleted file mode 100644 index 25d78bbb..00000000 --- a/roles/cadvisor_install/tasks/cadvisor_install.yml +++ /dev/null @@ -1,70 +0,0 @@ -## -## Copyright (c) 2020-2023 Intel Corporation. -## -## Licensed under the Apache License, Version 2.0 (the "License"); -## you may not use this file except in compliance with the License. -## You may obtain a copy of the License at -## -## http://www.apache.org/licenses/LICENSE-2.0 -## -## Unless required by applicable law or agreed to in writing, software -## distributed under the License is distributed on an "AS IS" BASIS, -## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -## See the License for the specific language governing permissions and -## limitations under the License. -## ---- -- name: check cAdvisor Helm charts directory - stat: - path: "{{ (project_root_dir, 'charts', 'cadvisor') | path_join }}" - register: cadvisor_dir - -- name: create cAdvisor Helm charts directory if needed - file: - path: "{{ (project_root_dir, 'charts', 'cadvisor') | path_join }}" - state: directory - mode: 0755 - when: - - cadvisor_dir.stat.exists is defined and not cadvisor_dir.stat.exists - -- name: check cAdvisor Helm charts temp directory. - stat: - path: "{{ (project_root_dir, 'charts', 'cadvisor', 'temp') | path_join }}" - register: cadvisor_temp_dir - -- name: create the temp folder for cAdvisor custom values - file: - path: "{{ (project_root_dir, 'charts', 'cadvisor', 'temp') | path_join }}" - state: directory - mode: 0755 - when: - - cadvisor_temp_dir.stat.exists is defined and not cadvisor_temp_dir.stat.exists - -- name: copy {{ perf_events_config_filename }} - copy: - src: "{{ (role_path, 'files', perf_events_config_filename) | path_join }}" - dest: "{{ (project_root_dir, 'charts', 'cadvisor', 'temp') | path_join }}" - mode: preserve - -- name: populate cAdvisor Helm charts values template and push to controller node - template: - src: "cadvisor_custom_values.yml.j2" - dest: "{{ (project_root_dir, 'charts', 'cadvisor', 'temp', 'cadvisor-custom-values.yml') | path_join }}" - force: yes - mode: preserve - -- name: Add "{{ cadvisor_application_name }}" Helm Chart Repository - command: >- - helm repo add "{{ cadvisor_helm_chart_repo_name }}" "{{ cadvisor_helm_repo_url }}" - changed_when: true - -- name: Deploy {{ cadvisor_helm_charts_version }} of {{ cadvisor_application_name }} - command: >- - helm install - {{ cadvisor_release_name }} - {{ cadvisor_helm_charts_ref }} - --version={{ cadvisor_helm_charts_version }} - --namespace {{ cadvisor_helm_release_namespace }} - --create-namespace - -f {{ (project_root_dir, 'charts', 'cadvisor', 'temp', 'cadvisor-custom-values.yml') | path_join }} - changed_when: true diff --git a/roles/cadvisor_install/tasks/cleanup_cadvisor.yml b/roles/cadvisor_install/tasks/cleanup.yml similarity index 53% rename from roles/cadvisor_install/tasks/cleanup_cadvisor.yml rename to roles/cadvisor_install/tasks/cleanup.yml index 1d336398..de980224 100644 --- a/roles/cadvisor_install/tasks/cleanup_cadvisor.yml +++ b/roles/cadvisor_install/tasks/cleanup.yml @@ -14,20 +14,23 @@ ## limitations under the License. ## --- -- block: - - name: delete cAdvisor Helm Charts - command: >- - helm delete {{ cadvisor_release_name }} --namespace {{ cadvisor_helm_release_namespace }} - when: - - inventory_hostname == groups['kube_control_plane'][0] - changed_when: false - failed_when: false - - name: delete cAdvisor Helm Repo - command: >- - helm repo remove {{ cadvisor_helm_chart_repo_name }} - when: - - inventory_hostname == groups['kube_control_plane'][0] - changed_when: false - failed_when: false +- name: Cleanup cAdvisor + when: + - inventory_hostname == groups['kube_control_plane'][0] + block: + - name: Uninstall cAdvisor Helm Chart + kubernetes.core.helm: + name: "{{ cadvisor_release_name }}" + namespace: "{{ cadvisor_namespace }}" + state: absent + - name: Remove cAdvisor Helm Repo + kubernetes.core.helm_repository: + name: "{{ cadvisor_helm_repo_name }}" + state: absent tags: - cadvisor + +- name: Delete cAdvisor directory + ansible.builtin.file: + path: "{{ (project_root_dir, 'charts', 'cadvisor') | path_join }}" + state: absent diff --git a/roles/cadvisor_install/tasks/install.yml b/roles/cadvisor_install/tasks/install.yml new file mode 100644 index 00000000..0dc47595 --- /dev/null +++ b/roles/cadvisor_install/tasks/install.yml @@ -0,0 +1,37 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: template cAdvisor Helm chart values and push to controller node + ansible.builtin.template: + src: "cadvisor_custom_values.yml.j2" + dest: "{{ (project_root_dir, 'charts', 'cadvisor', 'cadvisor-custom-values.yml') | path_join }}" + mode: 0644 + +- name: Add Helm Repository - {{ cadvisor_helm_repo_url }} + kubernetes.core.helm_repository: + name: "{{ cadvisor_helm_repo_name }}" + url: "{{ cadvisor_helm_repo_url }}" + state: present + +- name: Deploy cAdvisor + kubernetes.core.helm: + name: "{{ cadvisor_release_name }}" + chart_ref: "{{ cadvisor_helm_chart_ref }}" + chart_version: "{{ cadvisor_helm_chart_version }}" + namespace: "{{ cadvisor_namespace }}" + create_namespace: true + values_files: "{{ (project_root_dir, 'charts', 'cadvisor', 'cadvisor-custom-values.yml') | path_join }}" + wait: true diff --git a/roles/cadvisor_install/tasks/main.yml b/roles/cadvisor_install/tasks/main.yml index 14a0d71f..f9904aa1 100644 --- a/roles/cadvisor_install/tasks/main.yml +++ b/roles/cadvisor_install/tasks/main.yml @@ -14,8 +14,22 @@ ## limitations under the License. ## --- +- name: create cAdvisor Helm chart directory + ansible.builtin.file: + path: "{{ (project_root_dir, 'charts', 'cadvisor') | path_join }}" + state: directory + mode: 0755 + +- name: Check if perf events config enabled + ansible.builtin.set_fact: + cadvisor_perf_events: true + when: cadvisor_sample_perf_events_enabled or cadvisor_pik_perf_events_enabled | default(false) + +- name: Prepare perf events config for all nodes + ansible.builtin.import_tasks: perf_events_config.yml + when: cadvisor_perf_events | default(false) + - name: install cAdvisor Helm charts - import_tasks: cadvisor_install.yml + ansible.builtin.import_tasks: install.yml when: - - cadvisor_enabled is defined and cadvisor_enabled - inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/cadvisor_install/tasks/perf_events_config.yml b/roles/cadvisor_install/tasks/perf_events_config.yml new file mode 100644 index 00000000..791cca22 --- /dev/null +++ b/roles/cadvisor_install/tasks/perf_events_config.yml @@ -0,0 +1,43 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +- name: create the perf_config folder for cAdvisor custom perf events + ansible.builtin.file: + path: "{{ (project_root_dir, 'charts', 'cadvisor', 'perf_config') | path_join }}" + state: directory + mode: 0755 + +- name: load sample perf events json file + ansible.builtin.set_fact: + sample_perf_conf_json: "{{ lookup('file', 'sample-perf-event.json' ) | from_json }}" + when: cadvisor_sample_perf_events_enabled + +- name: load perf events json file supplied for PIK + ansible.builtin.set_fact: + pik_perf_conf_json: "{{ lookup('file', 'pik-perf-event.json') | from_json }}" + when: cadvisor_pik_perf_events_enabled | default(false) + +- name: construct perf events config + vars: + sample_perf: "{{ sample_perf_conf_json | default({}) }}" + pik_perf: "{{ pik_perf_conf_json | default({}) }}" + ansible.builtin.set_fact: + perf_conf_json: "{{ sample_perf | combine(pik_perf, recursive=true, list_merge='append_rp') | to_json(indent=2) }}" + +- name: create perf events configuration json file + ansible.builtin.copy: + content: "{{ perf_conf_json }}" + dest: "{{ (project_root_dir, 'charts', 'cadvisor', 'perf_config', cadvisor_perf_config_filename) | path_join }}" + mode: 0644 diff --git a/roles/cadvisor_install/tasks/preflight.yml b/roles/cadvisor_install/tasks/preflight.yml new file mode 100644 index 00000000..eeb86ab9 --- /dev/null +++ b/roles/cadvisor_install/tasks/preflight.yml @@ -0,0 +1,26 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +- name: Check requirements for perf events metrics + ansible.builtin.assert: + that: + - not on_vms | default(false) + - not on_cloud | default(false) + fail_msg: + cAdvisor perf events counting can only be enabled on BMRA. + Please disable perf events counting in group_vars. + when: + - inventory_hostname == groups["kube_control_plane"][0] + - cadvisor_pik_perf_events_enabled | default(false) or cadvisor_sample_perf_events_enabled | default(false) diff --git a/roles/cadvisor_install/templates/cadvisor_custom_values.yml.j2 b/roles/cadvisor_install/templates/cadvisor_custom_values.yml.j2 index 8f692f8f..81df240d 100644 --- a/roles/cadvisor_install/templates/cadvisor_custom_values.yml.j2 +++ b/roles/cadvisor_install/templates/cadvisor_custom_values.yml.j2 @@ -1,6 +1,6 @@ image: - repository: gcr.io/cadvisor/cadvisor - tag: v0.44.0 + repository: {{ cadvisor_image }} + tag: v{{ cadvisor_image_version }} pullPolicy: IfNotPresent ## Reference to one or more secrets to be used when pulling images @@ -16,11 +16,16 @@ container: - --event_storage_event_limit=default=0 - --event_storage_age_limit=default=0 - --disable_metrics=percpu,process,sched,tcp,udp # enable only diskIO, cpu, memory, network, disk - {% if cadvisor_custom_events_config_on | default(false) -%} - - --perf_events_config={{ (project_root_dir, 'charts', 'cadvisor', 'temp', perf_events_config_filename) | path_join }} + {% if cadvisor_perf_events | default(false) -%} + - --perf_events_config={{ ('/mnt/perf-config', cadvisor_perf_config_filename) | path_join }} {% endif -%} - --docker_only hostPaths: + {% if cadvisor_perf_events | default(false) -%} + - name: custom-events + path: "{{ (project_root_dir, 'charts', 'cadvisor', 'perf_config') | path_join }}" + mount: "/mnt/perf-config" + {% endif -%} - name: varrun path: "/var/run" - name: sys @@ -63,8 +68,8 @@ podSecurityPolicy: # Specifies whether a securityContext should be created. Required for privileged operations. podSecurityContext: - create: false - privileged: false + create: true + privileged: true nodeSelector: {} diff --git a/roles/check_machine_type/tasks/check_machine_type.yml b/roles/check_machine_type/tasks/check_machine_type.yml new file mode 100644 index 00000000..300b6b7c --- /dev/null +++ b/roles/check_machine_type/tasks/check_machine_type.yml @@ -0,0 +1,66 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: set CPU ID + ansible.builtin.set_fact: + cpu_id: "{{ ansible_processor[2] | regex_search('\\$?\\d\\d\\d\\d\\%?\\@?\\w?|\\d\\d/\\d\\w') }}" # noqa jinja[spacing] + +- name: print CPU ID + ansible.builtin.debug: + msg: "CPU ID: {{ cpu_id }}" + +- name: check if CPU has confirmed support + ansible.builtin.assert: + that: "cpu_id in {{ lookup('ansible.builtin.vars', 'confirmed_' + configured_arch + '_cpus') }} \ + {% if configured_arch == 'clx' %} or cpu_id in {{ confirmed_clx_ncpus }} {% endif %} \ + or cpu_id in {{ unconfirmed_cpu_models }}" + fail_msg: + "CPU model '{{ cpu_id }}' present on target is not in the confirmed CPUs list.\n + To proceed, please add '{{ cpu_id }}' to the list of unconfirmed CPUs in variable 'unconfirmed_cpu_models' in group_vars.\n + Please be aware that by using CPU model that is not confirmed, some features may not work properly." + +- name: set skl, icx, clx, spr to false + ansible.builtin.set_fact: + is_skl: false + is_clx: false + is_clx_ncpu: false + is_icx: false + is_spr: false + +- name: set is_skl architecture variable + ansible.builtin.set_fact: + is_skl: true + when: cpu_id in confirmed_skl_cpus + +- name: set is_clx architecture variable + ansible.builtin.set_fact: + is_clx: true + when: cpu_id in confirmed_clx_cpus + +- name: set is_icx architecture variable + ansible.builtin.set_fact: + is_icx: true + when: cpu_id in confirmed_icx_cpus + +- name: set is_spr architecture variable + ansible.builtin.set_fact: + is_spr: true + when: cpu_id in confirmed_spr_cpus + +- name: check if clx_ncpu mode + ansible.builtin.set_fact: + is_clx_ncpu: true + when: cpu_id in confirmed_clx_ncpus diff --git a/roles/check_machine_type/tasks/main.yml b/roles/check_machine_type/tasks/main.yml index 6abdaa44..2fa8dcf5 100644 --- a/roles/check_machine_type/tasks/main.yml +++ b/roles/check_machine_type/tasks/main.yml @@ -14,53 +14,16 @@ ## limitations under the License. ## --- -- name: set CPU ID - set_fact: - cpu_id: "{{ ansible_processor[2] | regex_search('\\$?\\d\\d\\d\\d\\%?\\@?\\w?|\\d\\d/\\d\\w') }}" # noqa jinja[spacing] - -- name: print CPU ID - debug: - msg: "CPU ID: {{ cpu_id }}" - -- name: check if CPU has confirmed support - assert: - that: "cpu_id in {{ lookup('ansible.builtin.vars', 'confirmed_' + configured_arch + '_cpus') }} \ - {% if configured_arch == 'clx' %} or cpu_id in {{ confirmed_clx_ncpus }} {% endif %} \ - or cpu_id in {{ unconfirmed_cpu_models }}" - fail_msg: - "CPU model '{{ cpu_id }}' present on target is not in the confirmed CPUs list.\n - To proceed, please add '{{ cpu_id }}' to the list of unconfirmed CPUs in variable 'unconfirmed_cpu_models' in group_vars.\n - Please be aware that by using CPU model that is not confirmed, some features may not work properly." - -- name: set skl, icx, clx, spr to false - set_fact: - is_skl: false - is_clx: false - is_clx_ncpu: false - is_icx: false - is_spr: false - -- name: set is_skl architecture variable - set_fact: - is_skl: true - when: cpu_id in confirmed_skl_cpus - -- name: set is_clx architecture variable - set_fact: - is_clx: true - when: cpu_id in confirmed_clx_cpus - -- name: set is_icx architecture variable - set_fact: - is_icx: true - when: cpu_id in confirmed_icx_cpus - -- name: set is_spr architecture variable - set_fact: - is_spr: true - when: cpu_id in confirmed_spr_cpus - -- name: check if clx_ncpu mode - set_fact: - is_clx_ncpu: true - when: cpu_id in confirmed_clx_ncpus +- name: determine machine type for BM + ansible.builtin.include_tasks: check_machine_type.yml + when: + - inventory_hostname in groups['kube_node'] + - not vm_enabled | default (false) + - not on_vms | default (false) + +- name: determine machine type for VM + ansible.builtin.include_tasks: check_machine_type.yml + when: + - inventory_hostname in groups['vm_host'] + - vm_enabled | default (false) + - not on_vms | default (false) diff --git a/roles/check_machine_type/vars/main.yml b/roles/check_machine_type/vars/main.yml index d6d74c8b..461a6154 100644 --- a/roles/check_machine_type/vars/main.yml +++ b/roles/check_machine_type/vars/main.yml @@ -14,15 +14,33 @@ ## limitations under the License. ## --- +confirmed_atom_cpus: + - "x6200FE" + - "x6212RE" + - "x6414RE" + - "x6425RE" + - "x6427FE" + +confirmed_core_cpus: + - "1115GRE" + - "1145GRE" + - "1185GRE" + - "12100E" + - "12500E" + - "12700E" + - "12900E" + confirmed_skl_cpus: # Sky Lake Xeon Gold (quad) - "6152" # Sky Lake Xeon Platinum (octa) - "8176" + confirmed_clx_ncpus: - "5218N" - "6252N" - "6230N" + confirmed_clx_cpus: - "6252" @@ -49,3 +67,6 @@ confirmed_spr_cpus: - "8487C" - "8490H" - "0000%@" + +confirmed_emr_cpus: + - "0000" diff --git a/roles/cluster_defaults/defaults/main.yml b/roles/cluster_defaults/defaults/main.yml index 68905393..b0f09bee 100644 --- a/roles/cluster_defaults/defaults/main.yml +++ b/roles/cluster_defaults/defaults/main.yml @@ -36,5 +36,5 @@ proxy_env: {} registry_containerd: "/var/lib/kubelet/config.json" kube_rbac_proxy_image_repo: "quay.io/brancz/kube-rbac-proxy" -kube_rbac_proxy_image_tag: "v0.14.0" +kube_rbac_proxy_image_tag: "v0.14.1" kube_rbac_proxy_tls_ciphers: "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305" # noqa yaml[line-length] diff --git a/roles/cluster_defaults/tasks/main.yml b/roles/cluster_defaults/tasks/main.yml index 9156f3a4..d26e3432 100644 --- a/roles/cluster_defaults/tasks/main.yml +++ b/roles/cluster_defaults/tasks/main.yml @@ -45,3 +45,9 @@ sysctl_file: "/etc/sysctl.d/99-sysctl.conf" state: present reload: yes + +- name: set kube_apiserver fact + ansible.builtin.set_fact: + kube_apiserver_cert: "{{ (kube_provisioner == 'kubespray') | ternary('/etc/kubernetes/ssl/ca.crt', '/var/lib/rancher/rke2/server/tls/client-ca.crt') }}" + kube_apiserver_key: "{{ (kube_provisioner == 'kubespray') | ternary('/etc/kubernetes/ssl/ca.key', '/var/lib/rancher/rke2/server/tls/client-ca.key') }}" + when: kubernetes diff --git a/roles/cndp_dp_install/tasks/main.yml b/roles/cndp_dp_install/tasks/main.yml deleted file mode 100644 index 5772b623..00000000 --- a/roles/cndp_dp_install/tasks/main.yml +++ /dev/null @@ -1,123 +0,0 @@ -## -## Copyright (c) 2020-2023 Intel Corporation. -## -## Licensed under the Apache License, Version 2.0 (the "License"); -## you may not use this file except in compliance with the License. -## You may obtain a copy of the License at -## -## http://www.apache.org/licenses/LICENSE-2.0 -## -## Unless required by applicable law or agreed to in writing, software -## distributed under the License is distributed on an "AS IS" BASIS, -## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -## See the License for the specific language governing permissions and -## limitations under the License. -## ---- -- name: install dependencies - include_role: - name: install_dependencies - -- name: install libbpf on Ubuntu - include_tasks: "../../cndp_install/tasks/install_libbpf_{{ ansible_distribution | lower }}.yml" - when: - - ansible_distribution | lower == "ubuntu" - - inventory_hostname == groups['kube_control_plane'][0] - -- name: install libbpf for CNDP Device Plugin build on RadHat, Rocky - package: - name: libbpf-devel - state: present - when: - - ansible_os_family == "RedHat" - - inventory_hostname == groups['kube_control_plane'][0] - -- name: put additional CNDP labels for nodes - include_tasks: add_cndp_labels.yml - loop: "{{ groups['kube_node'] }}" - loop_control: - loop_var: node_name - when: inventory_hostname == groups['kube_control_plane'][0] - -- name: clone Intel CNDP Device Plugin repository - git: - repo: "{{ intel_cndp_dp_git_url }}" - dest: "{{ intel_cndp_dp_dir }}" - version: "{{ intel_cndp_dp_version }}" - update: yes - - when: - - inventory_hostname == groups['kube_control_plane'][0] - -- name: prepare CNDP containers images - block: - - name: clean up unused dependencies for golang - command: "go mod tidy" - args: - chdir: "{{ intel_cndp_dp_dir }}" - changed_when: true - - - name: build Intel CNDP Device Plugin - make: - target: build - chdir: "{{ intel_cndp_dp_dir }}" - when: - - inventory_hostname == groups['kube_control_plane'][0] - -- name: Build CNDP Device Plugin image - block: - - name: build Intel CNDP Device Plugin image when docker is used as container runtime - make: - target: image - chdir: "{{ intel_cndp_dp_dir }}" - - - name: tag Intel CNDP Device Plugin image when docker is used as container runtime - command: docker tag afxdp-device-plugin {{ intel_cndp_dp_image }}:{{ intel_cndp_dp_image_version }} - changed_when: true - - - name: push Intel CNDP Device Plugin image to local registry when docker is used as container runtime - command: docker push {{ intel_cndp_dp_image }}:{{ intel_cndp_dp_image_version }} - changed_when: true - when: - - inventory_hostname == groups['kube_control_plane'][0] - - container_runtime == "docker" - -- name: prepare CNDP containers images when containerd/cri-o is used as container runtime - block: - - name: build Intel CNDP Device Plugin image when containerd/cri-o is used as container runtime - command: podman build -t afxdp-device-plugin -f images/amd64.dockerfile . - changed_when: true - args: - chdir: "{{ intel_cndp_dp_dir }}" - - - name: tag Intel CNDP Device Plugin image when containerd/cri-o is used as container runtime - command: podman tag afxdp-device-plugin {{ intel_cndp_dp_image }}:{{ intel_cndp_dp_image_version }} - changed_when: true - - - name: push Intel CNDP Device Plugin image to local registry when containerd/cri-o is used as container runtime - command: podman push {{ intel_cndp_dp_image }}:{{ intel_cndp_dp_image_version }} - changed_when: true - - when: - - inventory_hostname == groups['kube_control_plane'][0] - - '"docker" not in container_runtime' - -- name: prepare and deploy Intel CNDP Device Plugin - vars: - cndp_k8s_manifest_dir: "{{ (project_root_dir, 'cndp_k8s_manifest') | path_join }}" - block: - - name: create directory for CNDP k8s manifest files - file: - path: "{{ cndp_k8s_manifest_dir }}" - state: directory - mode: "644" - - - include_tasks: cndp_device_plugin_deploy.yml - loop: - - configmap - - serviceaccount - - daemonset - loop_control: - loop_var: cndp_k8s_object - when: - - inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/cndp_dp_install/templates/intel-cndp-plugin-configmap.yml.j2 b/roles/cndp_dp_install/templates/intel-cndp-plugin-configmap.yml.j2 deleted file mode 100644 index 75c1541d..00000000 --- a/roles/cndp_dp_install/templates/intel-cndp-plugin-configmap.yml.j2 +++ /dev/null @@ -1,35 +0,0 @@ -{%- set driver_pool = [] -%} -{%- for i_node in groups['kube_node'] -%} -{%- for i_pool in hostvars[i_node]['cndp_dp_pools'] -%} -{%- for i_attr, i_value in (i_pool | dict2items | selectattr("key", "in", ["drivers"]) | list | items2dict).items() -%} -{%- for vv in i_value|list -%} -{{ driver_pool.append(vv|string) }} -{%- endfor -%} -{%- endfor -%} -{%- endfor -%} -{%- endfor -%} ---- -apiVersion: v1 -kind: ConfigMap -metadata: - name: afxdp-dp-config - namespace: kube-system -data: - config.json: | - { - "logLevel": "debug", - "logFile": "afxdp-dp.log", - "pools": [ - { - "name": "raPool", - "mode": "primary", - "drivers": [ -{%- for pf_driver in driver_pool|unique %} - { - "name":{{pf_driver|tojson}} - }{% if not loop.last %},{% endif %} -{%- endfor %} - ] - } - ] - } diff --git a/roles/cndp_dp_install/templates/intel-cndp-plugin-daemonset.yml.j2 b/roles/cndp_dp_install/templates/intel-cndp-plugin-daemonset.yml.j2 deleted file mode 100644 index bb9b1178..00000000 --- a/roles/cndp_dp_install/templates/intel-cndp-plugin-daemonset.yml.j2 +++ /dev/null @@ -1,84 +0,0 @@ ---- -apiVersion: apps/v1 -kind: DaemonSet -metadata: - name: kube-afxdp-device-plugin - namespace: kube-system - labels: - tier: node - app: afxdp -spec: - selector: - matchLabels: - name: afxdp-device-plugin - template: - metadata: - labels: - name: afxdp-device-plugin - tier: node - app: afxdp - spec: - hostNetwork: true - nodeSelector: - kubernetes.io/arch: amd64 - tolerations: - - key: node-role.kubernetes.io/master - operator: Exists - effect: NoSchedule - - key: node-role.kubernetes.io/control-plane - operator: Exists - effect: NoSchedule - serviceAccountName: afxdp-device-plugin - containers: - - name: kube-afxdp - image: {{ intel_cndp_dp_image }}:{{ intel_cndp_dp_image_version }} - imagePullPolicy: IfNotPresent - securityContext: - capabilities: - drop: - - all - add: - - SYS_ADMIN - - NET_ADMIN - resources: - requests: - cpu: "250m" - memory: "40Mi" - limits: - cpu: "1" - memory: "200Mi" - volumeMounts: - - name: unixsock - mountPath: /tmp/afxdp_dp/ - - name: devicesock - mountPath: /var/lib/kubelet/device-plugins/ - - name: resources - mountPath: /var/lib/kubelet/pod-resources/ - - name: config-volume - mountPath: /afxdp/config - - name: log - mountPath: /var/log/afxdp-k8s-plugins/ - - name: cnibin - mountPath: /opt/cni/bin/ - volumes: - - name: unixsock - hostPath: - path: /tmp/afxdp_dp/ - - name: devicesock - hostPath: - path: /var/lib/kubelet/device-plugins/ - - name: resources - hostPath: - path: /var/lib/kubelet/pod-resources/ - - name: config-volume - configMap: - name: afxdp-dp-config - items: - - key: config.json - path: config.json - - name: log - hostPath: - path: /var/log/afxdp-k8s-plugins/ - - name: cnibin - hostPath: - path: /opt/cni/bin/ diff --git a/roles/cndp_dp_install/templates/intel-cndp-plugin-serviceaccount.yml.j2 b/roles/cndp_dp_install/templates/intel-cndp-plugin-serviceaccount.yml.j2 deleted file mode 100644 index 85d13f05..00000000 --- a/roles/cndp_dp_install/templates/intel-cndp-plugin-serviceaccount.yml.j2 +++ /dev/null @@ -1,6 +0,0 @@ ---- -apiVersion: v1 -kind: ServiceAccount -metadata: - name: afxdp-device-plugin - namespace: kube-system diff --git a/roles/cndp_install/defaults/main.yml b/roles/cndp_install/defaults/main.yml deleted file mode 100644 index 2aa20b94..00000000 --- a/roles/cndp_install/defaults/main.yml +++ /dev/null @@ -1,31 +0,0 @@ -## -## Copyright (c) 2020-2023 Intel Corporation. -## -## Licensed under the Apache License, Version 2.0 (the "License"); -## you may not use this file except in compliance with the License. -## You may obtain a copy of the License at -## -## http://www.apache.org/licenses/LICENSE-2.0 -## -## Unless required by applicable law or agreed to in writing, software -## distributed under the License is distributed on an "AS IS" BASIS, -## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -## See the License for the specific language governing permissions and -## limitations under the License. -## ---- -intel_cndp_git_url: "https://github.com/CloudNativeDataPlane/cndp.git" -intel_cndp_version: "v22.08.0" -intel_cndp_dir: "{{ (project_root_dir, 'intel-cndp') | path_join }}" - -docker_bin_dir: "/usr/bin" - -# Drop-in docker service file what defines DOCKER_OPTS environment variable -docker_options_conf_file: "/etc/systemd/system/docker.service.d/docker-options.conf" -daemon_conf_file: "daemon.json" -daemon_conf_file_directory: "/etc/docker/{{ daemon_conf_file }}" -containerd_service_dir: "/etc/systemd/system/containerd.service.d" -containerd_options_conf_file: "{{ (containerd_service_dir, 'limits.conf') | path_join }}" -crio_service_dir: "/etc/systemd/system/crio.service.d" -crio_options_conf_file: "{{ (crio_service_dir, 'limits.conf') | path_join }}" -containerd_bin_dir: "/usr/bin" diff --git a/roles/cndp_install/files/daemon.json b/roles/cndp_install/files/daemon.json deleted file mode 100644 index 2556de3a..00000000 --- a/roles/cndp_install/files/daemon.json +++ /dev/null @@ -1,3 +0,0 @@ -{ - "live-restore": true -} diff --git a/roles/cndp_install/handlers/main.yml b/roles/cndp_install/handlers/main.yml deleted file mode 100644 index bff785b8..00000000 --- a/roles/cndp_install/handlers/main.yml +++ /dev/null @@ -1,72 +0,0 @@ -## -## Copyright (c) 2020-2023 Intel Corporation. -## -## Licensed under the Apache License, Version 2.0 (the "License"); -## you may not use this file except in compliance with the License. -## You may obtain a copy of the License at -## -## http://www.apache.org/licenses/LICENSE-2.0 -## -## Unless required by applicable law or agreed to in writing, software -## distributed under the License is distributed on an "AS IS" BASIS, -## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -## See the License for the specific language governing permissions and -## limitations under the License. -## ---- -- name: restart docker - command: /bin/true - notify: - - reload systemd - - reload docker - - restarted docker - - wait for docker - -- name: reload systemd - systemd: - daemon_reload: true - -- name: reload docker - service: - name: docker - state: reloaded - -- name: restarted docker - service: - name: docker - state: restarted - -- name: wait for docker - command: "{{ docker_bin_dir }}/docker images" - register: docker_ready - retries: 20 - delay: 1 - until: docker_ready.rc == 0 - -- name: restart containerd - command: /bin/true - notify: - - containerd | restart containerd - -- name: containerd | restart containerd - systemd: - name: containerd - state: restarted - enabled: yes - daemon-reload: yes - -- name: restart crio - command: /bin/true - notify: - - reload systemd - - reload crio - -- name: reload systemd - systemd: - daemon_reload: true - -- name: reload crio - service: - name: crio - state: restarted - enabled: yes diff --git a/roles/cndp_install/tasks/install_libbpf_ubuntu.yml b/roles/cndp_install/tasks/install_libbpf_ubuntu.yml deleted file mode 100644 index 87e0b435..00000000 --- a/roles/cndp_install/tasks/install_libbpf_ubuntu.yml +++ /dev/null @@ -1,63 +0,0 @@ -## -## Copyright (c) 2020-2023 Intel Corporation. -## -## Licensed under the Apache License, Version 2.0 (the "License"); -## you may not use this file except in compliance with the License. -## You may obtain a copy of the License at -## -## http://www.apache.org/licenses/LICENSE-2.0 -## -## Unless required by applicable law or agreed to in writing, software -## distributed under the License is distributed on an "AS IS" BASIS, -## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -## See the License for the specific language governing permissions and -## limitations under the License. -## ---- -- name: install libbpf from package - package: - name: - - libbpf-dev - state: present - when: - - ansible_distribution_version >= "20.10" - -- name: install libbpf from source - vars: - libbpf_git_url: "https://github.com/libbpf/libbpf.git" - libbpf_dir: "{{ (project_root_dir, 'libbpf') | path_join }}" - libbpf_version: "v0.6.1" - block: - - name: clone libbpf repo - git: - repo: "{{ libbpf_git_url }}" - dest: "{{ libbpf_dir }}" - version: "{{ libbpf_version }}" - force: yes - - - name: install libelf-dev for the dependency - apt: - name: - - libelf-dev - - pkg-config - state: present - - - name: build libbpf - make: - chdir: "{{ libbpf_dir }}/src" - - - name: install libbpf - make: - target: install - chdir: "{{ libbpf_dir }}/src" - - - name: Add /usr/lib64 to ldconfig - command: ldconfig - changed_when: true - - - name: Set cndp build environment - set_fact: - cndp_build_env: - PKG_CONFIG_PATH: "/usr/lib64/pkgconfig" - when: - - ansible_distribution_version < "20.10" diff --git a/roles/cndp_install/tasks/main.yml b/roles/cndp_install/tasks/main.yml deleted file mode 100644 index 213ae546..00000000 --- a/roles/cndp_install/tasks/main.yml +++ /dev/null @@ -1,176 +0,0 @@ -## -## Copyright (c) 2020-2023 Intel Corporation. -## -## Licensed under the Apache License, Version 2.0 (the "License"); -## you may not use this file except in compliance with the License. -## You may obtain a copy of the License at -## -## http://www.apache.org/licenses/LICENSE-2.0 -## -## Unless required by applicable law or agreed to in writing, software -## distributed under the License is distributed on an "AS IS" BASIS, -## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -## See the License for the specific language governing permissions and -## limitations under the License. -## ---- -- name: install dependencies - include_role: - name: install_dependencies - -- name: install libbpf - include_tasks: "install_libbpf_{{ ansible_distribution | lower }}.yml" - when: ansible_distribution | lower == "ubuntu" - -- name: clone Intel CNDP repository - block: - - name: clone Intel CNDP repository - git: - repo: "{{ intel_cndp_git_url }}" - version: "{{ intel_cndp_version }}" - dest: "{{ intel_cndp_dir }}" - force: yes - - - name: Set fact with correct dir - set_fact: - cndp_lib_dest_dir: "{{ project_root_dir | regex_replace('\\/$', '') }}" - - - name: Replace /tmp dir with {{ cndp_lib_dest_dir }} - replace: - path: "{{ item }}" - regexp: '^(.*)\/tmp(.*)$' - replace: '\1{{ cndp_lib_dest_dir }}\2' - with_items: - - "{{ (intel_cndp_dir, 'tools', 'mklib') | path_join }}.sh" - - "{{ (intel_cndp_dir, 'meson') | path_join }}.build" - - # Suggested from CNDP product for a SPR feature on RHEL / ROCKY >= 9.0 - # `uintr` flag from the compiler, the compiler is reporting the `uintr` flag when `uintr` is not supported - # CNDP team advised to remove this flag for Rocky / RHEL - # make install not working on RHEL / ROCKY >= 9.0 need to build it with meson (ninja install) - # Ansible builtin module replace / lineinfile not responding as expected so replacing sed command as an alternative - - name: delete lines from meson.build on RHEL / ROCKY - command: "sed -i '201d;202d;203d;204d;205d' meson.build" # noqa command-instead-of-module - args: - chdir: "{{ intel_cndp_dir }}" - changed_when: true - register: test - when: - - ansible_os_family == "RedHat" - - ansible_distribution_version >= "9.0" - - - name: block for install CNDP - block: - - name: build cndp - make: - chdir: "{{ intel_cndp_dir }}" - - - name: install cndp - make: - target: install - chdir: "{{ intel_cndp_dir }}" - when: - - cndp_enabled | default(false) - -# TODO: Check if docker daemon is managed by systemd; get option file from DropInPaths attribute using systemctl show -P DropInPaths docker command - -- name: set memlock limit for docker containers via docker-options.conf of docker.service - block: - - name: check if --default-ulimit memlock=-1:-1 option does not exist - lineinfile: - path: "{{ docker_options_conf_file }}" - state: absent - regex: '.*--default-ulimit memlock=-1:-1.*$' - check_mode: yes - register: memlock_ulimit_option - changed_when: not memlock_ulimit_option.changed # set changed to true when option does not exist - - - name: check if DOCKER_OPTS is formatted in single line in the file - lineinfile: - path: "{{ docker_options_conf_file }}" - state: absent - regex: '^Environment="DOCKER_OPTS=.*"$' - check_mode: yes - register: docker_opts_oneline - - - name: add --default-ulimit memlock=-1:-1 to DOCKER_OPTS when it is formatted in single line - lineinfile: - path: "{{ docker_options_conf_file }}" - backrefs: yes - regexp: '^(Environment="DOCKER_OPTS=.*)(")$' - line: '\1 --default-ulimit memlock=-1:-1\2' - when: - - memlock_ulimit_option.changed - - docker_opts_oneline.changed - notify: restart docker - - - name: check if DOCKER_OPTS is spread over multi lines in the file - lineinfile: - path: "{{ docker_options_conf_file }}" - state: absent - regex: '^Environment="DOCKER_OPTS=.*\\s*$' - check_mode: yes - register: docker_opts_multiline - - - name: add --default-ulimit memlock=-1:-1 to DOCKER_OPTS when it is spread over multi lines - lineinfile: - path: "{{ docker_options_conf_file }}" - insertafter: 'Environment="DOCKER_OPTS=' - line: '--default-ulimit memlock=-1:-1 \' - when: - - memlock_ulimit_option.changed - - docker_opts_multiline.changed - notify: restart docker - - - name: docker daemon.json configuration - copy: - src: "{{ daemon_conf_file }}" - dest: "{{ daemon_conf_file_directory }}" - mode: 0644 - notify: reload docker - when: - - cndp_enabled | default(false) - - container_runtime == "docker" - -- name: set memlock limit for containerd via limits.conf of containerd.service - block: - - name: check if limits.conf file exists - stat: - path: "{{ containerd_options_conf_file }}" - register: containerd_conf_file - - - name: create limits.conf when it doesn't exist - template: - src: limits.conf.j2 - dest: "{{ containerd_options_conf_file }}" - mode: 0644 - notify: restart containerd - when: not containerd_conf_file.stat.exists - - when: - - cndp_enabled | default(false) - - container_runtime == "containerd" - -- name: set memlock limit for cri-o via limits.conf of crio.service - block: - - name: check if limits.conf file exists - stat: - path: "{{ crio_options_conf_file }}" - register: crio_conf_file - - - name: create limits.conf when it doesn't exist - template: - src: limits.conf.j2 - dest: "{{ crio_options_conf_file }}" - mode: 0644 - notify: restart crio - when: not crio_conf_file.stat.exists - - when: - - cndp_enabled | default(false) - - container_runtime == "crio" - -- name: wait for resources to be ready - include_tasks: wait_for_resources.yml - when: - - cndp_enabled | default(false) diff --git a/roles/cndp_install/templates/limits.conf.j2 b/roles/cndp_install/templates/limits.conf.j2 deleted file mode 100644 index bf022036..00000000 --- a/roles/cndp_install/templates/limits.conf.j2 +++ /dev/null @@ -1,2 +0,0 @@ -[Service] -LimitMEMLOCK=infinity diff --git a/roles/collectd_install/defaults/main.yml b/roles/collectd_install/defaults/main.yml index f0037943..f58a6eaf 100644 --- a/roles/collectd_install/defaults/main.yml +++ b/roles/collectd_install/defaults/main.yml @@ -63,24 +63,7 @@ pkgpower_dir: "{{ (project_root_dir, 'commspowermanagement') | path_join }}" # currently excluded plugins were not delivered with latest stable # opnfv/barometer-collectd image (digest sha256:ed5c574f653e) collectd_plugins: - basic: - - logfile - - cpu - - cpufreq - - disk - - ipmi - - numa - - ethstat - - netlink - - intel_pmu - - rdt - - pkgpower - - unixsock - - network - - turbostat - # - write_http - # - smart - on_prem: + basic: &basic - logfile - cpu - cpufreq @@ -97,103 +80,29 @@ collectd_plugins: - turbostat # - write_http # - smart + on_prem: &on_prem + - *basic access: - - logfile - - cpu - - cpufreq - - disk - - ipmi - - numa - - ethstat - - netlink - - intel_pmu - - rdt - - pkgpower + - *basic - dpdk_telemetry - hugepages - - unixsock - - network - - turbostat - # - write_http - # - smart remote_fp: - - logfile - - cpu - - cpufreq - - disk - - ipmi - - numa - - ethstat - - netlink - - intel_pmu - - rdt - - pkgpower - - unixsock - - network - - turbostat - # - write_http - # - smart + - *basic regional_dc: - - logfile - - cpu - - cpufreq - - disk - - ipmi - - numa - - ethstat - - netlink - - intel_pmu - - rdt - - pkgpower - - unixsock - - network - - turbostat - # - write_http - # - smart - full_nfv: - - logfile - - cpu - - cpufreq - - disk - - ipmi - - numa - - ethstat - - netlink - - intel_pmu - - rdt - - pkgpower + - *basic + full_nfv: &full_nfv + - *basic - dpdk_telemetry - hugepages - ovs_events - ovs_pmd_stats - ovs_stats - - unixsock - - network - - turbostat - # - write_http - # - smart build_your_own: - - logfile - - cpu - - cpufreq - - disk - - ipmi - - numa - - ethstat - - netlink - - intel_pmu - - rdt - - pkgpower - - dpdk_telemetry - - hugepages - - ovs_events - - ovs_pmd_stats - - ovs_stats - - unixsock - - network - - turbostat - # - write_http - # - smart + - *full_nfv + on_prem_vss: + - *on_prem + on_prem_industrial: + - *on_prem # List of plugins that will be excluded from collectd deployment. exclude_collectd_plugins: [] diff --git a/roles/collectd_install/tasks/copy-configs.yml b/roles/collectd_install/tasks/copy-configs.yml index 37c2d035..9f8b759e 100644 --- a/roles/collectd_install/tasks/copy-configs.yml +++ b/roles/collectd_install/tasks/copy-configs.yml @@ -36,6 +36,20 @@ exclude_collectd_plugins: "{{ exclude_collectd_plugins + [ 'rdt' ] }}" when: not enable_intel_rdt_plugin +- name: set rdt monitored cores to avoid core dump on ICX + block: + - name: get cores number + shell: "set -o pipefail && cat /proc/cpuinfo | grep processor | wc -l" + args: + executable: /bin/bash + register: cores_number + changed_when: false + + - name: set intel_rdt_plugin_monitored_cores if using default value + set_fact: + intel_rdt_plugin_monitored_cores: "0-{{ cores_number.stdout|int - 1 }}" + when: enable_intel_rdt_plugin and intel_rdt_plugin_monitored_cores == "" + - name: disable pkgpower plugin set_fact: exclude_collectd_plugins: "{{ exclude_collectd_plugins + [ 'pkgpower' ] }}" @@ -43,7 +57,7 @@ - name: prepare list of plugins to be deployed set_fact: - plugins: "{{ collectd_plugins[collectd_profile | default('basic')] | difference(exclude_collectd_plugins) }}" + plugins: "{{ collectd_plugins[profile_name] | flatten | difference(exclude_collectd_plugins) }}" - name: rename ipmi to 0_ipmi to ensure that ipmi plugin will be loaded first (https://sourceforge.net/p/openipmi/bugs/86/) set_fact: diff --git a/roles/collectd_install/tasks/preflight.yml b/roles/collectd_install/tasks/preflight.yml new file mode 100644 index 00000000..e0400946 --- /dev/null +++ b/roles/collectd_install/tasks/preflight.yml @@ -0,0 +1,22 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +- name: Check deployment profile exists in collectd plugins selection list + ansible.builtin.assert: + that: + - collectd_plugins[profile_name] is defined + msg: + - Deployment profile '{{ profile_name }}' has no collectd plugins selection defined. + - Please define collectd plugins selection for the current profile in {{ role_name }} role defaults. diff --git a/roles/configure_dpdk/defaults/main.yml b/roles/configure_dpdk/defaults/main.yml new file mode 100644 index 00000000..6098b17b --- /dev/null +++ b/roles/configure_dpdk/defaults/main.yml @@ -0,0 +1,21 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- + +config_dpdk_bind_nic_type: "E810" +config_dpdk_bind_drv_type: "vfio-pci" +config_dpdk_bind_port_offset: 0 +config_dpdk_bind_port_count: 2 diff --git a/roles/configure_dpdk/files/cek_config_dpdk.service b/roles/configure_dpdk/files/cek_config_dpdk.service new file mode 100644 index 00000000..6e87eec4 --- /dev/null +++ b/roles/configure_dpdk/files/cek_config_dpdk.service @@ -0,0 +1,13 @@ +[Unit] +Description=CEK Config DPDK Service +AssertPathExists=/usr/local/bin/cek_config_dpdk.sh +After=network.target + +[Service] +Type=oneshot +RemainAfterExit=true +#ExecStartPre=/bin/sleep 10 +ExecStart=/usr/local/bin/cek_config_dpdk.sh + +[Install] +WantedBy=multi-user.target diff --git a/roles/configure_dpdk/files/cek_config_dpdk.sh b/roles/configure_dpdk/files/cek_config_dpdk.sh new file mode 100644 index 00000000..7cb7384a --- /dev/null +++ b/roles/configure_dpdk/files/cek_config_dpdk.sh @@ -0,0 +1,5 @@ +#!/bin/bash + +echo "cek config dpdk service starts" +python /usr/local/bin/cek_config_dpdk_rebind.py +echo "cek config dpdk service exits" diff --git a/roles/configure_dpdk/files/cek_config_dpdk_bind.py b/roles/configure_dpdk/files/cek_config_dpdk_bind.py new file mode 100644 index 00000000..e3d579f4 --- /dev/null +++ b/roles/configure_dpdk/files/cek_config_dpdk_bind.py @@ -0,0 +1,275 @@ +# +# Copyright (c) 2023 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import sys +import re + +import cek_config_dpdk_util as util + +def dpdk_bind_port(nic_type, new_drv, port_offset, port_count) : + print('dpdk_bind_port(' + nic_type + ' ' + new_drv + ' ' + port_offset + ' ' + port_count + ')') + + ret = 0 + idx1 = 0 + idx2 = 0 + nic_lines = None + + offset = int(port_offset) + count = int(port_count) + if ( util.validate_nic_type(nic_type) and util.validate_drv(new_drv) and + (offset >= 0) and (count > 0) ) : + cmd = 'dpdk-devbind.py --status-dev net | grep ' + nic_type + print('execute cmd : ' + cmd) + result = os.popen(cmd) + info_list = result.read() + if result is not None : + result.close() + + nic_lines = info_list.splitlines() + idx1 = max(0, offset) + idx2 = min(offset + count, len(nic_lines)) + print('bind range ' + str(idx1) + " : " + str(idx2) ) + else : + print ("invalid input parameter, do nothing") + return -1 + + if idx1 >= idx2 : + print("bind range is null, do nothing") + return 0 + + conf_filename = '/etc/network_env.conf' + restore_conf_filename = '/etc/network_restore.conf' + + conf_port_lines = [] + conf_mac_lines = [] + conf_restore_lines = [] + + # check whether conf files exsit + append_conf = 0 + if os.path.exists(conf_filename) : + conf_flag = 'r+' + append_conf = 1 + else : + conf_flag = 'w' + + append_restore_conf = 0 + if os.path.exists(restore_conf_filename): + restore_conf_flag = 'r+' + append_restore_conf = 1 + else : + restore_conf_flag = 'w' + + with open(conf_filename, conf_flag) as conf : + with open(restore_conf_filename, restore_conf_flag) as restore_conf : + + # read existing conf files + if append_conf : + conf.seek(0) + lines = conf.readlines() + for line in lines: + if re.match(r'^dpdk_port[0-9]*=', line) : + conf_port_lines.append(line) + elif re.match(r'^dpdk_port[0-9]*_srcmac=', line) : + conf_mac_lines.append(line) + + if append_restore_conf : + restore_conf.seek(0) + conf_restore_lines = restore_conf.readlines() + + # loop nic devices for procession + idx = 0 + dpdk_idx = 1 + for idx in range(idx1, idx2) : + nic_line = nic_lines[idx] + bdf = nic_line[ : nic_line.find(' ')] + print("bdf : " + bdf) + + # check whether bdf already in conf + conf_exists = 0 + restore_line = "" + conf_line_count = len(conf_restore_lines) + for conf_line_idx in range(0, conf_line_count): + restore_line = conf_restore_lines[conf_line_idx] + if bdf in restore_line : + print("conf exists for " + bdf + " in line " + str(conf_line_idx) + " :") + print(restore_line) + conf_exists = 1 + break + + if conf_exists : + # restore conf line example : + # 0000:ca:00.0 dpdk_port=1 if=ens25f0 curr_drv=vfio-pci prev_drv=ice prev_active=1 + # 0000:ca:00.1 dpdk_port=2 if=ens25f1 curr_drv=vfio-pci prev_drv=ice prev_active=0 + items = restore_line.split(' ') + dpdk_port_str = items[1] + dpdk_port = dpdk_port_str[dpdk_port_str.find("=")+1 : ] + dpdk_idx = int(dpdk_port) + + dev_str = items[2] + dev = dev_str[dev_str.find("=")+1 : ] + + curr_drv_str = items[3] + curr_drv = curr_drv_str[curr_drv_str.find("=")+1 : ] + + prev_drv_str = items[4] + prev_drv = prev_drv_str[prev_drv_str.find("=")+1 : ] + + prev_act_str = items[5] + prev_act = int(prev_act_str[prev_act_str.find("=")+1 : ]) + + if curr_drv != new_drv : + # if driver change, rebind and update current conf line + if ( util.validate_bdf(bdf) and util.validate_drv(new_drv)) : + cmd = 'dpdk-devbind.py -u' + ' ' + bdf + print('execute cmd : ' + cmd) + ret = os.system(cmd) + + cmd = 'dpdk-devbind.py --bind=' + new_drv + ' ' + bdf + print('execute cmd : ' + cmd) + ret = os.system(cmd) + + curr_drv = new_drv + + new_restore_line = bdf + \ + " dpdk_port=" + str(dpdk_idx) + \ + " if=" + dev + \ + " curr_drv=" + curr_drv + \ + " prev_drv=" + prev_drv + \ + " prev_active=" + str(prev_act) + "\n" + print("new restore line : " + new_restore_line) + + conf_restore_lines[conf_line_idx] = new_restore_line + else : + print("no action for same conf") + ret = 0 + + else : + # nic line examples : + # 0000:ca:00.0 'Ethernet Controller E810-C for QSFP 1592' drv=vfio-pci unused=ice + # 0000:ca:00.1 'Ethernet Controller E810-C for QSFP 1592' drv=vfio-pci unused=ice + # 0000:4b:00.0 'I350 Gigabit Network Connection 1521' if=ens9f0 drv=igb unused=vfio-pci *Active* + # 0000:4b:00.1 'I350 Gigabit Network Connection 1521' if=ens9f1 drv=igb unused=vfio-pci + items = nic_line.split(' ') + act = 0 + dev_ready = 0 + for item in items : + #print(item) + if 'if=' in item : + dev = item[item.find('=')+1 : ] + #print(dev) + dev_ready = 1 + if 'drv=' in item : + drv = item[item.find('=')+1 : ] + #print(drv) + if '*Active*' in item : + act = 1 + #print(act) + + if dev_ready : + # get mac address + if util.validate_dev(dev) : + cmd = 'ifconfig ' + dev + ' | grep ether' + print('execute cmd : ' + cmd) + result2 = os.popen(cmd) + info_list2 = result2.read().split() + mac = '0x' + info_list2[1].replace(':', ',0x') + if result2 is not None : + result2.close() + + # down dev before dpdk bind + cmd = 'ifconfig ' + dev + ' down' + print('execute cmd : ' + cmd) + ret = os.system(cmd) + else : + dev = '' + mac = 'unknown' + else : + dev = '' + mac = 'unknown' + + if util.validate_bdf(bdf) and util.validate_drv(new_drv) : + # dpdk bind + cmd = 'dpdk-devbind.py -u' + ' ' + bdf + print('execute cmd : ' + cmd) + ret = os.system(cmd) + + cmd = 'dpdk-devbind.py --bind=' + new_drv + ' ' + bdf + print('execute cmd : ' + cmd) + ret = os.system(cmd) + + # append new conf lines + dpdk_idx = 1 + len(conf_port_lines) + + new_port_line = "dpdk_port" + str(dpdk_idx) + "=" + bdf +"\n" + print("new conf line : " + new_port_line) + + new_mac_line = "dpdk_port" + str(dpdk_idx) + "_srcmac=" + mac + "\n" + print("new mac line : " + new_mac_line) + + + new_restore_line = bdf + \ + " dpdk_port=" + str(dpdk_idx) + \ + " if=" + dev + \ + " curr_drv=" + new_drv + \ + " prev_drv=" + drv + \ + " prev_active=" + str(act) + "\n" + print("new restore line : " + new_restore_line) + + conf_port_lines.append(new_port_line) + conf_mac_lines.append(new_mac_line) + conf_restore_lines.append(new_restore_line) + + conf.seek(0) + conf.writelines(conf_port_lines) + conf.writelines(conf_mac_lines) + + restore_conf.seek(0) + restore_conf.writelines(conf_restore_lines) + + return ret + + +# Usage : +# cek_config_dpdk_bind.py nic_type drv_type port_offset port_count +# +# Examples : +# cek_dpdk_bind_port.py E810 vfio-pci 0 2 +# => bind E810 card port 0 and port 1 to vfio-pci driver +# +# cek_dpdk_bind_port.py E810 vfio-pci 1 1 +# => bind E810 card port 1 to vfio-pci driver +# +# Result : +# If bind related nic port to dpdk success, return 0; +# Else return error value (< 0); +# +# If success, will generate below files for user. +# /etc/network_env.conf and +# /etc/network_restore.conf +# /etc/network_env.conf content example as below : +# dpdk_port1=0000:4b:00.0 +# dpdk_port2=0000:4b:00.1 +# dpdk_port1_srcmac=0xb4,0x96,0x91,0xb2,0xa6,0x48 +# dpdk_port2_srcmac=0xb4,0x96,0x91,0xb2,0xa6,0x49 +# /etc/network_restore.conf content example as below : +# 0000:ca:00.0 dpdk_port=1 if=ens25f0 curr_drv=vfio-pci prev_drv=ice prev_active=1 +# 0000:ca:00.1 dpdk_port=2 if=ens25f1 curr_drv=vfio-pci prev_drv=ice prev_active=0 +# + +r = dpdk_bind_port(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4]) +print("dpdk_bind_port() return " + str(r)) +sys.exit(r) diff --git a/roles/configure_dpdk/files/cek_config_dpdk_link.py b/roles/configure_dpdk/files/cek_config_dpdk_link.py new file mode 100644 index 00000000..b028fd5f --- /dev/null +++ b/roles/configure_dpdk/files/cek_config_dpdk_link.py @@ -0,0 +1,103 @@ +# +# Copyright (c) 2023 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import sys + +def dpdk_link_port(filename1, filename2): + print("dpdk_link_port : " + filename1 + " " + filename2) + + ret = 0 + + with open(filename1, 'r+') as file1: + with open(filename2, 'r+') as file2: + append1 = [] + append2 = [] + + file1.seek(0) + lines1 = file1.readlines() + #print(lines1) + for line in lines1 : + if 'srcmac' in line : + new_line = line.replace('srcmac', 'destmac') + append2 = append2 + [new_line] + + file2.seek(0) + lines2 = file2.readlines() + #print(lines2) + for line in lines2 : + if 'srcmac' in line : + new_line = line.replace('srcmac', 'destmac') + append1 = append1 + [new_line] + + #print(append1) + #print(append2) + for line in append1 : + if line not in lines1 : + file1.write(line) + + for line in append2 : + if line not in lines2 : + file2.write(line) + + return ret + + +# Usage : +# cek_config_dpdk_link.py {filname1} {filename2} +# +# Examples : +# dpdk_bind_port.py network_src.conf network_dst.conf +# +# Result : +# If link success, return 0; +# Else return error value (< 0); +# +# If success, will : +# 1)Append src mac in file 1 to file 2 as dst mac. +# 2)Append src mac in file 2 to file 1 as dst mac. +# +# Input file content may like this : +# file 1: +# dpdk_port1=0000:ca:00.0 +# dpdk_port2=0000:ca:00.1 +# dpdk_port1_srcmac=0x6c,0xfe,0x54,0x41,0x13,0x20 +# dpdk_port2_srcmac=0x6c,0xfe,0x54,0x41,0x13,0x21 +# file 2 : +# dpdk_port1=0000:ca:00.0 +# dpdk_port2=0000:ca:00.1 +# dpdk_port1_srcmac=0x6c,0xfe,0x54,0x40,0xe6,0xe0 +# dpdk_port2_srcmac=0x6c,0xfe,0x54,0x40,0xe6,0xe1 +# +# After link, the file content may become : +# file 1: +# dpdk_port1=0000:ca:00.0 +# dpdk_port2=0000:ca:00.1 +# dpdk_port1_srcmac=0x6c,0xfe,0x54,0x41,0x13,0x20 +# dpdk_port2_srcmac=0x6c,0xfe,0x54,0x41,0x13,0x21 +# dpdk_port1_destmac=0x6c,0xfe,0x54,0x40,0xe6,0xe0 +# dpdk_port2_destmac=0x6c,0xfe,0x54,0x40,0xe6,0xe1 +# file 2 : +# dpdk_port1=0000:ca:00.0 +# dpdk_port2=0000:ca:00.1 +# dpdk_port1_srcmac=0x6c,0xfe,0x54,0x40,0xe6,0xe0 +# dpdk_port2_srcmac=0x6c,0xfe,0x54,0x40,0xe6,0xe1 +# dpdk_port1_destmac=0x6c,0xfe,0x54,0x41,0x13,0x20 +# dpdk_port2_destmac=0x6c,0xfe,0x54,0x41,0x13,0x21 +# + +r = dpdk_link_port(sys.argv[1], sys.argv[2]) +print("dpdk_link_port() return " + str(r)) +sys.exit(r) diff --git a/roles/configure_dpdk/files/cek_config_dpdk_rebind.py b/roles/configure_dpdk/files/cek_config_dpdk_rebind.py new file mode 100644 index 00000000..ec43ea33 --- /dev/null +++ b/roles/configure_dpdk/files/cek_config_dpdk_rebind.py @@ -0,0 +1,75 @@ +# +# Copyright (c) 2023 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import sys + +import cek_config_dpdk_util as util + +def dpdk_rebind_port() : + ret = 0 + + restore_conf_filename = '/etc/network_restore.conf' + if os.path.exists(restore_conf_filename) : + print(restore_conf_filename + " exists.") + + with open(restore_conf_filename, 'r') as conf : + # restore conf file content likes below : + # 0000:ca:00.0 dpdk_port=1 if=ens25f0 curr_drv=vfio-pci prev_drv=ice prev_active=1 + # 0000:ca:00.1 dpdk_port=2 if=ens25f1 curr_drv=vfio-pci prev_drv=ice prev_active=0 + lines = conf.readlines() + for line in lines : + print(line) + + items = line.split(' ') + + bdf = items[0] + + dev_str = items[2] + dev = dev_str[dev_str.find("=")+1 : ] + + curr_drv_str = items[3] + curr_drv = curr_drv_str[curr_drv_str.find("=")+1 : ] + + # down dev before dpdk bind + if dev != "" : + if util.validate_dev(dev) : + cmd = 'ifconfig ' + dev + ' down' + print('execute cmd : ' + cmd) + ret = os.system(cmd) + + # dpdk bind + if util.validate_drv(curr_drv) and util.validate_bdf(bdf) : + cmd = 'dpdk-devbind.py --bind=' + curr_drv + ' ' + bdf + print('execute cmd : ' + cmd) + ret = os.system(cmd) + + else : + print(restore_conf_filename + " does not exist.") + + return ret + +# Usage : +# cek_config_dpdk_rebind.py +# +# Result : +# It will check whether /etc/network_restore.conf exists. +# If the restore conf file exists, it will rebind dpdk with NIC dev +# according to the file content. +# +r = dpdk_rebind_port() +print("dpdk_rebind_port() return " + str(r)) +sys.exit(r) diff --git a/roles/configure_dpdk/files/cek_config_dpdk_unbind.py b/roles/configure_dpdk/files/cek_config_dpdk_unbind.py new file mode 100644 index 00000000..4187dadc --- /dev/null +++ b/roles/configure_dpdk/files/cek_config_dpdk_unbind.py @@ -0,0 +1,85 @@ +# +# Copyright (c) 2023 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import sys + +import cek_config_dpdk_util as util + +def dpdk_unbind_port() : + + ret = 0 + conf_filename = '/etc/network_env.conf' + restore_conf_filename = '/etc/network_restore.conf' + + with open(restore_conf_filename, 'r') as restore_conf : + lines = restore_conf.readlines() + for line in lines : + # network_restore.conf conent likes below : + # 0000:ca:00.0 dpdk_port=1 if=ens25f0 curr_drv=vfio-pci prev_drv=ice prev_active=1 + # 0000:ca:00.1 dpdk_port=2 if=ens25f1 curr_drv=vfio-pci prev_drv=ice prev_active=0 + print(line) + items = line.split(' ') + + bdf = items[0] + + dev_str = items[2] + dev = dev_str[dev_str.find("=")+1 : ] + + #curr_drv_str = items[2] + #curr_drv = curr_drv_str[curr_drv_str.find("=")+1 : ] + + prev_drv_str = items[4] + prev_drv = prev_drv_str[prev_drv_str.find("=")+1 : ] + + if 'pre_active=1' in items[4]: + act = 1 + else : + act = 0 + + if util.validate_drv(prev_drv) and util.validate_bdf(bdf) : + cmd = 'dpdk-devbind.py -u ' + bdf + print('execute cmd : ' + cmd) + ret = os.system(cmd) + + cmd = 'dpdk-devbind.py --bind=' + prev_drv + ' ' + bdf + print('execute cmd : ' + cmd) + ret = os.system(cmd) + + if dev != "" : + if act and util.validate_dev(dev) : + cmd = 'ifconfig ' + dev + ' up' + print('execute cmd : ' + cmd) + + os.remove(conf_filename) + os.remove(restore_conf_filename) + + return ret + + +# Usage : +# cek_config_dpdk_unbind.py +# +# Result : +# If will read /etc/network_restore.conf, to resotre nic +# to previous status before calling cek_config_dpdk_bind.py. +# After status restore, the /etc/network_restore.conf file +# will be removed. +# + +r = dpdk_unbind_port() +print("dpdk_unbind_port() return " + str(r)) +sys.exit(r) diff --git a/roles/configure_dpdk/files/cek_config_dpdk_util.py b/roles/configure_dpdk/files/cek_config_dpdk_util.py new file mode 100644 index 00000000..5955d6b7 --- /dev/null +++ b/roles/configure_dpdk/files/cek_config_dpdk_util.py @@ -0,0 +1,56 @@ +# +# Copyright (c) 2023 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import re + +def validate_nic_type(nic_type) : + ret = False + if re.match(r'^[A-Za-z0-9]+$', nic_type) : + ret = True + else : + print("invalid nic_type : " + nic_type) + return ret + +def validate_bdf(bdf) : + ret = False + if re.match(r'^([0-9a-f]{4}:){0,1}[0-9a-f]{2,4}:[0-9a-f]{2}\.[0-9a-f]$', bdf) : + ret = True + else : + print("invalid bdf : " + bdf) + return ret + +def validate_drv(drv) : + ret = False + nic_drvs = ["i40e", "ice", "iavf", "vfio-pci", "igb_uio"] + if drv in nic_drvs : + ret = True + else : + print("invalid drv : " + drv) + return ret + +def validate_dev(dev) : + ret = False + if re.match(r'^[A-Za-z0-9]+$', dev) : + ret = True + else : + print("invalid dev : " + dev) + return ret + +def validate_conf_name(filename): + ret = False + if re.match(r'^/etc/network_[A-Za-z0-9]+.conf', filename): + ret = True + return ret diff --git a/roles/configure_dpdk/tasks/cleanup.yml b/roles/configure_dpdk/tasks/cleanup.yml new file mode 100644 index 00000000..4edd7304 --- /dev/null +++ b/roles/configure_dpdk/tasks/cleanup.yml @@ -0,0 +1,60 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- block: + - name: check whether service exists + stat: + path: '/lib/systemd/system/cek_config_dpdk.service' + register: config_dpdk_service_status + + - name: disable that service which will do dpdk rebind after reboot + systemd: + name: cek_config_dpdk + state: stopped + enabled: false + daemon_reload: true + when: config_dpdk_service_status.stat.exists + become: true + + - name: check whether unbind script exists + stat: + path: '/usr/local/bin/cek_config_dpdk_unbind.py' + register: config_dpdk_unbind_script_status + + - block: + - name: execute dpdk unbind script + command: "python /usr/local/bin/cek_config_dpdk_unbind.py" + register: dpdk_unbind_result + changed_when: false + + - name: output dpdk unbind result + debug: + msg: "{{ dpdk_unbind_result.stdout_lines }}" + when: config_dpdk_unbind_script_status.stat.exists + + - name: remove config dpdk scripts + file: + path: "{{ item }}" + state: absent + with_items: + - '/usr/local/bin/cek_config_dpdk_bind.py' + - '/usr/local/bin/cek_config_dpdk_rebind.py' + - '/usr/local/bin/cek_config_dpdk_link.py' + - '/usr/local/bin/cek_config_dpdk_unbind.py' + - '/usr/local/bin/cek_config_dpdk_util.py' + - '/usr/local/bin/cek_config_dpdk.sh' + - '/lib/systemd/system/cek_config_dpdk.service' + become: true diff --git a/roles/configure_dpdk/tasks/main.yml b/roles/configure_dpdk/tasks/main.yml new file mode 100644 index 00000000..430424d2 --- /dev/null +++ b/roles/configure_dpdk/tasks/main.yml @@ -0,0 +1,110 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- + +# dpdk dev bind block +- block: + - name: install config dpdk scripts to /usr/local/bin + copy: + src: "{{ item }}" + dest: "/usr/local/bin/" + mode: 0700 + owner: root + group: root + force: true + with_items: + - 'cek_config_dpdk_bind.py' + - 'cek_config_dpdk_rebind.py' + - 'cek_config_dpdk_link.py' + - 'cek_config_dpdk_unbind.py' + - 'cek_config_dpdk_util.py' + - 'cek_config_dpdk.sh' + become: true + + - name: execute dpdk bind script + command: "python /usr/local/bin/cek_config_dpdk_bind.py + {{ config_dpdk_bind_nic_type }} + {{ config_dpdk_bind_drv_type }} + {{ config_dpdk_bind_port_offset }} + {{ config_dpdk_bind_port_count }}" + register: dpdk_bind_result + changed_when: false + + - name: output dpdk bind result + debug: + msg: "{{ dpdk_bind_result.stdout_lines }}" + + - name: install config dpdk service to /lib/systemd/system + copy: + src: "cek_config_dpdk.service" + dest: /lib/systemd/system/ + owner: root + group: root + mode: '0644' + become: true + + - name: enable the service which will do dpdk rebind after reboot + systemd: + name: cek_config_dpdk + state: started + enabled: true + daemon_reload: true + become: true + + when: + - dyna_config_dpdk_bind | default(false) | bool + + +# dpdk dev link block, to match the real network connection +- block: + - name: push network_env.conf from node2 to node1 + synchronize: + src: /etc/network_env.conf + dest: /etc/network_dst.conf + mode: push + delegate_to: "{{ dpdk_link_node2 }}" + + - name: execute dpdk like script on node1 + command: "python /usr/local/bin/cek_config_dpdk_link.py + /etc/network_env.conf /etc/network_dst.conf" + register: dpdk_link_result + changed_when: false + + - name: output dpdk link result + debug: + msg: "{{ dpdk_link_result.stdout_lines }}" + + - name: pull network_dst.conf from node1 to node2 + synchronize: + src: /etc/network_dst.conf + dest: /etc/network_env.conf + mode: pull + delegate_to: "{{ dpdk_link_node2 }}" + + - name: remove network_dst.conf on node1 + file: + path: /etc/network_dst.conf + state: absent + + when: + - dyna_config_dpdk_link | default(false) | bool + - inventory_hostname == dpdk_link_node1 + +# dpdk dev unbind block +- block: + - import_tasks: cleanup.yml + when: + - dyna_config_dpdk_unbind | default(false) | bool diff --git a/roles/container_engine/containerd/defaults/main.yml b/roles/container_engine/containerd/defaults/main.yml index 9d4fdae7..5fc71cb8 100644 --- a/roles/container_engine/containerd/defaults/main.yml +++ b/roles/container_engine/containerd/defaults/main.yml @@ -14,8 +14,8 @@ ## limitations under the License. ## --- -containerd_version: 1.6.16 -containerd_archive_checksum: "2415b431a900275c14942f87f751e1e13d513c1c2f062322b5ca5a9a2190f22a" +containerd_version: 1.7.0 +containerd_archive_checksum: "b068b05d58025dc9f2fc336674cac0e377a478930f29b48e068f97c783a423f0" containerd_download_url: "https://github.com/containerd/containerd/releases/download/v{{ containerd_version }}/containerd-{{ containerd_version }}-linux-amd64.tar.gz" # noqa yaml[line-length] containerd_bin_dir: "/usr/local/bin" diff --git a/roles/container_engine/crio/defaults/main.yml b/roles/container_engine/crio/defaults/main.yml index 2c6a951e..f71dedc6 100644 --- a/roles/container_engine/crio/defaults/main.yml +++ b/roles/container_engine/crio/defaults/main.yml @@ -24,9 +24,9 @@ crio_log_level: "info" crio_metrics_port: "9090" crio_pause_image: "k8s.gcr.io/pause:3.3" -crio_version: "v1.26.0" +crio_version: "v1.26.3" crio_download_url: "https://storage.googleapis.com/cri-o/artifacts/cri-o.amd64.{{ crio_version }}.tar.gz" -crio_archive_checksums: "79837d8b7af95547b92dbab105268dd6382ce2a7afbddad93cc168ab0ca766c8" +crio_archive_checksums: "942772081d9cd4bd0c07e466439b76a1ca95d3f10a7b53dc524d2946b2b17a71" crio: version: "{{ crio_version }}" diff --git a/roles/container_registry/charts/container-registry/Chart.yaml b/roles/container_registry/charts/container-registry/Chart.yaml new file mode 100644 index 00000000..15285007 --- /dev/null +++ b/roles/container_registry/charts/container-registry/Chart.yaml @@ -0,0 +1,39 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +apiVersion: v2 +name: container-registry +description: Container registry application + +# A chart can be either an 'application' or a 'library' chart. +# +# Application charts are a collection of templates that can be packaged into versioned archives +# to be deployed. +# +# Library charts provide useful utilities or functions for the chart developer. They're included as +# a dependency of application charts to inject those utilities and functions into the rendering +# pipeline. Library charts do not define any templates and therefore cannot be deployed. +type: application + +# This is the chart version. This version number should be incremented each time you make changes +# to the chart and its templates, including the app version. +# Versions are expected to follow Semantic Versioning (https://semver.org/) +version: 1.0.0 + +# This is the version number of the application being deployed. This version number should be +# incremented each time you make changes to the application. Versions are not expected to +# follow Semantic Versioning. They should reflect the version the application is using. +# It is recommended to use it with quotes. +appVersion: "" diff --git a/roles/container_registry/charts/container-registry/templates/_helpers.tpl b/roles/container_registry/charts/container-registry/templates/_helpers.tpl new file mode 100644 index 00000000..23d124aa --- /dev/null +++ b/roles/container_registry/charts/container-registry/templates/_helpers.tpl @@ -0,0 +1,51 @@ +{{/* +Expand the name of the chart. +*/}} +{{- define "container-registry.name" -}} +{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Create a default fully qualified app name. +We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec). +If release name contains chart name it will be used as a full name. +*/}} +{{- define "container-registry.fullname" -}} +{{- if .Values.fullnameOverride }} +{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }} +{{- else }} +{{- $name := default .Chart.Name .Values.nameOverride }} +{{- if contains $name .Release.Name }} +{{- .Release.Name | trunc 63 | trimSuffix "-" }} +{{- else }} +{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }} +{{- end }} +{{- end }} +{{- end }} + +{{/* +Create chart name and version as used by the chart label. +*/}} +{{- define "container-registry.chart" -}} +{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Common labels +*/}} +{{- define "container-registry.labels" -}} +helm.sh/chart: {{ include "container-registry.chart" . }} +{{ include "container-registry.selectorLabels" . }} +{{- if .Chart.AppVersion }} +app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} +{{- end }} +app.kubernetes.io/managed-by: {{ .Release.Service }} +{{- end }} + +{{/* +Selector labels +*/}} +{{- define "container-registry.selectorLabels" -}} +app.kubernetes.io/name: {{ include "container-registry.name" . }} +app.kubernetes.io/instance: {{ .Release.Name }} +{{- end }} diff --git a/roles/container_registry/templates/container-registry/nginx-configmap.yaml.j2 b/roles/container_registry/charts/container-registry/templates/configmap-nginx.yaml similarity index 84% rename from roles/container_registry/templates/container-registry/nginx-configmap.yaml.j2 rename to roles/container_registry/charts/container-registry/templates/configmap-nginx.yaml index f4bd3459..20ebbf67 100644 --- a/roles/container_registry/templates/container-registry/nginx-configmap.yaml.j2 +++ b/roles/container_registry/charts/container-registry/templates/configmap-nginx.yaml @@ -2,7 +2,10 @@ apiVersion: v1 kind: ConfigMap metadata: - name: nginx-conf + name: {{ include "container-registry.fullname" . }}-nginx-conf + namespace: {{ .Release.Namespace }} + labels: + {{- include "container-registry.labels" . | nindent 4 }} data: nginx.conf: | events { @@ -12,7 +15,7 @@ data: http { upstream docker-registry { - server {{ registry_addr }}:{{ registry_port }}; + server 127.0.0.1:{{ .Values.registry.port }}; } ## Set a variable to help us decide if we need to add the @@ -25,17 +28,17 @@ data: } server { - listen {{ registry_proxy }} ssl; - server_name {{ registry_addr }}; + listen {{ .Values.nginx.port }} ssl; + server_name {{ .Values.registry.listen_addr }}; # SSL ssl_certificate /etc/nginx/conf.d/tls.crt; ssl_certificate_key /etc/nginx/conf.d/tls.key; # Recommendations from https://raymii.org/s/tutorials/Strong_SSL_Security_On_nginx.html - ssl_protocols {{ nginx_ssl_protocols }}; + ssl_protocols {{ .Values.nginx.ssl_protocols }}; ssl_prefer_server_ciphers on; - ssl_ciphers {{ nginx_ssl_ciphers }}; + ssl_ciphers {{ .Values.nginx.ssl_ciphers }}; ssl_session_cache shared:SSL:10m; diff --git a/roles/container_registry/charts/container-registry/templates/configmap.yaml b/roles/container_registry/charts/container-registry/templates/configmap.yaml new file mode 100644 index 00000000..9b5a21c4 --- /dev/null +++ b/roles/container_registry/charts/container-registry/templates/configmap.yaml @@ -0,0 +1,38 @@ +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ include "container-registry.fullname" . }}-config + namespace: {{ .Release.Namespace }} + labels: + {{- include "container-registry.labels" . | nindent 4 }} +data: + config.yml: |- + health: + storagedriver: + enabled: true + interval: 10s + threshold: 3 + http: + addr: 127.0.0.1:{{ .Values.registry.port }} + headers: + X-Content-Type-Options: + - nosniff + log: + fields: + service: registry + storage: + cache: + blobdescriptor: inmemory + version: 0.1 +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ include "container-registry.fullname" . }}-probe + labels: + {{- include "container-registry.labels" . | nindent 4 }} +data: + probe.sh: |- + #!/bin/bash + wget --server-response "$1" 2>&1 | awk '/^ HTTP/{print $2}' diff --git a/roles/container_registry/templates/container-registry/deployment.yaml.j2 b/roles/container_registry/charts/container-registry/templates/deployment.yaml similarity index 51% rename from roles/container_registry/templates/container-registry/deployment.yaml.j2 rename to roles/container_registry/charts/container-registry/templates/deployment.yaml index 03c63229..df73e8d7 100644 --- a/roles/container_registry/templates/container-registry/deployment.yaml.j2 +++ b/roles/container_registry/charts/container-registry/templates/deployment.yaml @@ -1,45 +1,40 @@ --- -# Source: container-registry/templates/deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: - name: {{ release_name }} + name: {{ include "container-registry.fullname" . }} + namespace: {{ .Release.Namespace }} labels: - app: container-registry - chart: container-registry-1.9.4 - release: {{ release_name }} - heritage: Helm + {{- include "container-registry.labels" . | nindent 4 }} spec: selector: matchLabels: - app: container-registry - release: {{ release_name }} + {{- include "container-registry.selectorLabels" . | nindent 6 }} replicas: 1 minReadySeconds: 5 template: metadata: labels: - app: container-registry - release: {{ release_name }} - annotations: - checksum/config: b6512fd9a59e099a4176321795e2a88258d3210614308faf9e0fc97da6e4fb6b + {{- include "container-registry.selectorLabels" . | nindent 8 }} spec: containers: - name: nginx - image: "{{ nginx_image }}" + image: {{ .Values.nginx.image }}:{{ .Values.nginx.tag }} ports: - - containerPort: {{ registry_proxy }} + - containerPort: {{ .Values.nginx.port }} name: nginx-https volumeMounts: - name: tls - mountPath: "/etc/nginx/conf.d/" + mountPath: /etc/nginx/conf.d/ - name: nginx-conf mountPath: /etc/nginx - name: htpasswd - mountPath: "/etc/nginx/conf.d/auth" + mountPath: /etc/nginx/conf.d/auth - name: container-registry - image: "{{ registry_image }}" - imagePullPolicy: IfNotPresent + image: {{ .Values.registry.image }}:{{ .Values.registry.tag }} + ports: + - containerPort: {{ .Values.registry.port }} + name: registry-http command: - /bin/registry - serve @@ -48,35 +43,32 @@ spec: exec: command: - sh - - /tmp/probe.sh - - {{ registry_addr }}:{{ registry_port }} + - /etc/probe/probe.sh + - 127.0.0.1:{{ .Values.registry.port }} initialDelaySeconds: 5 periodSeconds: 5 readinessProbe: exec: command: - sh - - /tmp/probe.sh - - {{ registry_addr }}:{{ registry_port }} + - /etc/probe/probe.sh + - 127.0.0.1:{{ .Values.registry.port }} initialDelaySeconds: 5 periodSeconds: 5 - resources: - {} env: - name: REGISTRY_HTTP_ADDR - value: "{{ registry_addr }}:{{ registry_port }}" + value: 127.0.0.1:{{ .Values.registry.port }} - name: REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY - value: "/var/lib/registry" + value: /var/lib/registry volumeMounts: - name: data mountPath: /var/lib/registry/ - - name: "{{ release_name }}-config" - mountPath: "/etc/container/registry" + - name: config + mountPath: /etc/container/registry - name: probe - mountPath: /tmp/probe.sh - + mountPath: /etc/probe nodeSelector: - kubernetes.io/hostname: {{ hostvars[groups['kube_control_plane'][0]]['ansible_hostname'] }} + kubernetes.io/hostname: {{ .Values.node_name }} tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master @@ -87,22 +79,22 @@ spec: volumes: - name: data persistentVolumeClaim: - claimName: container-registry - - name: {{ release_name }}-config + claimName: {{ .Values.storage.pvc }} + - name: config configMap: - name: {{ release_name }}-config + name: {{ include "container-registry.fullname" . }}-config - name: htpasswd secret: - secretName: {{ release_name }}-secret + secretName: {{ .Values.secrets.htpasswd }} - name: tls secret: - secretName: {{ registry_secret_name }} + secretName: {{ .Values.secrets.tls }} - name: probe - hostPath: - path: /etc/probe.sh + configMap: + name: {{ include "container-registry.fullname" . }}-probe - name: nginx-conf configMap: - name: nginx-conf + name: {{ include "container-registry.fullname" . }}-nginx-conf items: - key: nginx.conf path: nginx.conf diff --git a/roles/container_registry/charts/container-registry/templates/service.yaml b/roles/container_registry/charts/container-registry/templates/service.yaml new file mode 100644 index 00000000..0a42fc87 --- /dev/null +++ b/roles/container_registry/charts/container-registry/templates/service.yaml @@ -0,0 +1,20 @@ +--- +apiVersion: v1 +kind: Service +metadata: + name: {{ include "container-registry.fullname" . }} + namespace: {{ .Release.Namespace }} + labels: + {{- include "container-registry.labels" . | nindent 4 }} +spec: + type: {{ .Values.service.type }} + ports: + - port: {{ default .Values.nginx.port .Values.service.port }} + protocol: TCP + name: nginx-https + targetPort: nginx-https + {{- if eq .Values.service.type "NodePort" }} + nodePort: {{ .Values.service.node_port }} + {{- end }} + selector: + {{- include "container-registry.selectorLabels" . | nindent 4 }} diff --git a/roles/container_registry/charts/container-registry/values.yaml b/roles/container_registry/charts/container-registry/values.yaml new file mode 100644 index 00000000..f0d22b37 --- /dev/null +++ b/roles/container_registry/charts/container-registry/values.yaml @@ -0,0 +1,17 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +nameOverride: "" +fullnameOverride: "" diff --git a/roles/container_registry/defaults/main.yml b/roles/container_registry/defaults/main.yml index 389f3a08..92fbe159 100644 --- a/roles/container_registry/defaults/main.yml +++ b/roles/container_registry/defaults/main.yml @@ -14,21 +14,20 @@ ## limitations under the License. ## --- -registry_secret_name: "container-registry-tls" registry_namespace: "kube-system" -registry_proxy: 5043 -registry_port: 5000 -registry_addr: 127.0.0.1 +registry_nodeport: "30500" +registry_user: docker +# user can provide own password through group_vars +registry_password: -registry_image: "docker.io/library/registry:2.8.1" -nginx_image: "docker.io/library/nginx:1.23.3-alpine" +registry_size: 10Gi -nginx_ssl_ciphers: - "AES128-CCM-SHA256:CHACHA20-POLY1305-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE\ - -ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256" -nginx_ssl_protocols: "TLSv1.2 TLSv1.3" +registry_image: "docker.io/library/registry" +registry_version: 2.8.1 +registry_nginx_image: "docker.io/library/nginx" +registry_nginx_version: 1.24.0 +docker_pip_pkg_version: 6.0.0 -release_name: container-registry - -registry_auth: "/var/lib/kubelet/config.json" -registry_auth_env: "REGISTRY_AUTH_FILE=/var/lib/kubelet/config.json" +registry_tls_secret_name: container-registry-tls +registry_storage_dir: /var/lib/registry +registry_auth_path: /var/lib/kubelet/config.json diff --git a/roles/container_registry/files/probe.sh b/roles/container_registry/files/probe.sh deleted file mode 100755 index 5fea326d..00000000 --- a/roles/container_registry/files/probe.sh +++ /dev/null @@ -1,3 +0,0 @@ -#!/bin/bash - -wget --server-response "$1" 2>&1 | awk '/^ HTTP/{print $2}' diff --git a/roles/container_registry/files/pv.yml b/roles/container_registry/files/pv.yml deleted file mode 100644 index 1cf76005..00000000 --- a/roles/container_registry/files/pv.yml +++ /dev/null @@ -1,13 +0,0 @@ ---- -apiVersion: v1 -kind: PersistentVolume -metadata: - name: container-registry -spec: - storageClassName: manual - capacity: - storage: 10Gi - accessModes: - - ReadWriteOnce - hostPath: - path: "/var/lib/registry" diff --git a/roles/container_registry/files/pvc.yml b/roles/container_registry/files/pvc.yml deleted file mode 100644 index f6c263fc..00000000 --- a/roles/container_registry/files/pvc.yml +++ /dev/null @@ -1,12 +0,0 @@ -kind: PersistentVolumeClaim -apiVersion: v1 -metadata: - name: container-registry - namespace: kube-system -spec: - accessModes: - - ReadWriteOnce - resources: - requests: - storage: 10Gi - storageClassName: "manual" diff --git a/roles/container_registry/meta/main.yml b/roles/container_registry/meta/main.yml new file mode 100644 index 00000000..8abed973 --- /dev/null +++ b/roles/container_registry/meta/main.yml @@ -0,0 +1,18 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +dependencies: + - role: container_registry_shared_vars diff --git a/roles/container_registry/tasks/cleanup.yml b/roles/container_registry/tasks/cleanup.yml new file mode 100644 index 00000000..cd8c04b0 --- /dev/null +++ b/roles/container_registry/tasks/cleanup.yml @@ -0,0 +1,73 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +- name: Delete container registry deployment + kubernetes.core.helm: + release_name: "{{ registry_release_name }}" + release_namespace: "{{ registry_namespace }}" + state: absent + when: inventory_hostname == groups['kube_control_plane'][0] + +- name: Delete container registry persistent volume + kubernetes.core.k8s: + namespace: "{{ registry_namespace }}" + name: "{{ registry_pv_name }}" + kind: "PersistentVolume" + state: absent + when: inventory_hostname == groups['kube_control_plane'][0] + +- name: Delete container registry persistent volume claim + kubernetes.core.k8s: + namespace: "{{ registry_namespace }}" + name: "{{ registry_pvc_name }}" + kind: "PersistentVolumeClaim" + state: absent + when: inventory_hostname == groups['kube_control_plane'][0] + +- name: Delete container registry CSR + kubernetes.core.k8s: + namespace: "{{ registry_namespace }}" + name: "{{ registry_csr_name }}" + kind: "CertificateSigningRequest" + state: absent + when: inventory_hostname == groups['kube_control_plane'][0] + +- name: Delete container registry tls secret + kubernetes.core.k8s: + name: "{{ registry_tls_secret_name }}" + namespace: "{{ registry_namespace }}" + kind: "Secret" + state: absent + when: inventory_hostname == groups['kube_control_plane'][0] + +- name: Delete container registry htpasswd secret + kubernetes.core.k8s: + name: "{{ registry_htpasswd_secret_name }}" + namespace: "{{ registry_namespace }}" + kind: "Secret" + state: absent + when: inventory_hostname == groups['kube_control_plane'][0] + +- name: Delete container registry PV directory + ansible.builtin.file: + path: "{{ registry_storage_dir }}" + state: absent + when: inventory_hostname == groups['kube_control_plane'][0] + +- name: Delete container registry files + ansible.builtin.file: + path: "{{ registry_root_dir }}" + state: absent + when: inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/container_registry/tasks/main.yml b/roles/container_registry/tasks/main.yml index b4284024..ab7ec99c 100644 --- a/roles/container_registry/tasks/main.yml +++ b/roles/container_registry/tasks/main.yml @@ -14,165 +14,179 @@ ## limitations under the License. ## --- -- name: install dependencies - include_role: +- name: Install dependencies + ansible.builtin.include_role: name: install_dependencies when: inventory_hostname == groups['kube_control_plane'][0] -- name: wait for kube-apiserver to be up - uri: +- name: Wait for kube-apiserver to be up + ansible.builtin.uri: url: "https://127.0.0.1:6443/healthz" - client_cert: "/etc/kubernetes/ssl/ca.crt" - client_key: "/etc/kubernetes/ssl/ca.key" - validate_certs: no - register: result - until: result.status == 200 + client_cert: "{{ kube_apiserver_cert }}" + client_key: "{{ kube_apiserver_key }}" + validate_certs: false + register: kube_api + until: kube_api.status == 200 retries: 15 delay: 5 when: inventory_hostname == groups['kube_control_plane'][0] -- name: generate server and client certificates - include_tasks: tls.yml - -- name: load user-provided registry password or generate a random one - set_fact: - password: "{{ registry_password | default(lookup('password', 'registry_htpasswd')) }}" - -- name: clean up temporary files - file: path=registry_htpasswd state=absent - delegate_to: localhost - become: false - run_once: yes - when: inventory_hostname == groups['kube_control_plane'][0] - -- name: generate htpasswd file - command: htpasswd -Bbn docker {{ password }} - register: htpasswd - when: inventory_hostname == groups['kube_control_plane'][0] - -- name: create container registry directory - file: - path: "{{ container_registry_path }}" +- name: Create container registry directory + ansible.builtin.file: + path: "{{ registry_root_dir }}" state: directory mode: 0755 when: inventory_hostname == groups['kube_control_plane'][0] -- name: create persistent volume - k8s: - state: present - definition: "{{ lookup('file', '../files/pv.yml') | from_yaml }}" - when: inventory_hostname == groups['kube_control_plane'][0] +- name: Generate server and client certificates + ansible.builtin.include_tasks: tls.yml -- name: create persistent volume claim - k8s: - state: present - definition: "{{ lookup('file', '../files/pvc.yml') | from_yaml }}" - when: inventory_hostname == groups['kube_control_plane'][0] - -- name: copy probe.sh - copy: - src: probe.sh - dest: /etc/probe.sh - owner: root - group: root - mode: u+rwx,g-rwx,o- - when: inventory_hostname == groups['kube_control_plane'][0] +- name: Generate registry password if not provided + ansible.builtin.set_fact: + registry_password: "{{ lookup('ansible.builtin.password', '/dev/null') }}" + no_log: true + when: + - (registry_password is not defined) or (not registry_password) + run_once: true -- name: template container-registry - template: - src: "{{ item }}" - dest: "{{ container_registry_path }}/{{ item | basename | regex_replace('.j2','') }}" - owner: root - group: root - mode: u+rwx,g-rwx,o- - with_fileglob: - - ../templates/container-registry/*.j2 +- name: Prepare registry htpasswd configuration when: inventory_hostname == groups['kube_control_plane'][0] - -- name: delete old container registry - command: "kubectl delete -f {{ container_registry_path }} --namespace {{ registry_namespace }}" - register: delete_result - changed_when: delete_result is not failed - failed_when: false + block: + - name: Generate htpasswd contents + ansible.builtin.command: >- + htpasswd -Bbn {{ registry_user }} {{ registry_password }} + changed_when: true + register: registry_htpasswd + no_log: true + + - name: Crate htpasswd secret + kubernetes.core.k8s: + state: present + template: registry_htpasswd.yaml.j2 + wait: true + no_log: true + + +- name: Prepare registry storage when: inventory_hostname == groups['kube_control_plane'][0] - -- name: install container registry application - command: "kubectl apply -f {{ container_registry_path }} --namespace {{ registry_namespace }}" - changed_when: true + block: + - name: Create registry storage directory + ansible.builtin.file: + path: "{{ registry_storage_dir }}" + owner: '1000' + group: root + mode: '0700' + state: directory + + - name: Create registry persistent volume + kubernetes.core.k8s: + state: present + template: registry_pv.yaml.j2 + wait: true + + - name: Create registry persistent volume claim + kubernetes.core.k8s: + state: present + template: registry_pvc.yaml.j2 + wait: true + + +- name: Copy container-registry helm chart + copy: + src: "{{ (role_path, 'charts', 'container-registry/') | path_join }}" + dest: "{{ registry_chart_dir }}" + mode: 0644 when: inventory_hostname == groups['kube_control_plane'][0] -- name: clean up any certs/key/CSR files - file: path=/etc/ssl/registry state=absent +- name: Template Chart appVersion + replace: + path: "{{ (registry_chart_dir, 'Chart.yaml') | path_join }}" + regexp: "" + replace: "{{ registry_version }}" when: inventory_hostname == groups['kube_control_plane'][0] - failed_when: false - become: yes -- name: create registry storage directory - file: - path: /var/lib/registry - owner: '1000' - group: root - mode: '0700' - state: directory +- name: Template container registry helm values + ansible.builtin.template: + src: values_registry.yaml.j2 + dest: "{{ (registry_chart_dir, 'values_registry.yaml') | path_join }}" + force: yes + mode: preserve when: inventory_hostname == groups['kube_control_plane'][0] -- name: check nodes status - block: - - name: check nodes status - # noqa command-instead-of-shell - shell is used intentionally here - shell: kubectl get nodes - register: status - changed_when: false - - debug: - var: status.stdout_lines - failed_when: status.stdout is search ("NotReady") or status.stdout is search ("Unknown") +- name: Deploy container registry + kubernetes.core.helm: + chart_ref: "{{ registry_chart_dir }}" + release_name: "{{ registry_release_name }}" + release_namespace: "{{ registry_namespace }}" + values_files: "{{ (registry_chart_dir, 'values_registry.yaml') | path_join }}" + create_namespace: true + force: true + wait: true + timeout: 10m0s when: inventory_hostname == groups['kube_control_plane'][0] -- name: wait for container registry to come up on all nodes - uri: +- name: Wait for container registry to be reachable by all nodes + ansible.builtin.uri: url: "https://{{ registry_local_address }}" - validate_certs: no - user: docker - password: "{{ password }}" + user: "{{ registry_user }}" + password: "{{ registry_password }}" method: GET - force_basic_auth: yes + ca_path: /etc/kubernetes/ssl/ca.crt + force_basic_auth: true + force: true register: result until: result.status == 200 - retries: 600 - delay: 1 - -- name: install Python docker module - pip: - name: - - docker==4.3.1 - -- name: grant access to the registry to Docker on all nodes - docker_login: - username: docker - password: "{{ password }}" - registry_url: "{{ registry_local_address }}" - tls_hostname: "{{ hostvars[inventory_hostname]['ansible_hostname'] }}" - validate_certs: yes - when: container_runtime == "docker" - -- name: copy auth file - copy: + retries: 20 + delay: 5 + +- name: Check docker config file exists + ansible.builtin.stat: + path: "{{ ansible_env.HOME }}/.docker/config.json" + register: docker_conf_stat + +- name: Fetch existing docker config + ansible.builtin.slurp: + src: "{{ ansible_env.HOME }}/.docker/config.json" + register: docker_conf_content + no_log: true + when: docker_conf_stat.stat.exists + +- name: Add auth to docker config + vars: + original_conf: |- + {%- if docker_conf_stat.stat.exists -%} + {{ (docker_conf_content.content | b64decode | from_json) }} + {%- else -%} + {} + {%- endif -%} + auth_section: "{{ lookup('template', 'docker_auth.json.j2') }}" + ansible.builtin.copy: + content: "{{ original_conf | combine(auth_section) | to_nice_json(indent=8) }}" + dest: "{{ ansible_env.HOME }}/.docker/config.json" + owner: "{{ ansible_user | default(ansible_user_id) }}" + group: "{{ ansible_user | default(ansible_user_id) }}" + mode: 0640 + no_log: true + +- name: Copy config file to Kubelet config dir + ansible.builtin.copy: src: "{{ ansible_env.HOME }}/.docker/config.json" - dest: /var/lib/kubelet/config.json - remote_src: yes - mode: '0755' - when: container_runtime == "docker" - -- name: grant access to the registry - command: podman login --authfile="{{ registry_auth }}" -u docker -p "{{ password }}" "{{ registry_local_address }}" - changed_when: false - when: '"docker" not in container_runtime' - -- name: add registry environment variable to /etc/environment - lineinfile: + dest: "{{ registry_auth_path }}" + remote_src: true + owner: root + group: root + mode: 0640 + +- name: Add registry environment variable to /etc/environment + ansible.builtin.lineinfile: path: /etc/environment line: "{{ registry_auth_env }}" owner: root group: root - mode: '0644' - when: '"docker" not in container_runtime' + mode: 0644 + +- name: Setup registry config file for rke2 provisioner + ansible.builtin.include_role: + name: rke2_defaults + tasks_from: rke2_registries + when: kube_provisioner == 'rke2' diff --git a/roles/container_registry/tasks/tls.yml b/roles/container_registry/tasks/tls.yml index 35e88b06..11339b01 100644 --- a/roles/container_registry/tasks/tls.yml +++ b/roles/container_registry/tasks/tls.yml @@ -14,161 +14,140 @@ ## limitations under the License. ## --- -# server -- name: configure master node +- name: Generate Key and CSR + when: inventory_hostname == groups['kube_control_plane'][0] block: - - name: clean up any preexisting certs/key/CSR files - file: path=/etc/ssl/registry state=absent - failed_when: false - become: yes - - - name: create registry SSL directory - become: yes - file: - path: /etc/ssl/registry + - name: Create registry SSL directory + ansible.builtin.file: + path: "{{ registry_tls_dir }}" state: directory - mode: '0700' + mode: 0700 owner: root group: root + become: true - - name: delete any preexisting certs/key/CSR from Kubernetes - command: kubectl delete csr registry.{{ registry_namespace }} - changed_when: true - failed_when: false + - name: Populate registry CSR template + ansible.builtin.template: + src: "registry_csr_template.json.j2" + dest: "{{ (registry_tls_dir, 'registry-csr.json') | path_join }}" + mode: 0600 + owner: root + group: root + become: true - - name: delete any preexisting secrets from Kubernetes - command: kubectl delete secret -n {{ registry_namespace }} {{ registry_secret_name }} - changed_when: true - failed_when: false - - - name: populate registry CSR template - template: - src: "registry_csr.json.j2" - dest: "/etc/ssl/registry/registry-csr.json" - force: yes - mode: preserve - become: yes - - - name: get GOPATH - command: /usr/local/go/bin/go env GOPATH + - name: Get GOPATH + ansible.builtin.command: >- + go env GOPATH register: gopath changed_when: false - - name: generate key and CSR - shell: >- - set -o pipefail && {{ gopath.stdout }}/bin/cfssl genkey registry-csr.json | {{ gopath.stdout }}/bin/cfssljson -bare registry + - name: Generate key and CSR using cfssl + ansible.builtin.shell: + cmd: >- + set -o pipefail && + {{ gopath.stdout }}/bin/cfssl genkey registry-csr.json | + {{ gopath.stdout }}/bin/cfssljson -bare registry + creates: registry-key.pem + chdir: "{{ registry_tls_dir }}" args: - chdir: "/etc/ssl/registry/" executable: /bin/bash - changed_when: true - become: yes - - - name: read generated key - command: cat registry-key.pem - args: - chdir: "/etc/ssl/registry/" - changed_when: false - register: key - - - name: load generated key - set_fact: - registry_key: "{{ key.stdout }}" - - - name: read generated csr - command: cat registry.csr - args: - chdir: "/etc/ssl/registry/" - changed_when: false - register: csr + become: true - - name: load generated csr - set_fact: - registry_csr: "{{ csr.stdout | b64encode }}" + - name: Read generated key + ansible.builtin.slurp: + src: "{{ (registry_tls_dir, 'registry-key.pem') | path_join }}" + register: generated_key + no_log: true - - name: populate registry Kubernetes CA CSR template - template: - src: "kube_registry_csr.yml.j2" - dest: "/etc/ssl/registry/kube-registry-csr.yml" - force: yes - mode: preserve + - name: Read generated csr + ansible.builtin.slurp: + src: "{{ (registry_tls_dir, 'registry.csr') | path_join }}" + register: generated_csr - - name: send CSR to the Kubernetes API Server - command: kubectl apply -f /etc/ssl/registry/kube-registry-csr.yml - changed_when: true + - name: Load generated key & csr + ansible.builtin.set_fact: + registry_key_base64: "{{ generated_key.content }}" + registry_csr_base64: "{{ generated_csr.content }}" + no_log: true - - name: approve request - command: kubectl certificate approve registry.kube-system - changed_when: true - - name: get approved certificate - shell: kubectl get csr registry.kube-system -o jsonpath='{.status.certificate}' - args: - chdir: "/etc/ssl/registry" - changed_when: false - register: cert - retries: 30 - delay: 1 - until: cert.stdout | length > 0 - - - name: load generated cert - set_fact: - registry_cert: "{{ cert.stdout | b64decode }}" - - - name: create TLS secret for registry - command: >- - kubectl create -n {{ registry_namespace }} secret generic {{ registry_secret_name }} - --from-literal=tls.crt='{{ registry_cert }}' - --from-literal=tls.key='{{ registry_key }}' +- name: Get certificate signed by kubernetes + when: inventory_hostname == groups['kube_control_plane'][0] + block: + - name: Send CSR to the Kubernetes API Server + kubernetes.core.k8s: + template: registry_csr.yaml.j2 + state: present + register: csr_signing + + - name: Approve CSR to sign certificate + ansible.builtin.command: >- + kubectl -n {{ registry_namespace }} certificate approve {{ registry_csr_name }} changed_when: true + when: csr_signing.changed + + - name: Get signed certificate + kubernetes.core.k8s_info: + kind: CertificateSigningRequest + name: "{{ registry_csr_name }}" + namespace: "{{ registry_namespace }}" + register: registry_csr + retries: 10 + delay: 3 + until: | + registry_csr.resources | length() != 0 and + registry_csr.resources[0].status is defined and + registry_csr.resources[0].status.certificate is defined + no_log: true + + - name: Load signed cert + ansible.builtin.set_fact: + registry_cert_base64: "{{ registry_csr.resources[0].status.certificate }}" + + +- name: Create TLS secret for registry + kubernetes.core.k8s: + template: registry_tls_secret.yaml.j2 + state: present + no_log: true + when: inventory_hostname == groups['kube_control_plane'][0] - - name: clean up - file: path=/etc/ssl/registry state=absent - failed_when: false - become: yes +- name: Clean up generated files from system + ansible.builtin.file: + path: "{{ registry_tls_dir }}" + state: absent + become: true when: inventory_hostname == groups['kube_control_plane'][0] # copy CA file so that registry clients can validate its certificate -- name: copy Kubernetes CA so that registry client can validate registry's certificate - become: yes +- name: Copy Kubernetes CA so that registry client can validate registry's certificate + become: true + vars: + certs_path: |- + {%- if container_runtime == "docker" -%} + /etc/docker/certs.d/ + {%- else -%} + /etc/containers/certs.d/ + {%- endif -%} block: - - name: remove existing certs and keys - file: path="/etc/docker/certs.d/{{ registry_local_address }}" state=absent - - name: ensure that path exists - file: - path: "/etc/docker/certs.d/{{ registry_local_address }}" + - name: Remove existing certs and keys + ansible.builtin.file: + path: "{{ (certs_path, registry_local_address) | path_join }}" + state: absent + + - name: Ensure that path exists + ansible.builtin.file: + path: "{{ (certs_path, registry_local_address) | path_join }}" mode: '0700' owner: root group: root state: directory - - name: place Kubernetes CA in the /etc/docker/certs.d - copy: - src: /etc/kubernetes/ssl/ca.crt - dest: "/etc/docker/certs.d/{{ registry_local_address }}/ca.crt" - remote_src: yes - mode: '0600' - owner: root - group: root - when: container_runtime == "docker" -# copy CA file so that registry clients can validate its certificate -- name: copy Kubernetes CA so that registry client can validate registry's certificate - become: yes - block: - - name: remove existing certs and keys - file: path="/etc/containers/certs.d/{{ registry_local_address }}ca.crt" state=absent - - name: ensure that path exists - file: - path: "/etc/containers/certs.d/{{ registry_local_address }}" - mode: '0700' - owner: root - group: root - state: directory - - name: place Kubernetes CA in the /etc/containers/certs.d/ - copy: + - name: Place Kubernetes CA in the /etc/docker/certs.d + ansible.builtin.copy: src: /etc/kubernetes/ssl/ca.crt - dest: "/etc/containers/certs.d/{{ registry_local_address }}/ca.crt" - remote_src: yes + dest: "{{ (certs_path, registry_local_address, 'ca.crt') | path_join }}" + remote_src: true mode: '0600' owner: root group: root - when: '"docker" not in container_runtime' diff --git a/roles/container_registry/templates/container-registry/configmap.yaml.j2 b/roles/container_registry/templates/container-registry/configmap.yaml.j2 deleted file mode 100644 index 5030d376..00000000 --- a/roles/container_registry/templates/container-registry/configmap.yaml.j2 +++ /dev/null @@ -1,30 +0,0 @@ ---- -# Source: container-registry/templates/configmap.yaml -apiVersion: v1 -kind: ConfigMap -metadata: - name: {{ release_name }}-config - labels: - app: container-registry - chart: container-registry-1.9.4 - heritage: Helm - release: {{ release_name }} -data: - config.yml: |- - health: - storagedriver: - enabled: true - interval: 10s - threshold: 3 - http: - addr: {{ registry_addr }}:{{ registry_port }} - headers: - X-Content-Type-Options: - - nosniff - log: - fields: - service: registry - storage: - cache: - blobdescriptor: inmemory - version: 0.1 diff --git a/roles/container_registry/templates/container-registry/secret.yaml.j2 b/roles/container_registry/templates/container-registry/secret.yaml.j2 deleted file mode 100644 index b4271577..00000000 --- a/roles/container_registry/templates/container-registry/secret.yaml.j2 +++ /dev/null @@ -1,12 +0,0 @@ ---- -# Source: container-registry/templates/secret.yaml -apiVersion: v1 -kind: Secret -metadata: - name: {{ release_name }}-secret - labels: - app: container-registry - release: {{ release_name }} -type: Opaque -data: - haSharedSecret: {{ htpasswd.stdout_lines[0] | b64encode }} diff --git a/roles/container_registry/templates/container-registry/service.yaml.j2 b/roles/container_registry/templates/container-registry/service.yaml.j2 deleted file mode 100644 index 5643c6fe..00000000 --- a/roles/container_registry/templates/container-registry/service.yaml.j2 +++ /dev/null @@ -1,20 +0,0 @@ ---- -# Source: container-registry/templates/service.yaml -apiVersion: v1 -kind: Service -metadata: - name: {{release_name }} - labels: - app: container-registry - release: {{ release_name }} -spec: - type: NodePort - ports: - - port: {{ registry_proxy }} - protocol: TCP - name: nginx-https - targetPort: nginx-https - nodePort: {{ registry_nodeport }} - selector: - app: container-registry - release: {{ release_name }} diff --git a/roles/container_registry/templates/docker_auth.json.j2 b/roles/container_registry/templates/docker_auth.json.j2 new file mode 100644 index 00000000..d4ea40ad --- /dev/null +++ b/roles/container_registry/templates/docker_auth.json.j2 @@ -0,0 +1,12 @@ +{ + "auths": { +{% if intel_sriov_fec_operator_enabled | default(false) and container_runtime == "containerd" %} + "registry.redhat.io": { + "auth": "{{ (redhat_user + ':' + redhat_password) | b64encode }}" + }, +{% endif %} + "{{ registry_local_address }}": { + "auth": "{{ (registry_user + ':' + registry_password) | b64encode }}" + } + } +} diff --git a/roles/container_registry/templates/kube_registry_csr.yml.j2 b/roles/container_registry/templates/registry_csr.yaml.j2 similarity index 62% rename from roles/container_registry/templates/kube_registry_csr.yml.j2 rename to roles/container_registry/templates/registry_csr.yaml.j2 index 19fc83ed..0171ca4c 100644 --- a/roles/container_registry/templates/kube_registry_csr.yml.j2 +++ b/roles/container_registry/templates/registry_csr.yaml.j2 @@ -1,10 +1,13 @@ +--- apiVersion: certificates.k8s.io/v1 kind: CertificateSigningRequest metadata: - name: registry.{{ registry_namespace }} + name: {{ registry_csr_name }} + labels: + app: {{ registry_release_name }} spec: signerName: kubernetes.io/kubelet-serving - request: {{ registry_csr }} + request: {{ registry_csr_base64 }} usages: - digital signature - key encipherment diff --git a/roles/container_registry/templates/registry_csr.json.j2 b/roles/container_registry/templates/registry_csr_template.json.j2 similarity index 91% rename from roles/container_registry/templates/registry_csr.json.j2 rename to roles/container_registry/templates/registry_csr_template.json.j2 index 29ea1401..0342aa34 100644 --- a/roles/container_registry/templates/registry_csr.json.j2 +++ b/roles/container_registry/templates/registry_csr_template.json.j2 @@ -4,6 +4,7 @@ "container-registry.kube-system.svc", "container-registry.kube-system", "localhost", + "{{ groups['kube_control_plane'][0] }}", {% if hostvars[groups['kube_control_plane'][0]].adq_dp.enabled | default(false) %} "{{ hostvars[groups['kube_control_plane'][0]].adq_dp.interface_address }}", {% endif %} diff --git a/roles/container_registry/templates/registry_htpasswd.yaml.j2 b/roles/container_registry/templates/registry_htpasswd.yaml.j2 new file mode 100644 index 00000000..320588ee --- /dev/null +++ b/roles/container_registry/templates/registry_htpasswd.yaml.j2 @@ -0,0 +1,11 @@ +--- +apiVersion: v1 +kind: Secret +type: Opaque +metadata: + name: {{ registry_htpasswd_secret_name }} + namespace: {{ registry_namespace }} + labels: + app: {{ registry_release_name }} +stringData: + haSharedSecret: {{ registry_htpasswd.stdout_lines[0] }} diff --git a/roles/container_registry/templates/registry_pv.yaml.j2 b/roles/container_registry/templates/registry_pv.yaml.j2 new file mode 100644 index 00000000..29d13d40 --- /dev/null +++ b/roles/container_registry/templates/registry_pv.yaml.j2 @@ -0,0 +1,15 @@ +--- +apiVersion: v1 +kind: PersistentVolume +metadata: + name: {{ registry_pv_name }} + labels: + app: {{ registry_release_name }} +spec: + storageClassName: manual + capacity: + storage: {{ registry_size }} + accessModes: + - ReadWriteOnce + hostPath: + path: {{ registry_storage_dir }} diff --git a/roles/container_registry/templates/registry_pvc.yaml.j2 b/roles/container_registry/templates/registry_pvc.yaml.j2 new file mode 100644 index 00000000..c337f998 --- /dev/null +++ b/roles/container_registry/templates/registry_pvc.yaml.j2 @@ -0,0 +1,15 @@ +--- +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: {{ registry_pvc_name }} + namespace: {{ registry_namespace }} + labels: + app: {{ registry_release_name }} +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: {{ registry_size }} + storageClassName: manual diff --git a/roles/container_registry/templates/registry_tls_secret.yaml.j2 b/roles/container_registry/templates/registry_tls_secret.yaml.j2 new file mode 100644 index 00000000..a4b95048 --- /dev/null +++ b/roles/container_registry/templates/registry_tls_secret.yaml.j2 @@ -0,0 +1,12 @@ +--- +apiVersion: v1 +kind: Secret +type: Opaque +metadata: + name: {{ registry_tls_secret_name }} + namespace: {{ registry_namespace }} + labels: + app: {{ registry_release_name }} +data: + tls.crt: '{{ registry_cert_base64 }}' + tls.key: '{{ registry_key_base64 }}' diff --git a/roles/container_registry/templates/values_registry.yaml.j2 b/roles/container_registry/templates/values_registry.yaml.j2 new file mode 100644 index 00000000..70998c92 --- /dev/null +++ b/roles/container_registry/templates/values_registry.yaml.j2 @@ -0,0 +1,25 @@ +service: + type: NodePort + node_port: {{ registry_nodeport }} + +registry: + image: {{ registry_image }} + tag: {{ registry_version }} + listen_addr: 127.0.0.1 + port: 5000 + +nginx: + image: {{ registry_nginx_image }} + tag: {{ registry_nginx_version }} + ssl_ciphers: {{ registry_nginx_ssl_ciphers }} + ssl_protocols: {{ registry_nginx_ssl_protocols }} + port: 5001 + +node_name: {{ hostvars[groups['kube_control_plane'][0]]['ansible_hostname'] }} + +storage: + pvc: {{ registry_pvc_name }} + +secrets: + tls: {{ registry_tls_secret_name }} + htpasswd: {{ registry_htpasswd_secret_name }} diff --git a/roles/container_registry/vars/main.yml b/roles/container_registry/vars/main.yml index 33dd974a..70641e09 100644 --- a/roles/container_registry/vars/main.yml +++ b/roles/container_registry/vars/main.yml @@ -14,7 +14,22 @@ ## limitations under the License. ## --- -container_registry_path: "{{ (project_root_dir, 'container_registry') | path_join }}" +registry_root_dir: "{{ (project_root_dir, 'container_registry') | path_join }}" +registry_tls_dir: "{{ (registry_root_dir, 'ssl') | path_join }}" +registry_chart_dir: "{{ (project_root_dir, 'charts', 'container-registry') | path_join }}" + +registry_release_name: "container-registry" +registry_htpasswd_secret_name: "{{ registry_release_name }}-htpasswd" +registry_pv_name: "{{ registry_release_name }}-pv" +registry_pvc_name: "{{ registry_release_name }}-pvc" +registry_csr_name: "{{ registry_release_name }}-csr" + +registry_auth_env: "REGISTRY_AUTH_FILE={{ registry_auth_path }}" + +registry_nginx_ssl_ciphers: + "AES128-CCM-SHA256:CHACHA20-POLY1305-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE\ + -ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256" +registry_nginx_ssl_protocols: "TLSv1.2 TLSv1.3" install_dependencies: Debian: diff --git a/roles/container_registry_shared_vars/defaults/main.yml b/roles/container_registry_shared_vars/defaults/main.yml new file mode 100644 index 00000000..749ef788 --- /dev/null +++ b/roles/container_registry_shared_vars/defaults/main.yml @@ -0,0 +1,18 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +registry_namespace: kube-system +registry_tls_secret_name: container-registry-tls +registry_auth_path: "/var/lib/kubelet/config.json" diff --git a/roles/container_registry_shared_vars/tasks/main.yml b/roles/container_registry_shared_vars/tasks/main.yml new file mode 100644 index 00000000..60d4a20e --- /dev/null +++ b/roles/container_registry_shared_vars/tasks/main.yml @@ -0,0 +1,18 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +- name: Configure container registry shared variables + debug: + msg: "Check roles/container_registry_shared_vars/defaults/main.yml" diff --git a/roles/elasticsearch_install/files/elasticsearch_certs.yml b/roles/elasticsearch_install/files/elasticsearch_certs.yml index b816dbae..8443f084 100644 --- a/roles/elasticsearch_install/files/elasticsearch_certs.yml +++ b/roles/elasticsearch_install/files/elasticsearch_certs.yml @@ -45,6 +45,11 @@ spec: - elasticsearch-master.monitoring - elasticsearch-master - elasticsearch + - kibana-kibana.monitoring.svc.cluster.local + - kibana-kibana.monitoring.svc + - kibana-kibana.monitoring + - kibana-kibana + - kibana isCA: false privateKey: algorithm: RSA diff --git a/roles/elasticsearch_install/tasks/main.yml b/roles/elasticsearch_install/tasks/main.yml index b993c518..472b186f 100644 --- a/roles/elasticsearch_install/tasks/main.yml +++ b/roles/elasticsearch_install/tasks/main.yml @@ -13,7 +13,10 @@ ## See the License for the specific language governing permissions and ## limitations under the License. ## -- block: +- name: Deploy elasticsearch + when: + - inventory_hostname == groups['kube_control_plane'][0] + block: - name: create storage directory ansible.builtin.file: path: "/etc/elasticsearch" @@ -101,4 +104,3 @@ values_files: "{{ (project_root_dir, 'elasticsearch', 'elasticsearch_values.yml') | path_join }}" wait: true timeout: 15m0s - when: inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/bootstrap/install_gpu_driver/tasks/set_oem_kernel_version.yml b/roles/ffmpeg_install/defaults/main.yml similarity index 59% rename from roles/bootstrap/install_gpu_driver/tasks/set_oem_kernel_version.yml rename to roles/ffmpeg_install/defaults/main.yml index 3124a004..938fff2c 100644 --- a/roles/bootstrap/install_gpu_driver/tasks/set_oem_kernel_version.yml +++ b/roles/ffmpeg_install/defaults/main.yml @@ -14,14 +14,11 @@ ## limitations under the License. ## --- +ffmpeg_path: "{{ (project_root_dir, 'ffmpeg') | path_join }}" +ffmpeg_patch_path: "{{ (ffmpeg_path, 'ffmpeg_patch') | path_join }}" -# Set GPU kernel version, reload variables anytime via "include_role" when referring to gpu_oem_kernel_version -- name: set kernel version for Ubuntu 22.04 - set_fact: - gpu_oem_kernel_version: "5.15.0-48-generic" - when: ansible_distribution_version == "22.04" - -- name: set kernel version for Ubuntu 20.04 - set_fact: - gpu_oem_kernel_version: "5.14.0-1047-oem" - when: ansible_distribution_version == "20.04" +# Define the FFmpeg version using the release tag or commit hash. If both are used, the commit hash is used. +# ffmpeg_version: "n4.2.9" +ffmpeg_commit_hash: "c3a7999" +ffmpeg_git_url: "https://github.com/FFmpeg/FFmpeg.git" +ffmpeg_configure_options: "--enable-shared --enable-vaapi --enable-libvpl" diff --git a/roles/ffmpeg_install/tasks/ffmpeg_archive_patch.yml b/roles/ffmpeg_install/tasks/ffmpeg_archive_patch.yml new file mode 100644 index 00000000..3f3a8842 --- /dev/null +++ b/roles/ffmpeg_install/tasks/ffmpeg_archive_patch.yml @@ -0,0 +1,77 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: create patch directory + ansible.builtin.file: + path: '{{ ffmpeg_patch_path }}' + state: directory + mode: 0755 + +- name: download patch archive + ansible.builtin.get_url: + dest: '{{ ffmpeg_path }}' + sha256sum: '{{ patch_item.sha256 }}' + url: "{{ patch_item.url }}" + mode: '0640' + register: patch_download + retries: "{{ number_of_retries | default(5) }}" + until: patch_download is succeeded + delay: "{{ retry_delay | default(3) }}" + +- name: download and extract FFmpeg patches archive file to machine + ansible.builtin.unarchive: + src: '{{ patch_download.dest }}' + dest: '{{ ffmpeg_patch_path }}' + remote_src: true + list_files: true + register: tar_out + +- name: get patch files + block: + - name: find all patch files + ansible.builtin.find: + paths: "{{ (ffmpeg_patch_path, tar_out.files[0], patch_item.subdirectory) | path_join }}" + register: patch_file_list + - name: sort patch files by Name + set_fact: + files_list: "{{ (patch_file_list.files | map(attribute='path') | list | sort) }}" + +- name: patch FFmpeg sources + ansible.posix.patch: + src: "{{ item }}" + basedir: "{{ (ffmpeg_path, 'ffmpeg_src') | path_join }}" + remote_src: true + strip: 1 + with_items: "{{ files_list }}" + when: + - patch_item.apply_all_patches + +- name: patch FFmpeg sources + ansible.posix.patch: + src: "{{ (ffmpeg_patch_path, tar_out.files[0], patch_item.subdirectory, item) | path_join }}" + basedir: "{{ (ffmpeg_path, 'ffmpeg_src') | path_join }}" + remote_src: true + strip: 1 + with_items: "{{ patch_item.patches_to_apply }}" + when: + - ((patch_item.apply_all_patches is not defined) or + (not patch_item.apply_all_patches)) and + (patch_item.patches_to_apply is defined) + +- name: remove patch directory + ansible.builtin.file: + path: '{{ ffmpeg_patch_path }}' + state: absent diff --git a/roles/cndp_dp_install/tasks/add_cndp_labels.yml b/roles/ffmpeg_install/tasks/ffmpeg_cleanup.yml similarity index 71% rename from roles/cndp_dp_install/tasks/add_cndp_labels.yml rename to roles/ffmpeg_install/tasks/ffmpeg_cleanup.yml index 65b21d5e..ca88d15a 100644 --- a/roles/cndp_dp_install/tasks/add_cndp_labels.yml +++ b/roles/ffmpeg_install/tasks/ffmpeg_cleanup.yml @@ -14,8 +14,13 @@ ## limitations under the License. ## --- -- name: add labels for nodes with CNDP - command: kubectl label nodes {{ hostvars[node_name]['ansible_hostname'] }} cndp=true --overwrite +- block: + - name: delete FFmpeg folder + ansible.builtin.file: + path: "{{ (ffmpeg_path) | path_join }}" + state: absent when: - - cndp_dp_enabled | default(false) - - hostvars[node_name]['cndp_enabled'] | default(false) + - ffmpeg_install_enabled | default (false) + - inventory_hostname in groups['kube_node'] + tags: + - intel-ffmpeg diff --git a/roles/ffmpeg_install/tasks/ffmpeg_git_patch.yml b/roles/ffmpeg_install/tasks/ffmpeg_git_patch.yml new file mode 100644 index 00000000..4ad6695e --- /dev/null +++ b/roles/ffmpeg_install/tasks/ffmpeg_git_patch.yml @@ -0,0 +1,64 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: create patch directory + ansible.builtin.file: + path: ffmpeg_patch_path + state: directory + mode: 0755 + +- name: clone patch git repository + ansible.builtin.git: + repo: "{{ patch_item.url }}" + dest: "{{ ffmpeg_patch_path }}" + version: "{{ patch_item.git_tag }}" + +- name: get patch files + block: + - name: find all patch files + ansible.builtin.find: + paths: "{{ (ffmpeg_patch_path, patch_item.subdirectory) | path_join }}" + register: patch_file_list + - name: sort patch files by Name + ansible.builtin.set_fact: + files_list: "{{ (patch_file_list.files | map(attribute='path') | list | sort) }}" + +- name: patch FFmpeg sources + ansible.posix.patch: + src: "{{ item }}" + basedir: "{{ (ffmpeg_path, 'ffmpeg_src') | path_join }}" + remote_src: true + strip: 1 + with_items: "{{ files_list }}" + when: + - patch_item.apply_all_patches + +- name: patch FFmpeg sources + ansible.posix.patch: + src: "{{ (ffmpeg_patch_path, patch_item.subdirectory, item) | path_join }}" + basedir: "{{ (ffmpeg_path, 'ffmpeg_src') | path_join }}" + remote_src: true + strip: 1 + loop: "{{ patch_item.patches_to_apply }}" + when: + - (patch_item.apply_all_patches is not defined) or + (not patch_item.apply_all_patches) and + (patch_item.patches_to_apply is defined) + +- name: remove patch directory + ansible.builtin.file: + path: '{{ ffmpeg_patch_path }}' + state: absent diff --git a/roles/ffmpeg_install/tasks/ffmpeg_install.yml b/roles/ffmpeg_install/tasks/ffmpeg_install.yml new file mode 100644 index 00000000..ed4974fa --- /dev/null +++ b/roles/ffmpeg_install/tasks/ffmpeg_install.yml @@ -0,0 +1,76 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: create FFmpeg folder + ansible.builtin.file: + path: "{{ (ffmpeg_path) | path_join }}" + state: directory + mode: 0755 + +- name: install FFmpeg dependencies + include_role: + name: install_dependencies + +- name: clone FFmpeg git repository + ansible.builtin.git: + repo: "{{ ffmpeg_git_url }}" + dest: "{{ (ffmpeg_path, 'ffmpeg_src') | path_join }}" + version: "{{ ffmpeg_version | default(ffmpeg_commit_hash ) }}" + +- name: apply FFmpeg patches + block: + - name: apply FFmpeg patches from git repository + ansible.builtin.include_tasks: ffmpeg_git_patch.yml + when: patch_item.type == "git" + loop: "{{ ffmpeg_patches }}" + loop_control: + loop_var: patch_item + - name: apply FFmpeg patches from archive file + ansible.builtin.include_tasks: ffmpeg_archive_patch.yml + when: patch_item.type in ['zip', 'tar.gz'] + loop: "{{ ffmpeg_patches }}" + loop_control: + loop_var: patch_item + +- name: configure FFmpeg source + ansible.builtin.command: "{{ item }}" + changed_when: false + args: + chdir: "{{ (ffmpeg_path, 'ffmpeg_src') | path_join }}" + with_items: + - ./configure --prefix={{ ffmpeg_path }} --libdir=/usr/lib {{ ffmpeg_configure_options }} + +- name: get number of CPUs + ansible.builtin.command: nproc + register: proc_number + changed_when: false + +- name: build FFmpeg tool (1/2) + community.general.make: + chdir: "{{ (ffmpeg_path, 'ffmpeg_src') | path_join }}" + jobs: "{{ proc_number.stdout }}" + +- name: build FFmpeg tool (2/2) + community.general.make: + target: "{{ item }}" + chdir: "{{ (ffmpeg_path, 'ffmpeg_src') | path_join }}" + with_items: + - install + +- name: remove FFmpeg source directory + ansible.builtin.file: + path: "{{ (ffmpeg_path, 'ffmpeg_src') | path_join }}" + state: absent diff --git a/roles/ffmpeg_install/tasks/main.yml b/roles/ffmpeg_install/tasks/main.yml new file mode 100644 index 00000000..bcc0f6c1 --- /dev/null +++ b/roles/ffmpeg_install/tasks/main.yml @@ -0,0 +1,21 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: install FFmpeg + ansible.builtin.include_tasks: ffmpeg_install.yml + when: + - ffmpeg_install_enabled | default(false) + - inventory_hostname in groups['kube_node'] diff --git a/roles/ffmpeg_install/vars/main.yml b/roles/ffmpeg_install/vars/main.yml new file mode 100644 index 00000000..b86841ea --- /dev/null +++ b/roles/ffmpeg_install/vars/main.yml @@ -0,0 +1,75 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +install_dependencies: + Debian: + - autoconf + - automake + - build-essential + - cmake + - git-core + - libass-dev + - libfreetype6-dev + - libgnutls28-dev + - libmp3lame-dev + - libsdl2-dev + - libtool + - libva-dev + - libvdpau-dev + - libvorbis-dev + - libxcb1-dev + - libxcb-shm0-dev + - libxcb-xfixes0-dev + - meson + - ninja-build + - pkg-config + - texinfo + - yasm + - zlib1g-dev + - nasm + - libx264-dev + - libx265-dev + - libnuma-dev + - libvpx-dev + - libfdk-aac-dev + - libopus-dev + - libdav1d-dev + - libmfx-dev + - libvpl-dev + RedHat: + - autoconf + - automake + - bzip2 + - bzip2-devel + - cmake + - freetype-devel + - gcc + - gcc-c++ + - git + - libtool + - make + - pkgconfig + - zlib-devel + - nasm + - yasm-devel + - libx264-devel + - libx265-devel + - fdk-aac-free + - lame-devel + - opus + - libvpx6-devel + - libmfx-devel + - libvpl-devel diff --git a/roles/gpu_dp_install/tasks/main.yml b/roles/gpu_dp_install/tasks/main.yml index 06c9bf4c..a138ee48 100644 --- a/roles/gpu_dp_install/tasks/main.yml +++ b/roles/gpu_dp_install/tasks/main.yml @@ -79,19 +79,12 @@ intel_gpu_dp_init_image: "{{ registry_local_address }}/intel-gpu-initcontainer:{{ intel_dp_version }}" when: gpu_dp_build_image_locally - - name: Set GPU oem kernel version based on OS version - include_role: - name: bootstrap/install_gpu_driver # noqa role-name[path] - role in bootstrap - tasks_from: set_oem_kernel_version - - name: populate Intel GPU Device Plugin yaml file and push to controller node template: src: "intel-gpu-plugin.yml.j2" dest: "{{ (intel_dp_templates_dir, 'intel-gpu-plugin.yml') | path_join }}" force: yes mode: preserve - # vars: - # target_kernel_version: "{{ gpu_oem_kernel_version }}" - name: deploy Intel GPU Device Plugin with the Intel Device Plugin Operator k8s: diff --git a/roles/gpu_dp_install/tasks/preflight_gpu_dp.yml b/roles/gpu_dp_install/tasks/preflight_gpu_dp.yml index 03349d42..a865e785 100644 --- a/roles/gpu_dp_install/tasks/preflight_gpu_dp.yml +++ b/roles/gpu_dp_install/tasks/preflight_gpu_dp.yml @@ -69,8 +69,3 @@ when: " item | regex_search('8086:[0-9a-zA-Z]{4}') != none " when: - configure_gpu is defined and configure_gpu - -- name: Set GPU oem kernel version based on OS version - include_role: - name: bootstrap/install_gpu_driver # noqa role-name[path] - role in bootstrap - tasks_from: set_oem_kernel_version diff --git a/roles/gpu_dp_install/templates/intel-gpu-plugin.yml.j2 b/roles/gpu_dp_install/templates/intel-gpu-plugin.yml.j2 index edc1febb..df0ea412 100644 --- a/roles/gpu_dp_install/templates/intel-gpu-plugin.yml.j2 +++ b/roles/gpu_dp_install/templates/intel-gpu-plugin.yml.j2 @@ -15,5 +15,3 @@ spec: intel.feature.node.kubernetes.io/gpu: "true" # check if node has required PCI IDs feature.node.kubernetes.io/pci-0380_8086.present: 'true' - # check if node custom gpu kernel installed - feature.node.kubernetes.io/kernel-version.full: '{{ gpu_oem_kernel_version }}' diff --git a/roles/install_ddp_pkgs/defaults/main.yml b/roles/install_ddp_pkgs/defaults/main.yml index c9ae8516..e7d3eb5e 100644 --- a/roles/install_ddp_pkgs/defaults/main.yml +++ b/roles/install_ddp_pkgs/defaults/main.yml @@ -36,3 +36,4 @@ ddp_pkgs: - "https://downloadmirror.intel.com/713853/800%20Series%20DDP%20Comms%20Package%201.3.31.0.zip" - "https://downloadmirror.intel.com/727568/ice_comms-1.3.35.0.zip" - "https://downloadmirror.intel.com/738733/800%20Series%20DDP%20Comms%20Package%201.3.37.0.zip" + - "https://downloadmirror.intel.com/772040/800%20Series%20DDP%20for%20Comms%20Package%201.3.40.0.zip" diff --git a/roles/install_ddp_pkgs/tasks/install_a_pkg.yml b/roles/install_ddp_pkgs/tasks/install_a_pkg.yml index 92ccc28e..1a9c4141 100644 --- a/roles/install_ddp_pkgs/tasks/install_a_pkg.yml +++ b/roles/install_ddp_pkgs/tasks/install_a_pkg.yml @@ -69,6 +69,14 @@ mode: 0644 when: '"1.3.37.0" in pkgurl' +- name: unarchive DDP package subfolder excluding from list of URLs + unarchive: + src: "{{ temp_ddp_path }}/ice_comms-1.3.40.0.zip" + dest: "{{ temp_ddp_path }}" + remote_src: yes + mode: 0644 + when: '"1.3.40.0" in pkgurl' + - name: find PKG files find: paths: "{{ temp_ddp_path }}" diff --git a/roles/install_dependencies/tasks/main.yml b/roles/install_dependencies/tasks/main.yml index de995b90..c31b4ffe 100644 --- a/roles/install_dependencies/tasks/main.yml +++ b/roles/install_dependencies/tasks/main.yml @@ -28,8 +28,7 @@ - name: install packages action: "{{ ansible_pkg_mgr }} \ - name={{ install_dependencies[ansible_os_family] | difference(['linux-headers-' + gpu_oem_kernel_version if gpu_oem_kernel_version is defined else '']) }} \ - state=present" + name={{ install_dependencies[ansible_os_family] }} state=present" register: pkg_mgr_results retries: "{{ number_of_retries | default(5) }}" until: pkg_mgr_results is success diff --git a/roles/install_dpdk/vars/main.yml b/roles/install_dpdk/vars/main.yml index e65caa56..6898066f 100644 --- a/roles/install_dpdk/vars/main.yml +++ b/roles/install_dpdk/vars/main.yml @@ -22,6 +22,7 @@ install_dependencies: - linux-headers-{{ ansible_kernel }} - pkg-config - python3-pip + - git RedHat: - numactl-devel - libpcap-devel @@ -30,3 +31,4 @@ install_dependencies: - elfutils-libelf-devel - pkgconfig - python3-pip + - git diff --git a/roles/intel_ai/files/run_vehicle_detection_attribute.sh b/roles/intel_ai/files/run_vehicle_detection_attribute.sh deleted file mode 100644 index 03efa13d..00000000 --- a/roles/intel_ai/files/run_vehicle_detection_attribute.sh +++ /dev/null @@ -1,50 +0,0 @@ -#!/bin/bash - -VIDEO_IN=${1:-cars-on-highway.1920x1080.mp4} -VIDEO_OUT=${2:-cars-on-highway-annotated.mp4} - -# shellcheck source=/dev/null -source /opt/intel/openvino_2022/setupvars.sh -# shellcheck source=/dev/null -source /opt/intel/dlstreamer/setupvars.sh - -DET_MODEL=models/public/yolov5m/FP16/yolov5m.xml -DET_MODEL_PROC=models/public/yolov5m/yolov5m.json -DET_LABEL='labels-file=models/public/yolov5m/coco_80cl.txt' - -CLS_MODEL=models/intel/vehicle-attributes-recognition-barrier-0039/FP16-INT8/vehicle-attributes-recognition-barrier-0039.xml -CLS_MODEL_PROC=models/intel/vehicle-attributes-recognition-barrier-0039/vehicle-attributes-recognition-barrier-0039.json - -INC_DETECT="gvadetect pre-process-backend=vaapi-surface-sharing \ - model=${DET_MODEL} \ - model-proc=${DET_MODEL_PROC} \ - ${DET_LABEL} \ - ie-config=CACHE_DIR=./cl_cache \ - device=GPU ! "\ - -#INC_TRACK="gvatrack tracking-type=short-term-imageless ! " - -INC_CLASSIFY="gvaclassify pre-process-backend=vaapi-surface-sharing \ - model=${CLS_MODEL} \ - model-proc=${CLS_MODEL_PROC} \ - ${CLS_LABEL} \ - inference-region=roi-list object-class=car \ - ie-config=CACHE_DIR=./cl_cache \ - device=GPU ! " - -#INC_METAPUBLISH='gvametaconvert ! gvametapublish !' - -INC_WATERMARK='meta_overlay device=GPU !' - -set -x -# shellcheck disable=SC2086 -gst-launch-1.0 filesrc location=${VIDEO_IN} ! \ - decodebin ! video/x-raw\(memory:VASurface\) ! \ - ${INC_DETECT} \ - ${INC_TRACK} \ - ${INC_CLASSIFY} \ - ${INC_METAPUBLISH} \ - ${INC_WATERMARK} \ - gvafpscounter ! \ - queue ! vaapih264enc bitrate=2048 ! h264parse ! \ - mp4mux ! filesink location=/tmp/${VIDEO_OUT} diff --git a/roles/intel_cpu_controlplane/charts/intel-cpu-controlplane/templates/daemonset.yaml b/roles/intel_cpu_controlplane/charts/intel-cpu-controlplane/templates/daemonset.yaml index 9dc4af3e..3c14feb6 100755 --- a/roles/intel_cpu_controlplane/charts/intel-cpu-controlplane/templates/daemonset.yaml +++ b/roles/intel_cpu_controlplane/charts/intel-cpu-controlplane/templates/daemonset.yaml @@ -21,20 +21,46 @@ spec: volumeMounts: - name: state mountPath: /daemonstate + securityContext: + privileged: true + seccompProfile: + type: RuntimeDefault + capabilities: + drop: + - all + resources: + limits: + cpu: 2 + memory: "128M" + requests: + cpu: 1 + memory: "64M" containers: - name: ctlplane-daemonset - image: {{ dig "image" "repository" "intel.io/intel_cpu_controlplane" . }}:{{ dig "image" "tag" "v0.1" . }} + image: {{ dig "image" "repository" "" . }}:{{ dig "image" "tag" "" . }} imagePullPolicy: Always ports: - containerPort: 31000 securityContext: privileged: true - args: ["-cpath", "/cgroup", "-spath", "/daemonstate/daemon.state", "-runtime", {{ dig "runtime" "containerd" .| quote }}, "-allocator", {{ dig "allocator" "default" . | quote }}] + seccompProfile: + type: RuntimeDefault + capabilities: + drop: + - all + args: ["-cpath", "/cgroup", "-spath", "/daemonstate/daemon.state", "-runtime", {{ dig "runtime" "" .| quote }}, "-allocator", {{ dig "allocator" "default" . | quote }} {{ if dig "enable_memory_pinning" "true" . | eq true }}, "-mem" {{ end }}] volumeMounts: - name: host mountPath: /cgroup - name: state mountPath: /daemonstate + resources: + limits: + cpu: 4 + memory: "512M" + requests: + cpu: 2 + memory: "64M" readinessProbe: tcpSocket: port: 31000 @@ -46,16 +72,33 @@ spec: initialDelaySeconds: 15 periodSeconds: 20 - name: ctlplane-agent - image: {{ dig "image" "repository" "intel.io/intel_cpu_controlplane" . }}:{{ dig "image" "tag" "v0.1" . }} + image: {{ dig "image" "repository" "" . }}:{{ dig "image" "tag" "" . }} imagePullPolicy: Always securityContext: - privileged: true + privileged: false + allowPrivilegeEscalation: false + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 10001 + runAsGroup: 10001 + seccompProfile: + type: RuntimeDefault + capabilities: + drop: + - all args: ["-a", "-namespace-prefix", {{ dig "agent_namespace_prefix" "" .| quote }}] env: - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName + resources: + limits: + cpu: 4 + memory: "512M" + requests: + cpu: 2 + memory: "64M" volumes: - name: host hostPath: diff --git a/roles/intel_cpu_controlplane/charts/intel-cpu-controlplane/values.yaml b/roles/intel_cpu_controlplane/charts/intel-cpu-controlplane/values.yaml index 58733c76..a3cccfa4 100755 --- a/roles/intel_cpu_controlplane/charts/intel-cpu-controlplane/values.yaml +++ b/roles/intel_cpu_controlplane/charts/intel-cpu-controlplane/values.yaml @@ -18,20 +18,22 @@ intel_cpu_controlplane: image: - repository: "intel.io/intel_cpu_controlplane" - tag: "v0.1" + repository: "" + tag: "" pullPolicy: IfNotPresent # container runtime: one of ['contianerd', 'docker'] - runtime: "containerd" + runtime: "" namespace: "ctlplane" # NUMA allocator: one of ['default', 'numa', 'numa-namespace', 'numa-namespace-exclusive'] - # https://github.com/intel-innersource/libraries.orchestrators.resourcemanagement.controlplane.daemon/blob/master/README.md - allocator: "numa-namespace-exclusive=2" + # https://github.com/intel/cpu-control-plane-plugin-for-kubernetes/blob/main/README.md + allocator: "default" # control plane agent namespace agent_namespace_prefix: "test-" + enable_memory_pinning: false + imagePullSecrets: [] diff --git a/roles/intel_cpu_controlplane/defaults/main.yml b/roles/intel_cpu_controlplane/defaults/main.yml index f34399bf..09173bbb 100755 --- a/roles/intel_cpu_controlplane/defaults/main.yml +++ b/roles/intel_cpu_controlplane/defaults/main.yml @@ -19,7 +19,6 @@ cpu_ctlplane_release_name: "intel-cpu-controlplane" # CPU control plane releas cpu_ctlplane_golang_version: "1.19.1" # CPU control plane golang version cpu_ctlplane_git_url: "https://github.com/intel/cpu-control-plane-plugin-for-kubernetes.git" -cpu_ctlplane_commit_hash: "a2b5f286cfa0ae05b0504fa0085023ea0e1fbdea" # "7e843ea8742d93759fe34a1a7d1d00293c58efd0" +cpu_ctlplane_version: "0.1.2" cpu_ctlplane_local_build_dir: "{{ (project_root_dir, 'intel-cpu-controlplane') | path_join }}" cpu_ctlplane_local_build_name: "intel-cpu-controlplane" -cpu_ctlplane_local_tag_name: "v0.1" diff --git a/roles/intel_cpu_controlplane/files/controlplane.daemon.zip b/roles/intel_cpu_controlplane/files/controlplane.daemon.zip deleted file mode 100755 index f2f71c99..00000000 Binary files a/roles/intel_cpu_controlplane/files/controlplane.daemon.zip and /dev/null differ diff --git a/roles/intel_cpu_controlplane/tasks/build_cpu_controlplane_daemon_image.yml b/roles/intel_cpu_controlplane/tasks/build_cpu_controlplane_daemon_image.yml index ebb14d13..0fdfb4ed 100755 --- a/roles/intel_cpu_controlplane/tasks/build_cpu_controlplane_daemon_image.yml +++ b/roles/intel_cpu_controlplane/tasks/build_cpu_controlplane_daemon_image.yml @@ -23,7 +23,7 @@ - name: clone CPU Control Plane to the controller node ansible.builtin.git: repo: "{{ cpu_ctlplane_git_url }}" - version: "{{ cpu_ctlplane_commit_hash }}" + version: "{{ ( 'v' + cpu_ctlplane_version ) | path_join }}" dest: "{{ cpu_ctlplane_local_build_dir }}" force: yes @@ -36,14 +36,14 @@ - name: build local CPU Control Plane image ansible.builtin.command: >- - docker build -f ./docker/Dockerfile -t {{ registry_local_address }}/{{ cpu_ctlplane_local_build_name }}:{{ cpu_ctlplane_local_tag_name }} ./ + docker build -f ./docker/Dockerfile -t {{ registry_local_address }}/{{ cpu_ctlplane_local_build_name }}:{{ cpu_ctlplane_version }} ./ args: chdir: "{{ (cpu_ctlplane_local_build_dir) | path_join }}" changed_when: true - name: push the local CPU Control Plane image to local registry ansible.builtin.command: >- - docker push {{ registry_local_address }}/{{ cpu_ctlplane_local_build_name }}:{{ cpu_ctlplane_local_tag_name }} + docker push {{ registry_local_address }}/{{ cpu_ctlplane_local_build_name }}:{{ cpu_ctlplane_version }} changed_when: true when: - container_runtime == "docker" @@ -57,14 +57,14 @@ - name: build local CPU Control Plane image ansible.builtin.command: >- - podman build -f ./docker/Dockerfile -t {{ registry_local_address }}/{{ cpu_ctlplane_local_build_name }}:{{ cpu_ctlplane_local_tag_name }} ./ + podman build -f ./docker/Dockerfile -t {{ registry_local_address }}/{{ cpu_ctlplane_local_build_name }}:{{ cpu_ctlplane_version }} ./ args: chdir: "{{ (cpu_ctlplane_local_build_dir) | path_join }}" changed_when: true - name: push the local CPU Control Plane image to local registry ansible.builtin.command: >- - podman push {{ registry_local_address }}/{{ cpu_ctlplane_local_build_name }}:{{ cpu_ctlplane_local_tag_name }} + podman push {{ registry_local_address }}/{{ cpu_ctlplane_local_build_name }}:{{ cpu_ctlplane_version }} changed_when: true when: - container_runtime == "containerd" diff --git a/roles/intel_cpu_controlplane/tasks/cleanup_cpu_controlplane.yml b/roles/intel_cpu_controlplane/tasks/cleanup_cpu_controlplane.yml index 2364b2bf..02b9ecc9 100755 --- a/roles/intel_cpu_controlplane/tasks/cleanup_cpu_controlplane.yml +++ b/roles/intel_cpu_controlplane/tasks/cleanup_cpu_controlplane.yml @@ -42,7 +42,7 @@ community.docker.docker_image: state: absent name: "{{ registry_local_address }}/{{ cpu_ctlplane_local_build_name }}" - tag: "{{ cpu_ctlplane_local_tag_name }}" + tag: "{{ cpu_ctlplane_version }}" changed_when: false failed_when: false @@ -50,7 +50,7 @@ containers.podman.podman_image: state: absent name: "{{ registry_local_address }}/{{ cpu_ctlplane_local_build_name }}" - tag: "{{ cpu_ctlplane_local_tag_name }}" + tag: "{{ cpu_ctlplane_version }}" changed_when: false failed_when: false @@ -66,4 +66,4 @@ when: - inventory_hostname == groups['kube_control_plane'][0] tags: - - cpu_ctlplane + - cpu-ctlplane diff --git a/roles/intel_cpu_controlplane/tasks/install_cpu_controlplane_helmchart.yml b/roles/intel_cpu_controlplane/tasks/install_cpu_controlplane_helmchart.yml index 2672f404..96df8efd 100755 --- a/roles/intel_cpu_controlplane/tasks/install_cpu_controlplane_helmchart.yml +++ b/roles/intel_cpu_controlplane/tasks/install_cpu_controlplane_helmchart.yml @@ -58,5 +58,5 @@ release_values: "{{ lookup('template', 'custom_values.yml.j2') | from_yaml }}" create_namespace: true chart_ref: "{{ (project_root_dir, 'charts', 'intel-cpu-controlplane') | path_join }}" - chart_version: "{{ cpu_ctlplane_local_tag_name }}" + chart_version: "{{ cpu_ctlplane_version }}" wait: true diff --git a/roles/intel_cpu_controlplane/tasks/preflight_cpu_controlplane.yml b/roles/intel_cpu_controlplane/tasks/preflight_cpu_controlplane.yml index 0a572f42..b2fa13eb 100755 --- a/roles/intel_cpu_controlplane/tasks/preflight_cpu_controlplane.yml +++ b/roles/intel_cpu_controlplane/tasks/preflight_cpu_controlplane.yml @@ -14,7 +14,7 @@ ## limitations under the License. ## --- -- name: preflight Intel CPU Control Plane +- name: Preflight Intel CPU Control Plane block: - name: Intel CPU Control Plane - check if ansible_host distro is Ubuntu 20.04/22.04 assert: @@ -34,7 +34,7 @@ when: - golang_version is not defined - - name: check golang version + - name: Check golang version ansible.builtin.assert: that: golang_version is version(cpu_ctlplane_golang_version, '>') msg: | @@ -43,7 +43,7 @@ The current golang version: {{ golang_version }} Please install golang >= {{ cpu_ctlplane_golang_version }} on the controller. - - name: check container runtime + - name: Check container runtime ansible.builtin.assert: that: container_runtime in ['containerd', 'docker'] msg: | @@ -51,9 +51,20 @@ The Intel CPU Control Plane (group_vars/intel_cpu_controlplane) support only one of ['containerd', 'docker'] The current container_runtime: {{ container_runtime }}" Please change group_vars/contanier_runtime or disable/remove intel_cpu_controlplane. + + - name: Check if registry is enabled + ansible.builtin.assert: + that: + - registry_enable | default(false) + msg: | + Incorrect configuration !! + Intel CPU Control Plane requires a container registry. + Please enable with registry_enable: true in group_vars + when: not on_cloud | default(false) + when: - kubernetes - intel_cpu_controlplane is defined and intel_cpu_controlplane.enabled any_errors_fatal: true tags: - - cpu_ctlplane + - cpu-ctlplane diff --git a/roles/intel_cpu_controlplane/templates/custom_values.yml.j2 b/roles/intel_cpu_controlplane/templates/custom_values.yml.j2 index 8dfbbebc..0db3c4f9 100755 --- a/roles/intel_cpu_controlplane/templates/custom_values.yml.j2 +++ b/roles/intel_cpu_controlplane/templates/custom_values.yml.j2 @@ -2,17 +2,19 @@ intel_cpu_controlplane: image: - repository: "{{ registry_local_address}}/{{cpu_ctlplane_local_build_name}}" - tag: "{{ cpu_ctlplane_local_tag_name }}" + repository: "{{ registry_local_address}}/{{ cpu_ctlplane_local_build_name }}" + tag: "{{ cpu_ctlplane_version }}" pullPolicy: IfNotPresent # container runtime: one of ['contianerd', 'docker'] runtime : "{{ container_runtime }}" # NUMA allocator: one of ['default', 'numa', 'numa-namespace', 'numa-namespace-exclusive'] - # https://github.com/intel-innersource/libraries.orchestrators.resourcemanagement.controlplane.daemon/blob/master/README.md + # https://github.com/intel/cpu-control-plane-plugin-for-kubernetes/blob/main/README.md allocator: "{{ intel_cpu_controlplane.allocator }}" + enable_memory_pinning: {{ intel_cpu_controlplane.enable_memory_pinning }} + # control plane agent namespace agent_namespace_prefix: "{{ intel_cpu_controlplane.agent_namespace_prefix }}" diff --git a/roles/intel_dp_operator/tasks/add_dp_labels.yml b/roles/intel_dp_operator/tasks/add_dp_labels.yml index 6b079bcf..c2e7b9da 100644 --- a/roles/intel_dp_operator/tasks/add_dp_labels.yml +++ b/roles/intel_dp_operator/tasks/add_dp_labels.yml @@ -18,7 +18,7 @@ command: kubectl label nodes {{ hostvars[node_name]['ansible_hostname'] }} qat.configured=true --overwrite when: - qat_dp_enabled | default(false) - - hostvars[node_name]['update_qat_drivers'] | default(false) + - hostvars[node_name]['configure_qat'] | default(false) - hostvars[node_name]['qat_devices'] | length > 0 - name: add labels for nodes with configured SGX diff --git a/roles/net_attach_defs_create/tasks/cndp_net_attach_def.yml b/roles/intel_eci/defaults/main.yml similarity index 64% rename from roles/net_attach_defs_create/tasks/cndp_net_attach_def.yml rename to roles/intel_eci/defaults/main.yml index 3129d45c..a048103c 100644 --- a/roles/net_attach_defs_create/tasks/cndp_net_attach_def.yml +++ b/roles/intel_eci/defaults/main.yml @@ -14,14 +14,10 @@ ## limitations under the License. ## --- -- name: create definition yaml file from template - template: - src: "cndp.yml.j2" - dest: "{{ cndp_k8s_manifest_dir }}/cndp_net_attach_def.yml" - force: yes - mode: preserve +# Intel Edge Controls for Industrial (ECI) -- name: deploy network attachment definition for CNDP - k8s: - state: present - src: "{{ cndp_k8s_manifest_dir }}/cndp_net_attach_def.yml" +intel_eci_version: "3.0.2" +# Please contact eci-support@intel.com on how to access this repo. +# Also refer to ESH (https://www.intel.com/content/www/us/en/edge-computing/edge-software-hub.html) +intel_eci_repo: "" +intel_eci_repo_checksum: "4c397b998e18a88e9a918742c27d3c0bff4c67c4" diff --git a/roles/intel_eci/tasks/eci_preflight.yml b/roles/intel_eci/tasks/eci_preflight.yml new file mode 100644 index 00000000..a92c94af --- /dev/null +++ b/roles/intel_eci/tasks/eci_preflight.yml @@ -0,0 +1,47 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: validate linux distro version for Intel ECI + ansible.builtin.assert: + that: ansible_distribution == 'Ubuntu' and ansible_distribution_version == '22.04' + fail_msg: "Intel ECI is supported only on Ubuntu 22.04 ({{ ansible_distribution }} {{ ansible_distribution_version }} is not supported)" + success_msg: "Assertion passed. Intel ECI is supported and can be deployed on target with Ubuntu 22.04" + +# check Codesys Benchmarking for Intel ECI +- name: validate OPC UA Client and Server are mutually exclusive + ansible.builtin.assert: + that: > + (opcua_framework.codesys_opcua_client | bool and not opcua_framework.standalone_opcua_server | bool) or + (opcua_framework.standalone_opcua_server | bool and not opcua_framework.codesys_opcua_client | bool) + fail_msg: "OPC UA Client and Server roles are mutually exclusive; they cannot be both enabled on the same target" + success_msg: "Assertion passed. Target role is unique" + when: opcua_framework.codesys_opcua_client | default(false) | bool or opcua_framework.standalone_opcua_server | default(false) | bool + +- name: validate Intel ECI is enabled for OPC UA Client or Server + ansible.builtin.assert: + that: intel_eci_enabled + fail_msg: "OPC UA Client and Server roles require Intel ECI to be enabled in host vars (intel_eci_enabled: true)" + success_msg: "Assertion passed. Intel ECI is enabled" + when: opcua_framework.codesys_opcua_client | default(false) | bool or opcua_framework.standalone_opcua_server | default(false) | bool + +- name: validate Intel ECI repo + ansible.builtin.assert: + that: intel_eci_repo_checksum == "{{ intel_eci_repo | checksum }}" + msg: + - Please configure intel_eci_repo in group vars. + - Please contact eci-support@intel.com on how to access this repo. + +# TODO: check CPU for Intel ECI (Atom or Core) diff --git a/roles/intel_eci/tasks/main.yml b/roles/intel_eci/tasks/main.yml new file mode 100644 index 00000000..e687307b --- /dev/null +++ b/roles/intel_eci/tasks/main.yml @@ -0,0 +1,160 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- debug: msg="Entering Intel ECI Role" + +- name: add the ECI APT key + ansible.builtin.apt_key: + url: "{{ intel_eci_repo }}/repos/gpg-keys/GPG-PUB-KEY-INTEL-ECI.gpg" + keyring: /usr/share/keyrings/eci-archive-keyring.gpg + state: present + +- name: add the ECI APT repo + ansible.builtin.apt_repository: + repo: "deb [signed-by=/usr/share/keyrings/eci-archive-keyring.gpg] {{ intel_eci_repo }}/repos/{{ ansible_distribution_release }} isar main" + state: present + filename: eci + +- name: add the ECI APT source repo + ansible.builtin.apt_repository: + repo: "deb-src [signed-by=/usr/share/keyrings/eci-archive-keyring.gpg] {{ intel_eci_repo }}/repos/{{ ansible_distribution_release }} isar main" + state: present + filename: eci + +- name: set ECI APT repo priority above all + ansible.builtin.copy: + dest: /etc/apt/preferences.d/isar + content: | + Package: * + Pin: "origin {{ intel_eci_repo }}" + Pin-Priority: 1000 + mode: '0644' + +- name: install dependencies for Intel ECI (including RT kernel) + include_role: + name: install_dependencies + +- name: reboot into RT kernel + ansible.builtin.reboot: + +- name: re-gather o/s facts + ansible.builtin.setup: + filter: + - 'ansible_kernel' + +- name: check RT kernel + ansible.builtin.assert: + that: "'intel-ese-standard-lts-rt' in ansible_kernel" + fail_msg: "System failed to boot the RT kernel. Detected '{{ ansible_kernel }}' kernel" + success_msg: "Assertion passed. Kernel is now '{{ ansible_kernel }}'" + +- name: install ECI meta-packages + ansible.builtin.apt: + name: "{{ item.key }}" + state: present + update_cache: true + with_items: "{{ intel_eci | dict2items }}" + when: item.value + +- name: deploy Codesys OPC UA Client + block: + - name: install packages for Codesys OPC UA Client + ansible.builtin.apt: + name: + - codesys-opcua-benchmark + - codesys-benchmark-scripts + state: present + update_cache: true + + # sudo /opt/benchmarking/codesys/utility/start_codesys_native.sh + - name: start the Codesys runtime + ansible.builtin.command: /opt/benchmarking/codesys/utility/start_codesys_native.sh + register: codesys_runtime + changed_when: '"Changing affinity of Codesys Runtime tasks" in codesys_runtime.stdout' + failed_when: + - codesys_runtime.rc != 0 + - '"Codesys preparation complete" not in codesys_runtime.stdout' + + - name: restart docker service (codesys_native script killed it) + ansible.builtin.service: + name: docker + state: restarted + + - name: gather service facts + ansible.builtin.service_facts: + + # sudo systemctl status codesyscontrol + - name: print codesyscontrol status + debug: + var: ansible_facts.services.codesyscontrol.state + + - name: check codesyscontrol status + ansible.builtin.assert: + that: ansible_facts.services.codesyscontrol.state == "running" + success_msg: "Assertion passed. The codesyscontrol service is active (running)" + fail_msg: "The codesyscontrol service is in {{ ansible_facts.services.codesyscontrol.state }} state (not running)" + + - debug: msg="Intel ECI with Codesys OPC UA Client is ready on target '{{ inventory_hostname }}'" + when: opcua_framework.codesys_opcua_client | bool + +- name: deploy Standalone OPC UA Server + block: + - name: install packages for Standalone OPC UA Server + ansible.builtin.apt: + name: eci-connectivity-ec-bridge + state: present + update_cache: true + + - name: scan for existing opcsvr process(es) + ansible.builtin.shell: "set -o pipefail && ps -A | grep -i opcsvr | awk '{print $1}'" # noqa command-instead-of-shell + args: + executable: /bin/bash + register: opcsvr_pids + changed_when: false + failed_when: false + + - name: kill any existing opcsvr process(es) + ansible.builtin.shell: "kill -9 {{ item }}" # noqa command-instead-of-shell + with_items: "{{ opcsvr_pids.stdout_lines }}" + changed_when: true + when: opcsvr_pids.stdout | length() != 0 + + # sudo chrt -f 37 /opt/ec-protocol-bridge/opcsvr /opt/ec-protocol-bridge/config/opcsvr-pubsub.yaml + - name: start the EC-Protocol OPC UA Server + ansible.builtin.command: "chrt -f 37 /opt/ec-protocol-bridge/opcsvr /opt/ec-protocol-bridge/config/opcsvr-pubsub.yaml" + async: 9999999 # run "forever" (untill killed) + poll: 0 + register: opcsvr + changed_when: true + + - name: scan for new opcsvr process +# community.general.pids: # needs psutil(python module) +# name: opcsvr + ansible.builtin.shell: "set -o pipefail && ps -A | grep -i opcsvr | awk '{print $1}'" # noqa command-instead-of-shell + args: + executable: /bin/bash + register: opcsvr_pid + changed_when: false + failed_when: false + + - name: check opcsvr status + ansible.builtin.assert: + that: opcsvr_pid.stdout | length() != 0 + success_msg: "Assertion passed. The OPC UA Server is running as 'opcsvr' process with PID {{ opcsvr_pid.stdout }}" + fail_msg: "The OPC UA Server failed to start. No 'opcsvr' process is running" + + - debug: msg="Intel ECI with Standalone OPC UA Server is ready on target '{{ inventory_hostname }}'" + when: opcua_framework.standalone_opcua_server | bool diff --git a/roles/intel_eci/vars/main.yml b/roles/intel_eci/vars/main.yml new file mode 100644 index 00000000..ecbd4c5c --- /dev/null +++ b/roles/intel_eci/vars/main.yml @@ -0,0 +1,20 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +install_dependencies: + Debian: + - eci-customizations + - linux-intel-rt diff --git a/roles/intel_ethernet_operator/defaults/main.yml b/roles/intel_ethernet_operator/defaults/main.yml index 898c072b..8b014be7 100644 --- a/roles/intel_ethernet_operator/defaults/main.yml +++ b/roles/intel_ethernet_operator/defaults/main.yml @@ -59,6 +59,8 @@ intel_ethernet_operator_ddp_urls: 'ice_comms-1.3.31.0.pkg': https://downloadmirror.intel.com/713853/800%20Series%20DDP%20Comms%20Package%201.3.31.0.zip 'ice_comms-1.3.35.0.pkg': https://downloadmirror.intel.com/727568/ice_comms-1.3.35.0.zip 'ice_comms-1.3.37.0.pkg': https://downloadmirror.intel.com/738733/800%20Series%20DDP%20Comms%20Package%201.3.37.0.zip + 'ice_comms-1.3.40.0.pkg': https://downloadmirror.intel.com/772040/800%20Series%20DDP%20for%20Comms%20Package%201.3.40.0.zip + # SHA-1 sums of DDP packages intel_ethernet_operator_ddp_sums: @@ -72,3 +74,4 @@ intel_ethernet_operator_ddp_sums: 'ice_comms-1.3.31.0.pkg': 5dbe3ae8d2ada5b78de05da150e5df5befb3bf75 'ice_comms-1.3.35.0.pkg': c61189b98bb116e05853f67ba21ca915416aef46 'ice_comms-1.3.37.0.pkg': e73d24bdf6b3c8fe46b52ccc31ee534034b0b3e0 + 'ice_comms-1.3.40.0.pkg': 8fcc3eab682f2a023ae49831669e42e63bf05f8f diff --git a/roles/intel_ethernet_operator/tasks/ddp.yml b/roles/intel_ethernet_operator/tasks/ddp.yml index 5b500528..2706fc93 100644 --- a/roles/intel_ethernet_operator/tasks/ddp.yml +++ b/roles/intel_ethernet_operator/tasks/ddp.yml @@ -35,6 +35,8 @@ port: 22 state: stopped timeout: 60 + delegate_to: localhost + become: false - name: Wait for node after reboot ansible.builtin.wait_for: @@ -42,11 +44,15 @@ port: 22 connect_timeout: 5 timeout: 1200 + delegate_to: localhost + become: false # Update could be started on master node - name: Wait for kube-apiserver to be up ansible.builtin.uri: url: "https://127.0.0.1:6443/healthz" + client_cert: "{{ kube_apiserver_cert }}" + client_key: "{{ kube_apiserver_key }}" validate_certs: no use_proxy: no register: ddp_update_api_info @@ -59,11 +65,16 @@ kind: EthernetNodeConfig name: "{{ hostvars[node_name]['ansible_hostname'] }}" namespace: "{{ intel_ethernet_operator_namespace }}" - wait: true - wait_timeout: 1200 - wait_condition: - type: Updated - reason: Succeeded + retries: 60 + delay: 5 + register: enc_status + until: | + enc_status.failed or + ( + enc_status.resources[0].status.conditions[0].status == "True" + and + enc_status.resources[0].status.conditions[0].reason == "Succeeded" + ) - name: Check cluster after reboot ansible.builtin.include_role: diff --git a/roles/intel_ethernet_operator/tasks/fw.yml b/roles/intel_ethernet_operator/tasks/fw.yml index 225a2ddb..c53c0e94 100644 --- a/roles/intel_ethernet_operator/tasks/fw.yml +++ b/roles/intel_ethernet_operator/tasks/fw.yml @@ -35,6 +35,8 @@ port: 22 state: stopped timeout: 60 + delegate_to: localhost + become: false - name: Wait for node reboot ansible.builtin.wait_for: @@ -42,11 +44,15 @@ port: 22 connect_timeout: 5 timeout: 1200 + delegate_to: localhost + become: false # Update could be started on master node - name: Wait for kube-apiserver to be up ansible.builtin.uri: url: "https://127.0.0.1:6443/healthz" + client_cert: "{{ kube_apiserver_cert }}" + client_key: "{{ kube_apiserver_key }}" validate_certs: no use_proxy: no register: ddp_update_api_info diff --git a/roles/intel_ethernet_operator/templates/ddp-service.j2 b/roles/intel_ethernet_operator/templates/ddp-service.j2 index a3d70b44..4de028fa 100644 --- a/roles/intel_ethernet_operator/templates/ddp-service.j2 +++ b/roles/intel_ethernet_operator/templates/ddp-service.j2 @@ -5,9 +5,7 @@ Before=kubelet.service [Service] Type=oneshot -{% if (not hostvars[node_name]['update_nic_drivers'] and -((hostvars[node_name]['ansible_distribution'] == "Ubuntu" and hostvars[node_name]['ansible_distribution_version'] >= "22.04") or -(hostvars[node_name]['ansible_os_family'] == "RedHat" and hostvars[node_name]['ansible_distribution_version'] >= "8.6"))) %} +{% if 'irdma' in ieo_lsmod.stdout %} ExecStart=/sbin/modprobe -r irdma ice ExecStart=/sbin/modprobe -a ice irdma {% else %} diff --git a/roles/intel_flexran/defaults/main.yml b/roles/intel_flexran/defaults/main.yml index 39fb5b5a..05a42708 100644 --- a/roles/intel_flexran/defaults/main.yml +++ b/roles/intel_flexran/defaults/main.yml @@ -27,14 +27,14 @@ # intel_flexran_repo: "Intel’s Developer Zone Portal aka RDC" # intel_flexran_token: "not public. pkg access requires NDA. see docs/flexran_guide.md" intel_flexran_staging_location: "/tmp/flexran/" # a directory on localhost (ansible host) -intel_flexran_ver: "22.11" # "22.03" (RA22.06) "22.07" (RA22.08) "22.07.3" (RA22.11) "22.11" (RA23.02) +intel_flexran_ver: "23.03" # "22.03" (RA22.06) "22.07" (RA22.08) "22.07.3" (RA22.11) "22.11" (RA23.02) "23.03" (RA23.07) intel_flexran_pod_version: "22.07" # (RA23.02) intel_flexran_namespace: "default" # intel_flexran_tarball: "FlexRAN-22.03.tar.gz" # intel_flexran_tar_chk: "65e59ac1295ef392f54b80047db2efe458962fc78e5d84c5d54703439a364cda" # SHA256 intel_flexran_dir: "{{ (project_root_dir, 'intel-flexran') | path_join }}" intel_flexran_files_dir: "{{ (project_root_dir, 'intel-flexran-files') | path_join }}" # for FEC ACC CRs, kernel cmdline, etc -intel_flexran_dpdk_ver: "21.11" # for FlexRAN 22.03, 22.07, 22.07.3, 22.11 +intel_flexran_dpdk_ver: "22.11.1" # "21.11" for FlexRAN 22.03, 22.07, 22.07.3, 22.11 intel_flexran_dpdk_dir: "{{ (project_root_dir, 'dpdk-' + intel_flexran_dpdk_ver) | path_join }}" intel_flexran_dpdk_zip: "dpdk_patch-{{ intel_flexran_ver }}.patch.zip" @@ -42,23 +42,12 @@ intel_flexran_dpdk_zip_chk: "dab1a0c3a0530be9904d62d3c3f4f88166b73360dcc11402500 intel_flexran_dpdk_patch: "dpdk_patch-{{ intel_flexran_ver }}.patch" # intel_flexran_dpdk_patch_chk: "bd136f939609545d70e4d6a8dca83a1550d6a28a0b6b70fdfb10d8283922f4c5" # SHA256 for dpdk_patch-22.07.3.patch -intel_flexran_dpdk_patch_chk: "d7943817a04d58ee3a78a36b2cbbaa18f45df15402f6566e904648820e657ad7" # SHA256 for dpdk_patch-22.11.patch +# intel_flexran_dpdk_patch_chk: "d7943817a04d58ee3a78a36b2cbbaa18f45df15402f6566e904648820e657ad7" # SHA256 for dpdk_patch-22.11.patch +intel_flexran_dpdk_patch_chk: "91b4d1911568f59d349adc0152d0d60fb4c5fd927373d021f7de188ba20f1267" # SHA256 for dpdk_patch-23.03.patch intel_flexran_patch: "FlexRAN-R{{ intel_flexran_ver }}.zip" intel_flexran_patch_chk: "1089d1bd3d86fe2f2198c497fa26e6f9322fd867f5f6ece087190499ff427593" # SHA256 for FlexRAN-R22.07.3.zip -# Intel oneAPI Base Toolkit -# Reference: https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html -intel_oneapi_dir: "{{ (project_root_dir, 'intel-oneapi') | path_join }}" - -# intel_oneapi_ver: "2022.1.2.146" -# intel_oneapi_url: "https://registrationcenter-download.intel.com/akdlm/irc_nas/18487/l_BaseKit_p_2022.1.2.146_offline.sh" -# intel_oneapi_chk: "91682e4410c17a82147ce574c30e57271cc12adfab198c8547612f13d4dd21c8d77ce12153d29b3774bc27f0c6b604cd" # SHA384 - -intel_oneapi_ver: "2022.2" -intel_oneapi_url: "https://registrationcenter-download.intel.com/akdlm/irc_nas/18673/l_BaseKit_p_2022.2.0.262_offline.sh" -intel_oneapi_chk: "e508b0a64f048d9518cc3706e1fa3f400dbb0a07fdc0f91e02b371b18a35715fa0fad7a960dbb7fc04595f77ae65a333" # SHA384 - -intel_pfbb_version: "v22.11" +intel_pfbb_version: "v23.03" inih_version: "r44" diff --git a/roles/intel_flexran/tasks/flexran.yml b/roles/intel_flexran/tasks/flexran.yml index 42221be6..0d5d30c4 100644 --- a/roles/intel_flexran/tasks/flexran.yml +++ b/roles/intel_flexran/tasks/flexran.yml @@ -32,6 +32,15 @@ # args: # chdir: "{{ intel_flexran_dir }}" +# As the path to the libnuma library is different with Ubuntu, a soft link is created to avoid multiple changes to the makefiles +# (FlexRAN hardcoded the path for libnuma) +- name: create libnuma symlink + file: + src: "/usr/lib/x86_64-linux-gnu/libnuma.so" + dest: "/usr/lib64/libnuma.so" + state: link + when: ansible_distribution in ['Ubuntu'] + - name: patch Intel FlexRAN for xx.yy.z release block: - name: copy Intel FlexRAN patch file @@ -60,10 +69,14 @@ content: "{{ intel_flexran_dpdk_dir }}" mode: '0755' +- name: include oneAPI vars + ansible.builtin.include_vars: + file: ../intel_oneapi_install/vars/main.yml + - name: set oneAPI path for Intel FlexRAN copy: dest: "{{ (intel_flexran_dir, '.flexran_icx.path') | path_join }}" - content: "{{ intel_oneapi_dir }}" + content: "{{ intel_oneapi_install_dir }}" mode: '0755' - debug: msg="Intel FlexRAN mode is '{{ intel_flexran_mode }}'" diff --git a/roles/intel_flexran/tasks/flexran_preflight.yml b/roles/intel_flexran/tasks/flexran_preflight.yml index c539c19f..a75b2f76 100644 --- a/roles/intel_flexran/tasks/flexran_preflight.yml +++ b/roles/intel_flexran/tasks/flexran_preflight.yml @@ -174,6 +174,14 @@ msg: - DPDK version '{{ dpdk_version }}' set in the worker node host_vars file does NOT match the DPDK version required for FlexRAN. - Must be '{{ intel_flexran_dpdk_ver }}' + + - name: check intel oneAPI basekit enabled + assert: + that: + - intel_oneapi_enabled | default(false) # basekit must be enabled by default + fail_msg: > + Intel oneAPI is not enabled. + Intel oneAPI must be enabled for FlexRAN. when: - intel_flexran_enabled | default(false) | bool - intel_flexran_type == "host" @@ -187,9 +195,9 @@ assert: that: > (ansible_distribution == 'Ubuntu' and ansible_distribution_version == '22.04' and 'realtime' in ansible_kernel) or - (ansible_distribution == 'RedHat' and ansible_distribution_version == '8.6' and 'rt' in ansible_kernel) + (ansible_distribution == 'RedHat' and ansible_distribution_version == '9.2' and 'rt' in ansible_kernel) msg: - - Deploying Intel FlexRAN is supported only on Ubuntu 22.04 or RHEL 8.6 and with real-time kernel. + - Deploying Intel FlexRAN is supported only on Ubuntu 22.04 or RHEL 9.2 and with real-time kernel. - Please prepare accordingly the o/s image on target(s) or disable FlexRAN. See docs/flexran_guide.md # check package for FlexRAN @@ -244,6 +252,9 @@ - Deploying Intel FlexRAN in Docker POD is supported only on Ubuntu 22.04 with real-time kernel. - Please prepare accordingly the o/s image on target(s) or disable FlexRAN. See docs/flexran_guide.md success_msg: "Assertion passed. Intel FlexRAN in Docker POD is supported and can be deployed on {{ ansible_distribution }} {{ ansible_distribution_version }} {{ ansible_kernel }} target" # noqa yaml[line-length] + when: + - rt_kernel_enabled is not defined or + not rt_kernel_enabled | default(false) | bool # check CPU for FlexRAN in Docker POD - debug: msg="CPU={{ ansible_processor[2] }} cores={{ ansible_processor_cores }} count={{ ansible_processor_count }} nproc={{ ansible_processor_nproc }} tpc={{ ansible_processor_threads_per_core }} vcpus={{ ansible_processor_vcpus }}" # noqa yaml[line-length] @@ -257,8 +268,8 @@ - debug: msg="Container runtime is set to {{ container_runtime }}" - name: check runtime for FlexRAN in Docker POD ansible.builtin.assert: - that: container_runtime == 'docker' - fail_msg: "Deploying Intel FlexRAN as a Docker POD is supported only for docker runtime. Please correct the group_vars configuration" + that: container_runtime in ['docker', 'containerd'] + fail_msg: "Deploying Intel FlexRAN as a Docker POD is supported only for docker/containerd runtime. Please correct the group_vars configuration" success_msg: "Assertion passed. Intel FlexRAN as a Docker POD is supported and can be deployed on '{{ container_runtime }}' runtime" # check SRIOV for FlexRAN in Docker POD diff --git a/roles/intel_flexran/tasks/main.yml b/roles/intel_flexran/tasks/main.yml index 3a033c2b..7c17130c 100644 --- a/roles/intel_flexran/tasks/main.yml +++ b/roles/intel_flexran/tasks/main.yml @@ -52,9 +52,6 @@ include_tasks: power.yml when: inventory_hostname == groups['kube_node'][0] -- name: deploy Intel oneAPI - include_tasks: oneapi.yml - - name: deploy Intel FlexRAN include_tasks: flexran.yml when: intel_flexran_type == "host" diff --git a/roles/intel_flexran/tasks/oneapi.yml b/roles/intel_flexran/tasks/oneapi.yml deleted file mode 100644 index 558a185e..00000000 --- a/roles/intel_flexran/tasks/oneapi.yml +++ /dev/null @@ -1,46 +0,0 @@ -## -## Copyright (c) 2020-2023 Intel Corporation. -## -## Licensed under the Apache License, Version 2.0 (the "License"); -## you may not use this file except in compliance with the License. -## You may obtain a copy of the License at -## -## http://www.apache.org/licenses/LICENSE-2.0 -## -## Unless required by applicable law or agreed to in writing, software -## distributed under the License is distributed on an "AS IS" BASIS, -## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -## See the License for the specific language governing permissions and -## limitations under the License. -## ---- -- name: create Intel oneAPI directory - file: - path: "{{ intel_oneapi_dir }}" - state: directory - mode: '0755' - -- name: download Intel oneAPI - get_url: - url: "{{ intel_oneapi_url }}" - dest: "{{ (intel_oneapi_dir, 'intel-oneapi-basekit-offline.sh') | path_join }}" - checksum: "sha384:{{ intel_oneapi_chk }}" - mode: '0755' - use_proxy: yes - -# ln -s /usr/lib/x86_64-linux-gnu/libnuma.so /usr/lib64/libnuma.so -# RHEL 8.6 RT ERR: src file does not exist, use "force=yes" if you really want to create the link: /usr/lib/x86_64-linux-gnu/libnuma.so -- name: create libnuma symlink - file: - src: "/usr/lib/x86_64-linux-gnu/libnuma.so" - dest: "/usr/lib64/libnuma.so" - state: link - when: ansible_distribution == 'Ubuntu' - -- name: install Intel oneAPI -# command: "sh {{ intel_oneapi_dir }}/intel-oneapi-basekit-offline.sh -a --silent --eula accept --install-dir {{ intel_oneapi_dir }}" - command: "sh {{ intel_oneapi_dir }}/intel-oneapi-basekit-offline.sh -a --silent --eula accept --components intel.oneapi.lin.dpcpp-cpp-compiler:intel.oneapi.lin.ipp.devel:intel.oneapi.lin.ippcp.devel:intel.oneapi.lin.mkl.devel:intel.oneapi.lin.dpcpp-ct:intel.oneapi.lin.dpl:intel.oneapi.lin.dpcpp_dbg --install-dir {{ intel_oneapi_dir }}" # noqa yaml[line-length] - changed_when: true - failed_when: false # to allow re-run install without uninstall -# environment: -# PATH: "{{ gopath.stdout }}/bin:/usr/local/go/bin:/usr/sbin:/usr/bin:/sbin:/bin:{{ intel_oneapi_dir }}" diff --git a/roles/intel_flexran/vars/main.yml b/roles/intel_flexran/vars/main.yml index ad48830c..adb643b7 100644 --- a/roles/intel_flexran/vars/main.yml +++ b/roles/intel_flexran/vars/main.yml @@ -31,6 +31,7 @@ install_dependencies: - flex - bison - msr-tools + - linux-tools-{{ ansible_kernel }} RedHat: - git - make @@ -38,7 +39,7 @@ install_dependencies: - elfutils-libelf-devel - cmake - gcc-c++ - - libhugetlbfs* +# - libhugetlbfs* # RH9.2RT: "failures": "No package available." - libstdc++* - kernel-devel - numactl* diff --git a/roles/intel_media_analytics/defaults/main.yaml b/roles/intel_media_analytics/defaults/main.yaml new file mode 100644 index 00000000..c3a21bf6 --- /dev/null +++ b/roles/intel_media_analytics/defaults/main.yaml @@ -0,0 +1,29 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +intel_media_analytics_namespace: "intel-media" +intel_media_analytics__release_name: "intel-media" + +# Media Analytics +intel_media_analytics_image_src: "intel/dlstreamer" +intel_media_analytics_image_tag: "2022.3.0-ubuntu22-gpu555-dpcpp" + +intel_media_analytics_local_folder: "{{ (project_root_dir, 'intel-media') | path_join }}" + +intel_media_analytics_local_build_name: "intel-media" +intel_media_analytics_local_build_tag: "v23.02" + +intel_media_analytics_sample_pod_name: "intel-media" diff --git a/roles/intel_media_analytics/files/run_vehicle_detection_attribute.sh b/roles/intel_media_analytics/files/run_vehicle_detection_attribute.sh new file mode 100644 index 00000000..87ff9d10 --- /dev/null +++ b/roles/intel_media_analytics/files/run_vehicle_detection_attribute.sh @@ -0,0 +1,88 @@ +#!/bin/bash + +VIDEO_IN=${1:-cars-on-highway.1920x1080.mp4} +VIDEO_OUT=${2:-cars-on-highway-annotated.mp4} + +# shellcheck source=/dev/null +source /opt/intel/openvino_2022/setupvars.sh +# shellcheck source=/dev/null +source /opt/intel/dlstreamer/setupvars.sh + +DET_MODEL=models/public/yolov5m/FP16/yolov5m.xml +DET_MODEL_PROC=models/public/yolov5m/yolov5m.json +DET_LABEL='labels-file=models/public/yolov5m/coco_80cl.txt' + +CLS_MODEL=models/intel/vehicle-attributes-recognition-barrier-0039/FP16-INT8/vehicle-attributes-recognition-barrier-0039.xml +CLS_MODEL_PROC=models/intel/vehicle-attributes-recognition-barrier-0039/vehicle-attributes-recognition-barrier-0039.json + +INC_DETECT_CMD=( + "gvadetect" + "pre-process-backend=vaapi-surface-sharing" + "model=${DET_MODEL}" + "model-proc=${DET_MODEL_PROC}" + "${DET_LABEL}" + "ie-config=CACHE_DIR=./cl_cache" + "device=GPU" +) + +#INC_TRACK_CMD=( +# "gvatrack" +# "tracking-type=short-term-imageless" +#) + + +if [[ -n "${CLS_LABEL}" ]]; +then +INC_CLASSIFY_CMD=( + "gvaclassify" + "pre-process-backend=vaapi-surface-sharing" + "model=${CLS_MODEL}" + "model-proc=${CLS_MODEL_PROC}" + "${CLS_LABEL}" + "inference-region=roi-list" + "object-class=car" + "ie-config=CACHE_DIR=./cl_cache" + "device=GPU" +) +else +INC_CLASSIFY_CMD=( + "gvaclassify" + "pre-process-backend=vaapi-surface-sharing" + "model=${CLS_MODEL}" + "model-proc=${CLS_MODEL_PROC}" + "inference-region=roi-list" + "object-class=car" + "ie-config=CACHE_DIR=./cl_cache" + "device=GPU" +) +fi + +#INC_METAPUBLISH_PIPLINE=( +# 'gvametaconvert' ! +# 'gvametapublish' +#) + +INC_WATERMARK_CMD=( + "meta_overlay" + "device=GPU" +) + +FULL_PIPELINE=( + "filesrc" "location=${VIDEO_IN}" ! + "decodebin" ! + "video/x-raw(memory:VASurface)" ! + "${INC_DETECT_CMD[@]}" ! +# "${INC_TRACK_CMD[@]}" ! + "${INC_CLASSIFY_CMD[@]}" ! +# "${INC_METAPUBLISH_PIPELINE[@]}" ! + "${INC_WATERMARK_CMD[@]}" ! + "gvafpscounter" ! + "queue" ! + "vaapih264enc" "bitrate=2048" ! + "h264parse" ! + "mp4mux" ! + "filesink" "location=/tmp/${VIDEO_OUT}" +) + +set -x +gst-launch-1.0 "${FULL_PIPELINE[@]}" diff --git a/roles/intel_ai/tasks/cleanup_intel_ai.yml b/roles/intel_media_analytics/tasks/cleanup_intel_media_analytics.yml similarity index 73% rename from roles/intel_ai/tasks/cleanup_intel_ai.yml rename to roles/intel_media_analytics/tasks/cleanup_intel_media_analytics.yml index a8da0de7..91b2b3eb 100644 --- a/roles/intel_ai/tasks/cleanup_intel_ai.yml +++ b/roles/intel_media_analytics/tasks/cleanup_intel_media_analytics.yml @@ -20,33 +20,33 @@ state: absent api_version: v1 kind: Pod - name: "{{ intel_ai_sample_pod_name }}" - namespace: "{{ intel_ai_namespace }}" + name: "{{ intel_media_analytics_sample_pod_name }}" + namespace: "{{ intel_media_analytics_namespace }}" - name: remove Media Analytics image from local registry block: - name: delete the tag community.docker.docker_image: state: absent - name: "{{ registry_local_address }}/{{ intel_ai_local_build_name }}" - tag: "{{ intel_ai_local_build_tag }}" + name: "{{ registry_local_address }}/{{ intel_media_analytics_local_build_name }}" + tag: "{{ intel_media_analytics_local_build_tag }}" force_absent: true when: - container_runtime == "docker" - name: remove Media Analytics folder ansible.builtin.file: - path: "{{ (intel_ai_local_folder) | path_join }}" + path: "{{ (intel_media_analytics_local_folder) | path_join }}" state: absent - name: remove a k8s namespace kubernetes.core.k8s: - name: "{{ intel_ai_namespace }}" + name: "{{ intel_media_analytics_namespace }}" api_version: v1 kind: Namespace state: absent when: - kubernetes - - intel_ai_enabled | default (false) + - intel_media_analytics_enabled | default (false) tags: - - intel-ai + - intel-media-analytics diff --git a/roles/intel_ai/tasks/intel_ai_install.yml b/roles/intel_media_analytics/tasks/intel_media_analytics_install.yml similarity index 62% rename from roles/intel_ai/tasks/intel_ai_install.yml rename to roles/intel_media_analytics/tasks/intel_media_analytics_install.yml index 738397df..7b8c8f14 100644 --- a/roles/intel_ai/tasks/intel_ai_install.yml +++ b/roles/intel_media_analytics/tasks/intel_media_analytics_install.yml @@ -16,51 +16,51 @@ --- - name: create Media Analytics folder ansible.builtin.file: - path: "{{ (intel_ai_local_folder) | path_join }}" + path: "{{ (intel_media_analytics_local_folder) | path_join }}" state: directory mode: 0755 - name: copy Media Analytics shell script to the controller node ansible.builtin.copy: src: "{{ item }}" - dest: "{{ (intel_ai_local_folder) | path_join }}" + dest: "{{ (intel_media_analytics_local_folder) | path_join }}" mode: 0644 with_fileglob: - ./*.sh -- name: get the group ID for GPU when gpu_dp_enabled - ansible.builtin.stat: - path: /dev/dri/renderD128 - register: gpu_stat_gid - when: gpu_dp_enabled +- name: Copy YAML templates to the controller node for each node + ansible.builtin.template: + src: "templates/media_analytics_sample_pod.yaml.j2" + dest: "{{ intel_media_analytics_local_folder }}/{{ item }}.yaml" + mode: 0644 + loop: "{{ groups['kube_node'] }}" + when: hostvars[item].gpu_stat_gid.stat.gid is defined - name: copy Media Analytics Dockerfile to the controller node ansible.builtin.template: - src: "{{ item }}" - dest: "{{ intel_ai_local_folder}}/{{ item | basename | regex_replace('.j2','') }}" + src: "templates/Dockerfile.j2" + dest: "{{ intel_media_analytics_local_folder}}/Dockerfile" mode: 0644 - with_fileglob: - - ../templates/*.j2 # docker is used as container runtime: - name: prepare containers images block: - name: prepare and push containers images vars: - image: "{{ registry_local_address }}/{{ intel_ai_local_build_name }}" - tag: "{{ intel_ai_local_build_tag }}" + image: "{{ registry_local_address }}/{{ intel_media_analytics_local_build_name }}" + tag: "{{ intel_media_analytics_local_build_tag }}" ansible.builtin.shell: cmd: |- docker build -t {{ image }}:{{ tag }} -f Dockerfile . docker push {{ image }}:{{ tag }} - chdir: "{{ (intel_ai_local_folder) | path_join }}" + chdir: "{{ (intel_media_analytics_local_folder) | path_join }}" changed_when: true when: - container_runtime is in ['docker'] - name: create a k8s namespace for Media Analytics kubernetes.core.k8s: - name: "{{ intel_ai_namespace }}" + name: "{{ intel_media_analytics_namespace }}" api_version: v1 kind: Namespace state: present @@ -68,4 +68,5 @@ - name: create Media Analytics sample pod kubernetes.core.k8s: state: present - src: "{{ (intel_ai_local_folder, 'media_analytics_sample_pod.yaml' ) | path_join }}" + src: "{{ intel_media_analytics_local_folder }}/{{ item }}.yaml" + loop: "{{ groups['kube_node'] }}" diff --git a/roles/intel_media_analytics/tasks/main.yml b/roles/intel_media_analytics/tasks/main.yml new file mode 100644 index 00000000..bd1ea6d0 --- /dev/null +++ b/roles/intel_media_analytics/tasks/main.yml @@ -0,0 +1,29 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +- name: Get the group ID for GPU when gpu_dp_enabled + ansible.builtin.stat: + path: /dev/dri/renderD128 + register: gpu_stat_gid + when: + - gpu_dp_enabled + - inventory_hostname in groups['kube_node'] + +- name: install Media Analytics + import_tasks: intel_media_analytics_install.yml + when: + - kubernetes + - intel_media_analytics_enabled | default(false) + - inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/intel_ai/tasks/preflight_intel_ai.yml b/roles/intel_media_analytics/tasks/preflight_intel_media_analytics.yml similarity index 94% rename from roles/intel_ai/tasks/preflight_intel_ai.yml rename to roles/intel_media_analytics/tasks/preflight_intel_media_analytics.yml index 6b698ddb..7148fe32 100644 --- a/roles/intel_ai/tasks/preflight_intel_ai.yml +++ b/roles/intel_media_analytics/tasks/preflight_intel_media_analytics.yml @@ -33,7 +33,7 @@ Make sure 'container_runtime: docker' to enable Media Analytics when: - kubernetes - - intel_ai_enabled | default(false) + - intel_media_analytics_enabled | default(false) any_errors_fatal: true tags: - - intel-ai + - intel-media-analytics diff --git a/roles/intel_media_analytics/tasks/template_dockerfile.yml b/roles/intel_media_analytics/tasks/template_dockerfile.yml new file mode 100644 index 00000000..75031395 --- /dev/null +++ b/roles/intel_media_analytics/tasks/template_dockerfile.yml @@ -0,0 +1,20 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +- name: copy Media Analytics Dockerfile to the controller node + ansible.builtin.template: + src: "Dockerfile.j2" + dest: "{{ (dockerfiles_dir, 'Dockerfile-intel_media_analytics') | path_join }}" + mode: 0644 diff --git a/roles/intel_ai/templates/Dockerfile.j2 b/roles/intel_media_analytics/templates/Dockerfile.j2 similarity index 64% rename from roles/intel_ai/templates/Dockerfile.j2 rename to roles/intel_media_analytics/templates/Dockerfile.j2 index be6c053b..ac29e443 100755 --- a/roles/intel_ai/templates/Dockerfile.j2 +++ b/roles/intel_media_analytics/templates/Dockerfile.j2 @@ -1,4 +1,19 @@ -FROM {{ intel_ai_image_src}}:{{ intel_ai_image_tag }} +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## + +FROM {{ intel_media_analytics_image_src}}:{{ intel_media_analytics_image_tag }} ARG http_proxy ARG https_proxy @@ -20,9 +35,9 @@ ENV HOME=/home/dlstreamer RUN python3 -m venv venv --prompt dls2022.3 ENV PATH="$HOME/venv/bin:$PATH" RUN python3 -m pip install --no-cache-dir -U pip \ - && pip install --no-cache-dir openvino-dev[onnx]==2022.3.0 \ - openvino-dev[tensorflow2]==2022.3.0 \ - openvino-dev[pytorch]==2022.3.0 + && pip install --no-cache-dir "openvino-dev[onnx]==2022.3.0" \ + "openvino-dev[tensorflow2]==2022.3.0" \ + "openvino-dev[pytorch]==2022.3.0" ENV MODEL_PATH=${HOME}/models RUN mkdir -p ${MODEL_PATH} diff --git a/roles/intel_ai/templates/media_analytics_sample_pod.yaml.j2 b/roles/intel_media_analytics/templates/media_analytics_sample_pod.yaml.j2 similarity index 53% rename from roles/intel_ai/templates/media_analytics_sample_pod.yaml.j2 rename to roles/intel_media_analytics/templates/media_analytics_sample_pod.yaml.j2 index 34b6e908..e54cc7a3 100644 --- a/roles/intel_ai/templates/media_analytics_sample_pod.yaml.j2 +++ b/roles/intel_media_analytics/templates/media_analytics_sample_pod.yaml.j2 @@ -1,15 +1,17 @@ apiVersion: v1 kind: Pod metadata: - name: "{{ intel_ai_sample_pod_name }}" - namespace: "{{ intel_ai_namespace }}" + name: "{{ intel_media_analytics_sample_pod_name }}" + namespace: "{{ intel_media_analytics_namespace }}" spec: + nodeSelector: + kubernetes.io/hostname: "{{ hostvars[item].inventory_hostname }}" securityContext: runAsUser: 1000 - runAsGroup: {{ gpu_stat_gid.stat.gid }} + runAsGroup: {{ hostvars[item].gpu_stat_gid.stat.gid }} containers: - - name: "{{ intel_ai_sample_pod_name }}" - image: {{ registry_local_address }}/{{ intel_ai_local_build_name }}:{{ intel_ai_local_build_tag }} + - name: "{{ intel_media_analytics_sample_pod_name }}" + image: {{ registry_local_address }}/{{ intel_media_analytics_local_build_name }}:{{ intel_media_analytics_local_build_tag }} command: ['sh', '-c', 'echo "Hello, Media Analytics!" && sleep infinity'] {%- if gpu_dp_enabled == true %} resources: diff --git a/roles/intel_oneapi_install/defaults/main.yml b/roles/intel_oneapi_install/defaults/main.yml new file mode 100644 index 00000000..502284e2 --- /dev/null +++ b/roles/intel_oneapi_install/defaults/main.yml @@ -0,0 +1,42 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +supported_intel_oneapi_kits: + - basekit # Reference: https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html + - ai_analytics # Reference: https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html + +default_intel_oneapi_kit: basekit + +oneapi_basekit_version: 2023.1.0 # reference for versions playbook +oneapi_ai_version: 2023.1.1 # reference for versions playbook + +intel_oneapi_checksum: + basekit: "aa874c08c985095c710f849af7e3d1f0cfecf561398056d391aae2528c40ea12994b17e244646597be4e55cb762490e1" # SHA384 + ai_analytics: "720f9f0bc10c92a8591142881e5f87bf2a254a550fdc948182af620f9c0f83f2c3987b71d25747c6e80f53cdb2c4646a" # SHA384 + +intel_oneapi_url: + basekit: "https://registrationcenter-download.intel.com/akdlm/IRC_NAS/7deeaac4-f605-4bcf-a81b-ea7531577c61/l_BaseKit_p_2023.1.0.46401_offline.sh" + ai_analytics: "https://registrationcenter-download.intel.com/akdlm/IRC_NAS/ef4efa4d-9b83-4994-a122-d5a4d2dec84c/l_AIKit_p_2023.1.1.48862_offline.sh" + +intel_oneapi_components: + basekit: + - "intel.oneapi.lin.dpcpp-cpp-compiler" + - "intel.oneapi.lin.ipp.devel" + - "intel.oneapi.lin.ippcp.devel" + - "intel.oneapi.lin.mkl.devel" + - "intel.oneapi.lin.dpcpp-ct" + - "intel.oneapi.lin.dpl" + - "intel.oneapi.lin.dpcpp_dbg" + ai_analytics: diff --git a/roles/intel_ai/defaults/main.yaml b/roles/intel_oneapi_install/tasks/cleanup.yml similarity index 59% rename from roles/intel_ai/defaults/main.yaml rename to roles/intel_oneapi_install/tasks/cleanup.yml index 51b26f8e..96b3f47a 100644 --- a/roles/intel_ai/defaults/main.yaml +++ b/roles/intel_oneapi_install/tasks/cleanup.yml @@ -13,17 +13,15 @@ ## See the License for the specific language governing permissions and ## limitations under the License. ## ---- -intel_ai_namespace: "intel-ai" -intel_ai_release_name: "intel-ai" +- name: Remove Intel oneAPI kits + vars: + intel_oneapi_dir: "{{ intel_oneapi_root_dir }}/{{ oneapi_kit }}" + ansible.builtin.include_tasks: remove_kit.yml + loop: "{{ intel_oneapi | dict2items | rejectattr('value', 'false') | map(attribute='key') | list }}" + loop_control: + loop_var: "oneapi_kit" -# Media Analytics -intel_ai_image_src: "intel/dlstreamer" -intel_ai_image_tag: "2022.3.0-ubuntu22-gpu555-dpcpp" - -intel_ai_local_folder: "{{ (project_root_dir, 'intel-ai') | path_join }}" - -intel_ai_local_build_name: "intel-ai" -intel_ai_local_build_tag: "v23.02" - -intel_ai_sample_pod_name: "intel-ai" +- name: Remove Intel oneAPI directory + ansible.builtin.file: + path: "{{ intel_oneapi_root_dir }}" + state: absent diff --git a/roles/intel_oneapi_install/tasks/install_kit.yml b/roles/intel_oneapi_install/tasks/install_kit.yml new file mode 100644 index 00000000..e3da8521 --- /dev/null +++ b/roles/intel_oneapi_install/tasks/install_kit.yml @@ -0,0 +1,58 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: create Intel oneAPI directory - {{ oneapi_kit }} + ansible.builtin.file: + path: "{{ intel_oneapi_dir }}" + state: directory + mode: 0755 + +- name: download Intel oneAPI kit - {{ oneapi_kit }} + ansible.builtin.get_url: + url: "{{ intel_oneapi_url[oneapi_kit] }}" + dest: "{{ (intel_oneapi_dir, 'intel-oneapi-' + oneapi_kit + '-offline.sh') | path_join }}" + checksum: "sha384:{{ intel_oneapi_checksum[oneapi_kit] }}" + mode: 0755 + use_proxy: yes + +- name: template selected components to be installed + ansible.builtin.set_fact: + oneapi_selected_components: >- + {%- if intel_oneapi_components[oneapi_kit] | default(false) -%} + --components {{ intel_oneapi_components[oneapi_kit] | list | join(':') }} + {%- endif -%} + +- name: install Intel oneAPI kit - {{ oneapi_kit }} + vars: + oneapi_cmd: > + {{ intel_oneapi_dir }}/intel-oneapi-{{ oneapi_kit }}-offline.sh -a + --silent --eula accept {{ oneapi_selected_components | default('') }} --install-dir {{ intel_oneapi_install_dir }} + oneapi_output_installed: "It is already installed." + block: + - name: try installation of Intel oneAPI - {{ oneapi_kit }} + ansible.builtin.command: + cmd: "sh {{ oneapi_cmd }}" + register: oneapi_install + changed_when: oneapi_output_installed not in oneapi_install.stdout + failed_when: + - oneapi_install.rc != 0 + - oneapi_output_installed not in oneapi_install.stdout + + - name: attempt to repair installation of Intel oneAPI kit - {{ oneapi_kit }} + ansible.builtin.command: + cmd: "sh {{ oneapi_cmd }} --action repair" + changed_when: true + when: oneapi_output_installed in oneapi_install.stdout diff --git a/roles/intel_oneapi_install/tasks/main.yml b/roles/intel_oneapi_install/tasks/main.yml new file mode 100644 index 00000000..aa3beecf --- /dev/null +++ b/roles/intel_oneapi_install/tasks/main.yml @@ -0,0 +1,29 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: create Intel oneAPI directory + ansible.builtin.file: + path: "{{ intel_oneapi_root_dir }}" + state: directory + mode: 0755 + +- name: Download and install Intel oneAPI kits + vars: + intel_oneapi_dir: "{{ intel_oneapi_root_dir }}/{{ oneapi_kit }}" + ansible.builtin.include_tasks: install_kit.yml + loop: "{{ intel_oneapi | default({}) | dict2items | rejectattr('value', 'false') | map(attribute='key') | list }}" + loop_control: + loop_var: "oneapi_kit" diff --git a/roles/intel_oneapi_install/tasks/preflight.yml b/roles/intel_oneapi_install/tasks/preflight.yml new file mode 100644 index 00000000..657177c1 --- /dev/null +++ b/roles/intel_oneapi_install/tasks/preflight.yml @@ -0,0 +1,35 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: load enabled oneAPI kits + ansible.builtin.set_fact: + enabled_kits: "{{ intel_oneapi | default({}) | dict2items | rejectattr('value', 'false') | map(attribute='key') | list }}" + +# STORY basekit must be enabled if at least one other kit is enabled +- name: Check oneAPI basekit is enabled + ansible.builtin.assert: + that: default_intel_oneapi_kit in enabled_kits + fail_msg: + Intel oneAPI basekit not enabled. + Intel oneAPI basekit must be enabled when at least one other Intel oneAPI kit is enabled. + +# STORY all enabled kits must be supported by role +- name: Check all defined kits are supported + ansible.builtin.assert: + that: enabled_kits | difference(supported_intel_oneapi_kits) | length == 0 + fail_msg: > + There are intel oneAPI kits enabled in group_vars that are not supported in RA deployment. + Please check roles/intel_oneapi_install/defaults/main.yml:supported_intel_oneapi_kits for a list of supported kits. diff --git a/roles/intel_oneapi_install/tasks/remove_kit.yml b/roles/intel_oneapi_install/tasks/remove_kit.yml new file mode 100644 index 00000000..05363298 --- /dev/null +++ b/roles/intel_oneapi_install/tasks/remove_kit.yml @@ -0,0 +1,31 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +- name: Check if Intel oneAPI directory present + ansible.builtin.stat: + path: "{{ intel_oneapi_dir }}/intel-oneapi-{{ oneapi_kit }}-offline.sh" + register: oneapi_dir + +- name: remove Intel oneAPI kit - {{ oneapi_kit }} + vars: + oneapi_cmd: "{{ intel_oneapi_dir }}/intel-oneapi-{{ oneapi_kit }}-offline.sh -a --silent --action remove" + ansible.builtin.command: + cmd: "sh {{ oneapi_cmd }}" + register: oneapi_remove + changed_when: true + failed_when: + - oneapi_remove.rc != 0 + - "'it is not installed' not in oneapi_remove.stdout" + when: oneapi_dir.stat.exists diff --git a/roles/intel_oneapi_install/vars/main.yml b/roles/intel_oneapi_install/vars/main.yml new file mode 100644 index 00000000..27b42596 --- /dev/null +++ b/roles/intel_oneapi_install/vars/main.yml @@ -0,0 +1,17 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +intel_oneapi_root_dir: "{{ (project_root_dir, 'intel_oneapi') | path_join }}" +intel_oneapi_install_dir: "{{ (project_root_dir, 'intel_oneapi', 'install_dir') | path_join }}" diff --git a/roles/intel_power_manager/defaults/main.yml b/roles/intel_power_manager/defaults/main.yml index 8deb404b..b982bdd1 100644 --- a/roles/intel_power_manager/defaults/main.yml +++ b/roles/intel_power_manager/defaults/main.yml @@ -15,12 +15,10 @@ ## --- intel_power_manager_git_url: "https://github.com/intel/kubernetes-power-manager.git" -intel_power_manager_git_ref: "v1.0.2" # project is consistent with git ref and image version +intel_power_manager_git_ref: "v2.2.0" # project is consistent with git ref and image version intel_power_manager_dir: "{{ (project_root_dir, 'intel-power-manager') | path_join }}" intel_power_manager_namespace: "intel-power" - -intel_appqos_git_url: "https://github.com/intel/intel-cmt-cat.git" -intel_appqos_git_ref: "v4.4.1" -intel_appqos_version: "v4.4.1" -intel_appqos_dir: "{{ (project_root_dir, 'intel-appqos') | path_join }}" -intel_appqos_cert_dir: "/etc/certs/public" +intel_power_operator_image: "docker.io/intel/power-operator" +intel_power_operator_image_local: "{{ registry_local_address }}/intel-power-operator" +intel_power_node_agent_image: "intel/power-node-agent" +intel_power_node_agent_image_local: "{{ registry_local_address }}/intel-power-node-agent" diff --git a/roles/intel_power_manager/files/appqos.conf b/roles/intel_power_manager/files/appqos.conf deleted file mode 100644 index 0aea96db..00000000 --- a/roles/intel_power_manager/files/appqos.conf +++ /dev/null @@ -1,8 +0,0 @@ -{ - "apps": [], - "sstbf": { - "configured": false - }, - "pools": [], - "power_profiles_expert_mode": true -} diff --git a/roles/intel_power_manager/tasks/app_qos.yml b/roles/intel_power_manager/tasks/app_qos.yml deleted file mode 100644 index c12f2864..00000000 --- a/roles/intel_power_manager/tasks/app_qos.yml +++ /dev/null @@ -1,100 +0,0 @@ -## -## Copyright (c) 2020-2023 Intel Corporation. -## -## Licensed under the Apache License, Version 2.0 (the "License"); -## you may not use this file except in compliance with the License. -## You may obtain a copy of the License at -## -## http://www.apache.org/licenses/LICENSE-2.0 -## -## Unless required by applicable law or agreed to in writing, software -## distributed under the License is distributed on an "AS IS" BASIS, -## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -## See the License for the specific language governing permissions and -## limitations under the License. -## ---- -- name: clone Intel CMT CAT repository - ansible.builtin.git: - repo: "{{ intel_appqos_git_url }}" - version: "{{ intel_appqos_git_ref }}" - dest: "{{ intel_appqos_dir }}" - force: yes - when: inventory_hostname in groups['kube_node'] - -# NOTE(pklimowx): since AppQoS image is not available on docker hub -# and public images of the Power Manager use `appqos:latest` image, -# we have to build AppQoS image on each node, and push it to localregistry -# only once. -# -# docker runtime is in use -- name: prepare image for Application Quality of Service - block: - - name: build image of App QoS - ansible.builtin.command: docker build --no-cache -t appqos -f Dockerfile ../../ - changed_when: true - args: - chdir: "{{ (intel_appqos_dir, 'appqos', 'docker') | path_join }}" - - - name: tag App QoS image - ansible.builtin.command: docker tag appqos:latest {{ registry_local_address }}/appqos:{{ intel_appqos_version }} - changed_when: true - when: inventory_hostname == groups['kube_node'][0] - - - name: push App QoS image to local registry - ansible.builtin.command: docker push {{ registry_local_address }}/appqos:{{ intel_appqos_version }} - changed_when: true - when: inventory_hostname == groups['kube_node'][0] - when: - - container_runtime == "docker" - - inventory_hostname in groups['kube_node'] - -# crio/containerd runtime is in use -- name: prepare image for Application Quality of Service - block: - - name: build and tag App QoS image - ansible.builtin.command: podman build -f Dockerfile -t {{ registry_local_address }}/appqos:{{ intel_appqos_version }} ../../ - changed_when: true - args: - chdir: "{{ (intel_appqos_dir, 'appqos', 'docker') | path_join }}" - - - name: push App QoS image to local registry - ansible.builtin.command: podman push {{ registry_local_address }}/appqos:{{ intel_appqos_version }} - changed_when: true - when: inventory_hostname == groups['kube_node'][0] - when: - - container_runtime in ["crio", "containerd"] - - inventory_hostname in groups['kube_node'] - -- name: generate App QoS certificates - block: - - name: create directory for App QoS certs - ansible.builtin.file: - state: directory - path: "{{ intel_appqos_cert_dir }}" - owner: "{{ ansible_user | default(ansible_user_id) }}" - group: "{{ ansible_user | default(ansible_user_id) }}" - mode: 0755 - recurse: yes - - - name: generate certificates - ansible.builtin.command: "{{ item }}" - args: - chdir: "{{ intel_appqos_cert_dir }}" - changed_when: true - with_items: - - openssl req -nodes -x509 -newkey rsa:4096 -keyout ca.key -out ca.crt -days 365 -subj "/O=AppQoS/OU=root/CN=localhost" - - openssl req -nodes -newkey rsa:3072 -keyout appqos.key -out appqos.csr -subj "/O=AppQoS/OU=AppQoS Server/CN=localhost" - - openssl x509 -req -in appqos.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out appqos.crt - - - name: copy example App QoS config to /etc/certs/public - ansible.builtin.copy: - src: appqos.conf - dest: "{{ intel_appqos_cert_dir }}" - owner: "{{ ansible_user | default(ansible_user_id) }}" - group: "{{ ansible_user | default(ansible_user_id) }}" - mode: 0644 - -- name: set facts for Intel App QoS templates - ansible.builtin.set_fact: - app_qos_image: "{{ registry_local_address }}/appqos:{{ intel_appqos_version }}" diff --git a/roles/intel_power_manager/tasks/deploy_features.yml b/roles/intel_power_manager/tasks/deploy_features.yml new file mode 100644 index 00000000..82c905e4 --- /dev/null +++ b/roles/intel_power_manager/tasks/deploy_features.yml @@ -0,0 +1,63 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## + +# Uncore Frequency +- name: prepare and deploy Uncore Frequency + when: hostvars[power_node]['uncore_frequency']['enabled'] | default(false) | bool + block: + - name: prepare required varibles to deploy Uncore Frequency + ansible.builtin.set_fact: + system_max_frequency: "{{ hostvars[power_node]['uncore_frequency']['system_max_frequency'] }}" + system_min_frequency: "{{ hostvars[power_node]['uncore_frequency']['system_min_frequency'] }}" + die_selector: "{{ hostvars[power_node]['uncore_frequency']['die_selector'] }}" + when: inventory_hostname == groups['kube_control_plane'][0] + + - name: populate Uncore Frequency template + ansible.builtin.template: + src: uncore_frequency.yaml.j2 + dest: "{{ (intel_power_manager_dir, 'uncore_frequency_' + power_node + '.yaml') | path_join }}" + force: yes + mode: preserve + when: inventory_hostname == groups['kube_control_plane'][0] + + - name: apply Uncore Frequency + kubernetes.core.k8s: + state: present + src: "{{ (intel_power_manager_dir, 'uncore_frequency_' + power_node + '.yaml') | path_join }}" + when: inventory_hostname == groups['kube_control_plane'][0] + +# C-States +- name: prepare and deploy C-States + when: hostvars[power_node]['cstates']['enabled'] + block: + - name: prepare required varibles to deploy C-States + ansible.builtin.set_fact: + cstates: "{{ hostvars[power_node]['cstates'] }}" + when: inventory_hostname == groups['kube_control_plane'][0] + + - name: populate C-States template + ansible.builtin.template: + src: cstates.yaml.j2 + dest: "{{ (intel_power_manager_dir, 'cstates_' + power_node + '.yaml') | path_join }}" + force: yes + mode: preserve + when: inventory_hostname == groups['kube_control_plane'][0] + + - name: apply C-States + kubernetes.core.k8s: + state: present + src: "{{ (intel_power_manager_dir, 'cstates_' + power_node + '.yaml') | path_join }}" + when: inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/intel_power_manager/tasks/deploy_sample_pods.yml b/roles/intel_power_manager/tasks/deploy_sample_pods.yml index ae771d63..79ede989 100644 --- a/roles/intel_power_manager/tasks/deploy_sample_pods.yml +++ b/roles/intel_power_manager/tasks/deploy_sample_pods.yml @@ -19,22 +19,21 @@ state: directory mode: 0755 path: "{{ (intel_power_manager_dir, 'sample_power_pods') | path_join }}" + when: inventory_hostname == groups['kube_control_plane'][0] -# NOTE(pklimowx): # this task will generate yaml files for each PowerProfile from -# intel_power_manager.power_profiles list for each node in -# intel_power_manager.power_nodes list +# power_profiles list for each node in intel_power_manager.power_nodes list - name: generate templates for each available profile for the node ansible.builtin.include_tasks: power_pod_template_helper.yml - loop: "{{ intel_power_manager.power_profiles }}" + loop: "{{ intel_power_manager.power_nodes }}" loop_control: - loop_var: profile_name + loop_var: power_node - name: get yaml files to deploy ansible.builtin.find: path: "{{ (intel_power_manager_dir, 'sample_power_pods') | path_join }}" file_type: file - patterns: "*.yml" + patterns: "*.yaml" register: sample_pod_files - name: deploy sample power pods @@ -43,3 +42,4 @@ src: "{{ item.path }}" wait: true loop: "{{ sample_pod_files.files }}" + when: inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/intel_power_manager/tasks/deploy_shared_resources.yml b/roles/intel_power_manager/tasks/deploy_shared_resources.yml index 8ae91b86..b2bc16ef 100644 --- a/roles/intel_power_manager/tasks/deploy_shared_resources.yml +++ b/roles/intel_power_manager/tasks/deploy_shared_resources.yml @@ -16,7 +16,7 @@ --- - name: prepare and deploy node-specific Shared Power Profiles block: - - name: make sure that direcotry for node-specific Shared Power Profiles exists + - name: make sure that directory for node-specific Shared Power Profiles exists ansible.builtin.file: state: directory path: "{{ (intel_power_manager_dir, 'local_shared_power_profiles') | path_join }}" @@ -25,25 +25,26 @@ - name: obtain variables needed for deployment of node-specific Shared Power Profile ansible.builtin.set_fact: node_name: "{{ node_name }}" - max_frequency: "{{ hostvars[node_name]['local_shared_profile']['node_max_shared_frequency'] }}" - min_frequency: "{{ hostvars[node_name]['local_shared_profile']['node_min_shared_frequency'] }}" + local_max_frequency: "{{ hostvars[node_name]['local_shared_profile']['local_max_frequency'] }}" + local_min_frequency: "{{ hostvars[node_name]['local_shared_profile']['local_min_frequency'] }}" + local_pstate_governor: "{{ hostvars[node_name]['local_shared_profile']['local_pstate_governor'] }}" - name: populate template for node-specific Shared Power Profile ansible.builtin.template: - src: local_shared_profile.yml.j2 - dest: "{{ (intel_power_manager_dir, 'local_shared_power_profiles', node_name + '_local_shared_profile.yml') | path_join }}" + src: local_shared_profile.yaml.j2 + dest: "{{ (intel_power_manager_dir, 'local_shared_power_profiles', node_name + '_local_shared_profile.yaml') | path_join }}" mode: preserve force: yes - name: deploy node-specific Shared Power Profile kubernetes.core.k8s: state: present - src: "{{ (intel_power_manager_dir, 'local_shared_power_profiles', node_name + '_local_shared_profile.yml') | path_join }}" + src: "{{ (intel_power_manager_dir, 'local_shared_power_profiles', node_name + '_local_shared_profile.yaml') | path_join }}" when: hostvars[node_name]['local_shared_profile']['enabled'] - name: prepare and deploy node-specific Shared Power Workload block: - - name: make sure that direcotry for node-specific Shared Power Workloads exists + - name: make sure that directory for node-specific Shared Power Workloads exists ansible.builtin.file: state: directory path: "{{ (intel_power_manager_dir, 'shared_power_workloads') | path_join }}" @@ -57,13 +58,13 @@ - name: populate template for Shared Power Workload ansible.builtin.template: - src: shared_workload.yml.j2 - dest: "{{ (intel_power_manager_dir, 'shared_power_workloads', node_name + '_shared_workload.yml') | path_join }}" + src: shared_workload.yaml.j2 + dest: "{{ (intel_power_manager_dir, 'shared_power_workloads', node_name + '_shared_workload.yaml') | path_join }}" mode: preserve force: yes - - name: deploy node-specific Shared Power Profile + - name: deploy node-specific Shared Power Workload kubernetes.core.k8s: state: present - src: "{{ (intel_power_manager_dir, 'shared_power_workloads', node_name + '_shared_workload.yml') | path_join }}" + src: "{{ (intel_power_manager_dir, 'shared_power_workloads', node_name + '_shared_workload.yaml') | path_join }}" when: hostvars[node_name]['shared_workload']['enabled'] diff --git a/roles/intel_power_manager/tasks/enable_drivers.yml b/roles/intel_power_manager/tasks/enable_drivers.yml new file mode 100644 index 00000000..2a486187 --- /dev/null +++ b/roles/intel_power_manager/tasks/enable_drivers.yml @@ -0,0 +1,29 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +- name: Uncore Frequency driver + when: hostvars[power_node]['uncore_frequency']['enabled'] | default(false) | bool + block: + # Ubuntu only + - name: obtain mandatory package for Uncore Frequency driver + ansible.builtin.apt: + name: "linux-generic-hwe-{{ ansible_distribution_version.split('.')[:2]|join('.') }}" + state: present + when: ansible_distribution == "Ubuntu" + + - name: enable Uncore Frequency driver + community.general.modprobe: + name: intel_uncore_frequency + state: present diff --git a/roles/intel_power_manager/tasks/main.yml b/roles/intel_power_manager/tasks/main.yml index 2bf21ea8..024bc77a 100644 --- a/roles/intel_power_manager/tasks/main.yml +++ b/roles/intel_power_manager/tasks/main.yml @@ -23,8 +23,11 @@ loop: "{{ intel_power_manager.power_nodes }}" when: inventory_hostname == groups['kube_control_plane'][0] -- name: prepare App QoS - ansible.builtin.include_tasks: app_qos.yml +- name: enable power manager drivers + ansible.builtin.include_tasks: enable_drivers.yml + loop: "{{ intel_power_manager.power_nodes }}" + loop_control: + loop_var: power_node - name: prepare Intel Kubernetes Power Manager ansible.builtin.include_tasks: power_manager.yml @@ -35,19 +38,25 @@ - intel_power_manager.deploy_example_pods - inventory_hostname == groups['kube_control_plane'][0] +- name: deploy power manager features + ansible.builtin.include_tasks: deploy_features.yml + loop: "{{ intel_power_manager.power_nodes }}" + loop_control: + loop_var: power_node + # The Shared Profiles and Workloads deployment starts here - name: prepare and deploy Global Shared Power Profile block: - name: populate Global Shared Profile template to the controller node ansible.builtin.template: - src: global_shared_profile.yml.j2 - dest: "{{ (intel_power_manager_dir, 'global_shared_profile.yml') | path_join }}" + src: global_shared_profile.yaml.j2 + dest: "{{ (intel_power_manager_dir, 'global_shared_profile.yaml') | path_join }}" force: yes mode: preserve - name: deploy Global Shared Profile kubernetes.core.k8s: - src: "{{ (intel_power_manager_dir, 'global_shared_profile.yml') | path_join }}" + src: "{{ (intel_power_manager_dir, 'global_shared_profile.yaml') | path_join }}" state: present when: - intel_power_manager.global_shared_profile_enabled diff --git a/roles/intel_power_manager/tasks/power_manager.yml b/roles/intel_power_manager/tasks/power_manager.yml index ac95b47a..0c288524 100644 --- a/roles/intel_power_manager/tasks/power_manager.yml +++ b/roles/intel_power_manager/tasks/power_manager.yml @@ -14,49 +14,6 @@ ## limitations under the License. ## --- -# workaround to be removed once component will be updated to version, which supports higher golang version -# Required golang version here is 1.17.8 -- name: check current golang version - ansible.builtin.shell: "set -o pipefail && /usr/local/go/bin/go version|sed -e 's/go version go//g'|cut -d' ' -f1" - args: - executable: /bin/bash - failed_when: false - changed_when: false - register: go_version - -- name: set intel_power_manager - 'ipm' go version - ansible.builtin.set_fact: - ipm_go_version: 1.17.8 - -- name: install ipm specific golang version - when: go_version.stdout > ipm_go_version - block: - - name: workaround for old golang version needed for IPM - vars: - golang_version: "{{ ipm_go_version }}" - golang_download_checksum: "sha256:980e65a863377e69fd9b67df9d8395fd8e93858e7a24c9f55803421e453f4f99" - additional_go_version: "go_{{ ipm_go_version }}" - ansible.builtin.include_role: - name: bootstrap/golang_install # noqa role-name[path] - role in bootstrap - - - name: set gopath for ipm go version - ansible.builtin.set_fact: - ipm_go_root_path: "/usr/local/go_{{ ipm_go_version }}/go" - ipm_go_path: "/root/go_{{ ipm_go_version }}/go" - -- name: set gopath for ipm go version - ansible.builtin.set_fact: - ipm_go_root_path: "/usr/local/go" - ipm_go_path: "/root/go" - when: go_version.stdout == ipm_go_version - -- name: show remote PATH variable - ansible.builtin.command: echo $PATH - changed_when: false - register: remote_path - -# NOTE(pklimowx): repo must be cloned to the controller and, if we want to build -# images locally, to the first node as well. - name: clone Intel Kubernetes Power Manager repository ansible.builtin.git: repo: "{{ intel_power_manager_git_url }}" @@ -64,51 +21,54 @@ dest: "{{ intel_power_manager_dir }}" force: yes when: - - inventory_hostname == groups['kube_control_plane'][0] or - (inventory_hostname == groups['kube_node'][0] and intel_power_manager.build_image_locally | default(false) | bool) - -- name: set facts for Intel Kubernetes Power Manager templates - ansible.builtin.set_fact: - power_operator_image: "{{ registry_local_address }}/intel-power-operator" - power_operator_image_version: "{{ intel_power_manager_git_ref }}" - node_agent_image: "{{ registry_local_address }}/intel-power-node-agent" - node_agent_image_version: "{{ intel_power_manager_git_ref }}" - when: - - intel_power_manager.build_image_locally | default(false) | bool + - inventory_hostname == groups['kube_control_plane'][0] # NOTE(pklimowx): node-agent DS is deployed automatically via Power Manager after providing # PowerProfile. The yaml file needs to be patched before building image to provide correct source for it. -# Both images depend on intel_power_manager* variable as there is no public image for AppQoS -- name: patch Node Agent DaemonSet yaml +- name: patch image to use local registry + ansible.builtin.lineinfile: + path: "{{ intel_power_manager_dir }}/build/manifests/power-node-agent-ds.yaml" + regexp: "^.*image: {{ intel_power_node_agent_image }}:{{ intel_power_manager_git_ref }}" + line: " - image: {{ intel_power_node_agent_image_local }}:{{ intel_power_manager_git_ref }}" when: - intel_power_manager.build_image_locally | default(false) | bool - - inventory_hostname == groups['kube_node'][0] - block: - - name: use node-agent image from local registry - ansible.builtin.lineinfile: - path: "{{ intel_power_manager_dir }}/build/manifests/power-node-agent-ds.yaml" - regexp: "^ - image: intel/power-node-agent:{{ intel_power_manager_git_ref }}" - line: " - image: {{ node_agent_image }}:{{ node_agent_image_version }}" - - - name: use appqos image from local registry - ansible.builtin.lineinfile: - path: "{{ intel_power_manager_dir }}/build/manifests/power-node-agent-ds.yaml" - regexp: "^ - image: 'appqos:latest'" - line: " - image: {{ app_qos_image }}" + - inventory_hostname == groups['kube_control_plane'][0] + +- name: count cpu quota + set_fact: + cpu_quota: "{{ 200 + ( 200 * multiplier | float ) | int | abs }}" + when: inventory_hostname == groups['kube_control_plane'][0] + +- name: patch cpu quota + ansible.builtin.lineinfile: + path: "{{ intel_power_manager_dir }}/build/manifests/power-node-agent-ds.yaml" + regexp: "^.*cpu: 100m" + line: " cpu: {{ cpu_quota }}m" + loop: [1, 2] + when: inventory_hostname == groups['kube_control_plane'][0] + +- name: count memory quota + set_fact: + memory_quota: "{{ 300 + ( 300 * multiplier | float ) | int | abs }}" + when: inventory_hostname == groups['kube_control_plane'][0] + +- name: patch memory quota + ansible.builtin.lineinfile: + path: "{{ intel_power_manager_dir }}/build/manifests/power-node-agent-ds.yaml" + regexp: "^.*memory: 64Mi" + line: " memory: {{ memory_quota }}Mi" + loop: [1, 2] + when: inventory_hostname == groups['kube_control_plane'][0] # docker runtime is in use - name: prepare images for Intel Kubernetes Power Manager when: - container_runtime == "docker" - intel_power_manager.build_image_locally | default(false) | bool - - inventory_hostname == groups['kube_node'][0] - environment: - GOROOT: "{{ ipm_go_root_path }}" - GOPATH: "{{ ipm_go_path }}" - PATH: "{{ ipm_go_root_path }}/bin:{{ remote_path.stdout }}" + - inventory_hostname == groups['kube_control_plane'][0] block: - name: build images for Intel Kubernetes Power Manager - ansible.builtin.command: docker build -f build/{{ item.file }} -t {{ item.name }}:latest . + ansible.builtin.command: docker build -f build/{{ item.file }} -t {{ registry_local_address }}/{{ item.name }}:{{ intel_power_manager_git_ref }} . changed_when: true args: chdir: "{{ intel_power_manager_dir }}" @@ -116,13 +76,6 @@ - {file: Dockerfile, name: intel-power-operator} - {file: Dockerfile.nodeagent, name: intel-power-node-agent} - - name: tag Intel Kubernetes Power Manager images - ansible.builtin.command: docker tag {{ item }}:latest {{ registry_local_address }}/{{ item }}:{{ intel_power_manager_git_ref }} - changed_when: true - with_items: - - intel-power-operator - - intel-power-node-agent - - name: push Intel Kubernetes Power Manager images to local registry ansible.builtin.command: docker push {{ registry_local_address }}/{{ item }}:{{ intel_power_manager_git_ref }} changed_when: true @@ -135,11 +88,7 @@ when: - container_runtime in ["crio", "containerd"] - intel_power_manager.build_image_locally | default(false) | bool - - inventory_hostname == groups['kube_node'][0] - environment: - GOROOT: "{{ ipm_go_root_path }}" - GOPATH: "{{ ipm_go_path }}" - PATH: "{{ ipm_go_root_path }}/bin:{{ remote_path.stdout }}" + - inventory_hostname == groups['kube_control_plane'][0] block: - name: build and tag images for Intel Kubernetes Power Manager ansible.builtin.command: podman build -f build/{{ item.file }} -t {{ registry_local_address }}/{{ item.name }}:{{ intel_power_manager_git_ref }} . @@ -159,10 +108,6 @@ - name: prepare and deploy Intel Power Manager when: inventory_hostname == groups['kube_control_plane'][0] - environment: - GOROOT: "{{ ipm_go_root_path }}" - GOPATH: "{{ ipm_go_path }}" - PATH: "{{ ipm_go_root_path }}/bin:{{ remote_path.stdout }}" block: - name: create Intel Power Manager namespace kubernetes.core.k8s: @@ -179,21 +124,28 @@ state: present src: "{{ (intel_power_manager_dir, 'config', 'rbac', 'rbac.yaml') | path_join }}" + # WA: go mod tidy is needed, until upstream issue is fixed. + - name: run go mod tidy + ansible.builtin.command: "go mod tidy -v" + args: + chdir: "{{ intel_power_manager_dir }}" + changed_when: true + - name: create and install Intel Power Manager CRDs community.general.make: chdir: "{{ intel_power_manager_dir }}" - name: populate Intel Kubernetes Power Manager Controller Manager template ansible.builtin.template: - src: controller_manager.yml.j2 - dest: "{{ (intel_power_manager_dir, 'controller_manager.yml') | path_join }}" + src: controller_manager.yaml.j2 + dest: "{{ (intel_power_manager_dir, 'controller_manager.yaml') | path_join }}" force: yes mode: preserve - name: deploy Intel Kubernetes Power Manager Controller Manager kubernetes.core.k8s: state: present - src: "{{ (intel_power_manager_dir, 'controller_manager.yml') | path_join }}" + src: "{{ (intel_power_manager_dir, 'controller_manager.yaml') | path_join }}" - name: wait for Power Manager to be up and running kubernetes.core.k8s_info: @@ -206,17 +158,23 @@ reason: MinimumReplicasAvailable wait_timeout: 300 + - name: combine power profiles from each power node + set_fact: + combined_profiles: "{{ combined_profiles + hostvars[item]['power_profiles'] }}" + loop: "{{ intel_power_manager.power_nodes }}" + when: inventory_hostname == groups['kube_control_plane'][0] + - name: populate Power Config template ansible.builtin.template: - src: power_config.yml.j2 - dest: "{{ (intel_power_manager_dir, 'power_config.yml') | path_join }}" + src: power_config.yaml.j2 + dest: "{{ (intel_power_manager_dir, 'power_config.yaml') | path_join }}" force: yes mode: preserve - name: apply Power Config kubernetes.core.k8s: state: present - src: "{{ (intel_power_manager_dir, 'power_config.yml') | path_join }}" + src: "{{ (intel_power_manager_dir, 'power_config.yaml') | path_join }}" - name: check that all pods are running ansible.builtin.include_role: diff --git a/roles/intel_power_manager/tasks/power_pod_template_helper.yml b/roles/intel_power_manager/tasks/power_pod_template_helper.yml index d454e0ed..85eb6b74 100644 --- a/roles/intel_power_manager/tasks/power_pod_template_helper.yml +++ b/roles/intel_power_manager/tasks/power_pod_template_helper.yml @@ -16,10 +16,12 @@ --- - name: populate sample power pods templates ansible.builtin.template: - src: "sample_power_pod.yml.j2" - dest: "{{ (intel_power_manager_dir, 'sample_power_pods', profile_name + '_power_pod.yml') | path_join }}" + src: "sample_power_pod.yaml.j2" + dest: "{{ (intel_power_manager_dir, 'sample_power_pods', power_profile_name + '_power_pod_' + power_node + '.yaml') | path_join }}" force: yes mode: preserve - loop: "{{ intel_power_manager.power_nodes }}" + loop: "{{ hostvars[power_node]['power_profiles'] }}" loop_control: - loop_var: node_name + loop_var: + power_profile_name + when: inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/intel_power_manager/templates/controller_manager.yaml.j2 b/roles/intel_power_manager/templates/controller_manager.yaml.j2 new file mode 100644 index 00000000..e9674497 --- /dev/null +++ b/roles/intel_power_manager/templates/controller_manager.yaml.j2 @@ -0,0 +1,52 @@ +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: controller-manager + namespace: {{ intel_power_manager_namespace }} + labels: + control-plane: controller-manager +spec: + selector: + matchLabels: + control-plane: controller-manager + replicas: 1 + template: + metadata: + labels: + control-plane: controller-manager + spec: + serviceAccountName: intel-power-operator + containers: + - command: + - /manager + args: + - --enable-leader-election + imagePullPolicy: IfNotPresent + {% if intel_power_manager.build_image_locally -%} + image: {{ intel_power_operator_image_local }}:{{ intel_power_manager_git_ref }} + {% else -%} + image: {{ intel_power_operator_image }}:{{ intel_power_manager_git_ref }} + {% endif -%} + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: ["ALL"] + name: manager + resources: + limits: + cpu: "{{ ( 100 + ( 100 * multiplier | float ) | int | abs ) }}m" + memory: "{{ ( 30 + ( 30 * multiplier | float ) | int | abs ) }}Mi" + requests: + cpu: "{{ ( 100 + ( 100 * multiplier | float ) | int | abs ) }}m" + memory: "{{ ( 30 + ( 30 * multiplier | float ) | int | abs ) }}Mi" + volumeMounts: + - mountPath: /sys/fs + name: cgroup + mountPropagation: HostToContainer + readOnly: true + terminationGracePeriodSeconds: 10 + volumes: + - name: cgroup + hostPath: + path: /sys/fs diff --git a/roles/intel_power_manager/templates/controller_manager.yml.j2 b/roles/intel_power_manager/templates/controller_manager.yml.j2 deleted file mode 100644 index 43132b0e..00000000 --- a/roles/intel_power_manager/templates/controller_manager.yml.j2 +++ /dev/null @@ -1,48 +0,0 @@ ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: controller-manager - namespace: {{ intel_power_manager_namespace }} - labels: - control-plane: controller-manager -spec: - selector: - matchLabels: - control-plane: controller-manager - replicas: 1 - template: - metadata: - labels: - control-plane: controller-manager - spec: - serviceAccountName: intel-power-operator - containers: - - command: - - /manager - args: - - --enable-leader-election - imagePullPolicy: IfNotPresent - image: {{ power_operator_image | default("docker.io/intel/power-operator") }}:{{ power_operator_image_version | default("v1.0.2") }} - securityContext: - allowPrivilegeEscalation: false - capabilities: - drop: ["ALL"] - name: manager - resources: - limits: - cpu: 100m - memory: 30Mi - requests: - cpu: 100m - memory: 20Mi - volumeMounts: - - mountPath: /sys/fs - name: cgroup - mountPropagation: HostToContainer - readOnly: true - terminationGracePeriodSeconds: 10 - volumes: - - name: cgroup - hostPath: - path: /sys/fs diff --git a/roles/intel_power_manager/templates/cstates.yaml.j2 b/roles/intel_power_manager/templates/cstates.yaml.j2 new file mode 100644 index 00000000..17a8b7c6 --- /dev/null +++ b/roles/intel_power_manager/templates/cstates.yaml.j2 @@ -0,0 +1,12 @@ +apiVersion: power.intel.com/v1 +kind: CStates +metadata: + name: {{ power_node }} + namespace: {{ intel_power_manager_namespace }} +spec: + sharedPoolCStates: + {{ cstates.shared }} + exclusivePoolCStates: + {{ cstates.profile_exclusive }} + individualCoreCStates: + {{ cstates.core }} diff --git a/roles/intel_power_manager/templates/global_shared_profile.yaml.j2 b/roles/intel_power_manager/templates/global_shared_profile.yaml.j2 new file mode 100644 index 00000000..22a8d06a --- /dev/null +++ b/roles/intel_power_manager/templates/global_shared_profile.yaml.j2 @@ -0,0 +1,12 @@ +--- +apiVersion: "power.intel.com/v1" +kind: PowerProfile +metadata: + name: shared-global + namespace: {{ intel_power_manager_namespace }} +spec: + name: "shared-global" + max: {{ intel_power_manager.global_max_frequency }} + min: {{ intel_power_manager.global_min_frequency }} + epp: "power" + governor: {{ intel_power_manager.global_pstate_governor }} diff --git a/roles/intel_power_manager/templates/global_shared_profile.yml.j2 b/roles/intel_power_manager/templates/global_shared_profile.yml.j2 deleted file mode 100644 index d4e08aca..00000000 --- a/roles/intel_power_manager/templates/global_shared_profile.yml.j2 +++ /dev/null @@ -1,11 +0,0 @@ ---- -apiVersion: "power.intel.com/v1alpha1" -kind: PowerProfile -metadata: - name: shared-global - namespace: {{ intel_power_manager_namespace }} -spec: - name: "shared-global" - max: {{ intel_power_manager.max_shared_frequency }} - min: {{ intel_power_manager.min_shared_frequency }} - epp: "power" diff --git a/roles/intel_power_manager/templates/local_shared_profile.yml.j2 b/roles/intel_power_manager/templates/local_shared_profile.yaml.j2 similarity index 54% rename from roles/intel_power_manager/templates/local_shared_profile.yml.j2 rename to roles/intel_power_manager/templates/local_shared_profile.yaml.j2 index fa5a8c5c..26379eb3 100644 --- a/roles/intel_power_manager/templates/local_shared_profile.yml.j2 +++ b/roles/intel_power_manager/templates/local_shared_profile.yaml.j2 @@ -1,11 +1,12 @@ --- -apiVersion: "power.intel.com/v1alpha1" +apiVersion: "power.intel.com/v1" kind: PowerProfile metadata: name: shared-{{ node_name }} namespace: {{ intel_power_manager_namespace }} spec: name: "shared-{{ node_name }}" - max: {{ max_frequency }} - min: {{ min_frequency }} + max: {{ local_max_frequency }} + min: {{ local_min_frequency }} epp: "power" + governor: {{ local_pstate_governor }} diff --git a/roles/intel_power_manager/templates/power_config.yml.j2 b/roles/intel_power_manager/templates/power_config.yaml.j2 similarity index 73% rename from roles/intel_power_manager/templates/power_config.yml.j2 rename to roles/intel_power_manager/templates/power_config.yaml.j2 index c182247a..e05cc073 100644 --- a/roles/intel_power_manager/templates/power_config.yml.j2 +++ b/roles/intel_power_manager/templates/power_config.yaml.j2 @@ -1,11 +1,10 @@ --- -apiVersion: "power.intel.com/v1alpha1" +apiVersion: "power.intel.com/v1" kind: PowerConfig metadata: name: power-config namespace: {{ intel_power_manager_namespace }} spec: - powerImage: {{ app_qos_image }} powerNodeSelector: # Add labels here for the Nodes you want the PowerNodeAgent to be applied to intel.power.node: "true" @@ -13,4 +12,4 @@ spec: # performance # balance-performance # balance-power - powerProfiles: {{ intel_power_manager.power_profiles }} + powerProfiles: {{ combined_profiles }} diff --git a/roles/intel_power_manager/templates/sample_power_pod.yaml.j2 b/roles/intel_power_manager/templates/sample_power_pod.yaml.j2 new file mode 100644 index 00000000..99d573e2 --- /dev/null +++ b/roles/intel_power_manager/templates/sample_power_pod.yaml.j2 @@ -0,0 +1,25 @@ +# Do not change the name of this file +--- +apiVersion: v1 +kind: Pod +metadata: + name: {{ power_profile_name }}-power-pod-{{ power_node }} + namespace: {{ intel_power_manager_namespace }} +spec: + containers: + - name: {{ power_profile_name }}-container + image: busybox + command: ["/bin/sh"] + args: ["-c", "sleep 15000"] + resources: + # IMPORTANT: The amount of the Power Cores have to be the same as the amount of requested CPUs + requests: + memory: "{{ ( 200 + ( 200 * multiplier | float ) | int | abs ) }}Mi" + cpu: "2" + power.intel.com/{{ power_profile_name }}: "2" + limits: + memory: "{{ ( 200 + ( 200 * multiplier | float ) | int | abs ) }}Mi" + cpu: "2" + power.intel.com/{{ power_profile_name }}: "2" + nodeSelector: + kubernetes.io/hostname: {{ power_node }} diff --git a/roles/intel_power_manager/templates/sample_power_pod.yml.j2 b/roles/intel_power_manager/templates/sample_power_pod.yml.j2 deleted file mode 100644 index 8fe8b56c..00000000 --- a/roles/intel_power_manager/templates/sample_power_pod.yml.j2 +++ /dev/null @@ -1,23 +0,0 @@ -# Do not change the name of this file ---- -apiVersion: v1 -kind: Pod -metadata: - name: {{ profile_name }}-power-pod - namespace: {{ intel_power_manager_namespace }} -spec: - containers: - - name: {{ profile_name }}-container - image: busybox - command: ["/bin/sh"] - args: ["-c", "sleep 15000"] - resources: - # IMPORTANT: The amount of the Power Cores have to be the same as the amount of requested CPUs - requests: - memory: "200Mi" - cpu: "2" - power.intel.com/{{ profile_name }}-{{ node_name }}: "2" - limits: - memory: "200Mi" - cpu: "2" - power.intel.com/{{ profile_name }}-{{ node_name }}: "2" diff --git a/roles/intel_power_manager/templates/shared_workload.yml.j2 b/roles/intel_power_manager/templates/shared_workload.yaml.j2 similarity index 92% rename from roles/intel_power_manager/templates/shared_workload.yml.j2 rename to roles/intel_power_manager/templates/shared_workload.yaml.j2 index 2293a4d1..8ef54dd6 100644 --- a/roles/intel_power_manager/templates/shared_workload.yml.j2 +++ b/roles/intel_power_manager/templates/shared_workload.yaml.j2 @@ -1,5 +1,5 @@ --- -apiVersion: "power.intel.com/v1alpha1" +apiVersion: "power.intel.com/v1" kind: PowerWorkload metadata: name: shared-{{ node_name }}-workload diff --git a/roles/intel_power_manager/templates/uncore_frequency.yaml.j2 b/roles/intel_power_manager/templates/uncore_frequency.yaml.j2 new file mode 100644 index 00000000..e659813a --- /dev/null +++ b/roles/intel_power_manager/templates/uncore_frequency.yaml.j2 @@ -0,0 +1,9 @@ +apiVersion: power.intel.com/v1 +kind: Uncore +metadata: + name: {{ power_node }} + namespace: {{ intel_power_manager_namespace }} +spec: + sysMax: {{ system_max_frequency }} + sysMin: {{ system_min_frequency }} + dieSelector: {{ die_selector }} diff --git a/roles/intel_power_manager/vars/main.yml b/roles/intel_power_manager/vars/main.yml index 002148f3..b67da666 100644 --- a/roles/intel_power_manager/vars/main.yml +++ b/roles/intel_power_manager/vars/main.yml @@ -21,3 +21,9 @@ install_dependencies: RedHat: - git - make + +# need union of all profiles for power config +combined_profiles: [] + +# used in calculating Cpu or Mem quotas +multiplier: '{{ [intel_power_manager.power_nodes | length / 20, 1.0] | min }}' # 20+ nodes will double the max basic value diff --git a/roles/intel_sriov_fec_operator/tasks/preflight_sriov_fec_operator.yml b/roles/intel_sriov_fec_operator/tasks/preflight_sriov_fec_operator.yml index 9aeab9ad..4d663c04 100644 --- a/roles/intel_sriov_fec_operator/tasks/preflight_sriov_fec_operator.yml +++ b/roles/intel_sriov_fec_operator/tasks/preflight_sriov_fec_operator.yml @@ -28,8 +28,8 @@ - name: SRIOV-FEC Operator - check distro ansible.builtin.assert: - that: ansible_distribution_version == "22.04" or ansible_distribution_version == "8.6" - fail_msg: "Deploying Intel SR-IOV FEC Operator is supported only on Ubuntu 22.04 or RHEL 8.6. Please change the o/s or correct group_vars configuration" # noqa yaml[line-length] + that: ansible_distribution_version == "22.04" or ansible_distribution_version == "9.2" + fail_msg: "Deploying Intel SR-IOV FEC Operator is supported only on Ubuntu 22.04 or RHEL 9.2. Please change the o/s or correct group_vars configuration" # noqa yaml[line-length] success_msg: "Assertion passed. Intel SR-IOV FEC Operator is supported and can be deployed on '{{ ansible_distribution }}' distro" - name: SRIOV-FEC Operator - check h/w acc @@ -45,10 +45,22 @@ - name: FEC Operator - check runtime ansible.builtin.assert: - that: container_runtime == 'docker' - fail_msg: "Deploying Intel SR-IOV FEC Operator is supported only for docker runtime. Please correct the group_vars configuration" + that: container_runtime in ['docker', 'containerd'] + fail_msg: "Deploying Intel SR-IOV FEC Operator is supported only for docker/containerd runtime. Please correct the group_vars configuration" success_msg: "Assertion passed. Intel SR-IOV FEC Operator is supported and can be deployed on '{{ container_runtime }}' runtime" + - name: SRIOV-FEC Operator - check Red Hat Login Account for containerd runtime + assert: + that: + - redhat_user is defined + - redhat_user != "ffffffffffffffffffffffffffffff" + - redhat_password is defined + - redhat_password != "ffffffffffffffffffffffffffffff" + fail_msg: "update Red Hat Account in group_vars, refer to https://access.redhat.com/RegistryAuthentication." + when: + - intel_sriov_fec_operator_enabled | default(false) | bool + - container_runtime == 'containerd' + # TODO # - name: FEC Operator - check Cert Manager is enabled # assert: diff --git a/roles/intel_sriov_fec_operator/tasks/sriov_fec_operator.yml b/roles/intel_sriov_fec_operator/tasks/sriov_fec_operator.yml index d95fb663..72f7424e 100644 --- a/roles/intel_sriov_fec_operator/tasks/sriov_fec_operator.yml +++ b/roles/intel_sriov_fec_operator/tasks/sriov_fec_operator.yml @@ -26,6 +26,24 @@ register: gopath changed_when: false +# /sys is mounted as ro in privileged container(sysfs /sys sysfs ro,nosuid,nodev,noexec,relatime 0 0) under containerd runtime +# which leads to reset FEC PF failure. The issue is reported in https://github.com/containerd/containerd/issues/8445. +# Here we manually mount /sys as WA to solve. +- name: Workaround for /sys mounted as ro in privileged container under containerd runtime + block: + - name: Add /sys to volumeMounts + ansible.builtin.lineinfile: + path: "{{ intel_sriov_fec_operator_dir }}/assets/300-daemon.yaml" + insertafter: "mountPath: /lib/modules" + line: " - name: sys\n mountPath: /sys" + - name: Add /sys to volumes + ansible.builtin.lineinfile: + path: "{{ intel_sriov_fec_operator_dir }}/assets/300-daemon.yaml" + insertafter: "path: /lib/modules" + line: " - name: sys\n hostPath:\n path: /sys" + when: + - container_runtime == "containerd" + # Workaoround to get SRIOV-FEC Operator daemon image build working - name: patch Intel Smart Edge Open (SEO) SRIOV-FEC Operator Dockerfile ansible.builtin.replace: diff --git a/roles/ipu/acc/tasks/main.yml b/roles/ipu/acc/tasks/main.yml new file mode 100644 index 00000000..f9ae5e5a --- /dev/null +++ b/roles/ipu/acc/tasks/main.yml @@ -0,0 +1,21 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: get ACC version + ansible.builtin.command: + cmd: "cat /etc/issue" + changed_when: false + register: acc_version diff --git a/roles/ipu/common/defaults/main.yml b/roles/ipu/common/defaults/main.yml new file mode 100644 index 00000000..673a62aa --- /dev/null +++ b/roles/ipu/common/defaults/main.yml @@ -0,0 +1,40 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +ipu_project_root_dir: "{{ (project_root_dir, 'ipu') | path_join }}" +ipu_tmp_dir: "/tmp/ipu" + +ipu_1gbe_connected_to_linkp: true +ipu_1gbe_link_interface: "eno2" +ipu_1gbe_link_interface_ip: "100.0.0.1/24" + +ipu_flavor: "release" +ipu_build: "ci" +ipu_build_number: "4988" + +# IPU host or IPU linkp based on variable ipu_1gbe_connected_to_linkp +ipu_ssd_image_tarball: "mev-hw-b0-{{ ipu_build }}-ts.{{ ipu_flavor }}.{{ ipu_build_number }}-mev-rl.tgz" + +# IPU linkp +ipu_nvm_image_tarball: "mev-hw-b0-{{ ipu_build }}-ts.{{ ipu_flavor }}.{{ ipu_build_number }}-imc.tgz" +ipu_eth_programmer_version: "2.0.1" +ipu_eth_programmer_zip: "EthProgrammer-{{ ipu_eth_programmer_version }}.zip" + +ssh_options: "-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" +imc_ssh_port: "22" +imc_user: "root" +imc_static_ip: "100.0.0.100" +acc_static_ip: "192.168.0.2" diff --git a/roles/ipu/common/tasks/main.yml b/roles/ipu/common/tasks/main.yml new file mode 100644 index 00000000..e78ae5f8 --- /dev/null +++ b/roles/ipu/common/tasks/main.yml @@ -0,0 +1,55 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: Load IPU common variables + ansible.builtin.debug: + msg: Load IPU common variables + +- name: check supported OS for IPU + ansible.builtin.assert: + that: + - (ansible_distribution == "Rocky" and ansible_distribution_version == "9.1") or + (ansible_distribution == "Fedora") + fail_msg: + - "Current OS - {{ ansible_distribution }} {{ ansible_distribution_version }} - is not supported for IPU" + - "Supported OSes are Rocky 9.1 and Fedora" + +- name: select the right connection host qroup + ansible.builtin.set_fact: + connection_host_group: "{% if ipu_1gbe_connected_to_linkp %}ipu_linkp{% else %}ipu_host{% endif %}" + +- name: setup 1GbE conncetion + when: + - inventory_hostname in groups[connection_host_group] + block: + - name: check IP address + ansible.builtin.shell: "set -o pipefail && ip a show eno2 |grep \"inet \"" + args: + executable: /bin/bash + register: ip_status + changed_when: false + failed_when: '" does not exist." in ip_status.stderr' + + - name: assign IP address to interface for 1GbE link from IPU + ansible.builtin.command: + cmd: "ip a add {{ ipu_1gbe_link_interface_ip }} dev {{ ipu_1gbe_link_interface }}" + when: + - ipu_1gbe_link_interface_ip not in ip_status.stdout + + - name: bring 1GbE link interface up + ansible.builtin.command: + cmd: "ip link set dev {{ ipu_1gbe_link_interface }} up" + changed_when: true diff --git a/roles/ipu/flash_ipu_nvm/defaults/main.yml b/roles/ipu/flash_ipu_nvm/defaults/main.yml new file mode 100644 index 00000000..4b1fee80 --- /dev/null +++ b/roles/ipu/flash_ipu_nvm/defaults/main.yml @@ -0,0 +1,22 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +eth_programmer_dir: "{{ (ipu_project_root_dir, 'net6') | path_join }}" +ipu_nvm_image_file: "{{ (ipu_project_root_dir, 'imc', 'images', 'anvm-image', 'release', 'image_256M', 'anvm-image.bin') | path_join }}" +# USB addresses of USB1 and USB3 to be unbind from ftdi_sio driver +usb_addresses: + - '1-3:1.1' + - '1-3:1.3' diff --git a/roles/ipu/flash_ipu_nvm/tasks/main.yml b/roles/ipu/flash_ipu_nvm/tasks/main.yml new file mode 100644 index 00000000..d055c2b4 --- /dev/null +++ b/roles/ipu/flash_ipu_nvm/tasks/main.yml @@ -0,0 +1,352 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: copy nvm image from ansible host + ansible.builtin.copy: + src: "{{ (ipu_tmp_dir, ipu_nvm_image_tarball) | path_join }}" + dest: "{{ (ipu_project_root_dir, ipu_nvm_image_tarball) | path_join }}" + mode: 0644 + +- name: unarchive nvm image + ansible.builtin.unarchive: + src: "{{ (ipu_project_root_dir, ipu_nvm_image_tarball) | path_join }}" + dest: "{{ ipu_project_root_dir }}" + remote_src: yes + mode: 0755 + +- name: list USB device binding + ansible.builtin.find: + path: "{{ ftdi_sio_driver_dir }}" + file_type: "link" + register: ftdi_sio_out + +- name: unbind USB1 and USB3 from ftdi_sio driver + ansible.builtin.shell: "set -o pipefail && echo '{{ item }}' > /sys/bus/usb/drivers/ftdi_sio/unbind" + args: + executable: /bin/bash + with_items: "{{ usb_addresses }}" + when: ftdi_sio_out.files|selectattr("path", "search", item)|list|length == 1 + +- name: add executable permission for EthProgrammer + ansible.builtin.file: + path: "{{ (eth_programmer_dir, 'EthProgrammer') | path_join }}" + state: touch + mode: a+x + +- name: flash nvm image + ansible.builtin.command: + cmd: "./EthProgrammer --flash {{ ipu_nvm_image_file }} --no-preservation" + chdir: "{{ eth_programmer_dir }}" + become: true + register: flash_nvm_out + changed_when: true + async: 1200 # Maximum allowed timeout in Seconds + poll: 10 # Polling Interval in Seconds + environment: + DOTNET_SYSTEM_GLOBALIZATION_INVARIANT: 1 + +# ipmi ansible module does not support power cycle +- name: power off IPU host + ansible.builtin.command: + cmd: "ipmitool -I lan -H {{ ipmi_ip }} -U {{ ipmi_user }} -P '{{ ipma_password }}' chassis power off" + changed_when: true + +- name: wait before power on again + ansible.builtin.pause: + prompt: "Waiting before power on..." + seconds: 10 + +- name: power on IPU host + ansible.builtin.command: + cmd: "ipmitool -I lan -H {{ ipmi_ip }} -U {{ ipmi_user }} -P '{{ ipma_password }}' chassis power on" + changed_when: true + +- name: wait for ssh connection to IPU host + ansible.builtin.wait_for: + port: 22 + host: "{{ hostvars[groups['ipu_host'][0]]['ansible_default_ipv4']['address'] }}" + search_regex: OpenSSH + delay: 1 + +- name: wait for ssh connection to IPU-IMC + ansible.builtin.wait_for: + port: 22 + host: '{{ imc_static_ip }}' + search_regex: OpenSSH + delay: 1 + +- name: get IMC hostname + ansible.builtin.command: + cmd: "ssh {{ ssh_options }} root@{{ imc_static_ip }} hostname" + changed_when: false + register: imc_hostname + +- name: Update /etc/hosts with ipu_linkp + ansible.builtin.blockinfile: + path: /etc/hosts + block: | + {{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} {{ inventory_hostname }} + marker: "# {mark} ANSIBLE MANAGED BLOCK {{ inventory_hostname }}" + delegate_to: localhost + become: false + +- name: Set login_user fact + ansible.builtin.set_fact: + login_user: "{{ ansible_env.SUDO_USER | default(ansible_env.USER, True) }}" + +- name: Set local_login_user fact + ansible.builtin.set_fact: + local_login_user: "{{ lookup('ansible.builtin.env', 'SUDO_USER')|default( lookup('ansible.builtin.env', 'USER'), True) }}" + +- name: Get login_user home dir + ansible.builtin.getent: + database: passwd + key: "{{ login_user }}" + +- name: Set login_user_dir fact + ansible.builtin.set_fact: + login_user_dir: "{{ ansible_facts.getent_passwd[login_user][4] }}" + +- name: Get local_login_user home dir + ansible.builtin.getent: + database: passwd + key: "{{ local_login_user }}" + delegate_to: localhost + become: false + +- name: Set local_login_user_dir fact + ansible.builtin.set_fact: + local_login_user_dir: "{{ ansible_facts.getent_passwd[local_login_user][4] }}" + +- name: Prepare ipu invetory file name + ansible.builtin.set_fact: + ipu_inventory_name: "{{ inventory_file | splitext | first + '_mev' + inventory_file | splitext | last }}" + run_once: true + delegate_to: localhost + become: false + +- name: Copy invetory file {{ inventory_file }} to new file {{ ipu_inventory_name }} + ansible.builtin.copy: + src: "{{ inventory_file }}" + dest: "{{ ipu_inventory_name }}" + mode: '0644' + force: yes + owner: "{{ local_login_user }}" + group: "{{ local_login_user }}" + run_once: true + delegate_to: localhost + become: false + +- name: Prepare bastion host configuration in ~/.ssh/config for IMC + ansible.builtin.blockinfile: + path: "{{ local_login_user_dir }}/.ssh/config" + block: | + Host {{ imc_hostname.stdout }} + ProxyCommand ssh {{ ssh_options }} {{ login_user }}@{{ inventory_hostname }} -p 22 -W %h:%p + StrictHostKeyChecking no + UserKnownHostsFile /dev/null + marker: "# {mark} ANSIBLE MANAGED BLOCK {{ imc_hostname.stdout }}" + create: yes + owner: "{{ local_login_user }}" + group: "{{ local_login_user }}" + mode: '0644' + delegate_to: localhost + become: false + +- name: Add IMC to inventory - all + ansible.builtin.add_host: + hostname: "{{ imc_hostname.stdout }}" + ansible_host: "{{ imc_static_ip }}" + ip: "{{ imc_static_ip }}" + ansible_user: "root" # IMC user - the only root user is available in flashed image + ansible_ssh_common_args: '{{ ssh_options }} -o ProxyCommand="ssh -p 22 -W %h:%p {{ ssh_options }} -q {{ inventory_hostname }}"' + ansible_ssh_user: '{{ ansible_user }}' + ansible_ssh_password: '{{ ansible_password }}' + inventory_dir: '{{ inventory_dir }}' + groups: all + +- name: Add IMC to inventory - ipu_imc + ansible.builtin.add_host: + hostname: "{{ imc_hostname.stdout }}" + groups: "ipu_imc" + inventory_dir: '{{ inventory_dir }}' + +- name: Update ipu inventory file for IMC - all + community.general.ini_file: + dest: "{{ ipu_inventory_name }}" + section: "all" + option: >- + {{ imc_hostname.stdout }} ansible_host={{ imc_static_ip }} ip={{ imc_static_ip }} ansible_user={{ login_user }} + ansible_ssh_user={{ ansible_user }} ansible_ssh_password='{{ ansible_password }}' inventory_dir={{ inventory_dir }} + ansible_ssh_common_args='{{ ssh_options }} -o ProxyCommand="ssh -p 22 -W %h:%p {{ ssh_options }} -q {{ inventory_hostname }}"' + no_extra_spaces: yes + allow_no_value: yes + mode: '0644' + state: present + backup: no + delegate_to: localhost + become: false + +- name: Update ipu inventory file for IMC - ipu_imc + community.general.ini_file: + dest: "{{ ipu_inventory_name }}" + section: "ipu_imc" + option: "{{ imc_hostname.stdout }}" + no_extra_spaces: yes + allow_no_value: yes + mode: '0644' + state: present + backup: no + delegate_to: localhost + become: false + +- name: Update /etc/hosts with ipu_imc + ansible.builtin.blockinfile: + path: /etc/hosts + block: | + {{ imc_static_ip }} {{ imc_hostname.stdout }} + marker: "# {mark} ANSIBLE MANAGED BLOCK {{ imc_hostname.stdout }}" + +- name: check hostname of machine where we delegate to + ansible.builtin.command: + cmd: "hostname" + changed_when: false + delegate_to: '{{ imc_hostname.stdout }}' + ignore_unreachable: true + register: imc_connection_result + +- name: wait for ssh connection from IPU-IMC to IPU-ACC + ansible.builtin.wait_for: + port: 22 + host: '{{ acc_static_ip }}' + search_regex: OpenSSH + delay: 1 + timeout: 3600 + delegate_to: '{{ imc_hostname.stdout }}' + ignore_unreachable: true + register: acc_connection_result + +- name: handle one IMC restart based on watchdog + block: + - name: IMC restart detected + ansible.builtin.debug: + msg: "IMC is not reachable - it is restarted because of ACC" + + - name: clear host error caused by reboot + ansible.builtin.meta: clear_host_errors + + - name: wait for ssh connection to IPU-IMC after restart + ansible.builtin.wait_for: + port: 22 + host: '{{ imc_static_ip }}' + search_regex: OpenSSH + delay: 1 + + - name: wait for ssh connection from IPU-IMC to IPU-ACC after restart + ansible.builtin.wait_for: + port: 22 + host: '{{ acc_static_ip }}' + search_regex: OpenSSH + delay: 1 + timeout: 3600 + delegate_to: '{{ imc_hostname.stdout }}' + when: + - acc_connection_result.unreachable | default(false) or imc_connection_result.unreachable | default(false) + +- name: get ACC hostname + ansible.builtin.command: + cmd: "ssh {{ ssh_options }} root@{{ acc_static_ip }} hostname" + register: acc_hostname + changed_when: false + delegate_to: '{{ imc_hostname.stdout }}' + +- name: Prepare bastion host configuration in ~/.ssh/config for ACC + ansible.builtin.blockinfile: + path: "{{ local_login_user_dir }}/.ssh/config" + block: | + Host {{ acc_hostname.stdout }} + ProxyCommand ssh {{ ssh_options }} {{ login_user }}@{{ imc_hostname.stdout }} -p 22 -W %h:%p + StrictHostKeyChecking no + UserKnownHostsFile /dev/null + marker: "# {mark} ANSIBLE MANAGED BLOCK {{ acc_hostname.stdout }}" + create: yes + owner: "{{ local_login_user }}" + group: "{{ local_login_user }}" + mode: '0644' + delegate_to: localhost + become: false + +- name: Add ACC to inventory - all + ansible.builtin.add_host: + hostname: "{{ acc_hostname.stdout }}" + ansible_host: "{{ acc_static_ip }}" + ip: "{{ acc_static_ip }}" + ansible_user: "root" # ACC user - the only root user is available in flashed image + ansible_ssh_common_args: >- + '{{ ssh_options }} -o ProxyCommand="ssh -W %h:%p {{ ssh_options }} -o ProxyCommand=\"ssh -W {{ imc_static_ip }}:22 {{ ssh_options }} + {{ inventory_hostname }}\" {{ imc_static_ip }}"' + ansible_ssh_user: '{{ ansible_user }}' + ansible_ssh_password: '{{ ansible_password }}' + inventory_dir: '{{ inventory_dir }}' + groups: all + +- name: Add ACC to inventory - ipu_acc + ansible.builtin.add_host: + hostname: "{{ acc_hostname.stdout }}" + groups: "ipu_acc" + inventory_dir: '{{ inventory_dir }}' + +- name: Update ipu inventory file for ACC - all + community.general.ini_file: + dest: "{{ ipu_inventory_name }}" + section: "all" + option: >- + {{ acc_hostname.stdout }} ansible_host={{ acc_static_ip }} ip={{ acc_static_ip }} ansible_user=root + ansible_ssh_user={{ ansible_user }} ansible_ssh_password='{{ ansible_password }}' inventory_dir={{ inventory_dir }} + ansible_ssh_common_args='{{ ssh_options }} -o ProxyCommand="ssh -W %h:%p {{ ssh_options }} -o ProxyCommand=\"ssh -W {{ imc_static_ip }}:22 + {{ ssh_options }} {{ inventory_hostname }}\" {{ imc_static_ip }}"' + no_extra_spaces: yes + allow_no_value: yes + mode: '0644' + state: present + backup: no + delegate_to: localhost + become: false + +- name: Update ipu inventory file for ACC - ipu_acc + community.general.ini_file: + dest: "{{ ipu_inventory_name }}" + section: "ipu_acc" + option: "{{ acc_hostname.stdout }}" + no_extra_spaces: yes + allow_no_value: yes + mode: '0644' + state: present + backup: no + delegate_to: localhost + become: false + +- name: Update /etc/hosts with ipu_acc + ansible.builtin.blockinfile: + path: /etc/hosts + block: | + {{ acc_static_ip }} {{ acc_hostname.stdout }} + marker: "# {mark} ANSIBLE MANAGED BLOCK {{ acc_hostname.stdout }}" + delegate_to: '{{ imc_hostname.stdout }}' + +- name: IPU is up and running + ansible.builtin.debug: + msg: "IPU is up and running" diff --git a/roles/ipu/flash_ipu_ssd/defaults/main.yml b/roles/ipu/flash_ipu_ssd/defaults/main.yml new file mode 100644 index 00000000..67f5cece --- /dev/null +++ b/roles/ipu/flash_ipu_ssd/defaults/main.yml @@ -0,0 +1,21 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +nvme1_dev: "/dev/nvme0n1" +dd_prefix: "dd bs=16M" +ipu_ssd_image_file: "{{ (ipu_project_root_dir, 'mev', 'images', 'nvme-image-mev.bin') | path_join }}" +dd_src_cmd: "{{ dd_prefix }} if={{ ipu_ssd_image_file }}" +dd_dst_cmd: "{{ dd_prefix }} of={{ nvme1_dev }}" diff --git a/roles/ipu/flash_ipu_ssd/tasks/main.yml b/roles/ipu/flash_ipu_ssd/tasks/main.yml new file mode 100644 index 00000000..6dfebfde --- /dev/null +++ b/roles/ipu/flash_ipu_ssd/tasks/main.yml @@ -0,0 +1,99 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: Install dependencies + ansible.builtin.include_role: + name: install_dependencies + +- name: Create IPU project directory + ansible.builtin.file: + path: "{{ ipu_project_root_dir }}" + state: directory + mode: 0755 + +- name: copy ssd image from ansible host + ansible.builtin.copy: + src: "{{ (ipu_tmp_dir, ipu_ssd_image_tarball) | path_join }}" + dest: "{{ (ipu_project_root_dir, ipu_ssd_image_tarball) | path_join }}" + mode: 0644 + +- name: unarchive ssd image + ansible.builtin.unarchive: + src: "{{ (ipu_project_root_dir, ipu_ssd_image_tarball) | path_join }}" + dest: "{{ ipu_project_root_dir }}" + remote_src: yes + mode: 0755 + +- name: check if ssd image is available + ansible.builtin.stat: + path: "{{ ipu_ssd_image_file }}" + register: ssd_image_stats + +- name: fail if ssd image does not exist + ansible.builtin.fail: + msg: "ssd image {{ ipu_ssd_image_file }} does not exist" + when: not ssd_image_stats.stat.exists + +- name: syslogmode DEV + ansible.builtin.shell: "set -o pipefail && ssh -p {{ imc_ssh_port }} {{ ssh_options }} {{ imc_user }}@{{ imc_static_ip }} /etc/init.d/syslogmode DEV" + args: + executable: /bin/bash + register: syslogmode_out + changed_when: "'DEV mode, syslog disabled' in syslogmode_out.stdout" + failed_when: syslogmode_out.rc != 0 + +- name: umount loop0 + ansible.builtin.shell: "set -o pipefail && ssh -p {{ imc_ssh_port }} {{ ssh_options }} {{ imc_user }}@{{ imc_static_ip }} umount /dev/loop0" + args: + executable: /bin/bash + register: loop0_out + changed_when: loop0_out.stderr | length == 0 + failed_when: + - loop0_out.rc != 0 + - "'t unmount /dev/loop0' not in loop0_out.stderr" + - "'Invalid argument' not in loop0_out.stderr" + +- name: umount nvme0n1 + ansible.builtin.shell: "set -o pipefail && ssh -p {{ imc_ssh_port }} {{ ssh_options }} {{ imc_user }}@{{ imc_static_ip }} umount -l /dev/nvme0n1*" + args: + executable: /bin/bash + register: nvme0n1_out + changed_when: nvme0n1_out.stderr | length == 0 + failed_when: + - nvme0n1_out.rc != 0 + - "'t unmount /dev/nvme0n1' not in nvme0n1_out.stderr" + - "'Invalid argument' not in nvme0n1_out.stderr" + +- name: kill tgtd + ansible.builtin.shell: "set -o pipefail && ssh -p {{ imc_ssh_port }} {{ ssh_options }} {{ imc_user }}@{{ imc_static_ip }} killall -9 tgtd" + args: + executable: /bin/bash + register: tgtd_out + changed_when: tgtd_out.stderr | length == 0 + failed_when: + - tgtd_out.rc != 0 + - "' no process killed' not in tgtd_out.stderr" + - "' No such process' not in tgtd_out.stderr" + +- name: flash ssd image + ansible.builtin.shell: + cmd: "set -o pipefail && {{ dd_src_cmd }} | ssh -p {{ imc_ssh_port }} {{ ssh_options }} -q {{ imc_user }}@{{ imc_static_ip }} {{ dd_dst_cmd }}" + args: + executable: /bin/bash + register: flash_ssd_out + changed_when: true + async: 1200 # Maximum allowed timeout in Seconds + poll: 10 # Polling Interval in Seconds diff --git a/roles/ipu/imc/tasks/main.yml b/roles/ipu/imc/tasks/main.yml new file mode 100644 index 00000000..fc76a0c5 --- /dev/null +++ b/roles/ipu/imc/tasks/main.yml @@ -0,0 +1,21 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: get IMC version + ansible.builtin.command: + cmd: "cat /etc/issue" + changed_when: false + register: imc_version diff --git a/roles/ipu/prepare_ipu_linkp/defaults/main.yml b/roles/ipu/prepare_ipu_linkp/defaults/main.yml new file mode 100644 index 00000000..705bebb0 --- /dev/null +++ b/roles/ipu/prepare_ipu_linkp/defaults/main.yml @@ -0,0 +1,17 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +ftdi_sio_driver_dir: "/sys/bus/usb/drivers/ftdi_sio" diff --git a/roles/ipu/prepare_ipu_linkp/tasks/main.yml b/roles/ipu/prepare_ipu_linkp/tasks/main.yml new file mode 100644 index 00000000..bcb7bb03 --- /dev/null +++ b/roles/ipu/prepare_ipu_linkp/tasks/main.yml @@ -0,0 +1,77 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: Install dependencies + ansible.builtin.include_role: + name: install_dependencies + +- name: create IMC minicom file + ansible.builtin.copy: + dest: "/etc/minirc.IMC" + content: | + pu port /dev/ttyUSB2 + pu baudrate 460800 + pu bits 8 + pu parity N + pu stopbits 1 + pu rtscts No + mode: 0644 + +- name: create ACC minicom file + ansible.builtin.copy: + dest: "/etc/minirc.ACC" + content: | + pu port /dev/ttyUSB0 + pu baudrate 115200 + pu bits 8 + pu parity N + pu stopbits 1 + pu rtscts No + mode: 0644 + +- name: Create IPU project directory + ansible.builtin.file: + path: "{{ ipu_project_root_dir }}" + state: directory + mode: 0755 + +- name: copy EthProgrammer from ansible host + ansible.builtin.copy: + src: "{{ (ipu_tmp_dir, ipu_eth_programmer_zip) | path_join }}" + dest: "{{ (ipu_project_root_dir, ipu_eth_programmer_zip) | path_join }}" + mode: 0644 + +- name: unarchive EthProgrammer + ansible.builtin.unarchive: + src: "{{ (ipu_project_root_dir, ipu_eth_programmer_zip) | path_join }}" + dest: "{{ ipu_project_root_dir }}" + remote_src: yes + mode: 0755 + register: unzip_result + failed_when: (unzip_result.extract_results.rc != 0) and ('appears to use backslashes as path separators' not in unzip_result.extract_results.err) + +- name: create a symbolic link for libdl.so + ansible.builtin.file: + src: /usr/lib64/libdl.so.2 + dest: /usr/lib64/libdl.so + owner: root + group: root + state: link + +- name: ensure that ftdi_sio module is loaded + community.general.modprobe: + name: ftdi_sio + state: present diff --git a/roles/cndp_dp_install/vars/main.yml b/roles/ipu/prepare_ipu_linkp/vars/main.yml similarity index 84% rename from roles/cndp_dp_install/vars/main.yml rename to roles/ipu/prepare_ipu_linkp/vars/main.yml index d6d96946..c8f87e02 100644 --- a/roles/cndp_dp_install/vars/main.yml +++ b/roles/ipu/prepare_ipu_linkp/vars/main.yml @@ -16,12 +16,7 @@ --- install_dependencies: Debian: - - git - - make - - build-essential - - binutils RedHat: - - git - - cmake - - "@Development tools" - - binutils + - minicom + - ipmitool + - dotnet-sdk-6.0 diff --git a/roles/istio_service_mesh/tasks/main.yml b/roles/istio_service_mesh/tasks/main.yml index 91f5d2a5..90cedc22 100644 --- a/roles/istio_service_mesh/tasks/main.yml +++ b/roles/istio_service_mesh/tasks/main.yml @@ -20,14 +20,6 @@ when: - inventory_hostname == groups['kube_control_plane'][0] -- name: determine machine type - include_role: - name: check_machine_type - when: - - inventory_hostname in groups['kube_node'] or - inventory_hostname in groups['vm_host'] - - not on_vms | default (false) - - name: remove existing istio service mesh resources include_tasks: cleanup.yml when: diff --git a/roles/istio_service_mesh/templates/custom-ca.yaml.j2 b/roles/istio_service_mesh/templates/custom-ca.yaml.j2 index cb9d2ae5..15e326fb 100644 --- a/roles/istio_service_mesh/templates/custom-ca.yaml.j2 +++ b/roles/istio_service_mesh/templates/custom-ca.yaml.j2 @@ -39,6 +39,20 @@ spec: - signers verbs: - approve + - path: rules[-1] + value: | + apiGroups: + - certificates.k8s.io + resources: + - certificatesigningrequests/approval + - certificatesigningrequests/status + - certificatesigningrequests + verbs: + - update + - create + - get + - delete + - watch meshConfig: defaultConfig: proxyMetadata: diff --git a/roles/istio_service_mesh/vars/main.yml b/roles/istio_service_mesh/vars/main.yml index 9cf2cb4f..bda58119 100644 --- a/roles/istio_service_mesh/vars/main.yml +++ b/roles/istio_service_mesh/vars/main.yml @@ -16,11 +16,11 @@ istio_service_mesh_defaults: enabled: false image: istio/istioctl - version: 1.17.1 + version: 1.18.1 intel_preview: enabled: false image: intel/istioctl - version: 1.16.1-intel.0 + version: 1.18.0-intel.0 context: '' filename: [] namespace: '' diff --git a/roles/jaeger_install/defaults/main.yml b/roles/jaeger_install/defaults/main.yml index 31115004..18dff597 100644 --- a/roles/jaeger_install/defaults/main.yml +++ b/roles/jaeger_install/defaults/main.yml @@ -13,7 +13,8 @@ ## See the License for the specific language governing permissions and ## limitations under the License. ## -jaeger_version: v1.42.0 +jaeger_version: v1.44.0 jaeger_crd_url: https://github.com/jaegertracing/jaeger-operator/releases/download/{{ jaeger_version }}/jaeger-operator.yaml +jaeger_annotations_key_to_remove: 'sidecar.jaegertracing.io/inject' jaeger_query_remove: >- -p='[{"op": "remove", "path": /metadata/annotations/sidecar.jaegertracing.io~1inject}]' diff --git a/roles/jaeger_install/tasks/main.yml b/roles/jaeger_install/tasks/main.yml index cbacfaf6..3c93b181 100644 --- a/roles/jaeger_install/tasks/main.yml +++ b/roles/jaeger_install/tasks/main.yml @@ -14,7 +14,8 @@ ## limitations under the License. ## - name: Deploy Jaeger - when: inventory_hostname == groups['kube_control_plane'][0] + when: + - inventory_hostname == groups['kube_control_plane'][0] block: - name: Create monitoring and observability namespace kubernetes.core.k8s: @@ -52,17 +53,29 @@ - jaeger_operator.yml - jaeger_rolebinding.yml - - name: Create secret with Elasticsearch credentials + - name: Get Elasticsearch credentials ansible.builtin.shell: >- - kubectl create secret generic jaeger-secret - --from-literal=ES_PASSWORD=$(kubectl get secrets --namespace=monitoring - elasticsearch-master-credentials -ojsonpath='{.data.password}' | base64 -d) - --from-literal=ES_USERNAME=elastic - -n monitoring - changed_when: true + kubectl get secrets --namespace=monitoring + elasticsearch-master-credentials -ojsonpath='{.data.password}' | base64 -d + changed_when: false + register: elastic_pass args: executable: /bin/bash + - name: Create secret with Elasticsearch credentials + kubernetes.core.k8s: + state: present + definition: + apiVersion: v1 + kind: Secret + type: Opaque + metadata: + name: "jaeger-secret" + namespace: "monitoring" + stringData: + ES_USERNAME: "elastic" + ES_PASSWORD: "{{ elastic_pass.stdout }}" + - name: Wait for jaeger operator CRD kubernetes.core.k8s_info: kind: CustomResourceDefinition @@ -101,6 +114,15 @@ retries: 30 delay: 10 + - name: Get jaeger-query deployment info + kubernetes.core.k8s_info: + api_version: v1 + kind: Deployment + name: jaeger-query + namespace: monitoring + register: jaeger_query_deployment + - name: Remove annotation from jaeger-query ansible.builtin.command: "kubectl patch deployment jaeger-query -n monitoring --type=json {{ jaeger_query_remove }}" changed_when: true + when: "jaeger_annotations_key_to_remove in (jaeger_query_deployment.resources | map(attribute='metadata') | map(attribute='annotations'))[0].keys()" diff --git a/roles/kibana_install/tasks/main.yml b/roles/kibana_install/tasks/main.yml index be204c17..e7765410 100644 --- a/roles/kibana_install/tasks/main.yml +++ b/roles/kibana_install/tasks/main.yml @@ -13,7 +13,10 @@ ## See the License for the specific language governing permissions and ## limitations under the License. ## -- block: +- name: Deploy kibana + when: + - inventory_hostname == groups['kube_control_plane'][0] + block: - name: create kibana folder ansible.builtin.file: state: directory @@ -37,4 +40,3 @@ create_namespace: true wait: true timeout: 15m0s - when: inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-config-configmap.yaml b/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-config-configmap.yaml index ab853d87..e5c7c137 100644 --- a/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-config-configmap.yaml +++ b/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-config-configmap.yaml @@ -8,7 +8,7 @@ data: apphsm.conf: | { "port": {{ .Values.apphsm.main.port | int }}, - "ip": {{ .Values.apphsm.main.hostname | quote }}, + "ip": {{ .Values.apphsm.main.ip | quote }}, "nonce_lifetime": {{ .Values.apphsm.nonce_lifetime | int }}, "clients": [ {{- if eq .Values.apphsm.ctk_loadkey_demo_enabled "true" -}} @@ -17,6 +17,12 @@ data: "permission": "allow_all" }, {{- end -}} + {{- if .Values.apphsm.oran -}} + { + "id": {{ .Values.apphsm.oran_netopeer2_cert_user_id | quote }}, + "permission": "allow_all" + }, + {{- end -}} { "id": {{ .Values.apphsm.generic_client_cert_id | quote }}, "permission": "allow_all" diff --git a/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-deployment.yml b/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-deployment.yml index 15c93401..ce082ef5 100644 --- a/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-deployment.yml +++ b/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-deployment.yml @@ -17,7 +17,7 @@ spec: labels: app: {{ .Release.Name }} spec: - hostNetwork: true + hostNetwork: false serviceAccountName: {{ .Release.Name }} initContainers: - name: init-tmpfs @@ -25,10 +25,17 @@ spec: command: ['sh', '-c', "rm -rf /var/lib/softhsm/tokens/*"] containers: - name: {{ .Release.Name }} + ports: + - name: apphsm + containerPort: {{ .Values.apphsm.main.port }} + readinessProbe: + tcpSocket: + port: {{ .Values.apphsm.main.port }} + initialDelaySeconds: 15 + periodSeconds: 5 + successThreshold: 2 image: "{{ .Values.apphsm.main.image.repo }}/{{ .Values.apphsm.main.image.name }}:{{ .Values.apphsm.main.image.tag }}" imagePullPolicy: {{ .Values.apphsm.main.image.pullPolicy }} - command: - - /workspace/apphsm_commands.sh envFrom: - configMapRef: name: {{ .Release.Name }}-env @@ -46,14 +53,34 @@ spec: - name: tmpfs mountPath: /var/lib/softhsm/tokens subPath: tokens - - name: appshm-conf +{{- if not (eq .Values.apphsm.oran "true") }} + - name: apphsm-conf mountPath: /opt/intel/apphsm/apphsm.conf subPath: apphsm.conf readOnly: true - - name: appshm-cmd - mountPath: /workspace/apphsm_commands.sh - subPath: apphsm_commands.sh +{{- else }} + - name: apphsm-conf + mountPath: /opt/apphsm_config/apphsm.conf + subPath: apphsm.conf + readOnly: true +# we have to mount these keys one by one, since apphsm doesn't like symbol links + - name: custom-config + mountPath: /opt/intel/custom_tls/server.key + subPath: server.key + readOnly: true + - name: custom-config + mountPath: /opt/intel/custom_tls/server.crt + subPath: server.crt readOnly: true + - name: custom-config + mountPath: /opt/intel/custom_tls/client.key + subPath: client.key + readOnly: true + - name: custom-config + mountPath: /opt/intel/custom_tls/client.crt + subPath: client.crt + readOnly: true +{{ end }} resources: limits: cpu: 500m @@ -88,10 +115,15 @@ spec: - name: sgx-qcnl-conf configMap: name: {{ .Release.Name }}-qcnl-conf - - name: appshm-conf + - name: apphsm-conf configMap: name: {{ .Release.Name }}-config - - name: appshm-cmd +{{- if eq .Values.apphsm.oran "true" }} + - name: custom-config + configMap: + name: {{ .Release.Name }}-custom-config +{{ end }} + - name: apphsm-cmd configMap: name: {{ .Release.Name }}-entrypoint defaultMode: 0777 diff --git a/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-entrypoint-configmap.yaml b/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-entrypoint-configmap.yaml deleted file mode 100644 index f94d8d86..00000000 --- a/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-entrypoint-configmap.yaml +++ /dev/null @@ -1,55 +0,0 @@ ---- -apiVersion: v1 -kind: ConfigMap -metadata: - name: {{ .Release.Name }}-entrypoint - namespace: {{ .Release.Namespace }} -data: - apphsm_commands.sh: | - #!/bin/bash -e - set -eu - export no_proxy="$no_proxy,localhost,127.0.0.1/8" - - echo "Cleaning up softhsm tokens directory" - rm -rf /var/lib/softhsm/tokens/* - mkdir -p /tmp/apphsm - cp /opt/intel/ca/{apphsm.crt,apphsm.key,ca.crt} /tmp/apphsm/ - cp /opt/intel/apphsm/apphsm.conf /tmp/apphsm/apphsm.conf - cd /tmp/apphsm/ - - {{- range $key := .Values.apphsm.keys }} - echo "Create private key & certificate..." - openssl req \ - -nodes \ - -x509 \ - -newkey rsa:3072 \ - -keyout {{ $key.token_key }} \ - -out {{ $key.token_cert }} \ - -subj "/O={{ $key.crt_subj.O }}/CN={{ $key.crt_subj.CN }}" - - echo "Create sample token..." - sample_key_gen \ - --so-pin {{ $.Values.apphsm.default_so_pin }} \ - --pin {{ $key.pin }} \ - --token-label {{ $key.token_name }} \ - --key-label {{ $key.key_name }} \ - --import-key {{ $key.token_key }} - - echo "Remove private key as the key is stored in the HSM..." - rm -rf {{ $key.token_key }} - {{ end }} - - echo -e "\033[0;31m------------------------------------------------------------" - echo -e "KMRA (Key Management Reference Application) is a proof-of-concept" - echo -e "software not suitable for production usage. AppHSM Key Server has" - echo -e "limited functionality and provisions private keys to non-production" - echo -e "SGX enclaves. Please note that the enclave is signed with a test" - echo -e "signing key. A production enclave should go through the process of" - echo -e "signing an enclave as explained in the section Enclave Signing Tool" - echo -e "in the Intel(R) SGX Developer Reference for Linux* OS" - echo -e "(https://download.01.org/intel-sgx/latest/linux-latest/docs/)" - echo -e "---------------------------------------------------------------\033[0m" - - source /opt/intel/apphsm/env_*/bin/activate && \ - python3 /opt/intel/apphsm/apphsm.py && \ - deactivate diff --git a/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-service.yaml b/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-service.yaml new file mode 100644 index 00000000..5dca41f5 --- /dev/null +++ b/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-service.yaml @@ -0,0 +1,14 @@ +apiVersion: v1 +kind: Service +metadata: + name: {{ .Values.apphsm.main.hostname }} + namespace: {{ .Release.Namespace }} + labels: + app: {{ .Release.Name }} +spec: + selector: + app: {{ .Release.Name }} + ports: + - protocol: TCP + port: {{ .Values.apphsm.main.servicePort }} + targetPort: {{ .Values.apphsm.main.port }} diff --git a/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-deployment.yml b/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-deployment.yml index dda8e3a6..a1636e3d 100644 --- a/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-deployment.yml +++ b/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-deployment.yml @@ -18,12 +18,17 @@ spec: annotations: sgx.intel.com/quote-provider: {{ .Release.Name }} spec: - hostNetwork: true + hostNetwork: false serviceAccountName: {{ .Release.Name }} initContainers: - name: init-tmpfs image: "{{ .Values.ctk_loadkey.init.image.repo }}/{{ .Values.ctk_loadkey.init.image.name }}:{{ .Values.ctk_loadkey.init.image.tag }}" command: ['sh', '-c', "rm -rf /opt/intel/cryptoapitoolkit/tokens/*"] + securityContext: + runAsUser: 65333 + runAsGroup: {{ .Values.ctk_loadkey.sgx_prv_gid }} + supplementalGroups: + - {{ .Values.ctk_loadkey.sgx_gid }} containers: - name: {{ .Release.Name }} image: "{{ .Values.ctk_loadkey.main.image.repo }}/{{ .Values.ctk_loadkey.main.image.name }}:{{ .Values.ctk_loadkey.main.image.tag }}" @@ -61,10 +66,6 @@ spec: cpu: 200m memory: 200Mi securityContext: - runAsUser: 65333 - runAsGroup: {{ .Values.ctk_loadkey.sgx_prv_gid }} - supplementalGroups: - - {{ .Values.ctk_loadkey.sgx_gid }} readOnlyRootFilesystem: true - name: {{ .Release.Name }}-nginx image: "{{ .Values.ctk_loadkey.nginx.image.repo }}/{{ .Values.ctk_loadkey.nginx.image.name }}:{{ .Values.ctk_loadkey.nginx.image.tag }}" @@ -91,7 +92,6 @@ spec: cpu: 100m memory: 200Mi securityContext: - runAsUser: 65333 runAsGroup: 65333 readOnlyRootFilesystem: true affinity: diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-client/.helmignore b/roles/kmra_install/charts/kmra-oran-netopeer2-client/.helmignore new file mode 100644 index 00000000..50af0317 --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-client/.helmignore @@ -0,0 +1,22 @@ +# Patterns to ignore when building packages. +# This supports shell glob matching, relative path matching, and +# negation (prefixed with !). Only one pattern per line. +.DS_Store +# Common VCS dirs +.git/ +.gitignore +.bzr/ +.bzrignore +.hg/ +.hgignore +.svn/ +# Common backup files +*.swp +*.bak +*.tmp +*~ +# Various IDEs +.project +.idea/ +*.tmproj +.vscode/ diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-client/Chart.yaml b/roles/kmra_install/charts/kmra-oran-netopeer2-client/Chart.yaml new file mode 100644 index 00000000..ccc541f0 --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-client/Chart.yaml @@ -0,0 +1,21 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +apiVersion: v1 +description: Key Management Reference Application - ORAN netopeer2 server Application +name: kmra-oran-netopeer2-client +version: 2.3 +appVersion: '2.3' diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/NOTES.txt b/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/NOTES.txt new file mode 100644 index 00000000..fc3352e9 --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/NOTES.txt @@ -0,0 +1,8 @@ +{{ .Chart.Name }} was installed . + +Your release is named {{ .Release.Name }}. + +To learn more about the release, try: + + $ helm status {{ .Release.Name }} + $ helm get {{ .Release.Name }} diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/_helpers.tpl b/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/_helpers.tpl new file mode 100644 index 00000000..e3b0b7c3 --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/_helpers.tpl @@ -0,0 +1,32 @@ +{{/* vim: set filetype=mustache: */}} +{{/* +Expand the name of the chart. +*/}} +{{- define "kmra-oran-netopeer2-client.name" -}} +{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" -}} +{{- end -}} + +{{/* +Create a default fully qualified app name. +We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec). +If release name contains chart name it will be used as a full name. +*/}} +{{- define "kmra-oran-netopeer2-client.fullname" -}} +{{- if .Values.fullnameOverride -}} +{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" -}} +{{- else -}} +{{- $name := default .Chart.Name .Values.nameOverride -}} +{{- if contains $name .Release.Name -}} +{{- .Release.Name | trunc 63 | trimSuffix "-" -}} +{{- else -}} +{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}} +{{- end -}} +{{- end -}} +{{- end -}} + +{{/* +Create chart name and version as used by the chart label. +*/}} +{{- define "kmra-oran-netopeer2-client.chart" -}} +{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" -}} +{{- end -}} diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-client-deployment.yml b/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-client-deployment.yml new file mode 100644 index 00000000..22a814aa --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-client-deployment.yml @@ -0,0 +1,140 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: {{ .Release.Name }} + labels: + app: {{ .Release.Name }} +spec: + replicas: 1 + strategy: + type: Recreate + selector: + matchLabels: + app: {{ .Release.Name }} + template: + metadata: + labels: + app: {{ .Release.Name }} + annotations: + sgx.intel.com/quote-provider: {{ .Release.Name }} + spec: + hostNetwork: false + serviceAccountName: {{ .Release.Name }} + initContainers: + - name: init-tmpfs + image: "{{ .Values.oran_netopeer2_client.init.image.repo }}/{{ .Values.oran_netopeer2_client.init.image.name }}:{{ .Values.oran_netopeer2_client.init.image.tag }}" + command: ['sh', '-c', "rm -rf /opt/intel/cryptoapitoolkit/tokens/*"] + securityContext: + runAsUser: 65533 + runAsGroup: {{ .Values.oran_netopeer2_client.sgx_prv_gid }} + supplementalGroups: + - {{ .Values.oran_netopeer2_client.sgx_gid }} + containers: + - name: {{ .Release.Name }} + image: "{{ .Values.oran_netopeer2_client.main.image.repo }}/{{ .Values.oran_netopeer2_client.main.image.name }}:{{ .Values.oran_netopeer2_client.main.image.tag }}" + imagePullPolicy: {{ .Values.oran_netopeer2_client.main.image.pullPolicy }} + ports: + - name: ctk-loader + containerPort: {{ .Values.oran_netopeer2_client.pkcs11_daemon_socket_port }} + envFrom: + - configMapRef: + name: {{ .Release.Name }}-env-cm + volumeMounts: + - name: mtls + mountPath: /opt/intel/ca + readOnly: true + - name: sgx-qcnl-conf + mountPath: /etc/sgx_default_qcnl.conf + subPath: sgx_default_qcnl.conf + readOnly: true + - name: tmpfs + mountPath: /tmp + subPath: tmp + - name: tmpfs + mountPath: /opt/intel/cryptoapitoolkit/tokens + subPath: tokens + - name: p11-proxy-tls-psk + mountPath: "{{ .Values.oran_netopeer2_client.pkcs11_proxy_tls_psk_file }}" + subPath: p11_proxy_tls.psk + readOnly: true + resources: + limits: + cpu: 500m + memory: 500Mi + sgx.intel.com/epc: "512Ki" + requests: + cpu: 200m + memory: 200Mi + securityContext: + readOnlyRootFilesystem: true + - name: {{ .Release.Name }}-oran + image: "{{ .Values.oran_netopeer2_client.oran.image.repo }}/{{ .Values.oran_netopeer2_client.oran.image.name }}:{{ .Values.oran_netopeer2_client.oran.image.tag }}" + imagePullPolicy: {{ .Values.oran_netopeer2_client.oran.image.pullPolicy }} + envFrom: + - configMapRef: + name: {{ .Release.Name }}-oran-env + volumeMounts: + - name: tmpfs-oran + mountPath: /tmp + subPath: tmp + - name: p11-proxy-tls-psk + mountPath: "{{ .Values.oran_netopeer2_client.pkcs11_proxy_tls_psk_file }}" + subPath: p11_proxy_tls.psk + readOnly: true + - name: sysrepo-config + mountPath: /opt/intel/sysrepo_config + readOnly: true + resources: + limits: + cpu: 200m + memory: 300Mi + requests: + cpu: 100m + memory: 200Mi + securityContext: + runAsUser: 1000 + runAsGroup: 1000 + readOnlyRootFilesystem: true + imagePullSecrets: + - name: {{ .Values.oran_netopeer2_client.pullSecret }} + affinity: + nodeAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 1 + preference: + matchExpressions: + - key: app + operator: In + values: + - kmra + volumes: + - name: mtls + secret: + secretName: {{ .Release.Name }}-tls + items: + - key: tls.key + path: ctk_loadkey.key + - key: tls.cert + path: ctk_loadkey.crt + - key: ca.cert + path: ca.crt + - name: sgx-qcnl-conf + configMap: + name: {{ .Release.Name }}-qcnl-conf + - name: tmpfs + emptyDir: + medium: Memory + sizeLimit: 64Mi + - name: tmpfs-oran + emptyDir: + medium: Memory + sizeLimit: 64Mi + - name: p11-proxy-tls-psk + configMap: + name: {{ .Release.Name }}-p11-proxy-tls-psk-conf + - name: oran-env + configMap: + name: {{ .Release.Name }}-oran-env + - name: sysrepo-config + configMap: + name: oran-sysrepo-config diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-client-env-configmap.yml b/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-client-env-configmap.yml new file mode 100644 index 00000000..787e0d81 --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-client-env-configmap.yml @@ -0,0 +1,21 @@ +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ .Release.Name }}-env-cm + namespace: {{ .Release.Namespace }} +data: + http_proxy: {{ .Values.http_proxy | default "" | quote }} + https_proxy: {{ .Values.https_proxy | default "" | quote }} + no_proxy: {{ .Values.no_proxy | default "" | quote }} + PCCS_PORT: {{ .Values.oran_netopeer2_client.pccs_port | quote }} + PCCS_HOSTNAME: {{ .Values.oran_netopeer2_client.pccs_hostname | quote }} + APPHSM_PORT: {{ .Values.oran_netopeer2_client.apphsm_port | quote }} + APPHSM_HOSTNAME: {{ .Values.oran_netopeer2_client.apphsm_hostname | quote }} + CLIENT_TOKEN: {{ .Values.oran_netopeer2_client.client_token | quote }} + CLIENT_KEY_LABEL: {{ .Values.oran_netopeer2_client.client_key_label | quote }} + TEST_UNIQUE_UID: {{ .Values.oran_netopeer2_client.test_unique_uid | quote }} + DEFAULT_USER_PIN: {{ .Values.oran_netopeer2_client.default_user_pin | quote }} + DEFAULT_SO_PIN: {{ .Values.oran_netopeer2_client.default_so_pin | quote }} + DEFAULT_CLIENT_TOKEN_ID: {{ .Values.oran_netopeer2_client.default_client_token_id | quote }} + PKCS11_PROXY_TLS_PSK_FILE: {{ .Values.oran_netopeer2_client.pkcs11_proxy_tls_psk_file | quote }} diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-client-oran-env-configmap.yml b/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-client-oran-env-configmap.yml new file mode 100644 index 00000000..05576b9d --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-client-oran-env-configmap.yml @@ -0,0 +1,21 @@ +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ .Release.Name }}-oran-env + namespace: {{ .Release.Namespace }} +data: + http_proxy: {{ .Values.http_proxy | default "" | quote }} + https_proxy: {{ .Values.https_proxy | default "" | quote }} + no_proxy: {{ .Values.no_proxy | default "" | quote }} + CLIENT_TOKEN: {{ .Values.oran_netopeer2_client.client_token | quote }} + CLIENT_KEY_LABEL: {{ .Values.oran_netopeer2_client.client_key_label | quote }} + TEST_UNIQUE_UID: {{ .Values.oran_netopeer2_client.test_unique_uid | quote }} + DEFAULT_USER_PIN: {{ .Values.oran_netopeer2_client.default_user_pin | quote }} + DEFAULT_SO_PIN: {{ .Values.oran_netopeer2_client.default_so_pin | quote }} + DEFAULT_CLIENT_TOKEN_ID: {{ .Values.oran_netopeer2_client.default_client_token_id | quote }} + PKCS11_PROXY_TLS_PSK_FILE: {{ .Values.oran_netopeer2_client.pkcs11_proxy_tls_psk_file | quote }} + PKCS11_PROXY_SOCKET: "tls://{{ .Values.oran_netopeer2_client.pkcs11_daemon_socket_hostname }}:{{ .Values.oran_netopeer2_client.pkcs11_daemon_socket_port }}" + NETOPEER2_SERVER_HOSTNAME: "{{ .Values.oran_netopeer2_client.oran.netopeer2_server_name }}.{{ .Release.Namespace }}.svc.{{ .Values.oran_netopeer2_client.oran.netopeer2_server_domain }}" + NETOPEER2_SERVER_PORT: {{ .Values.oran_netopeer2_client.oran.netopeer2_server_port | quote }} + NETOPEER_TYPE: {{ .Values.oran_netopeer2_client.oran.type | quote }} diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-client-pkcs11-proxy-tls-psk-configmap.yaml b/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-client-pkcs11-proxy-tls-psk-configmap.yaml new file mode 100644 index 00000000..be6256b3 --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-client-pkcs11-proxy-tls-psk-configmap.yaml @@ -0,0 +1,9 @@ +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ .Release.Name }}-p11-proxy-tls-psk-conf + namespace: {{ .Release.Namespace }} +data: + p11_proxy_tls.psk: | + {{ .Values.oran_netopeer2_client.pkcs11_proxy_tls_psk }} diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-client-qcnl-configmap.yaml b/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-client-qcnl-configmap.yaml new file mode 100644 index 00000000..927a9ae6 --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-client-qcnl-configmap.yaml @@ -0,0 +1,11 @@ +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ .Release.Name }}-qcnl-conf + namespace: {{ .Release.Namespace }} +data: + sgx_default_qcnl.conf: | + PCCS_URL=https://{{ .Values.oran_netopeer2_client.pccs_hostname }}:{{ .Values.oran_netopeer2_client.pccs_port }}/sgx/certification/v3/ + # To accept insecure HTTPS cert, set this option to FALSE + USE_SECURE_CERT={{ (upper .Values.oran_netopeer2_client.use_secure_cert) }} diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-client-rbac-cluster-role-binding.yml b/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-client-rbac-cluster-role-binding.yml new file mode 100644 index 00000000..da0d1956 --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-client-rbac-cluster-role-binding.yml @@ -0,0 +1,16 @@ +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: {{ .Release.Name }} +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: {{ .Release.Name }} +subjects: + - kind: ServiceAccount + name: {{ .Release.Name }} + namespace: "{{ .Release.Namespace }}" + - kind: Group + apiGroup: rbac.authorization.k8s.io + name: system:serviceaccounts diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-server-rbac-client-account.yml b/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-server-rbac-client-account.yml new file mode 100644 index 00000000..ec2b2d57 --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-client/templates/kmra-oran-netopeer2-server-rbac-client-account.yml @@ -0,0 +1,5 @@ +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: {{ .Release.Name }} diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-server/.helmignore b/roles/kmra_install/charts/kmra-oran-netopeer2-server/.helmignore new file mode 100644 index 00000000..50af0317 --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-server/.helmignore @@ -0,0 +1,22 @@ +# Patterns to ignore when building packages. +# This supports shell glob matching, relative path matching, and +# negation (prefixed with !). Only one pattern per line. +.DS_Store +# Common VCS dirs +.git/ +.gitignore +.bzr/ +.bzrignore +.hg/ +.hgignore +.svn/ +# Common backup files +*.swp +*.bak +*.tmp +*~ +# Various IDEs +.project +.idea/ +*.tmproj +.vscode/ diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-server/Chart.yaml b/roles/kmra_install/charts/kmra-oran-netopeer2-server/Chart.yaml new file mode 100644 index 00000000..d2ce210a --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-server/Chart.yaml @@ -0,0 +1,21 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +apiVersion: v1 +description: Key Management Reference Application - ORAN netopeer2 server Application +name: kmra-oran-netopeer2-server +version: 2.3 +appVersion: '2.3' diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/NOTES.txt b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/NOTES.txt new file mode 100644 index 00000000..fc3352e9 --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/NOTES.txt @@ -0,0 +1,8 @@ +{{ .Chart.Name }} was installed . + +Your release is named {{ .Release.Name }}. + +To learn more about the release, try: + + $ helm status {{ .Release.Name }} + $ helm get {{ .Release.Name }} diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/_helpers.tpl b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/_helpers.tpl new file mode 100644 index 00000000..70e612ad --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/_helpers.tpl @@ -0,0 +1,32 @@ +{{/* vim: set filetype=mustache: */}} +{{/* +Expand the name of the chart. +*/}} +{{- define "kmra-oran-netopeer2-server.name" -}} +{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" -}} +{{- end -}} + +{{/* +Create a default fully qualified app name. +We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec). +If release name contains chart name it will be used as a full name. +*/}} +{{- define "kmra-oran-netopeer2-server.fullname" -}} +{{- if .Values.fullnameOverride -}} +{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" -}} +{{- else -}} +{{- $name := default .Chart.Name .Values.nameOverride -}} +{{- if contains $name .Release.Name -}} +{{- .Release.Name | trunc 63 | trimSuffix "-" -}} +{{- else -}} +{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}} +{{- end -}} +{{- end -}} +{{- end -}} + +{{/* +Create chart name and version as used by the chart label. +*/}} +{{- define "kmra-oran-netopeer2-server.chart" -}} +{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" -}} +{{- end -}} diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-deployment.yml b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-deployment.yml new file mode 100644 index 00000000..09ac5a70 --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-deployment.yml @@ -0,0 +1,143 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: {{ .Release.Name }} + labels: + app: {{ .Release.Name }} +spec: + replicas: 1 + strategy: + type: Recreate + selector: + matchLabels: + app: {{ .Release.Name }} + template: + metadata: + labels: + app: {{ .Release.Name }} + annotations: + sgx.intel.com/quote-provider: {{ .Release.Name }} + spec: + hostNetwork: false + serviceAccountName: {{ .Release.Name }} + initContainers: + - name: init-tmpfs + image: "{{ .Values.oran_netopeer2_server.init.image.repo }}/{{ .Values.oran_netopeer2_server.init.image.name }}:{{ .Values.oran_netopeer2_server.init.image.tag }}" + command: ['sh', '-c', "rm -rf /opt/intel/cryptoapitoolkit/tokens/*"] + securityContext: + runAsUser: 65533 + runAsGroup: {{ .Values.oran_netopeer2_server.sgx_prv_gid }} + supplementalGroups: + - {{ .Values.oran_netopeer2_server.sgx_gid }} + containers: + - name: {{ .Release.Name }} + image: "{{ .Values.oran_netopeer2_server.main.image.repo }}/{{ .Values.oran_netopeer2_server.main.image.name }}:{{ .Values.oran_netopeer2_server.main.image.tag }}" + imagePullPolicy: {{ .Values.oran_netopeer2_server.main.image.pullPolicy }} + ports: + - name: ctk-loader + containerPort: {{ .Values.oran_netopeer2_server.pkcs11_daemon_socket_port }} + envFrom: + - configMapRef: + name: {{ .Release.Name }}-env-cm + volumeMounts: + - name: mtls + mountPath: /opt/intel/ca + readOnly: true + - name: sgx-qcnl-conf + mountPath: /etc/sgx_default_qcnl.conf + subPath: sgx_default_qcnl.conf + readOnly: true + - name: tmpfs + mountPath: /tmp + subPath: tmp + - name: tmpfs + mountPath: /opt/intel/cryptoapitoolkit/tokens + subPath: tokens + - name: p11-proxy-tls-psk + mountPath: "{{ .Values.oran_netopeer2_server.pkcs11_proxy_tls_psk_file }}" + subPath: p11_proxy_tls.psk + readOnly: true + resources: + limits: + cpu: 500m + memory: 500Mi + sgx.intel.com/epc: "512Ki" + requests: + cpu: 200m + memory: 200Mi + securityContext: + readOnlyRootFilesystem: true + - name: {{ .Release.Name }}-oran + image: "{{ .Values.oran_netopeer2_server.oran.image.repo }}/{{ .Values.oran_netopeer2_server.oran.image.name }}:{{ .Values.oran_netopeer2_server.oran.image.tag }}" + imagePullPolicy: {{ .Values.oran_netopeer2_server.oran.image.pullPolicy }} + ports: + - name: netopeer-server + containerPort: {{ .Values.oran_netopeer2_server.oran.servicePort }} + envFrom: + - configMapRef: + name: {{ .Release.Name }}-oran-env + volumeMounts: + - name: tmpfs-oran + mountPath: /tmp + subPath: tmp + - name: p11-proxy-tls-psk + mountPath: "{{ .Values.oran_netopeer2_server.pkcs11_proxy_tls_psk_file }}" + subPath: p11_proxy_tls.psk + readOnly: true + - name: sysrepo-config + mountPath: /opt/intel/sysrepo_config + readOnly: true + resources: + limits: + cpu: 200m + memory: 300Mi + requests: + cpu: 100m + memory: 200Mi + securityContext: + runAsUser: 1000 + runAsGroup: 1000 + readOnlyRootFilesystem: true + imagePullSecrets: + - name: {{ .Values.oran_netopeer2_server.pullSecret }} + affinity: + nodeAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 1 + preference: + matchExpressions: + - key: app + operator: In + values: + - kmra + volumes: + - name: mtls + secret: + secretName: {{ .Release.Name }}-tls + items: + - key: tls.key + path: ctk_loadkey.key + - key: tls.cert + path: ctk_loadkey.crt + - key: ca.cert + path: ca.crt + - name: sgx-qcnl-conf + configMap: + name: {{ .Release.Name }}-qcnl-conf + - name: tmpfs + emptyDir: + medium: Memory + sizeLimit: 64Mi + - name: tmpfs-oran + emptyDir: + medium: Memory + sizeLimit: 64Mi + - name: p11-proxy-tls-psk + configMap: + name: {{ .Release.Name }}-p11-proxy-tls-psk-conf + - name: oran-env + configMap: + name: {{ .Release.Name }}-oran-env + - name: sysrepo-config + configMap: + name: oran-sysrepo-config diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-env-configmap.yml b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-env-configmap.yml new file mode 100644 index 00000000..00b8c2ae --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-env-configmap.yml @@ -0,0 +1,21 @@ +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ .Release.Name }}-env-cm + namespace: {{ .Release.Namespace }} +data: + http_proxy: {{ .Values.http_proxy | default "" | quote }} + https_proxy: {{ .Values.https_proxy | default "" | quote }} + no_proxy: {{ .Values.no_proxy | default "" | quote }} + PCCS_PORT: {{ .Values.oran_netopeer2_server.pccs_port | quote }} + PCCS_HOSTNAME: {{ .Values.oran_netopeer2_server.pccs_hostname | quote }} + APPHSM_PORT: {{ .Values.oran_netopeer2_server.apphsm_port | quote }} + APPHSM_HOSTNAME: {{ .Values.oran_netopeer2_server.apphsm_hostname | quote }} + CLIENT_TOKEN: {{ .Values.oran_netopeer2_server.client_token | quote }} + CLIENT_KEY_LABEL: {{ .Values.oran_netopeer2_server.client_key_label | quote }} + TEST_UNIQUE_UID: {{ .Values.oran_netopeer2_server.test_unique_uid | quote }} + DEFAULT_USER_PIN: {{ .Values.oran_netopeer2_server.default_user_pin | quote }} + DEFAULT_SO_PIN: {{ .Values.oran_netopeer2_server.default_so_pin | quote }} + DEFAULT_CLIENT_TOKEN_ID: {{ .Values.oran_netopeer2_server.default_client_token_id | quote }} + PKCS11_PROXY_TLS_PSK_FILE: {{ .Values.oran_netopeer2_server.pkcs11_proxy_tls_psk_file | quote }} diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-oran-env-configmap.yml b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-oran-env-configmap.yml new file mode 100644 index 00000000..662f8e9f --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-oran-env-configmap.yml @@ -0,0 +1,19 @@ +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ .Release.Name }}-oran-env + namespace: {{ .Release.Namespace }} +data: + http_proxy: {{ .Values.http_proxy | default "" | quote }} + https_proxy: {{ .Values.https_proxy | default "" | quote }} + no_proxy: {{ .Values.no_proxy | default "" | quote }} + CLIENT_TOKEN: {{ .Values.oran_netopeer2_server.client_token | quote }} + CLIENT_KEY_LABEL: {{ .Values.oran_netopeer2_server.client_key_label | quote }} + TEST_UNIQUE_UID: {{ .Values.oran_netopeer2_server.test_unique_uid | quote }} + DEFAULT_USER_PIN: {{ .Values.oran_netopeer2_server.default_user_pin | quote }} + DEFAULT_SO_PIN: {{ .Values.oran_netopeer2_server.default_so_pin | quote }} + DEFAULT_CLIENT_TOKEN_ID: {{ .Values.oran_netopeer2_server.default_client_token_id | quote }} + PKCS11_PROXY_TLS_PSK_FILE: {{ .Values.oran_netopeer2_server.pkcs11_proxy_tls_psk_file | quote }} + PKCS11_PROXY_SOCKET: "tls://{{ .Values.oran_netopeer2_server.pkcs11_daemon_socket_hostname }}:{{ .Values.oran_netopeer2_server.pkcs11_daemon_socket_port }}" + NETOPEER_TYPE: {{ .Values.oran_netopeer2_server.oran.type | quote }} diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-pkcs11-proxy-tls-psk-configmap.yaml b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-pkcs11-proxy-tls-psk-configmap.yaml new file mode 100644 index 00000000..4dfc2858 --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-pkcs11-proxy-tls-psk-configmap.yaml @@ -0,0 +1,9 @@ +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ .Release.Name }}-p11-proxy-tls-psk-conf + namespace: {{ .Release.Namespace }} +data: + p11_proxy_tls.psk: | + {{ .Values.oran_netopeer2_server.pkcs11_proxy_tls_psk }} diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-qcnl-configmap.yaml b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-qcnl-configmap.yaml new file mode 100644 index 00000000..564d3df5 --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-qcnl-configmap.yaml @@ -0,0 +1,11 @@ +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ .Release.Name }}-qcnl-conf + namespace: {{ .Release.Namespace }} +data: + sgx_default_qcnl.conf: | + PCCS_URL=https://{{ .Values.oran_netopeer2_server.pccs_hostname }}:{{ .Values.oran_netopeer2_server.pccs_port }}/sgx/certification/v3/ + # To accept insecure HTTPS cert, set this option to FALSE + USE_SECURE_CERT={{ (upper .Values.oran_netopeer2_server.use_secure_cert) }} diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-rbac-cluster-role-binding.yml b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-rbac-cluster-role-binding.yml new file mode 100644 index 00000000..da0d1956 --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-rbac-cluster-role-binding.yml @@ -0,0 +1,16 @@ +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: {{ .Release.Name }} +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: {{ .Release.Name }} +subjects: + - kind: ServiceAccount + name: {{ .Release.Name }} + namespace: "{{ .Release.Namespace }}" + - kind: Group + apiGroup: rbac.authorization.k8s.io + name: system:serviceaccounts diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-rbac-service-account.yml b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-rbac-service-account.yml new file mode 100644 index 00000000..ec2b2d57 --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-rbac-service-account.yml @@ -0,0 +1,5 @@ +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: {{ .Release.Name }} diff --git a/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-service.yaml b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-service.yaml new file mode 100644 index 00000000..b2643e42 --- /dev/null +++ b/roles/kmra_install/charts/kmra-oran-netopeer2-server/templates/kmra-oran-netopeer2-server-service.yaml @@ -0,0 +1,14 @@ +apiVersion: v1 +kind: Service +metadata: + name: {{ .Values.oran_netopeer2_server.oran.hostname }} + namespace: {{ .Release.Namespace }} + labels: + app: {{ .Release.Name }} +spec: + selector: + app: {{ .Release.Name }} + ports: + - protocol: TCP + port: {{ .Values.oran_netopeer2_server.oran.servicePort }} + targetPort: netopeer-server diff --git a/roles/kmra_install/charts/kmra-pccs/templates/kmra-pccs-configmap.yml b/roles/kmra_install/charts/kmra-pccs/templates/kmra-pccs-configmap.yml index 9be23d7a..c5f2c0b3 100644 --- a/roles/kmra_install/charts/kmra-pccs/templates/kmra-pccs-configmap.yml +++ b/roles/kmra_install/charts/kmra-pccs/templates/kmra-pccs-configmap.yml @@ -8,7 +8,7 @@ data: default.json: | { "HTTPS_PORT" : {{ .Values.pccs.main.port | quote }}, - "hosts" : {{ .Values.pccs.hostname | quote }}, + "hosts" : {{ .Values.pccs.listen_ip | quote }}, "uri": {{ .Values.pccs.main.sgx_provisioning_api_url | quote }}, "ApiKey": {{ .Values.pccs.main.api_key | quote }}, "proxy" : {{ .Values.https_proxy | default "" | quote }}, diff --git a/roles/kmra_install/charts/kmra-pccs/templates/kmra-pccs-deployment.yml b/roles/kmra_install/charts/kmra-pccs/templates/kmra-pccs-deployment.yml index c4a212b0..fe2b34b6 100644 --- a/roles/kmra_install/charts/kmra-pccs/templates/kmra-pccs-deployment.yml +++ b/roles/kmra_install/charts/kmra-pccs/templates/kmra-pccs-deployment.yml @@ -17,12 +17,21 @@ spec: labels: app: {{ .Release.Name }} spec: - hostNetwork: true + hostNetwork: false serviceAccountName: {{ .Release.Name }} containers: - name: {{ .Release.Name }} image: "{{ .Values.pccs.main.image.repo }}/{{ .Values.pccs.main.image.name }}:{{ .Values.pccs.main.image.tag }}" imagePullPolicy: {{ .Values.pccs.main.image.pullPolicy }} + ports: + - name: pccs + containerPort: {{ .Values.pccs.main.port }} + readinessProbe: + tcpSocket: + port: {{ .Values.pccs.main.port }} + initialDelaySeconds: 5 + periodSeconds: 5 + successThreshold: 2 volumeMounts: - name: pccs-config mountPath: /opt/intel/pccs/config @@ -75,4 +84,3 @@ spec: emptyDir: medium: Memory sizeLimit: 64Mi - diff --git a/roles/kmra_install/charts/kmra-pccs/templates/kmra-pccs-service.yml b/roles/kmra_install/charts/kmra-pccs/templates/kmra-pccs-service.yml new file mode 100644 index 00000000..e2f78107 --- /dev/null +++ b/roles/kmra_install/charts/kmra-pccs/templates/kmra-pccs-service.yml @@ -0,0 +1,14 @@ +apiVersion: v1 +kind: Service +metadata: + name: {{ .Values.pccs.hostname }} + namespace: {{ .Release.Namespace }} + labels: + app: {{ .Release.Name }} +spec: + selector: + app: {{ .Release.Name }} + ports: + - protocol: TCP + port: {{ .Values.pccs.main.port }} + targetPort: {{ .Values.pccs.main.port }} diff --git a/roles/kmra_install/defaults/main.yml b/roles/kmra_install/defaults/main/main.yml similarity index 85% rename from roles/kmra_install/defaults/main.yml rename to roles/kmra_install/defaults/main/main.yml index 241b2601..ece1fc24 100644 --- a/roles/kmra_install/defaults/main.yml +++ b/roles/kmra_install/defaults/main/main.yml @@ -32,19 +32,18 @@ kmra_defaults: image_repo: "docker.io" image_name: "intel/apphsm" # image_tag: "" + sbx_image_repo: "{{ registry_local_address }}" + sbx_image_name: "apphsm" + sbx_image_staging_location: "/tmp/apphsm/apphsm.sbx.tar" + sbx_image_checksum: "b32757e1263c6de52903ba31583e3d61274044f30d2270541c3d85c523224f02" + # sbx_image_tag: "" init_image_repo: "docker.io" init_image_name: "busybox" init_image_tag: "1.35" + port: 5000 upstream_port: 5000 - hostname: | - {%- if vm_enabled %} - {{ hostvars[groups['kube_node'][0]]['ansible_all_ipv4_addresses'] | - ansible.utils.ipaddr(hostvars[groups['vm_host'][0]]['vxlan_gw_ip']) | - join('') | - trim }} - {%- else %} - {{ hostvars[groups['kube_node'][0]]['ansible_default_ipv4']['address'] }} - {%- endif %} + listen_ip: "0.0.0.0" + hostname: kmra-apphsm test_ctk_loadkey_cert_user_id: "ctk_loadkey_user_id_01234" generic_client_cert_id: "generic_client_id_01234" default_user_pin: "1234" @@ -54,7 +53,8 @@ kmra_defaults: crt_subj: O: "AppHSM" OU: "AppHSM" - CN: "localhost" + # TODO create dynamically with values from namespace and hostname vars + CN: "kmra-apphsm.kmra.svc.{{ cluster_name | default('cluster.local') }}" app_keys: - id: "unique_id_1234" token_name: "token_1" @@ -89,12 +89,15 @@ kmra_defaults: release_name: "kmra-pccs" helm_values_file: "{{ (project_root_dir, 'charts', 'kmra-pccs-values.yml') | path_join }}" chart_path: "{{ (project_root_dir, 'charts', 'kmra-pccs') | path_join }}" + sbx_sgx_provisioning_api_url: "https://sbx.api.trustedservices.intel.com/sgx/certification/v3/" sgx_provisioning_api_url: "https://api.trustedservices.intel.com/sgx/certification/v3/" image_repo: "docker.io" image_name: "intel/pccs" # image_tag: "" upstream_port: 8081 - hostname: "localhost" + listen_ip: "0.0.0.0" + hostname: "kmra-pccs" + dns_name: "kmra-pccs.kmra.svc.{{ cluster_name | default('cluster.local') }}" crt_subj: O: "SGX-PCCS" OU: "root" diff --git a/roles/kmra_install/defaults/main/oran.yml b/roles/kmra_install/defaults/main/oran.yml new file mode 100644 index 00000000..ee311912 --- /dev/null +++ b/roles/kmra_install/defaults/main/oran.yml @@ -0,0 +1,117 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +# This var is not intended to be customized by user +# Define respective field in the group_vars/all.yml instead +kmra_oran: + apphsm: + oran_netopeer2_sample_tls_url: "https://raw.githubusercontent.com/CESNET/netopeer2/v2.1.62/example_configuration/" + oran_netopeer2_cert_user_id: "ctk_loadkey_user_id_01234" + enabled: true + app_keys: + - id: "unique_id_1234s" + token_name: "token_server" + pin: "1234" + key_name: "/opt/intel/custom_tls/server.key" + token_cert: "/opt/intel/custom_tls/server.crt" + token_key: "/opt/intel/custom_tls/server.key" + crt_subj: + O: "SampleOrganisation" + CN: "localhost" + - id: "unique_id_1234c" + token_name: "token_client" + pin: "1234" + key_name: "/opt/intel/custom_tls/client.key" + token_cert: "/opt/intel/custom_tls/client.crt" + token_key: "/opt/intel/custom_tls/client.key" + crt_subj: + O: "SampleOrganisation" + CN: "localhost" + oran: + enabled: true + local_build: true + oran_image_staging_location: "/tmp/oran/oran.tar" + oran_image_checksum: "553b91005e19f5dfee1b3052b69146f027022d51784988f6521341092253fa3b" + sw_provider_name: "oranprovider" + sw_provider_crt_subj: + O: "SampleOrganisation" + CN: "localhost" + sw_operator_name: "oranoperator" + sw_operator_crt_subj: + O: "SampleOrganisation" + CN: "localhost" + + oran_netopeer2_server: + enabled: true + release_name: "kmra-oran-netopeer2-server" + helm_values_file: "{{ (project_root_dir, 'charts', 'kmra-oran-netopeer2-server-values.yml') | path_join }}" + chart_path: "{{ (project_root_dir, 'charts', 'kmra-oran-netopeer2-server') | path_join }}" + image_repo: "{{ registry_local_address }}" + image_name: "oran/ctk_loadkey" + # image_tag: "" + init_image_repo: "{{ registry_local_address }}" + init_image_name: "oran/busybox" + init_image_tag: "1.35" + client_token: "token_server" + client_key_label: "client_key_priv" + test_unique_uid: "unique_id_1234s" + default_user_pin: "1234" + default_so_pin: "12345678" + default_client_token_id: "0xDEADBEEF" + use_secure_cert: false + oran_image_repo: "{{ registry_local_address }}" + oran_image_name: "oran/oran" + # oran_image_tag: "" + oran_netopeer2_server_hostname: "kmra-oran-netopeer2-server" + oran_netopeer2_server_port: 6513 + pkcs11_proxy_tls_psk: "test:e9622c85018998993fcc16f5ce9c15e9" + pkcs11_proxy_tls_psk_file: "/etc/p11_proxy_tls.psk" + pkcs11_daemon_socket_hostname: "127.0.0.1" + pkcs11_daemon_socket_port: 5657 + crt_subj: + O: "oran" + OU: "ctk_loadkey_user_id_01234" + + oran_netopeer2_client: + enabled: true + release_name: "kmra-oran-netopeer2-client" + helm_values_file: "{{ (project_root_dir, 'charts', 'kmra-oran-netopeer2-client-values.yml') | path_join }}" + chart_path: "{{ (project_root_dir, 'charts', 'kmra-oran-netopeer2-client') | path_join }}" + image_repo: "{{ registry_local_address }}" + image_name: "oran/ctk_loadkey" + # image_tag: "" + init_image_repo: "{{ registry_local_address }}" + init_image_name: "oran/busybox" + init_image_tag: "1.35" + client_token: "token_client" + client_key_label: "client_key_priv" + test_unique_uid: "unique_id_1234c" + default_user_pin: "1234" + default_so_pin: "12345678" + default_client_token_id: "0xDEADBEEF" + use_secure_cert: false + oran_image_repo: "{{ registry_local_address }}" + oran_image_name: "oran/oran" + # oran_image_tag: "" + oran_netopeer2_server_hostname: "kmra-oran-netopeer2-server" + oran_netopeer2_server_port: 6513 + pkcs11_proxy_tls_psk: "test:e9622c85018998993fcc16f5ce9c15e9" + pkcs11_proxy_tls_psk_file: "/etc/p11_proxy_tls.psk" + pkcs11_daemon_socket_hostname: "127.0.0.1" + pkcs11_daemon_socket_port: 5657 + crt_subj: + O: "oran" + OU: "ctk_loadkey_user_id_01234" diff --git a/roles/kmra_install/files/oran/0001-Fix-slotID.patch b/roles/kmra_install/files/oran/0001-Fix-slotID.patch new file mode 100644 index 00000000..8a8a42bd --- /dev/null +++ b/roles/kmra_install/files/oran/0001-Fix-slotID.patch @@ -0,0 +1,31 @@ +From d156b0d44722ef5de37685accd308a08f6ba0931 Mon Sep 17 00:00:00 2001 +From: Michal Motyl +Date: Wed, 13 Apr 2022 09:43:50 +0100 +Subject: [PATCH] Fix slotID + + +diff --git a/gck-rpc-dispatch.c b/gck-rpc-dispatch.c +index fd0ef38..9fb509a 100644 +--- a/gck-rpc-dispatch.c ++++ b/gck-rpc-dispatch.c +@@ -1063,7 +1063,7 @@ static CK_RV rpc_C_WaitForSlotEvent(CallState * cs) + BEGIN_CALL(C_WaitForSlotEvent); + IN_ULONG(flags); + PROCESS_CALL((flags, &slot_id, NULL)); +- slot_id = CK_GNOME_APPARTMENT_SLOT(slot_id); ++ //slot_id = CK_GNOME_APPARTMENT_SLOT(slot_id); + OUT_ULONG(slot_id); + END_CALL; + } +@@ -1194,7 +1194,7 @@ static CK_RV rpc_C_GetSessionInfo(CallState * cs) + BEGIN_CALL(C_GetSessionInfo); + IN_ULONG(session); + PROCESS_CALL((session, &info)); +- info.slotID = CK_GNOME_APPARTMENT_SLOT(info.slotID); ++ //info.slotID = CK_GNOME_APPARTMENT_SLOT(info.slotID); + OUT_SESSION_INFO(info); + END_CALL; + } +-- +2.17.1 + diff --git a/roles/kmra_install/files/oran/0003-libnetconf2-add-pkcs11-support.patch b/roles/kmra_install/files/oran/0003-libnetconf2-add-pkcs11-support.patch new file mode 100644 index 00000000..62ba61e9 --- /dev/null +++ b/roles/kmra_install/files/oran/0003-libnetconf2-add-pkcs11-support.patch @@ -0,0 +1,204 @@ +commit df3352a42d8fec1ff721655a743f03a4cd78eb66 +Author: Karpenko, Veronika +Date: Thu Jun 1 11:47:49 2023 +0000 + + pkcs11 support + +diff --git a/src/session_client_tls.c b/src/session_client_tls.c +index f95fd46..0959ef1 100644 +--- a/src/session_client_tls.c ++++ b/src/session_client_tls.c +@@ -22,6 +22,7 @@ + #include + #include + #include ++#include + + #include + #include +@@ -508,6 +509,10 @@ nc_client_tls_update_opts(struct nc_client_tls_opts *opts, const char *peername) + char *key; + X509_LOOKUP *lookup; + X509_VERIFY_PARAM *vpm = NULL; ++ const int CMD_MANDATORY = 0; ++ EVP_PKEY *pkey = NULL; ++ ENGINE * pkcs11 = NULL; ++ const char* opensc_pkcs11_so = getenv("MODULE"); + + if (!opts->tls_ctx || opts->tls_ctx_change) { + SSL_CTX_free(opts->tls_ctx); +@@ -540,13 +545,57 @@ nc_client_tls_update_opts(struct nc_client_tls_opts *opts, const char *peername) + } else { + key = opts->key_path; + } +- if (SSL_CTX_use_PrivateKey_file(opts->tls_ctx, key, SSL_FILETYPE_PEM) != 1) { +- ERR(NULL, "Loading the client private key from \'%s\' failed (%s).", key, +- ERR_reason_error_string(ERR_get_error())); ++ ++ ENGINE_load_dynamic(); ++ pkcs11 = ENGINE_by_id( "pkcs11" ); ++ if ( pkcs11 == NULL ) ++ { ++ ERR(NULL, "Error retrieving 'pkcs11' engine"); ++ rc = -1; ++ goto cleanup; ++ } ++ ++ if ( 0 != access( opensc_pkcs11_so, R_OK ) ) ++ { ++ ERR(NULL, "Error finding '/usr/local/lib/libpkcs11-proxy.so'"); + rc = -1; + goto cleanup; + } + ++ if ( 1 != ENGINE_ctrl_cmd_string( pkcs11, "MODULE_PATH", opensc_pkcs11_so, CMD_MANDATORY ) ) ++ { ++ ERR(NULL, "Error setting module_path <= '/usr/local/lib/libpkcs11-proxy.so'"); ++ rc = -1; ++ goto cleanup; ++ } ++ ++ if ( 1 != ENGINE_init( pkcs11 ) ) ++ { ++ ERR(NULL, "Error pkcs11: unable to initialize engine"); ++ rc = -1; ++ goto cleanup; ++ } ++ ++ if ( 1 != ENGINE_ctrl_cmd_string( pkcs11, "PIN", "1234", CMD_MANDATORY ) ) ++ { ++ ERR(NULL, "Error setting pin"); ++ rc = -1; ++ goto cleanup; ++ } ++ ++ pkey = ENGINE_load_private_key( pkcs11, key, NULL, NULL ); ++ if (!key) ++ { ++ ERR(NULL, "Error reading private key"); ++ rc = -1; ++ goto cleanup; ++ } ++ if ((SSL_CTX_use_PrivateKey(opts->tls_ctx, pkey) != 1)) ++ { ++ ERR(NULL, "Loading the client private key failed (%s).", ERR_reason_error_string(ERR_get_error())); ++ rc = -1; ++ goto cleanup; ++ } + if (!SSL_CTX_load_verify_locations(opts->tls_ctx, opts->ca_file, opts->ca_dir)) { + ERR(NULL, "Failed to load the locations of trusted CA certificates (%s).", + ERR_reason_error_string(ERR_get_error())); +@@ -617,6 +666,7 @@ nc_client_tls_update_opts(struct nc_client_tls_opts *opts, const char *peername) + + cleanup: + X509_VERIFY_PARAM_free(vpm); ++ EVP_PKEY_free(pkey); + return rc; + } + +diff --git a/src/session_server_tls.c b/src/session_server_tls.c +index 040836f..e5f814e 100644 +--- a/src/session_server_tls.c ++++ b/src/session_server_tls.c +@@ -18,6 +18,7 @@ + #include + #include + #include ++#include + + #include + #include +@@ -1770,7 +1771,12 @@ nc_tls_ctx_set_server_cert_key(SSL_CTX *tls_ctx, const char *cert_name) + int ret = 0; + NC_SSH_KEY_TYPE privkey_type; + X509 *cert = NULL; +- EVP_PKEY *pkey = NULL; ++ EVP_PKEY *key = NULL; ++ ENGINE * pkcs11 = NULL; ++ const int CMD_MANDATORY = 0; ++ const char* opensc_pkcs11_so = getenv("MODULE"); ++ const char* uri = getenv("TOKEN_KEY_URI"); ++ const char* pin = getenv("DEFAULT_USER_PIN"); + + if (!cert_name) { + ERR(NULL, "Server certificate not set."); +@@ -1803,26 +1809,70 @@ nc_tls_ctx_set_server_cert_key(SSL_CTX *tls_ctx, const char *cert_name) + } + + /* load the private key */ +- if (privkey_path) { +- if (SSL_CTX_use_PrivateKey_file(tls_ctx, privkey_path, SSL_FILETYPE_PEM) != 1) { +- ERR(NULL, "Loading the server private key failed (%s).", ERR_reason_error_string(ERR_get_error())); ++ if (privkey_path) { ++ if (SSL_CTX_use_PrivateKey_file(tls_ctx, privkey_path, SSL_FILETYPE_PEM) != 1) { ++ ERR(NULL, "1 Loading the server private key failed (%s).", ERR_reason_error_string(ERR_get_error())); ++ ret = -1; ++ goto cleanup; ++ } ++ } else { ++ ++ ENGINE_load_dynamic(); ++ pkcs11 = ENGINE_by_id( "pkcs11" ); ++ if ( pkcs11 == NULL ) ++ { ++ ERR(NULL, "Error retrieving 'pkcs11' engine"); + ret = -1; + goto cleanup; + } +- } else { +- pkey = base64der_to_privatekey(privkey_data, nc_keytype2str(privkey_type)); +- if (!pkey || (SSL_CTX_use_PrivateKey(tls_ctx, pkey) != 1)) { +- ERR(NULL, "Loading the server private key failed (%s).", ERR_reason_error_string(ERR_get_error())); ++ ++ if ( 0 != access( opensc_pkcs11_so, R_OK ) ) ++ { ++ ERR(NULL, "Error finding pkcs module"); ++ ret = -1; ++ goto cleanup; ++ } ++ ++ if ( 1 != ENGINE_ctrl_cmd_string( pkcs11, "MODULE_PATH", opensc_pkcs11_so, CMD_MANDATORY ) ) ++ { ++ ERR(NULL, "Error setting module_path"); ++ ret = -1; ++ goto cleanup; ++ } ++ ++ if ( 1 != ENGINE_init( pkcs11 ) ) ++ { ++ ERR(NULL, "Error pkcs11: unable to initialize engine"); + ret = -1; + goto cleanup; + } +- } + ++ if ( 1 != ENGINE_ctrl_cmd_string( pkcs11, "PIN", pin, CMD_MANDATORY ) ) ++ { ++ ERR(NULL, "Error setting pin"); ++ ret = -1; ++ goto cleanup; ++ } ++ ++ key = ENGINE_load_private_key( pkcs11, uri, NULL, NULL ); ++ if (!key) ++ { ++ ERR(NULL, "Error reading private key using uri"); ++ ret = -1; ++ goto cleanup; ++ } ++ ++ if ((SSL_CTX_use_PrivateKey(tls_ctx, key) != 1)) { ++ ERR(NULL, "Loading the server private key failed (%s).", ERR_reason_error_string(ERR_get_error())); ++ ret = -1; ++ goto cleanup; ++ } ++ } + ret = nc_tls_ctx_set_server_cert_chain(tls_ctx, cert_name); + + cleanup: + X509_free(cert); +- EVP_PKEY_free(pkey); ++ EVP_PKEY_free(key); + free(cert_path); + free(cert_data); + free(privkey_path); diff --git a/roles/kmra_install/files/oran/0004-netopeer2-comms-fix.patch b/roles/kmra_install/files/oran/0004-netopeer2-comms-fix.patch new file mode 100644 index 00000000..730bab8b --- /dev/null +++ b/roles/kmra_install/files/oran/0004-netopeer2-comms-fix.patch @@ -0,0 +1,13 @@ +diff --git a/src/common.c b/src/common.c +index 8606869..4d03959 100644 +--- a/src/common.c ++++ b/src/common.c +@@ -327,7 +327,7 @@ np2srv_new_session_cb(const char *UNUSED(client_name), struct nc_session *new_se + sr_session_set_orig_name(sr_sess, "netopeer2"); + nc_id = nc_session_get_id(new_session); + sr_session_push_orig_data(sr_sess, sizeof nc_id, &nc_id); +- username = nc_session_get_username(new_session); ++ username = "netopeer2"; //nc_session_get_username(new_session); + sr_session_push_orig_data(sr_sess, strlen(username) + 1, username); + + /* set NACM username for it to be applied */ diff --git a/roles/kmra_install/files/oran/Dockerfile b/roles/kmra_install/files/oran/Dockerfile new file mode 100644 index 00000000..699b5adc --- /dev/null +++ b/roles/kmra_install/files/oran/Dockerfile @@ -0,0 +1,109 @@ +FROM ubuntu:22.04 as builder + +RUN apt-get update -y && \ + apt-get upgrade -y && \ + DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \ + build-essential \ + git \ + libglib2.0-dev \ + libssl-dev \ + python3-pip \ + python3 \ + libseccomp-dev \ + cmake + +SHELL ["/bin/bash", "-o", "pipefail", "-c"] + +WORKDIR /workspace + +COPY openssl.cnf.j2 openssl.cnf.j2 + +RUN python3 -m pip install --no-cache-dir j2cli==0.3.10 +ENV openssl_install_path=/usr/ +ENV p11_module_path=/usr/local/lib/libpkcs11-proxy.so +RUN j2 openssl.cnf.j2 > /workspace/openssl.cnf + +RUN git clone https://github.com/SUNET/pkcs11-proxy.git /workspace/pkcs11-proxy +COPY 0001-Fix-slotID.patch /workspace +WORKDIR /workspace/pkcs11-proxy + +RUN git apply /workspace/0001-Fix-slotID.patch && \ + cmake . && make && make install + +FROM ubuntu:22.04 as runtime + +EXPOSE 6513 + +HEALTHCHECK --interval=5s --timeout=3s CMD curl -s --insecure https://localhost:6513 || exit 1 + +SHELL ["/bin/bash", "-o", "pipefail", "-c"] +RUN apt-get update && \ + DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \ + ncat \ + opensc \ + libengine-pkcs11-openssl \ + python3-pip \ + python3 \ + curl \ + g++ \ + git \ + libc6-dev \ + libglib2.0-dev \ + libssl-dev \ + libssh-dev \ + make \ + cmake && \ + apt-get clean && \ + rm -rf /var/log/*log /var/lib/apt/lists/* /var/log/apt/* /var/lib/dpkg/*-old /var/cache/debconf/*-old + +WORKDIR /oran/NetConfServer +RUN git clone https://github.com/CESNET/libyang.git +WORKDIR /oran/NetConfServer/libyang/build +RUN cmake -DENABLE_BUILD_TESTS=OFF .. && make && make install + +WORKDIR /oran/NetConfServer/libyang +RUN git clone https://github.com/sysrepo/sysrepo.git +WORKDIR /oran/NetConfServer/libyang/sysrepo/build +RUN cmake -DSHM_DIR=/tmp/shm -DREPO_PATH=/tmp/sysrepo -DENABLE_TESTS=OFF .. && make && make install +RUN ldconfig + +WORKDIR /oran/NetConfServer/libyang/sysrepo +RUN git clone https://github.com/CESNET/libnetconf2.git +WORKDIR /oran/NetConfServer/libyang/sysrepo/libnetconf2 +COPY 0003-libnetconf2-add-pkcs11-support.patch . +RUN git apply 0003-libnetconf2-add-pkcs11-support.patch +WORKDIR /oran/NetConfServer/libyang/sysrepo/libnetconf2/build +RUN cmake -DENABLE_TLS=ON -DENABLE_SSH=ON -DENABLE_DNSSEC=OFF .. && make && make install + +WORKDIR /oran/NetConfServer/libyang/sysrepo/libnetconf2 +RUN git clone https://github.com/CESNET/netopeer2.git +WORKDIR /oran/NetConfServer/libyang/sysrepo/libnetconf2/netopeer2 +COPY 0004-netopeer2-comms-fix.patch . +RUN git apply 0004-netopeer2-comms-fix.patch +WORKDIR /oran/NetConfServer/libyang/sysrepo/libnetconf2/netopeer2/build +RUN cmake .. && make && make install + +# create new user +ARG USER=kmra +ARG UID=1000 +ENV CTK_USER=${USER} + +RUN useradd -m -d /tmp ${USER} --uid=${UID} --user-group + +WORKDIR / + +COPY --from=builder /usr/local/lib/lib* /usr/local/lib/ +COPY --from=builder --chown=${USER}:${USER} /workspace/openssl.cnf /etc/openssl.cnf +COPY --from=builder /workspace/pkcs11-proxy/lib* /usr/local/lib/ + +ENV LD_LIBRARY_PATH="/usr/lib:/usr/local/lib" + +COPY oran_commands.sh /workspace/oran_commands.sh +RUN chmod u+x /workspace/oran_commands.sh && \ + mkdir /opt/intel/ && cp -R /tmp/sysrepo /opt/intel/ && \ + chown -R ${USER}:${USER} /opt/intel/sysrepo && \ + chown -R ${USER}:${USER} /workspace/oran_commands.sh + +USER ${USER} + +ENTRYPOINT ["/workspace/oran_commands.sh"] diff --git a/roles/kmra_install/files/oran/openssl.cnf.j2 b/roles/kmra_install/files/oran/openssl.cnf.j2 new file mode 100644 index 00000000..9c2b342b --- /dev/null +++ b/roles/kmra_install/files/oran/openssl.cnf.j2 @@ -0,0 +1,365 @@ +# +# OpenSSL example configuration file. +# This is mostly being used for generation of certificate requests. +# + +# Note that you can include other files from the main configuration +# file using the .include directive. +#.include filename + +# This definition stops the following lines choking if HOME isn't +# defined. +HOME = . + +# Extra OBJECT IDENTIFIER info: +#oid_file = $ENV::HOME/.oid +oid_section = new_oids + +# To use this configuration file with the "-extfile" option of the +# "openssl x509" utility, name here the section containing the +# X.509v3 extensions to use: +# extensions = +# (Alternatively, use a configuration file that has only +# X.509v3 extensions in its main [= default] section.) + +openssl_conf = openssl_p11_engines_init +[ new_oids ] + +# We can add new OIDs in here for use by 'ca', 'req' and 'ts'. +# Add a simple OID like this: +# testoid1=1.2.3.4 +# Or use config file substitution like this: +# testoid2=${testoid1}.5.6 + +# Policies used by the TSA examples. +tsa_policy1 = 1.2.3.4.1 +tsa_policy2 = 1.2.3.4.5.6 +tsa_policy3 = 1.2.3.4.5.7 + +#################################################################### +[ ca ] +default_ca = CA_default # The default ca section + +#################################################################### +[ CA_default ] + +dir = ./demoCA # Where everything is kept +certs = $dir/certs # Where the issued certs are kept +crl_dir = $dir/crl # Where the issued crl are kept +database = $dir/index.txt # database index file. +#unique_subject = no # Set to 'no' to allow creation of + # several certs with same subject. +new_certs_dir = $dir/newcerts # default place for new certs. + +certificate = $dir/cacert.pem # The CA certificate +serial = $dir/serial # The current serial number +crlnumber = $dir/crlnumber # the current crl number + # must be commented out to leave a V1 CRL +crl = $dir/crl.pem # The current CRL +private_key = $dir/private/cakey.pem# The private key +RANDFILE = $dir/private/.rand # private random number file + +x509_extensions = usr_cert # The extensions to add to the cert + +# Comment out the following two lines for the "traditional" +# (and highly broken) format. +name_opt = ca_default # Subject Name options +cert_opt = ca_default # Certificate field options + +# Extension copying option: use with caution. +# copy_extensions = copy + +# Extensions to add to a CRL. Note: Netscape communicator chokes on V2 CRLs +# so this is commented out by default to leave a V1 CRL. +# crlnumber must also be commented out to leave a V1 CRL. +# crl_extensions = crl_ext + +default_days = 365 # how long to certify for +default_crl_days= 30 # how long before next CRL +default_md = default # use public key default MD +preserve = no # keep passed DN ordering + +# A few difference way of specifying how similar the request should look +# For type CA, the listed attributes must be the same, and the optional +# and supplied fields are just that :-) +policy = policy_match + +# For the CA policy +[ policy_match ] +countryName = match +stateOrProvinceName = match +organizationName = match +organizationalUnitName = optional +commonName = supplied +emailAddress = optional + +# For the 'anything' policy +# At this point in time, you must list all acceptable 'object' +# types. +[ policy_anything ] +countryName = optional +stateOrProvinceName = optional +localityName = optional +organizationName = optional +organizationalUnitName = optional +commonName = supplied +emailAddress = optional + +#################################################################### +[ req ] +default_bits = 2048 +default_keyfile = privkey.pem +distinguished_name = req_distinguished_name +attributes = req_attributes +x509_extensions = v3_ca # The extensions to add to the self signed cert + +# Passwords for private keys if not present they will be prompted for +# input_password = secret +# output_password = secret + +# This sets a mask for permitted string types. There are several options. +# default: PrintableString, T61String, BMPString. +# pkix : PrintableString, BMPString (PKIX recommendation before 2004) +# utf8only: only UTF8Strings (PKIX recommendation after 2004). +# nombstr : PrintableString, T61String (no BMPStrings or UTF8Strings). +# MASK:XXXX a literal mask value. +# WARNING: ancient versions of Netscape crash on BMPStrings or UTF8Strings. +string_mask = utf8only + +# req_extensions = v3_req # The extensions to add to a certificate request + +[ req_distinguished_name ] +countryName = Country Name (2 letter code) +countryName_default = AU +countryName_min = 2 +countryName_max = 2 + +stateOrProvinceName = State or Province Name (full name) +stateOrProvinceName_default = Some-State + +localityName = Locality Name (eg, city) + +0.organizationName = Organization Name (eg, company) +0.organizationName_default = Internet Widgits Pty Ltd + +# we can do this but it is not needed normally :-) +#1.organizationName = Second Organization Name (eg, company) +#1.organizationName_default = World Wide Web Pty Ltd + +organizationalUnitName = Organizational Unit Name (eg, section) +#organizationalUnitName_default = + +commonName = Common Name (e.g. server FQDN or YOUR name) +commonName_max = 64 + +emailAddress = Email Address +emailAddress_max = 64 + +# SET-ex3 = SET extension number 3 + +[ req_attributes ] +challengePassword = A challenge password +challengePassword_min = 4 +challengePassword_max = 20 + +unstructuredName = An optional company name + +[ usr_cert ] + +# These extensions are added when 'ca' signs a request. + +# This goes against PKIX guidelines but some CAs do it and some software +# requires this to avoid interpreting an end user certificate as a CA. + +basicConstraints=CA:FALSE + +# Here are some examples of the usage of nsCertType. If it is omitted +# the certificate can be used for anything *except* object signing. + +# This is OK for an SSL server. +# nsCertType = server + +# For an object signing certificate this would be used. +# nsCertType = objsign + +# For normal client use this is typical +# nsCertType = client, email + +# and for everything including object signing: +# nsCertType = client, email, objsign + +# This is typical in keyUsage for a client certificate. +# keyUsage = nonRepudiation, digitalSignature, keyEncipherment + +# This will be displayed in Netscape's comment listbox. +nsComment = "OpenSSL Generated Certificate" + +# PKIX recommendations harmless if included in all certificates. +subjectKeyIdentifier=hash +authorityKeyIdentifier=keyid,issuer + +# This stuff is for subjectAltName and issuerAltname. +# Import the email address. +# subjectAltName=email:copy +# An alternative to produce certificates that aren't +# deprecated according to PKIX. +# subjectAltName=email:move + +# Copy subject details +# issuerAltName=issuer:copy + +#nsCaRevocationUrl = http://www.domain.dom/ca-crl.pem +#nsBaseUrl +#nsRevocationUrl +#nsRenewalUrl +#nsCaPolicyUrl +#nsSslServerName + +# This is required for TSA certificates. +# extendedKeyUsage = critical,timeStamping + +[ v3_req ] + +# Extensions to add to a certificate request + +basicConstraints = CA:FALSE +keyUsage = nonRepudiation, digitalSignature, keyEncipherment + +[ v3_ca ] + + +# Extensions for a typical CA + + +# PKIX recommendation. + +subjectKeyIdentifier=hash + +authorityKeyIdentifier=keyid:always,issuer + +basicConstraints = critical,CA:true + +# Key usage: this is typical for a CA certificate. However since it will +# prevent it being used as an test self-signed certificate it is best +# left out by default. +# keyUsage = cRLSign, keyCertSign + +# Some might want this also +# nsCertType = sslCA, emailCA + +# Include email address in subject alt name: another PKIX recommendation +# subjectAltName=email:copy +# Copy issuer details +# issuerAltName=issuer:copy + +# DER hex encoding of an extension: beware experts only! +# obj=DER:02:03 +# Where 'obj' is a standard or added object +# You can even override a supported extension: +# basicConstraints= critical, DER:30:03:01:01:FF + +[ crl_ext ] + +# CRL extensions. +# Only issuerAltName and authorityKeyIdentifier make any sense in a CRL. + +# issuerAltName=issuer:copy +authorityKeyIdentifier=keyid:always + +[ proxy_cert_ext ] +# These extensions should be added when creating a proxy certificate + +# This goes against PKIX guidelines but some CAs do it and some software +# requires this to avoid interpreting an end user certificate as a CA. + +basicConstraints=CA:FALSE + +# Here are some examples of the usage of nsCertType. If it is omitted +# the certificate can be used for anything *except* object signing. + +# This is OK for an SSL server. +# nsCertType = server + +# For an object signing certificate this would be used. +# nsCertType = objsign + +# For normal client use this is typical +# nsCertType = client, email + +# and for everything including object signing: +# nsCertType = client, email, objsign + +# This is typical in keyUsage for a client certificate. +# keyUsage = nonRepudiation, digitalSignature, keyEncipherment + +# This will be displayed in Netscape's comment listbox. +nsComment = "OpenSSL Generated Certificate" + +# PKIX recommendations harmless if included in all certificates. +subjectKeyIdentifier=hash +authorityKeyIdentifier=keyid,issuer + +# This stuff is for subjectAltName and issuerAltname. +# Import the email address. +# subjectAltName=email:copy +# An alternative to produce certificates that aren't +# deprecated according to PKIX. +# subjectAltName=email:move + +# Copy subject details +# issuerAltName=issuer:copy + +#nsCaRevocationUrl = http://www.domain.dom/ca-crl.pem +#nsBaseUrl +#nsRevocationUrl +#nsRenewalUrl +#nsCaPolicyUrl +#nsSslServerName + +# This really needs to be in place for it to be a proxy certificate. +proxyCertInfo=critical,language:id-ppl-anyLanguage,pathlen:3,policy:foo + +#################################################################### +[ tsa ] + +default_tsa = tsa_config1 # the default TSA section + +[ tsa_config1 ] + +# These are used by the TSA reply generation only. +dir = ./demoCA # TSA root directory +serial = $dir/tsaserial # The current serial number (mandatory) +crypto_device = builtin # OpenSSL engine to use for signing +signer_cert = $dir/tsacert.pem # The TSA signing certificate + # (optional) +certs = $dir/cacert.pem # Certificate chain to include in reply + # (optional) +signer_key = $dir/private/tsakey.pem # The TSA private key (optional) +signer_digest = sha256 # Signing digest to use. (Optional) +default_policy = tsa_policy1 # Policy if request did not specify it + # (optional) +other_policies = tsa_policy2, tsa_policy3 # acceptable policies (optional) +digests = sha1, sha256, sha384, sha512 # Acceptable message digests (mandatory) +accuracy = secs:1, millisecs:500, microsecs:100 # (optional) +clock_precision_digits = 0 # number of digits after dot. (optional) +ordering = yes # Is ordering defined for timestamps? + # (optional, default: no) +tsa_name = yes # Must the TSA name be included in the reply? + # (optional, default: no) +ess_cert_id_chain = no # Must the ESS cert id chain be included? + # (optional, default: no) +ess_cert_id_alg = sha1 # algorithm to compute certificate + # identifier (optional, default: sha1) + +# openssl engines configuration +[openssl_p11_engines_init] +engines = engine_section + +[engine_section] +pkcs11 = pkcs11_section + +[pkcs11_section] +engine_id = pkcs11 +dynamic_path = {{ openssl_install_path }}/lib/engines-1.1/libpkcs11.so +MODULE_PATH = {{ p11_module_path }} +init = 1 \ No newline at end of file diff --git a/roles/kmra_install/files/oran/oran_commands.sh b/roles/kmra_install/files/oran/oran_commands.sh new file mode 100644 index 00000000..ae5d897e --- /dev/null +++ b/roles/kmra_install/files/oran/oran_commands.sh @@ -0,0 +1,70 @@ +#!/bin/bash -e +export no_proxy="$no_proxy,localhost,127.0.0.1/8" + +export NETOPEER2_SERVER_HOSTNAME="${NETOPEER2_SERVER_HOSTNAME:=netopeer2-server}" +export NETOPEER2_SERVER_PORT="${NETOPEER2_SERVER_PORT:=6513}" + +export CLIENT_TOKEN="${CLIENT_TOKEN:=client_token}" +export CLIENT_KEY_LABEL="${CLIENT_KEY_LABEL:=client_key_priv}" +export TEST_UNIQUE_UID="${TEST_UNIQUE_UID:=unique_id_1234}" + +export DEFAULT_USER_PIN="${DEFAULT_USER_PIN:=1234}" +export DEFAULT_SO_PIN="${DEFAULT_SO_PIN:=12345678}" + +export PKCS11_PROXY_TLS_PSK_FILE="/etc/p11_proxy_tls.psk" + +set -eu + +# Import the certificate +pkcs11-tool --module=/usr/local/lib/libpkcs11-proxy.so -l -p "${DEFAULT_USER_PIN}" --type cert --read-object --label client_cert -o /tmp/cert.der + +# Convert DER to PEM +openssl x509 -inform der -in /tmp/cert.der -out /tmp/oran_cert.pem + +PKCS11_ENGINE_NAME="pkcs11" +TOKEN_KEY_URI="${PKCS11_ENGINE_NAME}:token=${CLIENT_TOKEN};object=${CLIENT_KEY_LABEL};pin-value=${DEFAULT_USER_PIN}" + +export MODULE=/usr/local/lib/libpkcs11-proxy.so +export OPENSSL_CONF=/etc/openssl.cnf +export ansible_user_id="${CTK_USER}" +export ctk_loadkey_token_cert_path=/tmp/oran_cert.pem +export token_key_uri="${TOKEN_KEY_URI}" + +cp -R /opt/intel/sysrepo /tmp/sysrepo + +# Configure sysrepo +cd /opt/intel/sysrepo_config +sysrepocfg --edit=tls_keystore.xml --format=xml --datastore=running --module=ietf-keystore -v3 +sysrepocfg --edit=tls_truststore.xml --format=xml --datastore=running --module=ietf-truststore -v3 +sysrepocfg --edit=tls_listen.xml --format=xml --datastore=running --module=ietf-netconf-server -v3 +sysrepocfg --copy-from=running --datastore=startup + +echo -e "\033[0;31m------------------------------------------------------------" +echo -e "KMRA (Key Management Reference Application) is a proof-of-concept" +echo -e "software not suitable for production usage. Please note that the enclave" +echo -e "is signed with a test signing key. A production enclave should go through" +echo -e "the process of signing an enclave as explained in the section Enclave" +echo -e "Signing Tool in the Intel(R) SGX Developer Reference for Linux* OS" +echo -e "(https://download.01.org/intel-sgx/latest/linux-latest/docs/)" +echo -e "---------------------------------------------------------------\033[0m" + +if [[ ${NETOPEER_TYPE} == "server" ]]; then + echo "Starting netopeer2-server..." + /usr/local/sbin/netopeer2-server -d -v3 -p /tmp/netopeer2-server.pid -f /tmp/.netopeer2-server +fi + +if [[ ${NETOPEER_TYPE} == "client" ]]; then + echo "Starting netopeer2-client..." + # add the root ca to netopeer2 server + echo -----BEGIN CERTIFICATE----- > /tmp/ca.pem + sysrepocfg --export -v3 --xpath "/ietf-truststore:truststore/certificates[name='cacerts']/certificate[name='cacert']" --format json | grep "\"cert\":" | awk '{print $2}' | sed "s/\"//g" >> /tmp/ca.pem + echo -----END CERTIFICATE----- >> /tmp/ca.pem + # the server certificate was issued for localhost + ncat -k -l localhost 6513 --sh-exec "ncat ${NETOPEER2_SERVER_HOSTNAME} 6513" & + netopeer2-cli <- + openssl req -nodes -x509 -newkey rsa:2048 + -keyout {{ (mtls_tmp_dir.path, 'ca.key') | path_join }} + -out {{ (mtls_tmp_dir.path, 'ca.crt') | path_join }} + -subj "/O={{ kmra.ca_root_crt_subj.O }}/OU={{ kmra.ca_root_crt_subj.OU }}/CN={{ kmra.ca_root_crt_subj.CN }}" + changed_when: true + +- name: generate csr for the cosign key + ansible.builtin.command: >- + openssl req -nodes -newkey rsa:2048 + -keyout {{ (mtls_tmp_dir.path, item.name) | path_join }}.key + -out {{ (mtls_tmp_dir.path, item.name) | path_join }}.csr + -subj "/O={{ item.subj.O | default('') }}/OU={{ item.subj.OU | default('') }}/CN={{ item.subj.CN | default('') }}" + loop: "{{ secrets }}" + changed_when: true + +- name: generate cert for the cosign key + ansible.builtin.shell: >- + set -o pipefail && + openssl x509 -req -in {{ (mtls_tmp_dir.path, item.name) | path_join }}.csr + -days {{ kmra.certs_validity_period_days }} + -CA {{ (mtls_tmp_dir.path, 'ca.crt') | path_join }} + -CAkey {{ (mtls_tmp_dir.path, 'ca.key') | path_join }} + {{ '-extfile <(printf "subjectAltName=DNS:' + item.subj.CN + '")' + if item.subj.CN | default('') | length > 0 }} + -CAcreateserial -CAserial {{ (mtls_tmp_dir.path, 'ca.srl' ) | path_join }} + -out {{ (mtls_tmp_dir.path, item.name) | path_join }}.crt + args: + executable: /bin/bash + loop: "{{ secrets }}" + changed_when: true + +- name: get GOPATH + ansible.builtin.command: go env GOPATH + register: gopath + changed_when: false + +- name: generate cosign password if not provided + ansible.builtin.set_fact: + cosign_password: "{{ lookup('ansible.builtin.password', '/dev/null') }}" + no_log: true + run_once: true + when: + - (cosign_password is not defined) or (not cosign_password) + +- name: import secret for cosign + ansible.builtin.command: >- + env COSIGN_PASSWORD={{ cosign_password }} {{ gopath.stdout }}/bin/cosign + import-key-pair --key {{ (mtls_tmp_dir.path, item.name) | path_join }}.key + --output-key-prefix {{ item.name }}.cosign + args: + chdir: "{{ mtls_tmp_dir.path }}" + loop: "{{ secrets }}" + changed_when: true + +- name: generate a list of all secrets files + ansible.builtin.find: + paths: "{{ mtls_tmp_dir.path }}" + file_type: file + recurse: no + register: secret_list + +- name: read all secrets + ansible.builtin.slurp: + src: "{{ item.path }}" + register: secret_files + no_log: true + loop: "{{ secret_list.files }}" + +- name: set fact of all secrets + ansible.builtin.set_fact: + "cosign_{{ item['source'] | basename | replace('.','')}}": "{{ item['content'] | replace(\"'\",'') }}" + no_log: true + loop: "{{ secret_files.results }}" + +- name: create provider and operator secrets for cosign + kubernetes.core.k8s: + state: present + definition: + apiVersion: v1 + kind: Secret + type: Opaque + metadata: + name: "{{ item.name }}-cosign" + namespace: "{{ item.namespace | default(kmra.namespace) }}" + data: + cosign.ca: "{{ cosign_cacrt }}" + cosign.cert: "{{ hostvars[inventory_hostname]['cosign_' + item.name + 'crt'] }}" + cosign.key: "{{ hostvars[inventory_hostname]['cosign_' + item.name + 'cosignkey'] }}" + cosign.pub: "{{ hostvars[inventory_hostname]['cosign_' + item.name + 'cosignpub'] }}" + stringData: + cosign.password: "{{ cosign_password }}" + loop: "{{ secrets }}" + +- name: create pubkey secret for policy-controller + kubernetes.core.k8s: + state: present + definition: + apiVersion: v1 + kind: Secret + type: Opaque + metadata: + name: "{{ item.name }}-cosign-pubkey" + namespace: "{{ cosign_namespace }}" + data: + cosign.pub: "{{ hostvars[inventory_hostname]['cosign_' + item.name + 'cosignpub'] }}" + loop: "{{ secrets }}" + +- name: clean up tmp directory + ansible.builtin.file: + path: "{{ mtls_tmp_dir.path }}" + state: absent diff --git a/roles/kmra_install/tasks/create_custom_tls_configmap.yml b/roles/kmra_install/tasks/create_custom_tls_configmap.yml new file mode 100644 index 00000000..6e7f6588 --- /dev/null +++ b/roles/kmra_install/tasks/create_custom_tls_configmap.yml @@ -0,0 +1,134 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: create tmp dir for keys-certs + ansible.builtin.tempfile: + state: directory + suffix: kmra-custom-tls + register: tls_tmp_dir + +- name: download netopeer2 server and client keys-certs + ansible.builtin.get_url: + url: "{{ (kmra.apphsm.oran_netopeer2_sample_tls_url, 'tls_certs', item) | path_join }}" + dest: "{{ (tls_tmp_dir.path, item) | path_join }}" + mode: 0644 + loop: + - "server.key" + - "server.crt" + - "client.key" + - "client.crt" + +- name: generate a list of all secrets files + ansible.builtin.find: + paths: "{{ tls_tmp_dir.path }}" + file_type: file + recurse: no + register: secret_list + +- name: read all secrets + ansible.builtin.slurp: + src: "{{ item.path }}" + register: secret_files + no_log: true + loop: "{{ secret_list.files }}" + +- name: set fact of all secrets + ansible.builtin.set_fact: + "custom_tls_{{ item['source'] | basename | replace('.','')}}": "{{ item['content'] | replace(\"'\",'') | b64decode }}" + no_log: true + loop: "{{ secret_files.results }}" + +- name: create configmap for the kmra app custom-tls + kubernetes.core.k8s: + state: present + definition: + apiVersion: v1 + kind: ConfigMap + metadata: + name: kmra-apphsm-custom-config + namespace: "{{ kmra.namespace }}" + data: + server.key: "{{ custom_tls_serverkey }}" + server.crt: "{{ custom_tls_servercrt }}" + client.key: "{{ custom_tls_clientkey }}" + client.crt: "{{ custom_tls_clientcrt }}" + +- name: clean up tmp directory + ansible.builtin.file: + path: "{{ tls_tmp_dir.path }}" + state: absent + +- name: create tmp dir for sysrepo + ansible.builtin.tempfile: + state: directory + suffix: oran-sysrepo + register: sys_tmp_dir + +- name: download netopeer2 sysrepo config + ansible.builtin.get_url: + url: "{{ (kmra.apphsm.oran_netopeer2_sample_tls_url, item) | path_join }}" + dest: "{{ (sys_tmp_dir.path, item) | path_join }}" + mode: 0644 + loop: + - "tls_keystore.xml" + - "tls_listen.xml" + - "tls_truststore.xml" + +- name: hide the private key as we use pkcs11 to load it from ctk_loader + ansible.builtin.replace: + path: "{{ (sys_tmp_dir.path, 'tls_keystore.xml') | path_join }}" + regexp: '.+' + replace: "{{ ('pkcs11:token=token_server;object=client_key_priv;pin-value=' + \ + kmra.oran_netopeer2_server.default_user_pin + ';') | b64encode }}" + +- name: generate a list of all sysrepo files + ansible.builtin.find: + paths: "{{ sys_tmp_dir.path }}" + file_type: file + recurse: no + register: sysrepo_list + +- name: read all sysrepo files + ansible.builtin.slurp: + src: "{{ item.path }}" + register: sysrepo_files + no_log: true + loop: "{{ sysrepo_list.files }}" + +- name: set fact of all sysrepo files + ansible.builtin.set_fact: + "sysrepo_{{ item['source'] | basename | replace('.','')}}": "{{ item['content'] | replace(\"'\",'') | b64decode }}" + no_log: true + loop: "{{ sysrepo_files.results }}" + +- name: create configmap for the netopeer2 app + kubernetes.core.k8s: + state: present + definition: + apiVersion: v1 + kind: ConfigMap + metadata: + name: oran-sysrepo-config + namespace: "{{ cosign_enforce_namespace }}" + data: + tls_keystore.xml: "{{ sysrepo_tls_keystorexml }}" + tls_listen.xml: "{{ sysrepo_tls_listenxml }}" + tls_truststore.xml: "{{ sysrepo_tls_truststorexml }}" + +- name: clean up tmp directory + ansible.builtin.file: + path: "{{ sys_tmp_dir.path }}" + state: absent diff --git a/roles/kmra_install/tasks/create_tls_secrets.yml b/roles/kmra_install/tasks/create_tls_secrets.yml index 6d540973..28b8c3b6 100644 --- a/roles/kmra_install/tasks/create_tls_secrets.yml +++ b/roles/kmra_install/tasks/create_tls_secrets.yml @@ -46,8 +46,7 @@ -days {{ kmra.certs_validity_period_days }} -CA {{ (mtls_tmp_dir.path, 'ca.crt') | path_join }} -CAkey {{ (mtls_tmp_dir.path, 'ca.key') | path_join }} - {{ '-extfile <(printf "subjectAltName=DNS:' + item.subj.CN + ',IP:' - + hostvars[groups['kube_node'][0]]['ansible_default_ipv4']['address'] + '")' + {{ '-extfile <(printf "subjectAltName=DNS:' + item.subj.CN + '")' if item.subj.CN | default('') | length > 0 }} -CAcreateserial -CAserial {{ (mtls_tmp_dir.path, 'ca.srl' ) | path_join }} -out {{ (mtls_tmp_dir.path, item.name) | path_join }}.crt @@ -67,14 +66,8 @@ openssl x509 -req -in {{ (mtls_tmp_dir.path, item.name) | path_join }}.csr -CA {{ (mtls_tmp_dir.path, 'ca.crt') | path_join }} -CAkey {{ (mtls_tmp_dir.path, 'ca.key') | path_join }} - {{ '-extfile <(printf "subjectAltName=DNS:' - + item.subj.CN - + ',IP:' - + hostvars[groups['kube_node'][0]]['ansible_all_ipv4_addresses'] | - ansible.utils.ipaddr(hostvars[groups['vm_host'][0]]['vxlan_gw_ip']) | - join('') | trim + '")' - if item.subj.CN | default('') | length > 0 - }} + {{ '-extfile <(printf "subjectAltName=DNS:' + item.subj.CN + '")' + if item.subj.CN | default('') | length > 0 }} -CAcreateserial -CAserial {{ (mtls_tmp_dir.path, 'ca.srl' ) | path_join }} -out {{ (mtls_tmp_dir.path, item.name) | path_join }}.crt args: @@ -93,7 +86,7 @@ --from-file=tls.cert={{ (mtls_tmp_dir.path, item.name) | path_join }}.crt --from-file=tls.key={{ (mtls_tmp_dir.path, item.name) | path_join }}.key --from-file=ca.cert={{ (mtls_tmp_dir.path, 'ca.crt') | path_join }} - -n {{ kmra.namespace }} + -n {{ item.namespace | default(kmra.namespace) }} -o yaml --dry-run=client | kubectl apply -f - args: executable: /bin/bash diff --git a/roles/kmra_install/tasks/kmra_oran_preflight.yml b/roles/kmra_install/tasks/kmra_oran_preflight.yml new file mode 100644 index 00000000..3cc5c05d --- /dev/null +++ b/roles/kmra_install/tasks/kmra_oran_preflight.yml @@ -0,0 +1,54 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: check KMRA oran mode requirements + assert: + that: + - kmra.pccs.enabled + - kmra.apphsm.enabled + - not (kmra.ctk_loadkey_demo.enabled | default(false)) + - kmra.oran_netopeer2_server.enabled or kmra.oran_netopeer2_client.enabled + - sigstore_policy_controller_install + fail_msg: >- + KMRA oran installation requires pccs and apphsm set to 'true', ctk_loadkey_demo to 'false', + sigstore_policy_controller set 'true', netopeer2 server or client set 'true' + success_msg: "KMRA oran requirements verified" + +- name: check KMRA oran container runtime requirements + assert: + that: + - container_runtime == "docker" + fail_msg: "KMRA oran installation requires container_runtime set to 'docker'" + success_msg: "KMRA oran container requirements verified" + +- name: check oran docker image + when: + - not (kmra.oran.local_build | default(false)) + block: + - name: check if oran docker image exists + delegate_to: localhost + become: false + stat: + path: "{{ kmra_oran.oran.oran_image_staging_location }}" + checksum_algorithm: sha256 + register: provided_oran + + - name: check the oran image integrity + assert: + that: "provided_oran.stat.checksum == '{{ kmra_oran.oran.oran_image_checksum }}'" + msg: + - File {{ kmra_oran.oran.oran_image_staging_location }} on localhost is NOT the expected one. + - Please provide the correct file. diff --git a/roles/kmra_install/tasks/kmra_sbx_preflight.yml b/roles/kmra_install/tasks/kmra_sbx_preflight.yml new file mode 100644 index 00000000..9b27a9d0 --- /dev/null +++ b/roles/kmra_install/tasks/kmra_sbx_preflight.yml @@ -0,0 +1,43 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: check KMRA sbx apphsm requirements + assert: + that: + - kmra.apphsm.enabled + msg: "KMRA sbx is on top of apphsm, make sure apphsm is enabled" + +- name: check KMRA sbx container runtime requirements + assert: + that: + - container_runtime == "docker" + fail_msg: "KMRA sbx installation requires container_runtime set to 'docker'" + success_msg: "KMRA sbx container requirements verified" + +- name: check if sbx docker image exists + delegate_to: localhost + become: false + stat: + path: "{{ kmra_defaults.apphsm.sbx_image_staging_location }}" + checksum_algorithm: sha256 + register: provided_sbx + +- name: check the sbx image integrity + assert: + that: "provided_sbx.stat.checksum == '{{ kmra_defaults.apphsm.sbx_image_checksum }}'" + msg: + - File {{ kmra_defaults.apphsm.sbx_image_staging_location }} on localhost is NOT the expected one. + - Please provide the correct file. diff --git a/roles/kmra_install/tasks/main.yml b/roles/kmra_install/tasks/main.yml index f03a9b88..2ed31c31 100644 --- a/roles/kmra_install/tasks/main.yml +++ b/roles/kmra_install/tasks/main.yml @@ -14,67 +14,39 @@ ## limitations under the License. ## --- +- name: combine defaults and user provided vars + set_fact: + kmra: "{{ kmra_oran | combine(kmra | default({}), recursive=True) }}" + no_log: true + when: + - kmra.oran.enabled | default(false) + - name: combine defaults and user provided vars set_fact: kmra: "{{ kmra_defaults | combine(kmra | default({}), recursive=True) }}" no_log: true -- name: determine machine type - include_role: - name: check_machine_type +- name: include sigstore variables when needed + include_vars: "{{ (role_path, '../', 'sigstore_policy_controller/defaults/main.yml') | path_join }}" when: - - inventory_hostname in groups['kube_node'] or - inventory_hostname in groups['vm_host'] - - not on_vms | default (false) + - kmra.oran.enabled | default(false) - name: prepare worker node block: - - name: create sgx_prv group - group: + - name: ensure sgx_prv group exist + ansible.builtin.group: name: sgx_prv state: present when: - (ansible_distribution == "Ubuntu" and ansible_distribution_version >= '21.04') or (ansible_os_family == "RedHat" and ansible_distribution_version >= '8.4') - - name: add user to sgx_prv group - user: - name: "{{ ansible_user_id }}" - groups: sgx_prv - append: yes - when: - - (ansible_distribution == "Ubuntu" and ansible_distribution_version >= '21.04') - or (ansible_os_family == "RedHat" and ansible_distribution_version >= '8.4') - - - name: create udev rules - blockinfile: - path: /etc/udev/rules.d/10-sgx.rules - create: yes - mode: '0644' - block: | - SUBSYSTEM=="misc",KERNEL=="enclave",MODE="0666" - SUBSYSTEM=="misc",KERNEL=="provision",GROUP="sgx_prv",MODE="0660" - SUBSYSTEM=="sgx",KERNEL=="sgx/enclave",MODE="0666" - SUBSYSTEM=="sgx",KERNEL=="sgx/provision",MODE="0660" - SUBSYSTEM=="misc",KERNEL=="sgx_enclave",MODE="0666",SYMLINK+="sgx/enclave" - SUBSYSTEM=="misc",KERNEL=="sgx_provision",GROUP="sgx_prv",MODE="0660",SYMLINK+="sgx/provision" - when: - - (ansible_distribution == "Ubuntu" and ansible_distribution_version >= '21.04') - or (ansible_os_family == "RedHat" and ansible_distribution_version >= '8.4') - - - name: load udev rules - # noqa command-instead-of-shell - shell is used intentionally here - shell: udevadm trigger - when: - - (ansible_distribution == "Ubuntu" and ansible_distribution_version >= '21.04') - or (ansible_os_family == "RedHat" and ansible_distribution_version >= '8.4') - - name: determine sgx_prv GID - getent: + ansible.builtin.getent: database: group key: sgx_prv when: - - kmra.ctk_loadkey_demo.enabled | bool + - kmra.ctk_loadkey_demo.enabled or kmra.oran.enabled | default (false) - inventory_hostname == groups['kube_node'][0] - name: update aesmd/qcnl host settings @@ -164,6 +136,23 @@ subj: "{{ kmra.ctk_loadkey_demo.crt_subj }}", deploy: "{{ kmra.ctk_loadkey_demo.enabled | default(false) }}" } + - { + name: "{{ kmra.oran_netopeer2_server.release_name | default('') }}", + namespace: "{{ cosign_enforce_namespace | default('default') }}", + subj: "{{ kmra.oran_netopeer2_server.crt_subj | default('') }}", + deploy: "{{ kmra.oran_netopeer2_server.enabled | default(false) }}", + } + - { + name: "{{ kmra.oran_netopeer2_client.release_name | default('') }}", + namespace: "{{ cosign_enforce_namespace | default('default')}}", + subj: "{{ kmra.oran_netopeer2_client.crt_subj | default('')}}", + deploy: "{{ kmra.oran_netopeer2_client.enabled | default(false) }}", + } + + - name: prepare sbx apphsm images for pre-PRQ sgx + include_tasks: prepare_sbx_apphsm.yml + when: + - kmra.sbx | default(false) - name: create Helm charts directory if needed file: @@ -180,6 +169,8 @@ - {chart: 'kmra-pccs', deploy: "{{ kmra.pccs.enabled | default(false) }}"} - {chart: 'kmra-apphsm', deploy: "{{ kmra.apphsm.enabled | default(false) }}"} - {chart: 'kmra-ctk', deploy: "{{ kmra.ctk_loadkey_demo.enabled | default(false) }}"} + - {chart: 'kmra-oran-netopeer2-server', deploy: "{{ kmra.oran_netopeer2_server.enabled | default(false) }}"} + - {chart: 'kmra-oran-netopeer2-client', deploy: "{{ kmra.oran_netopeer2_client.enabled | default(false) }}"} when: - item.deploy @@ -220,6 +211,28 @@ dest: "{{ (kmra.ctk_loadkey_demo.chart_path, 'templates', 'kmra-ctk-loadkey-rbac-cluster-role.yml') | path_join }}", deploy: "{{ kmra.ctk_loadkey_demo.enabled | default(false) }}" } + - { + src: "kmra-oran-netopeer2-server-values.yaml.j2", + dest: "{{ (project_root_dir, 'charts', 'kmra-oran-netopeer2-server-values.yml') | path_join }}", + deploy: "{{ kmra.oran_netopeer2_server.enabled | default(false) }}" + } + - { + src: "kmra-oran-netopeer2-server-rbac-cluster-role.yml.j2", + dest: "{{ (kmra.oran_netopeer2_server.chart_path | default(project_root_dir), \ + 'templates', 'kmra-oran-netopeer2-server-rbac-cluster-role.yml') | path_join }}", + deploy: "{{ kmra.oran_netopeer2_server.enabled | default(false) }}" + } + - { + src: "kmra-oran-netopeer2-client-values.yaml.j2", + dest: "{{ (project_root_dir, 'charts', 'kmra-oran-netopeer2-client-values.yml') | path_join }}", + deploy: "{{ kmra.oran_netopeer2_client.enabled | default(false) }}" + } + - { + src: "kmra-oran-netopeer2-client-rbac-cluster-role.yml.j2", + dest: "{{ (kmra.oran_netopeer2_client.chart_path | default(project_root_dir), \ + 'templates', 'kmra-oran-netopeer2-client-rbac-cluster-role.yml') | path_join }}", + deploy: "{{ kmra.oran_netopeer2_client.enabled | default(false) }}" + } when: - item.deploy @@ -232,6 +245,22 @@ when: - kmra.pccs.enabled | default(false) + - name: Wait for pccs to start + kubernetes.core.k8s_info: + kind: Deployment + name: kmra-pccs + namespace: "{{ kmra.namespace }}" + wait: true + wait_condition: + reason: MinimumReplicasAvailable + type: Available + wait_timeout: 600 + + - name: create apphsm custom_tls for oran case + include_tasks: create_custom_tls_configmap.yml + when: + - kmra.oran.enabled | default(false) + - name: install KMRA AppHSM helm chart command: >- helm upgrade -i {{ kmra.apphsm.release_name }} @@ -260,5 +289,107 @@ {{ kmra.ctk_loadkey_demo.chart_path }} when: - kmra.ctk_loadkey_demo.enabled | default(false) + + - name: Wait for ctk_loadkey to start + kubernetes.core.k8s_info: + kind: Deployment + name: kmra-ctk + namespace: "{{ kmra.namespace }}" + wait: true + wait_condition: + reason: MinimumReplicasAvailable + type: Available + wait_timeout: 600 + when: + - kmra.ctk_loadkey_demo.enabled | default(false) + when: + - inventory_hostname == groups['kube_control_plane'][0] + +- name: deploy the oran ctk_loadkey based apps + block: + - name: create k8s tls secrets for oran apps cosign usage + include_tasks: create_cosign_tls_secrets.yml + vars: + secrets: + - { + name: "{{ kmra.oran.sw_provider_name }}", + namespace: "{{ cosign_enforce_namespace }}", + subj: "{{ kmra.oran.sw_provider_crt_subj }}", + deploy: true, + } + - { + name: "{{ kmra.oran.sw_operator_name }}", + namespace: "{{ cosign_enforce_namespace }}", + subj: "{{ kmra.oran.sw_operator_crt_subj }}", + deploy: true, + } + + - name: create enforce pubkey policy crd yaml + ansible.builtin.template: + src: "kmra-oran-key-cosign-verification.yaml.j2" + dest: "{{ (policy_controller_dir, 'kmra-oran-key-cosign-verification.yaml') | path_join }}" + force: yes + mode: preserve + + - name: apply enforce pubkey policy crd yaml + kubernetes.core.k8s: + state: present + src: "{{ (policy_controller_dir, 'kmra-oran-key-cosign-verification.yaml') | path_join }}" + + - name: prepare image for oran case + include_tasks: prepare_oran_image.yml + + - name: populate oran helm charts values templates and push to controller node + ansible.builtin.template: + src: "{{ item.src }}" + dest: "{{ item.dest }}" + force: yes + mode: preserve + loop: + - { + src: "kmra-oran-netopeer2-server-values.yaml.j2", + dest: "{{ (project_root_dir, 'charts', 'kmra-oran-netopeer2-server-values.yml') | path_join }}", + deploy: "{{ kmra.oran_netopeer2_server.enabled | default(false) }}" + } + - { + src: "kmra-oran-netopeer2-server-rbac-cluster-role.yml.j2", + dest: "{{ (kmra.oran_netopeer2_server.chart_path, 'templates', 'kmra-oran-netopeer2-server-rbac-cluster-role.yml') | path_join }}", + deploy: "{{ kmra.oran_netopeer2_server.enabled | default(false) }}" + } + - { + src: "kmra-oran-netopeer2-client-values.yaml.j2", + dest: "{{ (project_root_dir, 'charts', 'kmra-oran-netopeer2-client-values.yml') | path_join }}", + deploy: "{{ kmra.oran_netopeer2_client.enabled | default(false) }}" + } + - { + src: "kmra-oran-netopeer2-client-rbac-cluster-role.yml.j2", + dest: "{{ (kmra.oran_netopeer2_client.chart_path, 'templates', 'kmra-oran-netopeer2-client-rbac-cluster-role.yml') | path_join }}", + deploy: "{{ kmra.oran_netopeer2_client.enabled | default(false) }}" + } + when: + - item.deploy + + - name: install KMRA oran netopeer2 server helm chart + kubernetes.core.helm: + chart_ref: "{{ kmra.oran_netopeer2_server.chart_path }}" + release_name: "{{ kmra.oran_netopeer2_server.release_name }}" + release_namespace: "{{ cosign_enforce_namespace }}" + values_files: "{{ kmra.oran_netopeer2_server.helm_values_file }}" + wait: true + timeout: 4m0s + when: + - kmra.oran_netopeer2_server.enabled | default(false) + + - name: install KMRA oran netopeer2 client helm chart + kubernetes.core.helm: + chart_ref: "{{ kmra.oran_netopeer2_client.chart_path }}" + release_name: "{{ kmra.oran_netopeer2_client.release_name }}" + release_namespace: "{{ cosign_enforce_namespace }}" + values_files: "{{ kmra.oran_netopeer2_client.helm_values_file }}" + wait: true + timeout: 4m0s + when: + - kmra.oran_netopeer2_client.enabled | default(false) when: + - kmra.oran.enabled | default(false) - inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/kmra_install/tasks/prepare_oran_image.yml b/roles/kmra_install/tasks/prepare_oran_image.yml new file mode 100644 index 00000000..2377ec6a --- /dev/null +++ b/roles/kmra_install/tasks/prepare_oran_image.yml @@ -0,0 +1,152 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +# preflight already make sure container_manager == 'docker' +- name: check if oran container exists + community.docker.docker_image_info: + name: "{{ kmra.oran_netopeer2_server.oran_image_repo }}/{{ kmra.oran_netopeer2_server.oran_image_name }}:\ + {{ kmra.oran_netopeer2_server.oran_image_tag | default(kmra.image_tag) }}" + register: result + +- name: load pre-built oran container image + when: + - result.images | length != 1 + - not (kmra.oran.local_build | default(false)) + block: + - name: create tmp dir for container import + ansible.builtin.tempfile: + state: directory + suffix: docker-load + register: tmp_dir + + - name: copy oran image to controller node + ansible.builtin.copy: + src: "{{ kmra.oran.oran_image_staging_location }}" + dest: "{{ (tmp_dir.path, 'oran.tar') | path_join }}" + mode: 0755 + + - name: load oran image from archive and push to local registry + community.docker.docker_image: + name: "oran:latest" + repository: "{{ kmra.oran_netopeer2_server.oran_image_repo }}/{{ kmra.oran_netopeer2_server.oran_image_name }}:\ + {{ kmra.oran_netopeer2_server.oran_image_tag | default(kmra.image_tag) }}" + push: yes + source: load + load_path: "{{ (tmp_dir.path, 'oran.tar') | path_join }}" + + - name: clean up tmp directory + ansible.builtin.file: + path: "{{ tmp_dir.path }}" + state: absent + +- name: build local oran container image + when: + - result.images | length != 1 + - kmra.oran.local_build | default(false) + block: + - name: create tmp dir for container build + ansible.builtin.tempfile: + state: directory + suffix: docker-build + register: tmp_dir + + - name: copy oran dockerfiles to the controller node + ansible.builtin.copy: + src: "{{ item }}" + dest: "{{ tmp_dir.path }}" + mode: 0644 + with_fileglob: + - ./oran/* + + - name: build oran container and push to local registry + community.docker.docker_image: + name: "{{ kmra.oran_netopeer2_server.oran_image_repo }}/{{ kmra.oran_netopeer2_server.oran_image_name }}:\ + {{ kmra.oran_netopeer2_server.oran_image_tag | default(kmra.image_tag) }}" + repository: "{{ kmra.oran_netopeer2_server.oran_image_repo }}/{{ kmra.oran_netopeer2_server.oran_image_name }}:\ + {{ kmra.oran_netopeer2_server.oran_image_tag | default(kmra.image_tag) }}" + push: yes + source: build + build: + path: "{{ tmp_dir.path }}" + use_config_proxy: yes + + - name: clean up tmp directory + ansible.builtin.file: + path: "{{ tmp_dir.path }}" + state: absent + +- name: check if local ctk_loadkey exists + community.docker.docker_image_info: + name: "{{ kmra.oran_netopeer2_server.image_repo }}/{{ kmra.oran_netopeer2_server.image_name }}:\ + {{ kmra.oran_netopeer2_server.image_tag | default(kmra.image_tag) }}" + register: result + +- name: pull and tag ctk_loadkey image + when: result.images | length != 1 + community.docker.docker_image: + name: "{{ kmra.ctk_loadkey_demo.image_repo }}/{{ kmra.ctk_loadkey_demo.image_name}}:\ + {{ kmra.ctk_loadkey_demo.image_tag | default(kmra.image_tag) }}" + push: yes + source: pull + repository: "{{ kmra.oran_netopeer2_server.image_repo }}/{{ kmra.oran_netopeer2_server.image_name }}:\ + {{ kmra.oran_netopeer2_server.image_tag | default(kmra.image_tag) }}" + +- name: check if local busybox exists + community.docker.docker_image_info: + name: "{{ kmra.oran_netopeer2_server.init_image_repo }}/{{ kmra.oran_netopeer2_server.init_image_name }}:\ + {{ kmra.oran_netopeer2_server.init_image_tag | default(kmra.image_tag) }}" + register: result + +- name: pull and tag busybox image + when: result.images | length != 1 + community.docker.docker_image: + name: "{{ kmra.ctk_loadkey_demo.init_image_repo }}/{{ kmra.ctk_loadkey_demo.init_image_name}}:\ + {{ kmra.ctk_loadkey_demo.init_image_tag | default(kmra.image_tag) }}" + push: yes + source: pull + repository: "{{ kmra.oran_netopeer2_server.init_image_repo }}/{{ kmra.oran_netopeer2_server.init_image_name }}:\ + {{ kmra.oran_netopeer2_server.init_image_tag | default(kmra.image_tag) }}" + +- name: get GOPATH + ansible.builtin.command: go env GOPATH + register: gopath + changed_when: false + +- name: signing local registry images with sw provider key + ansible.builtin.command: >- + {{ gopath.stdout }}/bin/cosign sign -y + --key k8s://{{ cosign_enforce_namespace }}/{{ kmra.oran.sw_provider_name }}-cosign {{ item }} + with_items: + - "{{ kmra.oran_netopeer2_server.image_repo }}/{{ kmra.oran_netopeer2_server.image_name }}:\ + {{ kmra.oran_netopeer2_server.image_tag | default(kmra.image_tag) }}" + - "{{ kmra.oran_netopeer2_server.oran_image_repo }}/{{ kmra.oran_netopeer2_server.oran_image_name }}:\ + {{ kmra.oran_netopeer2_server.oran_image_tag | default(kmra.image_tag) }}" + - "{{ kmra.oran_netopeer2_server.init_image_repo }}/{{ kmra.oran_netopeer2_server.init_image_name }}:\ + {{ kmra.oran_netopeer2_server.init_image_tag | default(kmra.image_tag) }}" + changed_when: true + +- name: signing local registry images with sw operator key + ansible.builtin.command: >- + {{ gopath.stdout }}/bin/cosign sign -y + --key k8s://{{ cosign_enforce_namespace }}/{{ kmra.oran.sw_operator_name }}-cosign {{ item }} + with_items: + - "{{ kmra.oran_netopeer2_server.image_repo }}/{{ kmra.oran_netopeer2_server.image_name }}:\ + {{ kmra.oran_netopeer2_server.image_tag | default(kmra.image_tag) }}" + - "{{ kmra.oran_netopeer2_server.oran_image_repo }}/{{ kmra.oran_netopeer2_server.oran_image_name }}:\ + {{ kmra.oran_netopeer2_server.oran_image_tag | default(kmra.image_tag) }}" + - "{{ kmra.oran_netopeer2_server.init_image_repo }}/{{ kmra.oran_netopeer2_server.init_image_name }}:\ + {{ kmra.oran_netopeer2_server.init_image_tag | default(kmra.image_tag) }}" + changed_when: true diff --git a/roles/kmra_install/tasks/prepare_sbx_apphsm.yml b/roles/kmra_install/tasks/prepare_sbx_apphsm.yml new file mode 100644 index 00000000..f80263d8 --- /dev/null +++ b/roles/kmra_install/tasks/prepare_sbx_apphsm.yml @@ -0,0 +1,38 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +# preflight already make sure container_manager == 'docker' +- name: check if sbx apphsm exists + community.docker.docker_image_info: + name: "{{ kmra.apphsm.sbx_image_repo }}/{{ kmra.apphsm.sbx_image_name }}:{{ kmra.apphsm.sbx_image_tag | default(kmra.image_tag) }}" + register: result + +- name: check local sbx apphsm container image + when: result.images | length != 1 + block: + - name: copy sbx apphsm image to controller node + ansible.builtin.copy: + src: "{{ kmra.apphsm.sbx_image_staging_location }}" + dest: "{{ (project_root_dir, 'apphsm.sbx.tar') | path_join }}" + mode: 0755 + + - name: load sbx apphsm image from archive and push to local registry + community.docker.docker_image: + name: "apphsm:latest" + push: yes + repository: "{{ kmra.apphsm.sbx_image_repo }}/{{ kmra.apphsm.sbx_image_name }}:{{ kmra.apphsm.sbx_image_tag | default(kmra.image_tag) }}" + source: load + load_path: "{{ (project_root_dir, 'apphsm.sbx.tar') | path_join }}" diff --git a/roles/kmra_install/templates/kmra-apphsm-values.yaml.j2 b/roles/kmra_install/templates/kmra-apphsm-values.yaml.j2 index af5864ff..bd964473 100644 --- a/roles/kmra_install/templates/kmra-apphsm-values.yaml.j2 +++ b/roles/kmra_install/templates/kmra-apphsm-values.yaml.j2 @@ -12,12 +12,20 @@ no_proxy: "{{ proxy_env.no_proxy }}" apphsm: main: image: +{% if kmra.sbx == true %} + repo: "{{ kmra.apphsm.sbx_image_repo }}" + name: "{{ kmra.apphsm.sbx_image_name }}" + tag: "{{ kmra.apphsm.sbx_image_tag | default(kmra.image_tag) }}" +{% else %} repo: "{{ kmra.apphsm.image_repo }}" name: "{{ kmra.apphsm.image_name }}" tag: "{{ kmra.apphsm.image_tag | default(kmra.image_tag) }}" +{% endif %} pullPolicy: IfNotPresent - port: "{{ kmra.apphsm.upstream_port }}" - hostname: "{{ kmra.apphsm.hostname | trim}}" + port: "{{ kmra.apphsm.port }}" + servicePort: "{{ kmra.apphsm.upstream_port }}" + ip: "{{ kmra.apphsm.listen_ip }}" + hostname: "{{ kmra.apphsm.hostname }}" init: image: repo: "{{ kmra.apphsm.init_image_repo }}" @@ -25,11 +33,13 @@ apphsm: tag: "{{ kmra.apphsm.init_image_tag }}" pullPolicy: "IfNotPresent" pccs_port: "{{ kmra.pccs.upstream_port }}" - pccs_hostname: "{{ kmra.pccs.hostname }}" + pccs_hostname: "{{ kmra.pccs.dns_name }}" use_secure_cert: "{{ kmra.apphsm.use_secure_cert | quote }}" test_ctk_loadkey_cert_user_id: "{{ kmra.apphsm.test_ctk_loadkey_cert_user_id }}" generic_client_cert_id: "{{ kmra.apphsm.generic_client_cert_id }}" ctk_loadkey_demo_enabled: "{{ kmra.ctk_loadkey_demo.enabled | bool | lower }}" + oran: "{{ kmra.oran.enabled | bool | default(false) | lower }}" + oran_netopeer2_cert_user_id: "{{ kmra.apphsm.oran_netopeer2_cert_user_id | default('') }}" default_user_pin: "{{ kmra.apphsm.default_user_pin }}" default_so_pin: "{{ kmra.apphsm.default_so_pin }}" nonce_lifetime: "{{ kmra.apphsm.nonce_lifetime }}" diff --git a/roles/kmra_install/templates/kmra-ctk-values.yaml.j2 b/roles/kmra_install/templates/kmra-ctk-values.yaml.j2 index 6a6a3dda..50bf6f46 100644 --- a/roles/kmra_install/templates/kmra-ctk-values.yaml.j2 +++ b/roles/kmra_install/templates/kmra-ctk-values.yaml.j2 @@ -31,9 +31,9 @@ ctk_loadkey: port: "{{ kmra.ctk_loadkey_demo.nginx_demo_port }}" hostname: "{{ kmra.ctk_loadkey_demo.nginx_demo_server_name | default('0.0.0.0')}}" pccs_port: "{{ kmra.pccs.upstream_port }}" - pccs_hostname: "{{ kmra.pccs.hostname }}" + pccs_hostname: "{{ kmra.pccs.dns_name }}" apphsm_port: "{{ kmra.apphsm.upstream_port }}" - apphsm_hostname: "{{ kmra.apphsm.hostname | trim}}" + apphsm_hostname: "{{ kmra.apphsm.crt_subj.CN }}" sgx_prv_gid: "{{ hostvars[groups['kube_node'][0]]['getent_group']['sgx_prv'][1] | default('1002')}}" sgx_gid: "{{ hostvars[groups['kube_node'][0]]['getent_group']['sgx'][1] | default('107')}}" use_secure_cert: "{{ kmra.ctk_loadkey_demo.use_secure_cert | quote }}" diff --git a/roles/kmra_install/templates/kmra-oran-key-cosign-verification.yaml.j2 b/roles/kmra_install/templates/kmra-oran-key-cosign-verification.yaml.j2 new file mode 100644 index 00000000..c578f58e --- /dev/null +++ b/roles/kmra_install/templates/kmra-oran-key-cosign-verification.yaml.j2 @@ -0,0 +1,14 @@ +apiVersion: policy.sigstore.dev/v1alpha1 +kind: ClusterImagePolicy +metadata: + name: oran-cosign-image-policy-pubkey +spec: + images: + - glob: "{{ cosign_registry_address }}/oran/*" + authorities: + - key: + secretRef: + name: "{{ kmra.oran.sw_provider_name }}-cosign-pubkey" + - key: + secretRef: + name: "{{ kmra.oran.sw_operator_name }}-cosign-pubkey" diff --git a/roles/kmra_install/templates/kmra-oran-netopeer2-client-rbac-cluster-role.yml.j2 b/roles/kmra_install/templates/kmra-oran-netopeer2-client-rbac-cluster-role.yml.j2 new file mode 100644 index 00000000..e671faa9 --- /dev/null +++ b/roles/kmra_install/templates/kmra-oran-netopeer2-client-rbac-cluster-role.yml.j2 @@ -0,0 +1,14 @@ +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: {% raw %}{{ .Release.Name }}{% endraw %}{{''}} +rules: + - apiGroups: ["authentication.k8s.io"] + resources: + - tokenreviews + verbs: ["create"] + - apiGroups: ["authorization.k8s.io"] + resources: + - subjectaccessreviews + verbs: ["create"] diff --git a/roles/kmra_install/templates/kmra-oran-netopeer2-client-values.yaml.j2 b/roles/kmra_install/templates/kmra-oran-netopeer2-client-values.yaml.j2 new file mode 100644 index 00000000..51d96e1d --- /dev/null +++ b/roles/kmra_install/templates/kmra-oran-netopeer2-client-values.yaml.j2 @@ -0,0 +1,52 @@ +--- +{% if "http_proxy" in proxy_env %} +http_proxy: "{{ proxy_env.http_proxy }}" +{% endif %} +{% if "https_proxy" in proxy_env %} +https_proxy: "{{ proxy_env.https_proxy }}" +{% endif %} +{% if "no_proxy" in proxy_env %} +no_proxy: "{{ proxy_env.no_proxy }}" +{% endif %} + +oran_netopeer2_client: + main: + image: + repo: "{{ kmra.oran_netopeer2_client.image_repo }}" + name: "{{ kmra.oran_netopeer2_client.image_name }}" + tag: "{{ kmra.oran_netopeer2_client.image_tag | default(kmra.image_tag) }}" + pullPolicy: IfNotPresent + init: + image: + repo: "{{ kmra.oran_netopeer2_client.init_image_repo }}" + name: "{{ kmra.oran_netopeer2_client.init_image_name }}" + tag: "{{ kmra.oran_netopeer2_client.init_image_tag }}" + pullPolicy: IfNotPresent + oran: + type: client + image: + repo: "{{ kmra.oran_netopeer2_client.oran_image_repo }}" + name: "{{ kmra.oran_netopeer2_client.oran_image_name }}" + tag: "{{ kmra.oran_netopeer2_client.oran_image_tag | default(kmra.image_tag) }}" + pullPolicy: IfNotPresent + netopeer2_server_port: "{{ kmra.oran_netopeer2_client.oran_netopeer2_server_port }}" + netopeer2_server_name: "{{ kmra.oran_netopeer2_client.oran_netopeer2_server_hostname | default('oran_netopeer2_server')}}" + netopeer2_server_domain: {{ cluster_name | default('cluster.local') }} + pullSecret: "{{ container_registry_secret }}" + pccs_port: "{{ kmra.pccs.upstream_port }}" + pccs_hostname: "{{ kmra.pccs.dns_name }}" + apphsm_port: "{{ kmra.apphsm.upstream_port }}" + apphsm_hostname: "{{ kmra.apphsm.crt_subj.CN }}" + sgx_prv_gid: "{{ hostvars[groups['kube_node'][0]]['getent_group']['sgx_prv'][1] | default('1002')}}" + sgx_gid: "{{ hostvars[groups['kube_node'][0]]['getent_group']['sgx'][1] | default('107')}}" + use_secure_cert: "{{ kmra.oran_netopeer2_client.use_secure_cert | quote }}" + client_token: "{{ kmra.oran_netopeer2_client.client_token }}" + client_key_label: "{{kmra.oran_netopeer2_client.client_key_label }}" + test_unique_uid: "{{ kmra.oran_netopeer2_client.test_unique_uid }}" + default_user_pin: "{{ kmra.oran_netopeer2_client.default_user_pin }}" + default_so_pin: "{{ kmra.oran_netopeer2_client.default_so_pin }}" + default_client_token_id: "{{ kmra.oran_netopeer2_client.default_client_token_id }}" + pkcs11_proxy_tls_psk: "{{ kmra.oran_netopeer2_client.pkcs11_proxy_tls_psk }}" + pkcs11_proxy_tls_psk_file: "{{ kmra.oran_netopeer2_client.pkcs11_proxy_tls_psk_file }}" + pkcs11_daemon_socket_hostname: "{{ kmra.oran_netopeer2_client.pkcs11_daemon_socket_hostname }}" + pkcs11_daemon_socket_port: "{{ kmra.oran_netopeer2_client.pkcs11_daemon_socket_port }}" diff --git a/roles/kmra_install/templates/kmra-oran-netopeer2-server-rbac-cluster-role.yml.j2 b/roles/kmra_install/templates/kmra-oran-netopeer2-server-rbac-cluster-role.yml.j2 new file mode 100644 index 00000000..e671faa9 --- /dev/null +++ b/roles/kmra_install/templates/kmra-oran-netopeer2-server-rbac-cluster-role.yml.j2 @@ -0,0 +1,14 @@ +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: {% raw %}{{ .Release.Name }}{% endraw %}{{''}} +rules: + - apiGroups: ["authentication.k8s.io"] + resources: + - tokenreviews + verbs: ["create"] + - apiGroups: ["authorization.k8s.io"] + resources: + - subjectaccessreviews + verbs: ["create"] diff --git a/roles/kmra_install/templates/kmra-oran-netopeer2-server-values.yaml.j2 b/roles/kmra_install/templates/kmra-oran-netopeer2-server-values.yaml.j2 new file mode 100644 index 00000000..ab9eaae4 --- /dev/null +++ b/roles/kmra_install/templates/kmra-oran-netopeer2-server-values.yaml.j2 @@ -0,0 +1,51 @@ +--- +{% if "http_proxy" in proxy_env %} +http_proxy: "{{ proxy_env.http_proxy }}" +{% endif %} +{% if "https_proxy" in proxy_env %} +https_proxy: "{{ proxy_env.https_proxy }}" +{% endif %} +{% if "no_proxy" in proxy_env %} +no_proxy: "{{ proxy_env.no_proxy }}" +{% endif %} + +oran_netopeer2_server: + main: + image: + repo: "{{ kmra.oran_netopeer2_server.image_repo }}" + name: "{{ kmra.oran_netopeer2_server.image_name }}" + tag: "{{ kmra.oran_netopeer2_server.image_tag | default(kmra.image_tag) }}" + pullPolicy: IfNotPresent + init: + image: + repo: "{{ kmra.oran_netopeer2_server.init_image_repo }}" + name: "{{ kmra.oran_netopeer2_server.init_image_name }}" + tag: "{{ kmra.oran_netopeer2_server.init_image_tag }}" + pullPolicy: IfNotPresent + oran: + type: server + hostname: "{{ kmra.oran_netopeer2_server.oran_netopeer2_server_hostname }}" + servicePort: "{{ kmra.oran_netopeer2_server.oran_netopeer2_server_port }}" + image: + repo: "{{ kmra.oran_netopeer2_server.oran_image_repo }}" + name: "{{ kmra.oran_netopeer2_server.oran_image_name }}" + tag: "{{ kmra.oran_netopeer2_server.oran_image_tag | default(kmra.image_tag) }}" + pullPolicy: IfNotPresent + pullSecret: "{{ container_registry_secret }}" + pccs_port: "{{ kmra.pccs.upstream_port }}" + pccs_hostname: "{{ kmra.pccs.dns_name }}" + apphsm_port: "{{ kmra.apphsm.upstream_port }}" + apphsm_hostname: "{{ kmra.apphsm.crt_subj.CN }}" + sgx_prv_gid: "{{ hostvars[groups['kube_node'][0]]['getent_group']['sgx_prv'][1] | default('1002')}}" + sgx_gid: "{{ hostvars[groups['kube_node'][0]]['getent_group']['sgx'][1] | default('107')}}" + use_secure_cert: "{{ kmra.oran_netopeer2_server.use_secure_cert | quote }}" + client_token: "{{ kmra.oran_netopeer2_server.client_token }}" + client_key_label: "{{kmra.oran_netopeer2_server.client_key_label }}" + test_unique_uid: "{{ kmra.oran_netopeer2_server.test_unique_uid }}" + default_user_pin: "{{ kmra.oran_netopeer2_server.default_user_pin }}" + default_so_pin: "{{ kmra.oran_netopeer2_server.default_so_pin }}" + default_client_token_id: "{{ kmra.oran_netopeer2_server.default_client_token_id }}" + pkcs11_proxy_tls_psk: "{{ kmra.oran_netopeer2_server.pkcs11_proxy_tls_psk }}" + pkcs11_proxy_tls_psk_file: "{{ kmra.oran_netopeer2_server.pkcs11_proxy_tls_psk_file }}" + pkcs11_daemon_socket_hostname: "{{ kmra.oran_netopeer2_server.pkcs11_daemon_socket_hostname }}" + pkcs11_daemon_socket_port: "{{ kmra.oran_netopeer2_server.pkcs11_daemon_socket_port }}" diff --git a/roles/kmra_install/templates/kmra-pccs-values.yaml.j2 b/roles/kmra_install/templates/kmra-pccs-values.yaml.j2 index 1233a6de..a85c882e 100644 --- a/roles/kmra_install/templates/kmra-pccs-values.yaml.j2 +++ b/roles/kmra_install/templates/kmra-pccs-values.yaml.j2 @@ -11,12 +11,17 @@ pccs: tag: "{{ kmra.pccs.image_tag | default(kmra.image_tag) }}" pullPolicy: "IfNotPresent" port: "{{ kmra.pccs.upstream_port }}" +{% if kmra.sbx == true %} + sgx_provisioning_api_url: "{{ kmra.pccs.sbx_sgx_provisioning_api_url }}" +{% else %} sgx_provisioning_api_url: "{{ kmra.pccs.sgx_provisioning_api_url }}" +{% endif %} api_key: "{{ kmra.pccs.api_key }}" admin_pass: "{{ kmra.pccs.admin_pass | default('pccs_admin') | password_hash('sha512') }}" user_pass: "{{ kmra.pccs.user_pass | default('pccs_user') | password_hash('sha512') }}" log_level: "{{ kmra.pccs.log_level | default('info') }}" hostname: "{{ kmra.pccs.hostname }}" + listen_ip: "{{ kmra.pccs.listen_ip }}" db_name: "{{ kmra.pccs.db_name }}" db_user: "{{ kmra.pccs.db_user }}" db_password: "{{ kmra.pccs.db_passwd }}" diff --git a/roles/kube_prometheus/defaults/main.yml b/roles/kube_prometheus/defaults/main.yml index 66ad593e..6a919a9e 100644 --- a/roles/kube_prometheus/defaults/main.yml +++ b/roles/kube_prometheus/defaults/main.yml @@ -16,12 +16,13 @@ kube_prometheus_stack_directory: "{{ (project_root_dir, 'kube-prometheus-stack') | path_join }}" kube_prometheus_stack_namespace: monitoring -prometheus_stack_version: 2.42.0 -prometheus_operator_version: 0.63.0 -grafana_version: 9.3.6 +prometheus_stack_version: 2.43.0 +prometheus_operator_version: 0.64.1 +grafana_version: 9.4.7 node_exporter_version: 1.5.0 +kube_state_metrics_version: 2.8.2 -tas_demo_policy_dir: "{{ ('/tmp', 'node-metrics') | path_join }}" +tas_demo_policy_dir: "{{ (project_root_dir, 'tas-demo-policy') | path_join }}" # expose prometheus server API prometheus_srv_expose: false diff --git a/roles/cndp_install/tasks/wait_for_resources.yml b/roles/kube_prometheus/tasks/cleanup.yml similarity index 61% rename from roles/cndp_install/tasks/wait_for_resources.yml rename to roles/kube_prometheus/tasks/cleanup.yml index f3fa6ad3..cc92d06a 100644 --- a/roles/cndp_install/tasks/wait_for_resources.yml +++ b/roles/kube_prometheus/tasks/cleanup.yml @@ -14,18 +14,12 @@ ## limitations under the License. ## --- -- name: containerd | wait for containerd - command: "ctr images ls -q" - register: containerd_ready - retries: 8 - delay: 4 - until: containerd_ready.rc == 0 - changed_when: false - delegate_to: "{{ groups['kube_control_plane'][0] }}" - run_once: true - when: - - container_runtime == "containerd" +- name: delete previous grafana deployment + command: "kubectl delete -f {{ (kube_prometheus_stack_directory, 'grafana-deployment.yml') | path_join }}" + changed_when: true + failed_when: false -- name: wait for kubernetes - include_role: - name: wait_for_kubernetes_ready +- name: delete previous prometheus deployment + command: "kubectl delete -f {{ (kube_prometheus_stack_directory, 'prometheus-prometheus.yml') | path_join }}" + changed_when: true + failed_when: false diff --git a/roles/kube_prometheus/tasks/main.yml b/roles/kube_prometheus/tasks/main.yml index 27cc25cf..327c6801 100644 --- a/roles/kube_prometheus/tasks/main.yml +++ b/roles/kube_prometheus/tasks/main.yml @@ -14,6 +14,11 @@ ## limitations under the License. ## --- +- name: remove existing grafana and prometheus deployment + include_tasks: cleanup.yml + when: + - inventory_hostname == groups['kube_control_plane'][0] + - name: install dependencies ansible.builtin.include_role: name: install_dependencies @@ -118,16 +123,23 @@ when: - tas_enable_demo_policy | d(false) -- name: create persistent folder for grafana and prometheus +- name: create persistent folder for prometheus + ansible.builtin.file: + path: "/etc/prometheus" + state: directory + owner: 1000 + group: 1000 + mode: 0744 + when: + - inventory_hostname == groups['kube_node'][0] + +- name: create persistent folder for grafana ansible.builtin.file: - path: "{{ item }}" + path: "/etc/grafana" state: directory - owner: root - group: root + owner: 472 + group: 472 mode: 0744 - loop: - - /etc/grafana - - /etc/prometheus when: - inventory_hostname == groups['kube_node'][0] diff --git a/roles/kube_prometheus/templates/grafana-deployment.yml.j2 b/roles/kube_prometheus/templates/grafana-deployment.yml.j2 index 2206bc28..27569481 100644 --- a/roles/kube_prometheus/templates/grafana-deployment.yml.j2 +++ b/roles/kube_prometheus/templates/grafana-deployment.yml.j2 @@ -220,9 +220,10 @@ spec: nodeSelector: kubernetes.io/os: linux securityContext: - fsGroup: 65534 runAsNonRoot: true - runAsUser: 65534 + runAsGroup: 472 + runAsUser: 472 + fsGroup: 472 serviceAccountName: grafana volumes: - name: grafana-tls diff --git a/roles/kube_prometheus/templates/kubeStateMetrics-deployment.yaml.j2 b/roles/kube_prometheus/templates/kubeStateMetrics-deployment.yaml.j2 index 0053803a..0a1253a3 100644 --- a/roles/kube_prometheus/templates/kubeStateMetrics-deployment.yaml.j2 +++ b/roles/kube_prometheus/templates/kubeStateMetrics-deployment.yaml.j2 @@ -5,7 +5,7 @@ metadata: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 2.6.0 + app.kubernetes.io/version: {{ kube_state_metrics_version }} name: kube-state-metrics namespace: monitoring spec: @@ -23,7 +23,7 @@ spec: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 2.6.0 + app.kubernetes.io/version: {{ kube_state_metrics_version }} spec: automountServiceAccountToken: true containers: @@ -32,7 +32,7 @@ spec: - --port=8081 - --telemetry-host=127.0.0.1 - --telemetry-port=8082 - image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.6.0 + image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v{{ kube_state_metrics_version }} name: kube-state-metrics resources: limits: diff --git a/roles/kube_prometheus/templates/nodeExporter-daemonset.yml.j2 b/roles/kube_prometheus/templates/nodeExporter-daemonset.yml.j2 index 050eba25..48c84be4 100644 --- a/roles/kube_prometheus/templates/nodeExporter-daemonset.yml.j2 +++ b/roles/kube_prometheus/templates/nodeExporter-daemonset.yml.j2 @@ -146,7 +146,7 @@ spec: name: node-exporter-config {% if tas_enable_demo_policy |d(false) %} - hostPath: - path: /tmp/node-metrics + path: {{ tas_demo_policy_dir }} name: tas-demo-policy {% endif %} updateStrategy: diff --git a/roles/kubespray_check/tasks/check_kubespray_version.yml b/roles/kubespray_check/tasks/check_kubespray_version.yml new file mode 100644 index 00000000..ad6377d4 --- /dev/null +++ b/roles/kubespray_check/tasks/check_kubespray_version.yml @@ -0,0 +1,59 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +- name: Load requirements.yml file + set_fact: + ansible_reqs_file: "{{ lookup('file', collections_file) | from_yaml }}" + +- name: Load desired kubespray version + set_fact: + kubespray_version_desired: |- + {{ + (ansible_reqs_file.collections | + selectattr('name', '==', collection_name) | + list)[0].version + }} + +- name: Load kubespray directory state + ansible.builtin.stat: + path: "{{ kubespray_dir }}" + register: kubespray_stat + +- name: Check if kubespray module present + ansible.builtin.assert: + that: kubespray_stat.stat.exists + fail_msg: |- + Kubespray module not installed. + Please install the kubespray module and apply kubespray patch. + +- name: Load current version of kubespray + set_fact: + kubespray_version_current: "{{ lookup('file', version_file, errors='ignore') }}" + +- name: Check if desired version of kubespray present + ansible.builtin.assert: + that: kubespray_version_desired == kubespray_version_current + fail_msg: |- + Wrong kubespray version detected. + Please re-install the kubespray module and apply kubespray patch. + when: + - kubespray_version_current is defined + - kubespray_version_current + +- name: Write kubespray version + ansible.builtin.copy: + dest: "{{ version_file }}" + content: "{{ kubespray_version_desired }}" + mode: preserve diff --git a/roles/kubespray_check/vars/main.yml b/roles/kubespray_check/vars/main.yml new file mode 100644 index 00000000..1329e6d6 --- /dev/null +++ b/roles/kubespray_check/vars/main.yml @@ -0,0 +1,19 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +kubespray_dir: "{{ (role_path, '../..', 'collections/ansible_collections/kubernetes_sigs/kubespray') | path_join }}" +collections_file: "{{ (role_path, '../..', 'collections/requirements.yml') | path_join }}" +collection_name: "https://github.com/kubernetes-sigs/kubespray" +version_file: "{{ (kubespray_dir , 'kubespray_version') | path_join }}" diff --git a/roles/kubespray_mirrors/tasks/main.yml b/roles/kubespray_mirrors/tasks/main.yml index 9711cd8f..5bcefdfd 100644 --- a/roles/kubespray_mirrors/tasks/main.yml +++ b/roles/kubespray_mirrors/tasks/main.yml @@ -15,7 +15,7 @@ ## - name: Patch kubespray url mirrors replace: - path: "{{ playbook_dir }}/kubespray/roles/download/defaults/main.yml" + path: "{{ kubespray_dir }}/roles/download/defaults/main.yml" regexp: "(.*){{ item.original }}(.*)" replace: "\\1{{ item.mirror }}\\2" mode: 0644 diff --git a/roles/kubespray_mirrors/vars/main.yml b/roles/kubespray_mirrors/vars/main.yml new file mode 100644 index 00000000..7cc84a8f --- /dev/null +++ b/roles/kubespray_mirrors/vars/main.yml @@ -0,0 +1,16 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +kubespray_dir: "{{ (role_path, '../../collections/ansible_collections/kubernetes_sigs/kubespray') | path_join }}" diff --git a/roles/kubespray_patch/files/kubespray_crio.patch b/roles/kubespray_patch/files/kubespray_crio.patch index 5b06a698..d10d598a 100644 --- a/roles/kubespray_patch/files/kubespray_crio.patch +++ b/roles/kubespray_patch/files/kubespray_crio.patch @@ -1,8 +1,8 @@ diff --git a/roles/container-engine/cri-o/tasks/cleanup.yaml b/roles/container-engine/cri-o/tasks/cleanup.yaml -index 28c0c3af2..c7e9237bc 100644 +index ab06ca01a..c7e9237bc 100644 --- a/roles/container-engine/cri-o/tasks/cleanup.yaml +++ b/roles/container-engine/cri-o/tasks/cleanup.yaml -@@ -9,6 +9,11 @@ +@@ -9,6 +9,10 @@ apt_key: url: "https://{{ crio_download_base }}/{{ crio_kubic_debian_repo_name }}/Release.key" state: absent @@ -10,7 +10,6 @@ index 28c0c3af2..c7e9237bc 100644 + until: kubic_repo_key_result is succeeded + retries: 4 + delay: "{{ retry_stagger | d(3) }}" -+ environment: "{{ proxy_env | d({}) }}" + environment: "{{ proxy_env }}" when: crio_kubic_debian_repo_name is defined - - - name: Remove legacy CRI-O kubic apt repo + diff --git a/roles/kubespray_patch/files/kubespray_delay_wait.patch b/roles/kubespray_patch/files/kubespray_delay_wait.patch new file mode 100644 index 00000000..33986ee1 --- /dev/null +++ b/roles/kubespray_patch/files/kubespray_delay_wait.patch @@ -0,0 +1,30 @@ +diff --git a/roles/kubernetes/preinstall/handlers/main.yml b/roles/kubernetes/preinstall/handlers/main.yml +index 631ea743e..333930b3e 100644 +--- a/roles/kubernetes/preinstall/handlers/main.yml ++++ b/roles/kubernetes/preinstall/handlers/main.yml +@@ -9,6 +9,7 @@ + - Preinstall | restart kube-controller-manager crio/containerd + - Preinstall | restart kube-apiserver docker + - Preinstall | restart kube-apiserver crio/containerd ++ - Preinstall | delay wait for the apiserver to be running + - Preinstall | wait for the apiserver to be running + when: not ansible_os_family in ["Flatcar", "Flatcar Container Linux by Kinvolk"] and not is_fedora_coreos + +@@ -107,6 +108,17 @@ + - resolvconf_mode == 'host_resolvconf' + - kube_apiserver_set.stat.exists + ++# Ensure apiserver is already restarting before wait is started ++- name: Preinstall | delay wait for the apiserver to be running ++ pause: ++ seconds: 5 ++ when: ++ - container_manager == "docker" ++ - inventory_hostname in groups['kube_control_plane'] ++ - dns_mode != 'none' ++ - resolvconf_mode == 'host_resolvconf' ++ - kube_apiserver_set.stat.exists ++ + # When running this as the last phase ensure we wait for kube-apiserver to come up + - name: Preinstall | wait for the apiserver to be running + uri: diff --git a/roles/kubespray_patch/files/kubespray_dnsstublistener.patch b/roles/kubespray_patch/files/kubespray_dnsstublistener.patch new file mode 100644 index 00000000..03e09923 --- /dev/null +++ b/roles/kubespray_patch/files/kubespray_dnsstublistener.patch @@ -0,0 +1,13 @@ +diff --git a/roles/kubernetes/preinstall/templates/resolved.conf.j2 b/roles/kubernetes/preinstall/templates/resolved.conf.j2 +index 901fd2473..50f3886e0 100644 +--- a/roles/kubernetes/preinstall/templates/resolved.conf.j2 ++++ b/roles/kubernetes/preinstall/templates/resolved.conf.j2 +@@ -14,7 +14,7 @@ Domains={{ searchdomains|default([]) | join(' ') }} + #MulticastDNS=no + DNSSEC=no + Cache=no-negative +-{% if ansible_os_family in ["Flatcar", "Flatcar Container Linux by Kinvolk"] %} ++{% if ansible_os_family in ["Debian", "RedHat", "Flatcar", "Flatcar Container Linux by Kinvolk"] %} + DNSStubListener=no + {% else %} + #DNSStubListener=yes diff --git a/roles/kubespray_patch/tasks/load_checksum.yml b/roles/kubespray_patch/tasks/load_checksum.yml new file mode 100644 index 00000000..880548e3 --- /dev/null +++ b/roles/kubespray_patch/tasks/load_checksum.yml @@ -0,0 +1,25 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +- name: Load files of kubespray_patch role + find: + paths: "{{ role_path }}" + recurse: true + get_checksum: true + register: patch_files + +- name: Calculate checksum of kubespray_patch role + set_fact: + patch_checksum: "{{ patch_files.files | map(attribute='checksum') | checksum }}" diff --git a/roles/kubespray_patch/tasks/main.yml b/roles/kubespray_patch/tasks/main.yml index 7c696f74..4e4328c4 100644 --- a/roles/kubespray_patch/tasks/main.yml +++ b/roles/kubespray_patch/tasks/main.yml @@ -16,25 +16,34 @@ --- # Needs to be fixed in Kubespray Repo. - name: skip RHEL subscription manager processing - replace: - path: "{{ playbook_dir }}/kubespray/roles/bootstrap-os/tasks/main.yml" + ansible.builtin.replace: + path: "{{ kubespray_dir }}/roles/bootstrap-os/tasks/main.yml" regexp: "(.*)bootstrap-redhat.yml(.*)" replace: "\\1bootstrap-centos.yml\\2" mode: 0600 - when: target_distribution == "RedHat" - -- name: check Host Vars for WA - stat: - path: "{{ inventory_dir }}/host_vars/{{ item }}.yml" - register: host_vars_details - with_items: "{{ groups['vm_host'] | default([]) }}" - -- name: read Host Vars for WA - include_vars: "{{ item.stat.path }}" - with_items: "{{ host_vars_details.results }}" - when: item.stat.exists - name: apply kubespray patch for crio cleanup ansible.posix.patch: src: "files/kubespray_crio.patch" - dest: "{{ playbook_dir }}/kubespray/roles/container-engine/cri-o/tasks/cleanup.yaml" + dest: "{{ kubespray_dir }}/roles/container-engine/cri-o/tasks/cleanup.yaml" + +- name: apply kubespray patch to delay wait for apiserver after restart + ansible.posix.patch: + src: "files/kubespray_delay_wait.patch" + dest: "{{ kubespray_dir }}/roles/kubernetes/preinstall/handlers/main.yml" + +- name: apply kubespray patch for DNSStubListener + ansible.posix.patch: + src: "files/kubespray_dnsstublistener.patch" + dest: "{{ kubespray_dir }}/roles/kubernetes/preinstall/templates/resolved.conf.j2" + when: + - dns_disable_stub_listener | default(true) | bool + +- name: Load patch checksum + ansible.builtin.include_tasks: load_checksum.yml + +- name: Write checksum to kubespray directory + ansible.builtin.copy: + dest: "{{ patch_checksum_file }}" + content: "{{ patch_checksum }}" + mode: preserve diff --git a/roles/kubespray_patch/tasks/preflight_checksum.yml b/roles/kubespray_patch/tasks/preflight_checksum.yml new file mode 100644 index 00000000..a73f939e --- /dev/null +++ b/roles/kubespray_patch/tasks/preflight_checksum.yml @@ -0,0 +1,36 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +- name: Load patch checksum + ansible.builtin.include_tasks: load_checksum.yml + +- name: Load checksum from applied patch + set_fact: + applied_patch_checksum: "{{ lookup('file', patch_checksum_file, errors='ignore') }}" + +- name: Check if checksum of applied patch exists + ansible.builtin.assert: + that: + - applied_patch_checksum is defined + - applied_patch_checksum + fail_msg: |- + Kubespray patch is not applied. Please apply kubespray patch before running playbooks" + +- name: Compare checksums + ansible.builtin.assert: + that: patch_checksum == applied_patch_checksum + fail_msg: |- + Code of kubespray patch checksum is different from currently applied patch checksum. + Please re-install the kubespray module and apply new kubespray patch." diff --git a/roles/intel_ai/tasks/main.yml b/roles/kubespray_patch/vars/main.yml similarity index 76% rename from roles/intel_ai/tasks/main.yml rename to roles/kubespray_patch/vars/main.yml index 6a3d4167..3da59f80 100644 --- a/roles/intel_ai/tasks/main.yml +++ b/roles/kubespray_patch/vars/main.yml @@ -13,9 +13,5 @@ ## See the License for the specific language governing permissions and ## limitations under the License. ## -- name: install Media Analytics - import_tasks: intel_ai_install.yml - when: - - kubernetes - - intel_ai_enabled | default(false) - - inventory_hostname == groups['kube_control_plane'][0] +kubespray_dir: "{{ (role_path, '../..', 'collections/ansible_collections/kubernetes_sigs/kubespray') | path_join }}" +patch_checksum_file: "{{ (kubespray_dir, 'ra_patch_checksum') | path_join }}" diff --git a/roles/kubespray_target_setup/templates/multus.conf.j2 b/roles/kubespray_target_setup/templates/multus.conf.j2 index 726afe3a..0b3a673e 100644 --- a/roles/kubespray_target_setup/templates/multus.conf.j2 +++ b/roles/kubespray_target_setup/templates/multus.conf.j2 @@ -39,10 +39,17 @@ {% if kube_network_plugin == "flannel" %} "type": "flannel", "name": "flannel.1", - "delegate": { - "isDefaultGateway": true, - "hairpinMode": true - } + "delegate": { + "isDefaultGateway": true, + "hairpinMode": true + } +{% endif %} +{% if kube_network_plugin == "cilium" %} + "cniVersion": "0.3.1", + "name": "cilium", + "type": "cilium-cni", + "enable-debug": true, + "log-file": "/var/run/cilium/cilium-cni.log" {% endif %} }, { diff --git a/roles/linkerd_service_mesh/defaults/main.yml b/roles/linkerd_service_mesh/defaults/main.yml index bd5593c8..1d03db2d 100644 --- a/roles/linkerd_service_mesh/defaults/main.yml +++ b/roles/linkerd_service_mesh/defaults/main.yml @@ -17,7 +17,7 @@ # defaults file for linkerd-cli linkerd_cli_arch: "amd64" linkerd_release: "stable" -linkerd_version: "2.12.4" +linkerd_version: "2.13.3" linkerd_cli_version: "{{ linkerd_version }}" linkerd_cli_uri: "https://github.com/linkerd/linkerd2/releases/download/{{ linkerd_release }}-{{ linkerd_cli_version }}/\ diff --git a/roles/linkerd_service_mesh/tasks/uninstall.yml b/roles/linkerd_service_mesh/tasks/uninstall.yml index 8be27d49..e744a6a7 100644 --- a/roles/linkerd_service_mesh/tasks/uninstall.yml +++ b/roles/linkerd_service_mesh/tasks/uninstall.yml @@ -17,7 +17,7 @@ - name: Uninstall LinkerD when: inventory_hostname == groups["kube_control_plane"][0] tags: - - linkerd_service_mesh + - linkerd-service-mesh block: - name: Uninstall LinkerD control plane Helm Chart kubernetes.core.helm: diff --git a/roles/load_ddp/tasks/load_ice_ddp.yml b/roles/load_ddp/tasks/load_ice_ddp.yml index 9a127546..aae91211 100644 --- a/roles/load_ddp/tasks/load_ice_ddp.yml +++ b/roles/load_ddp/tasks/load_ice_ddp.yml @@ -28,6 +28,11 @@ when: item.bus_info is not regex (".*0$") loop: "{{ dataplane_interfaces }}" +- name: check if irdma is loaded + ansible.builtin.command: "lsmod" + register: ldddp_lsmod + changed_when: false + - name: template the ddp-ice systemd service template: src: ddp_ice_service.j2 diff --git a/roles/load_ddp/templates/ddp_ice_service.j2 b/roles/load_ddp/templates/ddp_ice_service.j2 index 33eb04e2..167016a7 100644 --- a/roles/load_ddp/templates/ddp_ice_service.j2 +++ b/roles/load_ddp/templates/ddp_ice_service.j2 @@ -5,9 +5,7 @@ AssertPathExists=/sbin/modprobe [Service] Type=oneshot ExecStartPre=/bin/sleep 5 -{% if (not update_nic_drivers and -((ansible_distribution == "Ubuntu" and ansible_distribution_version >= "22.04") or -(ansible_os_family == "RedHat" and ansible_distribution_version >= "8.6"))) %} +{% if 'irdma' in ldddp_lsmod.stdout %} ExecStart=/sbin/modprobe -r irdma ice ExecStart=/sbin/modprobe -a ice irdma {% else %} diff --git a/roles/net_attach_defs_create/tasks/main.yml b/roles/net_attach_defs_create/tasks/main.yml index d7566093..22b6e023 100644 --- a/roles/net_attach_defs_create/tasks/main.yml +++ b/roles/net_attach_defs_create/tasks/main.yml @@ -67,21 +67,3 @@ - example_net_attach_defs is defined - example_net_attach_defs.sriov_net_dp | default(false) | bool - inventory_hostname == groups['kube_control_plane'][0] - -- name: create net-attach-def object to be used with CNDP device plugin - block: - - name: create directory for CNDP k8s manifest files - file: - path: "{{ cndp_k8s_manifest_dir }}" - state: directory - mode: "644" - - - name: create CNDP network attachment definitions - include_tasks: cndp_net_attach_def.yml - - when: - - cndp_dp_enabled | default(false) - - cndp_net_attach_def_enabled | default(false) - - inventory_hostname == groups['kube_control_plane'][0] - - (ansible_distribution == "Ubuntu" and ansible_distribution_version >= "20.04") or - (ansible_os_family == "RedHat" and ansible_distribution_version >= "8.5") diff --git a/roles/net_attach_defs_create/templates/cndp.yml.j2 b/roles/net_attach_defs_create/templates/cndp.yml.j2 deleted file mode 100644 index ff08c28e..00000000 --- a/roles/net_attach_defs_create/templates/cndp.yml.j2 +++ /dev/null @@ -1,30 +0,0 @@ ---- -apiVersion: "k8s.cni.cncf.io/v1" -kind: NetworkAttachmentDefinition -metadata: - name: {{ cndp_net_attach_def_conf.name | default("cndp-cni-e2e") }} - namespace: default - annotations: - k8s.v1.cni.cncf.io/resourceName: afxdp/raPool -spec: - config: '{ - "cniVersion": "{{ cndp_cni_version }}", - "type": "afxdp", - "mode": "primary", - "logFile": "afxdp-cni.log", - "logLevel": "debug", -{% if not (on_vms | default(false) | bool) %} - "queues": "4", -{% endif %} - "ipam": { - "type": "host-local", - "subnet": "{{ cndp_net_attach_def_conf.ipam.subnet | default('192.168.1.0/24') }}", - "rangeStart": "{{ cndp_net_attach_def_conf.ipam.rangeStart | default('192.168.1.200') }}", - "rangeEnd": "{{ cndp_net_attach_def_conf.ipam.rangeEnd | default('192.168.1.220') }}", - "routes": [ - { "dst": "0.0.0.0/0" } - ], - "gateway": "{{ cndp_net_attach_def_conf.ipam.gateway | default('192.168.1.1') }}" - } - }' - diff --git a/roles/net_attach_defs_create/vars/main.yml b/roles/net_attach_defs_create/vars/main.yml index f0251db5..c101cbb3 100644 --- a/roles/net_attach_defs_create/vars/main.yml +++ b/roles/net_attach_defs_create/vars/main.yml @@ -15,4 +15,3 @@ ## --- userspace_cni_dir: "{{ (project_root_dir, 'userspace_cni_manifests') | path_join }}" -cndp_k8s_manifest_dir: "{{ (project_root_dir, 'cndp_k8s_manifest') | path_join }}" diff --git a/roles/nfd_install/charts/node-feature-discovery/crds/nfd-api-crds.yaml b/roles/nfd_install/charts/node-feature-discovery/crds/nfd-api-crds.yaml index ab6c8178..99f6d431 100644 --- a/roles/nfd_install/charts/node-feature-discovery/crds/nfd-api-crds.yaml +++ b/roles/nfd_install/charts/node-feature-discovery/crds/nfd-api-crds.yaml @@ -18,7 +18,7 @@ apiVersion: apiextensions.k8s.io/v1 kind: CustomResourceDefinition metadata: annotations: - controller-gen.kubebuilder.io/version: v0.9.2 + controller-gen.kubebuilder.io/version: v0.11.3 creationTimestamp: null name: nodefeatures.nfd.k8s-sigs.io spec: @@ -67,6 +67,8 @@ spec: required: - elements type: object + description: Attributes contains all the attribute-type features + of the node. type: object flags: additionalProperties: @@ -82,6 +84,8 @@ spec: required: - elements type: object + description: Flags contains all the flag-type features of the + node. type: object instances: additionalProperties: @@ -104,11 +108,9 @@ spec: required: - elements type: object + description: Instances contains all the instance-type features + of the node. type: object - required: - - attributes - - flags - - instances type: object labels: additionalProperties: @@ -116,8 +118,6 @@ spec: description: Labels is the set of node labels that are requested to be created. type: object - required: - - features type: object required: - spec @@ -129,7 +129,7 @@ apiVersion: apiextensions.k8s.io/v1 kind: CustomResourceDefinition metadata: annotations: - controller-gen.kubebuilder.io/version: v0.9.2 + controller-gen.kubebuilder.io/version: v0.11.3 creationTimestamp: null name: nodefeaturerules.nfd.k8s-sigs.io spec: @@ -170,6 +170,11 @@ spec: description: Rule defines a rule for node customization such as labeling. properties: + extendedResources: + additionalProperties: + type: string + description: ExtendedResources to create if the rule matches. + type: object labels: additionalProperties: type: string diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/clusterrole.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/clusterrole.yaml index 3dd6f6f3..9e75927e 100644 --- a/roles/nfd_install/charts/node-feature-discovery/templates/clusterrole.yaml +++ b/roles/nfd_install/charts/node-feature-discovery/templates/clusterrole.yaml @@ -10,20 +10,12 @@ rules: - "" resources: - nodes -{{- if .Values.master.resourceLabels | empty | not }} - nodes/status -{{- end }} verbs: - get - patch - update - list -- apiGroups: - - "" - resources: - - nodes/proxy - verbs: - - get - apiGroups: - nfd.k8s-sigs.io resources: @@ -36,7 +28,7 @@ rules: {{- end }} --- -{{- if .Values.topologyUpdater.rbac.create }} +{{- if and .Values.topologyUpdater.enable .Values.topologyUpdater.rbac.create }} apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: @@ -51,6 +43,12 @@ rules: verbs: - get - list +- apiGroups: + - "" + resources: + - nodes/proxy + verbs: + - get - apiGroups: - "" resources: @@ -66,3 +64,34 @@ rules: - get - update {{- end }} + +--- +{{- if and .Values.topologyGC.enable .Values.topologyGC.rbac.create .Values.topologyUpdater.enable }} +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: {{ include "node-feature-discovery.fullname" . }}-topology-gc + labels: + {{- include "node-feature-discovery.labels" . | nindent 4 }} +rules: +- apiGroups: + - "" + resources: + - nodes + verbs: + - list + - watch +- apiGroups: + - "" + resources: + - nodes/proxy + verbs: + - get +- apiGroups: + - topology.node.k8s.io + resources: + - noderesourcetopologies + verbs: + - delete + - list +{{- end }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/clusterrolebinding.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/clusterrolebinding.yaml index 5bceb41e..227bce5e 100644 --- a/roles/nfd_install/charts/node-feature-discovery/templates/clusterrolebinding.yaml +++ b/roles/nfd_install/charts/node-feature-discovery/templates/clusterrolebinding.yaml @@ -32,3 +32,21 @@ subjects: name: {{ include "node-feature-discovery.topologyUpdater.serviceAccountName" . }} namespace: {{ include "node-feature-discovery.namespace" . }} {{- end }} + +--- +{{- if and .Values.topologyGC.enable .Values.topologyGC.rbac.create .Values.topologyUpdater.enable }} +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: {{ include "node-feature-discovery.fullname" . }}-topology-gc + labels: + {{- include "node-feature-discovery.labels" . | nindent 4 }} +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: {{ include "node-feature-discovery.fullname" . }}-topology-gc +subjects: +- kind: ServiceAccount + name: {{ .Values.topologyGC.serviceAccount.name | default "nfd-topology-gc" }} + namespace: {{ include "node-feature-discovery.namespace" . }} +{{- end }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/master.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/master.yaml index 26a326cb..264e3bb7 100644 --- a/roles/nfd_install/charts/node-feature-discovery/templates/master.yaml +++ b/roles/nfd_install/charts/node-feature-discovery/templates/master.yaml @@ -27,6 +27,7 @@ spec: {{- toYaml . | nindent 8 }} {{- end }} serviceAccountName: {{ include "node-feature-discovery.master.serviceAccountName" . }} + enableServiceLinks: false securityContext: {{- toYaml .Values.master.podSecurityContext | nindent 8 }} containers: @@ -39,7 +40,7 @@ spec: exec: command: - "/usr/bin/grpc_health_probe" - - "-addr=:8080" + - "-addr=:{{ .Values.master.port | default "8080" }}" {{- if .Values.tls.enable }} - "-tls" - "-tls-ca-cert=/etc/kubernetes/node-feature-discovery/certs/ca.crt" @@ -52,7 +53,7 @@ spec: exec: command: - "/usr/bin/grpc_health_probe" - - "-addr=:8080" + - "-addr=:{{ .Values.master.port | default "8080" }}" {{- if .Values.tls.enable }} - "-tls" - "-tls-ca-cert=/etc/kubernetes/node-feature-discovery/certs/ca.crt" @@ -63,7 +64,7 @@ spec: periodSeconds: 10 failureThreshold: 10 ports: - - containerPort: 8080 + - containerPort: {{ .Values.master.port | default "8080" }} name: grpc env: - name: NODE_NAME @@ -76,16 +77,23 @@ spec: {{- toYaml .Values.master.resources | nindent 12 }} args: {{- if .Values.master.instance | empty | not }} - - "--instance={{ .Values.master.instance }}" + - "-instance={{ .Values.master.instance }}" {{- end }} + - "-port={{ .Values.master.port | default "8080" }}" {{- if .Values.enableNodeFeatureApi }} - "-enable-nodefeature-api" {{- end }} {{- if .Values.master.extraLabelNs | empty | not }} - - "--extra-label-ns={{- join "," .Values.master.extraLabelNs }}" + - "-extra-label-ns={{- join "," .Values.master.extraLabelNs }}" + {{- end }} + {{- if .Values.master.denyLabelNs | empty | not }} + - "-deny-label-ns={{- join "," .Values.master.denyLabelNs }}" {{- end }} {{- if .Values.master.resourceLabels | empty | not }} - - "--resource-labels={{- join "," .Values.master.resourceLabels }}" + - "-resource-labels={{- join "," .Values.master.resourceLabels }}" + {{- end }} + {{- if .Values.master.enableTaints }} + - "-enable-taints" {{- end }} {{- if .Values.master.crdController | kindIs "invalid" | not }} - "-crd-controller={{ .Values.master.crdController }}" @@ -96,20 +104,36 @@ spec: {{- if .Values.master.featureRulesController | kindIs "invalid" | not }} - "-featurerules-controller={{ .Values.master.featureRulesController }}" {{- end }} - {{- if .Values.tls.enable }} - - "--ca-file=/etc/kubernetes/node-feature-discovery/certs/ca.crt" - - "--key-file=/etc/kubernetes/node-feature-discovery/certs/tls.key" - - "--cert-file=/etc/kubernetes/node-feature-discovery/certs/tls.crt" + {{- if .Values.master.resyncPeriod }} + - "-resync-period={{ .Values.master.resyncPeriod }}" + {{- end }} + {{- if .Values.tls.enable }} + - "-ca-file=/etc/kubernetes/node-feature-discovery/certs/ca.crt" + - "-key-file=/etc/kubernetes/node-feature-discovery/certs/tls.key" + - "-cert-file=/etc/kubernetes/node-feature-discovery/certs/tls.crt" + {{- end }} volumeMounts: + {{- if .Values.tls.enable }} - name: nfd-master-cert mountPath: "/etc/kubernetes/node-feature-discovery/certs" readOnly: true + {{- end }} + - name: nfd-master-conf + mountPath: "/etc/kubernetes/node-feature-discovery" + readOnly: true volumes: + {{- if .Values.tls.enable }} - name: nfd-master-cert secret: secretName: nfd-master-cert - ## /TLS ## - {{- end }} + {{- end }} + - name: nfd-master-conf + configMap: + name: {{ include "node-feature-discovery.fullname" . }}-master-conf + items: + - key: nfd-master.conf + path: nfd-master.conf + {{- with .Values.master.nodeSelector }} nodeSelector: {{- toYaml . | nindent 8 }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/nfd-master-conf.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/nfd-master-conf.yaml new file mode 100644 index 00000000..c806a8e5 --- /dev/null +++ b/roles/nfd_install/charts/node-feature-discovery/templates/nfd-master-conf.yaml @@ -0,0 +1,10 @@ +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ include "node-feature-discovery.fullname" . }}-master-conf + namespace: {{ include "node-feature-discovery.namespace" . }} + labels: + {{- include "node-feature-discovery.labels" . | nindent 4 }} +data: + nfd-master.conf: |- + {{- .Values.master.config | toYaml | nindent 4 }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/service.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/service.yaml index 6731ca43..0d478981 100644 --- a/roles/nfd_install/charts/node-feature-discovery/templates/service.yaml +++ b/roles/nfd_install/charts/node-feature-discovery/templates/service.yaml @@ -9,7 +9,7 @@ metadata: spec: type: {{ .Values.master.service.type }} ports: - - port: {{ .Values.master.service.port }} + - port: {{ .Values.master.service.port | default "8080" }} targetPort: grpc protocol: TCP name: grpc diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/serviceaccount.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/serviceaccount.yaml index 883e5daa..022961e4 100644 --- a/roles/nfd_install/charts/node-feature-discovery/templates/serviceaccount.yaml +++ b/roles/nfd_install/charts/node-feature-discovery/templates/serviceaccount.yaml @@ -27,6 +27,21 @@ metadata: {{- end }} {{- end }} +--- +{{- if and .Values.topologyGC.enable .Values.topologyGC.serviceAccount.create .Values.topologyUpdater.enable }} +apiVersion: v1 +kind: ServiceAccount +metadata: + name: {{ .Values.topologyGC.serviceAccount.name | default "nfd-topology-gc" }} + namespace: {{ include "node-feature-discovery.namespace" . }} + labels: + {{- include "node-feature-discovery.labels" . | nindent 4 }} + {{- with .Values.topologyUpdater.serviceAccount.annotations }} + annotations: + {{- toYaml . | nindent 4 }} + {{- end }} +{{- end }} + --- {{- if .Values.worker.serviceAccount.create }} apiVersion: v1 diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/topology-gc.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/topology-gc.yaml new file mode 100644 index 00000000..642fec45 --- /dev/null +++ b/roles/nfd_install/charts/node-feature-discovery/templates/topology-gc.yaml @@ -0,0 +1,64 @@ +{{- if and .Values.topologyGC.enable .Values.topologyUpdater.enable -}} +apiVersion: apps/v1 +kind: Deployment +metadata: + name: {{ include "node-feature-discovery.fullname" . }}-topology-gc + namespace: {{ include "node-feature-discovery.namespace" . }} + labels: + {{- include "node-feature-discovery.labels" . | nindent 4 }} + role: topology-gc +spec: + replicas: {{ .Values.topologyGC.replicaCount | default 1 }} + selector: + matchLabels: + {{- include "node-feature-discovery.selectorLabels" . | nindent 6 }} + role: topology-gc + template: + metadata: + labels: + {{- include "node-feature-discovery.selectorLabels" . | nindent 8 }} + role: topology-gc + annotations: + {{- toYaml .Values.topologyGC.annotations | nindent 8 }} + spec: + serviceAccountName: {{ .Values.topologyGC.serviceAccountName | default "nfd-topology-gc" }} + dnsPolicy: ClusterFirstWithHostNet + {{- with .Values.imagePullSecrets }} + imagePullSecrets: + {{- toYaml . | nindent 8 }} + {{- end }} + securityContext: + {{- toYaml .Values.topologyGC.podSecurityContext | nindent 8 }} + containers: + - name: topology-gc + image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}" + imagePullPolicy: "{{ .Values.image.pullPolicy }}" + env: + - name: NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName + command: + - "nfd-topology-gc" + args: + {{- if .Values.topologyGC.interval | empty | not }} + - "-gc-interval={{ .Values.topologyGC.interval }}" + {{- end }} + resources: + {{- toYaml .Values.topologyGC.resources | nindent 12 }} + securityContext: + {{- toYaml .Values.topologyGC.securityContext | nindent 12 }} + + {{- with .Values.topologyGC.nodeSelector }} + nodeSelector: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.topologyGC.affinity }} + affinity: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.topologyGC.tolerations }} + tolerations: + {{- toYaml . | nindent 8 }} + {{- end }} +{{- end }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/topologyupdater-crds.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/topologyupdater-crds.yaml index cf5daf27..f5a1c6e3 100644 --- a/roles/nfd_install/charts/node-feature-discovery/templates/topologyupdater-crds.yaml +++ b/roles/nfd_install/charts/node-feature-discovery/templates/topologyupdater-crds.yaml @@ -4,7 +4,7 @@ kind: CustomResourceDefinition metadata: annotations: api-approved.kubernetes.io: https://github.com/kubernetes/enhancements/pull/1870 - controller-gen.kubebuilder.io/version: v0.7.0 + controller-gen.kubebuilder.io/version: v0.11.2 creationTimestamp: null name: noderesourcetopologies.topology.node.k8s.io spec: @@ -135,6 +135,139 @@ spec: - zones type: object served: true +storage: false + - name: v1alpha2 + schema: + openAPIV3Schema: + description: NodeResourceTopology describes node resources and their topology. + properties: + apiVersion: + description: 'APIVersion defines the versioned schema of this representation + of an object. Servers should convert recognized schemas to the latest + internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources' + type: string + attributes: + description: AttributeList contains an array of AttributeInfo objects. + items: + description: AttributeInfo contains one attribute of a Zone. + properties: + name: + type: string + value: + type: string + required: + - name + - value + type: object + type: array + kind: + description: 'Kind is a string value representing the REST resource this + object represents. Servers may infer this from the endpoint the client + submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' + type: string + metadata: + type: object + topologyPolicies: + description: 'DEPRECATED (to be removed in v1beta1): use top level attributes + if needed' + items: + type: string + type: array + zones: + description: ZoneList contains an array of Zone objects. + items: + description: Zone represents a resource topology zone, e.g. socket, + node, die or core. + properties: + attributes: + description: AttributeList contains an array of AttributeInfo objects. + items: + description: AttributeInfo contains one attribute of a Zone. + properties: + name: + type: string + value: + type: string + required: + - name + - value + type: object + type: array + costs: + description: CostList contains an array of CostInfo objects. + items: + description: CostInfo describes the cost (or distance) between + two Zones. + properties: + name: + type: string + value: + format: int64 + type: integer + required: + - name + - value + type: object + type: array + name: + type: string + parent: + type: string + resources: + description: ResourceInfoList contains an array of ResourceInfo + objects. + items: + description: ResourceInfo contains information about one resource + type. + properties: + allocatable: + anyOf: + - type: integer + - type: string + description: Allocatable quantity of the resource, corresponding + to allocatable in node status, i.e. total amount of this + resource available to be used by pods. + pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$ + x-kubernetes-int-or-string: true + available: + anyOf: + - type: integer + - type: string + description: Available is the amount of this resource currently + available for new (to be scheduled) pods, i.e. Allocatable + minus the resources reserved by currently running pods. + pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$ + x-kubernetes-int-or-string: true + capacity: + anyOf: + - type: integer + - type: string + description: Capacity of the resource, corresponding to capacity + in node status, i.e. total amount of this resource that + the node has. + pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$ + x-kubernetes-int-or-string: true + name: + description: Name of the resource. + type: string + required: + - allocatable + - available + - capacity + - name + type: object + type: array + type: + type: string + required: + - name + - type + type: object + type: array + required: + - zones + type: object + served: true storage: true status: acceptedNames: diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/topologyupdater.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/topologyupdater.yaml index 2f28b969..4963a52b 100644 --- a/roles/nfd_install/charts/node-feature-discovery/templates/topologyupdater.yaml +++ b/roles/nfd_install/charts/node-feature-discovery/templates/topologyupdater.yaml @@ -41,27 +41,40 @@ spec: - "nfd-topology-updater" args: {{- if .Values.topologyUpdater.updateInterval | empty | not }} - - "--sleep-interval={{ .Values.topologyUpdater.updateInterval }}" + - "-sleep-interval={{ .Values.topologyUpdater.updateInterval }}" {{- else }} - - "--sleep-interval=3s" + - "-sleep-interval=3s" {{- end }} {{- if .Values.topologyUpdater.watchNamespace | empty | not }} - - "--watch-namespace={{ .Values.topologyUpdater.watchNamespace }}" + - "-watch-namespace={{ .Values.topologyUpdater.watchNamespace }}" {{- else }} - - "--watch-namespace=*" + - "-watch-namespace=*" {{- end }} {{- if .Values.tls.enable }} - - "--ca-file=/etc/kubernetes/node-feature-discovery/certs/ca.crt" - - "--key-file=/etc/kubernetes/node-feature-discovery/certs/tls.key" - - "--cert-file=/etc/kubernetes/node-feature-discovery/certs/tls.crt" + - "-ca-file=/etc/kubernetes/node-feature-discovery/certs/ca.crt" + - "-key-file=/etc/kubernetes/node-feature-discovery/certs/tls.key" + - "-cert-file=/etc/kubernetes/node-feature-discovery/certs/tls.crt" + {{- end }} + {{- if .Values.topologyUpdater.podSetFingerprint }} + - "-pods-fingerprint" + {{- end }} + {{- if .Values.topologyUpdater.kubeletConfigPath | empty | not }} + - "-kubelet-config-uri=file:///host-var/kubelet-config" {{- end }} volumeMounts: + {{- if .Values.topologyUpdater.kubeletConfigPath | empty | not }} - name: kubelet-config - mountPath: /host-var/lib/kubelet/config.yaml + mountPath: /host-var/kubelet-config + {{- end }} - name: kubelet-podresources-sock mountPath: /host-var/lib/kubelet/pod-resources/kubelet.sock - name: host-sys mountPath: /host-sys + {{- if .Values.topologyUpdater.kubeletStateDir | empty | not }} + - name: kubelet-state-files + mountPath: /host-var/lib/kubelet + readOnly: true + {{- end }} {{- if .Values.tls.enable }} - name: nfd-topology-updater-cert mountPath: "/etc/kubernetes/node-feature-discovery/certs" @@ -79,13 +92,11 @@ spec: - name: host-sys hostPath: path: "/sys" + {{- if .Values.topologyUpdater.kubeletConfigPath | empty | not }} - name: kubelet-config hostPath: - {{- if .Values.topologyUpdater.kubeletConfigPath | empty | not }} path: {{ .Values.topologyUpdater.kubeletConfigPath }} - {{- else }} - path: /var/lib/kubelet/config.yaml - {{- end }} + {{- end }} - name: kubelet-podresources-sock hostPath: {{- if .Values.topologyUpdater.kubeletPodResourcesSockPath | empty | not }} @@ -93,6 +104,11 @@ spec: {{- else }} path: /var/lib/kubelet/pod-resources/kubelet.sock {{- end }} + {{- if .Values.topologyUpdater.kubeletStateDir | empty | not }} + - name: kubelet-state-files + hostPath: + path: {{ .Values.topologyUpdater.kubeletStateDir }} + {{- end }} - name: nfd-topology-updater-conf configMap: name: {{ include "node-feature-discovery.fullname" . }}-topology-updater-conf diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/worker.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/worker.yaml index e723cc5c..c1240bdc 100644 --- a/roles/nfd_install/charts/node-feature-discovery/templates/worker.yaml +++ b/roles/nfd_install/charts/node-feature-discovery/templates/worker.yaml @@ -45,14 +45,14 @@ spec: command: - "nfd-worker" args: - - "--server={{ include "node-feature-discovery.fullname" . }}-master:{{ .Values.master.service.port }}" + - "-server={{ include "node-feature-discovery.fullname" . }}-master:{{ .Values.master.service.port }}" {{- if .Values.enableNodeFeatureApi }} - "-enable-nodefeature-api" {{- end }} {{- if .Values.tls.enable }} - - "--ca-file=/etc/kubernetes/node-feature-discovery/certs/ca.crt" - - "--key-file=/etc/kubernetes/node-feature-discovery/certs/tls.key" - - "--cert-file=/etc/kubernetes/node-feature-discovery/certs/tls.crt" + - "-ca-file=/etc/kubernetes/node-feature-discovery/certs/ca.crt" + - "-key-file=/etc/kubernetes/node-feature-discovery/certs/tls.key" + - "-cert-file=/etc/kubernetes/node-feature-discovery/certs/tls.crt" {{- end }} volumeMounts: - name: host-boot @@ -67,6 +67,9 @@ spec: - name: host-usr-lib mountPath: "/host-usr/lib" readOnly: true + - name: host-lib + mountPath: "/host-lib" + readOnly: true {{- if .Values.worker.mountUsrSrc }} - name: host-usr-src mountPath: "/host-usr/src" @@ -99,6 +102,9 @@ spec: - name: host-usr-lib hostPath: path: "/usr/lib" + - name: host-lib + hostPath: + path: "/lib" {{- if .Values.worker.mountUsrSrc }} - name: host-usr-src hostPath: diff --git a/roles/nfd_install/charts/node-feature-discovery/values.yaml b/roles/nfd_install/charts/node-feature-discovery/values.yaml index ca406d45..ee9b25e1 100644 --- a/roles/nfd_install/charts/node-feature-discovery/values.yaml +++ b/roles/nfd_install/charts/node-feature-discovery/values.yaml @@ -28,10 +28,24 @@ namespaceOverride: "" enableNodeFeatureApi: false master: + config: ### + # noPublish: false + # extraLabelNs: ["added.ns.io","added.kubernets.io"] + # denyLabelNs: ["denied.ns.io","denied.kubernetes.io"] + # resourceLabels: ["vendor-1.com/feature-1","vendor-2.io/feature-2"] + # enableTaints: false + # labelWhiteList: "foo" + # resyncPeriod: "2h" + ### + # The TCP port that nfd-master listens for incoming requests. Default: 8080 + port: 8080 instance: featureApi: + resyncPeriod: + denyLabelNs: [] extraLabelNs: [] resourceLabels: [] + enableTaints: false crdController: null featureRulesController: null deploymentAnnotations: {} @@ -411,6 +425,7 @@ topologyUpdater: kubeletPodResourcesSockPath: updateInterval: 60s watchNamespace: "*" + kubeletStateDir: /host-var/lib/kubelet podSecurityContext: {} securityContext: @@ -436,6 +451,45 @@ topologyUpdater: tolerations: [] annotations: {} affinity: {} + podSetFingerprint: true + +topologyGC: + enable: true + replicaCount: 1 + + serviceAccount: + create: true + annotations: {} + name: + rbac: + create: true + + interval: 1h + + podSecurityContext: {} + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: [ "ALL" ] + readOnlyRootFilesystem: true + runAsNonRoot: true + + resources: {} + # We usually recommend not to specify default resources and to leave this as a conscious + # choice for the user. This also increases chances charts run on environments with little + # resources, such as Minikube. If you do want to specify resources, uncomment the following + # lines, adjust them as necessary, and remove the curly braces after 'resources:'. + # limits: + # cpu: 100m + # memory: 128Mi + # requests: + # cpu: 100m + # memory: 128Mi + + nodeSelector: {} + tolerations: [] + annotations: {} + affinity: {} # Optionally use encryption for worker <--> master comms # TODO: verify hostname is not yet supported diff --git a/roles/nfd_install/defaults/main.yml b/roles/nfd_install/defaults/main.yml index 01e931ee..ff0e4f59 100644 --- a/roles/nfd_install/defaults/main.yml +++ b/roles/nfd_install/defaults/main.yml @@ -15,7 +15,7 @@ ## --- nfd_image: "registry.k8s.io/nfd/node-feature-discovery" -nfd_image_tag: "v0.12.1-minimal" +nfd_image_tag: "v0.13.1-minimal" nfd_namespace: "kube-system" diff --git a/roles/nfd_install/templates/node-feature-rules.yml.j2 b/roles/nfd_install/templates/node-feature-rules.yml.j2 index 7c5a7480..e847b36c 100644 --- a/roles/nfd_install/templates/node-feature-rules.yml.j2 +++ b/roles/nfd_install/templates/node-feature-rules.yml.j2 @@ -55,10 +55,12 @@ spec: vendor: {op: In, value: ["8086"]} device: {op: In, value: {{ qat_supported_pf_dev_ids | list + qat_supported_vf_dev_ids | list }}} class: {op: In, value: ["0b40"]} +{% if not (on_vms | default(false) and not update_qat_drivers | default(false) and configure_qat | default(false)) %} - feature: kernel.loadedmodule matchExpressions: intel_qat: {op: Exists} {% endif %} +{% endif %} {% if sgx_dp_enabled | d(false) %} - name: "intel.sgx" labels: diff --git a/roles/openssl_engine_install/defaults/main.yml b/roles/openssl_engine_install/defaults/main.yml index ae942e82..da1b6bbe 100644 --- a/roles/openssl_engine_install/defaults/main.yml +++ b/roles/openssl_engine_install/defaults/main.yml @@ -16,9 +16,9 @@ --- openssl_engine_dir: "{{ (project_root_dir, 'openssl') | path_join }}" openssl_engine_url: "https://github.com/intel/QAT_Engine.git" -openssl_engine_version: "v0.6.19" +openssl_engine_version: "v1.2.0" libarchive_url: "https://github.com/libarchive/libarchive/releases/download/v3.5.1/libarchive-3.5.1.tar.xz" ipp_crypto_url: "https://github.com/intel/ipp-crypto.git" -ipp_crypto_version: "ippcp_2021.7" +ipp_crypto_version: "ippcp_2021.7.1" intel_ipsec_url: "https://github.com/intel/intel-ipsec-mb.git" intel_ipsec_version: "v1.3" diff --git a/roles/opentelemetry_install/defaults/main.yml b/roles/opentelemetry_install/defaults/main.yml index 9d54c5d6..7d78e5c0 100644 --- a/roles/opentelemetry_install/defaults/main.yml +++ b/roles/opentelemetry_install/defaults/main.yml @@ -17,4 +17,10 @@ opentelemetry_repo: "https://open-telemetry.github.io/opentelemetry-helm-charts" opentelemetry_operator_namespace: "monitoring" opentelemetry_operator_chart_name: "opentelemetry-operator" -opentelemetry_operator_chart_version: "0.24.0" +opentelemetry_operator_chart_version: "0.27.0" + +opentelemetry_collectors: + gateway: true + cadvisor: "{{ cadvisor_enabled | default(false) }}" + telegraf: "{{ telegraf_enabled | default(false) }}" + elasticsearch: "{{ elasticsearch_enabled | default(false) }}" diff --git a/roles/opentelemetry_install/files/otel-agent-cadvisor-certs.yaml b/roles/opentelemetry_install/files/otel-agent-cadvisor-certs.yaml new file mode 100644 index 00000000..2a9be3a6 --- /dev/null +++ b/roles/opentelemetry_install/files/otel-agent-cadvisor-certs.yaml @@ -0,0 +1,25 @@ +apiVersion: cert-manager.io/v1 +kind: Certificate +metadata: + name: otel-agent-cadvisor + namespace: monitoring +spec: + secretName: otel-agent-cadvisor + dnsNames: + - otel-gateway-collector-headless.monitoring.svc.cluster.local + - otel-gateway-collector-headless.monitoring.svc + - otel-gateway-collector-headless.monitoring + - otel-gateway-collector-headless + isCA: false + privateKey: + algorithm: RSA + size: 2048 + issuerRef: + name: elasticsearch-tls-ca-issuer + kind: Issuer + group: cert-manager.io + usages: + - client auth + - server auth + - digital signature + - key encipherment diff --git a/roles/opentelemetry_install/files/otel-agent-cadvisor.yaml b/roles/opentelemetry_install/files/otel-agent-cadvisor.yaml new file mode 100644 index 00000000..dee52845 --- /dev/null +++ b/roles/opentelemetry_install/files/otel-agent-cadvisor.yaml @@ -0,0 +1,87 @@ +apiVersion: opentelemetry.io/v1alpha1 +kind: OpenTelemetryCollector +metadata: + name: otel-agent-cadvisor + namespace: monitoring +spec: + mode: daemonset + serviceAccount: otel-agent + volumeMounts: + - name: otel-agent-cadvisor + mountPath: "/var/run/secrets/otel-agent-tls" + - mountPath: /var/log + name: varlog + readOnly: true + - mountPath: /var/lib/docker/containers + name: varlibdockercontainers + readOnly: true + volumes: + - name: otel-agent-cadvisor + secret: + secretName: otel-agent-cadvisor + - name: varlog + hostPath: + path: /var/log + - name: varlibdockercontainers + hostPath: + path: /var/lib/docker/containers + env: + - name: API_NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName + config: | + receivers: + prometheus: + config: + scrape_configs: + - job_name: "otel-cadvisor-collector" + scrape_interval: 5s + scheme: http + kubernetes_sd_configs: + - role: endpoints + relabel_configs: + - source_labels: + - '__meta_kubernetes_namespace' + action: keep + regex: cadvisor + - source_labels: + - '__meta_kubernetes_endpoint_node_name' + action: keep + regex: ${API_NODE_NAME} + + processors: + batch: + # cAdvisor can have metrics huge spans and maximum exported byte size cannot exceed GRPC 4MB size + send_batch_size: 1000 + send_batch_max_size: 1000 + timeout: 5s + + metricstransform: + transforms: + - include: .* + match_type: regexp + action: update + operations: + - action: add_label + new_label: node_name + new_value: "${API_NODE_NAME}" + + exporters: + logging: + loglevel: info + + otlp: + endpoint: otel-gateway-collector-headless.monitoring.svc:4317 + tls: + insecure: false + ca_file: "/var/run/secrets/otel-agent-tls/ca.crt" + cert_file: "/var/run/secrets/otel-agent-tls/tls.crt" + key_file: "/var/run/secrets/otel-agent-tls/tls.key" + + service: + pipelines: + metrics: + receivers: [prometheus] + processors: [metricstransform, batch] + exporters: [logging,otlp] diff --git a/roles/opentelemetry_install/files/otel-agent-rbac.yaml b/roles/opentelemetry_install/files/otel-agent-rbac.yaml new file mode 100644 index 00000000..498fc926 --- /dev/null +++ b/roles/opentelemetry_install/files/otel-agent-rbac.yaml @@ -0,0 +1,41 @@ +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: otel-agent +rules: +- apiGroups: [""] + resources: + - nodes + - nodes/proxy + - services + - endpoints + - pods + - namespaces + verbs: ["get", "list", "watch"] +- apiGroups: [""] + resources: + - nodes/metrics + verbs: ["get"] +- nonResourceURLs: + - /metrics + verbs: ["get"] +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: otel-agent + namespace: monitoring +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: otel-agent +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: otel-agent +subjects: +- kind: ServiceAccount + name: otel-agent + namespace: monitoring diff --git a/roles/opentelemetry_install/files/otel-agent-telegraf-certs.yml b/roles/opentelemetry_install/files/otel-agent-telegraf-certs.yaml similarity index 100% rename from roles/opentelemetry_install/files/otel-agent-telegraf-certs.yml rename to roles/opentelemetry_install/files/otel-agent-telegraf-certs.yaml diff --git a/roles/opentelemetry_install/files/otel-agent-telegraf.yml b/roles/opentelemetry_install/files/otel-agent-telegraf.yaml similarity index 83% rename from roles/opentelemetry_install/files/otel-agent-telegraf.yml rename to roles/opentelemetry_install/files/otel-agent-telegraf.yaml index 329fb66a..17488ec9 100644 --- a/roles/opentelemetry_install/files/otel-agent-telegraf.yml +++ b/roles/opentelemetry_install/files/otel-agent-telegraf.yaml @@ -5,7 +5,7 @@ metadata: namespace: monitoring spec: mode: daemonset - serviceAccount: prometheus-k8s + serviceAccount: otel-agent volumeMounts: - name: telegraf-ca mountPath: "/var/run/secrets/telegraf-tls" @@ -30,6 +30,11 @@ spec: - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers + env: + - name: API_NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName config: | receivers: prometheus: @@ -37,13 +42,27 @@ spec: scrape_configs: - job_name: "otel-telegraf-collector" scrape_interval: 5s - static_configs: - - targets: ["telegraf:9273"] + kubernetes_sd_configs: + - role: endpoints + relabel_configs: + - source_labels: + - '__meta_kubernetes_namespace' + action: keep + regex: monitoring + - source_labels: + - '__meta_kubernetes_endpoint_node_name' + action: keep + regex: ${API_NODE_NAME} + - source_labels: + - '__meta_kubernetes_service_name' + action: keep + regex: telegraf authorization: credentials_file: "/var/run/secrets/kubernetes.io/serviceaccount/token" scheme: https tls_config: ca_file: "/var/run/secrets/telegraf-tls/ca.crt" + server_name: telegraf filelog: include: @@ -128,17 +147,19 @@ spec: - k8s.pod.name - k8s.pod.uid - k8s.deployment.name - - k8s.cluster.name - k8s.namespace.name - k8s.node.name - k8s.pod.start_time # Pod association using resource attributes and connection pod_association: - - from: resource_attribute - name: k8s.pod.uid - - from: resource_attribute - name: k8s.pod.ip - - from: connection + - sources: + - from: resource_attribute + name: k8s.pod.uid + - sources: + - from: resource_attribute + name: k8s.pod.ip + - sources: + - from: connection exporters: logging: diff --git a/roles/opentelemetry_install/files/otel-elasticsearch-certs.yml b/roles/opentelemetry_install/files/otel-elasticsearch-certs.yaml similarity index 100% rename from roles/opentelemetry_install/files/otel-elasticsearch-certs.yml rename to roles/opentelemetry_install/files/otel-elasticsearch-certs.yaml diff --git a/roles/opentelemetry_install/files/otel-gateway-certs.yml b/roles/opentelemetry_install/files/otel-gateway-certs.yaml similarity index 100% rename from roles/opentelemetry_install/files/otel-gateway-certs.yml rename to roles/opentelemetry_install/files/otel-gateway-certs.yaml diff --git a/roles/opentelemetry_install/files/otel-gateway-log-rbac.yml b/roles/opentelemetry_install/files/otel-gateway-log-rbac.yaml similarity index 100% rename from roles/opentelemetry_install/files/otel-gateway-log-rbac.yml rename to roles/opentelemetry_install/files/otel-gateway-log-rbac.yaml diff --git a/roles/opentelemetry_install/files/otel-telegraf-servicemonitor.yml b/roles/opentelemetry_install/files/otel-gateway-servicemonitor.yaml similarity index 72% rename from roles/opentelemetry_install/files/otel-telegraf-servicemonitor.yml rename to roles/opentelemetry_install/files/otel-gateway-servicemonitor.yaml index b958c1d6..462f1fed 100644 --- a/roles/opentelemetry_install/files/otel-telegraf-servicemonitor.yml +++ b/roles/opentelemetry_install/files/otel-gateway-servicemonitor.yaml @@ -6,11 +6,13 @@ metadata: spec: endpoints: - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token - interval: 30s + interval: 10s port: prometheus scheme: http tlsConfig: insecureSkipVerify: true selector: + matchExpressions: + - { key: operator.opentelemetry.io/collector-headless-service, operator: NotIn, values: [Exists]} matchLabels: app.kubernetes.io/name: otel-gateway-collector diff --git a/roles/opentelemetry_install/files/otel-sidecar.yml b/roles/opentelemetry_install/files/otel-sidecar.yaml similarity index 100% rename from roles/opentelemetry_install/files/otel-sidecar.yml rename to roles/opentelemetry_install/files/otel-sidecar.yaml diff --git a/roles/opentelemetry_install/tasks/main.yml b/roles/opentelemetry_install/tasks/main.yml index 9220ebf7..d180b2bc 100644 --- a/roles/opentelemetry_install/tasks/main.yml +++ b/roles/opentelemetry_install/tasks/main.yml @@ -46,6 +46,7 @@ release_name: "{{ opentelemetry_operator_chart_name }}" release_namespace: "{{ opentelemetry_operator_namespace }}" wait: true + timeout: 15m0s values: kubeRBACProxy: enabled: true @@ -72,46 +73,54 @@ executable: /bin/bash changed_when: false register: es_otel_credentials + when: opentelemetry_collectors['elasticsearch'] | default(false) - name: template opentelemetry deployment file ansible.builtin.template: - src: "otel-gateway.yml.j2" - dest: "{{ (project_root_dir, 'opentelemetry', 'otel-gateway.yml') | path_join }}" + src: "otel-gateway.yaml.j2" + dest: "{{ (project_root_dir, 'opentelemetry', 'otel-gateway.yaml') | path_join }}" mode: '0644' - name: create opentelemetry certs kubernetes.core.k8s: state: present - src: "{{ (project_root_dir, 'opentelemetry', item) | path_join }}" + src: "{{ (project_root_dir, 'opentelemetry', item[1]) | path_join }}" loop: - - otel-agent-telegraf-certs.yml - - otel-gateway-certs.yml - - otel-elasticsearch-certs.yml + - ['telegraf', 'otel-agent-telegraf-certs.yaml'] + - ['cadvisor', 'otel-agent-cadvisor-certs.yaml'] + - ['gateway', 'otel-gateway-certs.yaml'] + - ['elasticsearch', 'otel-elasticsearch-certs.yaml'] + when: opentelemetry_collectors[item[0]] | default(false) - name: wait for certs creation kubernetes.core.k8s_info: kind: Certificate - name: "{{ item }}" + name: "{{ item[1] }}" namespace: monitoring wait: true wait_condition: type: Ready wait_timeout: 60 loop: - - otel-gateway-collector - - otel-agent-telegraf - - otel-elasticsearch-tls + - ['telegraf', 'otel-gateway-collector'] + - ['cadvisor', 'otel-agent-cadvisor'] + - ['gateway', 'otel-gateway-certs.yaml'] + - ['elasticsearch', 'otel-elasticsearch-tls'] + when: opentelemetry_collectors[item[0]] | default(false) - name: create opentelemetry resources kubernetes.core.k8s: state: present - src: "{{ (project_root_dir, 'opentelemetry', item) | path_join }}" + src: "{{ (project_root_dir, 'opentelemetry', item[1]) | path_join }}" loop: - - otel-gateway-log-rbac.yml - - otel-gateway.yml - - otel-agent-telegraf.yml - - otel-telegraf-servicemonitor.yml - - otel-sidecar.yml + - ['gateway', 'otel-gateway-log-rbac.yaml'] + - ['gateway', 'otel-gateway.yaml'] + - ['gateway', 'otel-agent-rbac.yaml'] + - ['cadvisor', "otel-agent-cadvisor.yaml"] + - ['telegraf', 'otel-agent-telegraf.yaml'] + - ['gateway', 'otel-gateway-servicemonitor.yaml'] + - ['elasticsearch', 'otel-sidecar.yaml'] + when: opentelemetry_collectors[item[0]] | default(false) - name: check for all pods ansible.builtin.include_role: diff --git a/roles/opentelemetry_install/templates/otel-gateway.yml.j2 b/roles/opentelemetry_install/templates/otel-gateway.yaml.j2 similarity index 100% rename from roles/opentelemetry_install/templates/otel-gateway.yml.j2 rename to roles/opentelemetry_install/templates/otel-gateway.yaml.j2 diff --git a/roles/platform_aware_scheduling_install/defaults/main.yml b/roles/platform_aware_scheduling_install/defaults/main.yml index b86b59a1..d371b684 100644 --- a/roles/platform_aware_scheduling_install/defaults/main.yml +++ b/roles/platform_aware_scheduling_install/defaults/main.yml @@ -14,14 +14,6 @@ ## limitations under the License. ## --- -install_dependencies: - Debian: - - git - - make - RedHat: - - git - - make - # Platform Aware Scheduler pas_git_url: "https://github.com/intel/platform-aware-scheduling.git" pas_dir: "{{ (project_root_dir, 'platform-aware-scheduling') | path_join }}" @@ -29,14 +21,18 @@ pas_namespace: kube-system # Descheduler descheduler_git_url: https://github.com/kubernetes-sigs/descheduler.git -descheduler_git_version: "v0.24.1" +# Descheduler version can't be bumped until below bug is resolved. +# https://github.com/intel/platform-aware-scheduling/issues/90 +# Bug has been sumbitted: +# https://github.com/kubernetes-sigs/descheduler/issues/863 +descheduler_git_version: "v0.23.1" descheduler_dir: "{{ (project_root_dir, 'sigs.k8s.io/descheduler') | path_join }}" sigs_k8s_io_dir: "{{ (project_root_dir, 'sigs.k8s.io') | path_join }}" # TAS deployment tas_enabled: false tas_build_image_locally: false -tas_extender_image_tag_default: "0.4.0" +tas_extender_image_tag_default: "0.5.0" tas_git_version: "telemetry-aware-scheduling/v{{ tas_extender_image_tag_default }}" tas_version: |- {{ ('telemetry-aware-scheduling' in tas_git_version) @@ -69,7 +65,7 @@ tas_verbosity: 4 # GAS deployment gas_enabled: false gas_build_image_locally: false -gas_extender_image_tag_default: "0.5.1" +gas_extender_image_tag_default: "0.5.2" gas_git_version: "gpu-aware-scheduling/v{{ gas_extender_image_tag_default }}" gas_version: |- {{ ('gpu-aware-scheduling' in gas_git_version) diff --git a/roles/platform_aware_scheduling_install/vars/main.yml b/roles/platform_aware_scheduling_install/vars/main.yml index 3b27d61a..f24d6b42 100644 --- a/roles/platform_aware_scheduling_install/vars/main.yml +++ b/roles/platform_aware_scheduling_install/vars/main.yml @@ -14,6 +14,14 @@ ## limitations under the License. ## --- +install_dependencies: + Debian: + - git + - make + RedHat: + - git + - make + # variables in this file are not intended to be modified by the user directly # please use defaults/main.yml instead extenders: diff --git a/roles/redeploy_cleanup/defaults/main.yml b/roles/redeploy_cleanup/defaults/main.yml index f00fae5f..0a5765fe 100644 --- a/roles/redeploy_cleanup/defaults/main.yml +++ b/roles/redeploy_cleanup/defaults/main.yml @@ -26,6 +26,11 @@ k8s_dirs_to_remove: - "/var/run/kubernetes" - "$HOME/.kube/" +rke2_dirs_to_remove: + - "{{ project_root_dir }}/rke2" + - "/etc/kubernetes" + - "$HOME/.kube/" + intel_dirs_to_remove: - "/etc/ssl/tas" - "/etc/ssl/gas" @@ -51,6 +56,7 @@ intel_services_to_stop: - "sst-bf-configure.service" - "sst-tf-configure.service" - "vpp.service" + - "configure-sgx-udev.service" # Mentioned below folder location must match with roles/bootstrap/install_qat_drivers_services/defaults/main.yml qat_drivers_dir: "{{ (project_root_dir, 'qat_drivers') | path_join }}" diff --git a/roles/redeploy_cleanup/tasks/k8s_cleanup.yml b/roles/redeploy_cleanup/tasks/k8s_cleanup.yml index a0e72365..38c918d3 100644 --- a/roles/redeploy_cleanup/tasks/k8s_cleanup.yml +++ b/roles/redeploy_cleanup/tasks/k8s_cleanup.yml @@ -236,7 +236,6 @@ - include_role: name: remove_kubespray_host_dns_settings - when: remove_kubespray_host_dns_settings | default(false) - debug: msg: "k8s cluster has been removed ..." diff --git a/roles/redeploy_cleanup/tasks/main.yml b/roles/redeploy_cleanup/tasks/main.yml index 9b440ff6..75284516 100644 --- a/roles/redeploy_cleanup/tasks/main.yml +++ b/roles/redeploy_cleanup/tasks/main.yml @@ -14,70 +14,132 @@ ## limitations under the License. ## --- -- name: uninstall elasticsearch +- name: uninstall prometheus and grafana include_role: + name: kube_prometheus + tasks_from: cleanup.yml + tags: + - kube_prometheus + +- name: uninstall elasticsearch + ansible.builtin.include_role: name: elasticsearch_install tasks_from: cleanup tags: - elasticsearch - name: uninstall opentelemetry - include_role: + ansible.builtin.include_role: name: opentelemetry_install tasks_from: cleanup tags: - opentelemetry +# TODO: missing cleanup for Collectd +# - name: Remove Collectd +# ansible.builtin.include_role: +# name: collectd_install +# tasks_from: cleanup +# tags: +# - monitoring + +- name: Remove Telegraf + ansible.builtin.include_role: + name: telegraf_install + tasks_from: cleanup + tags: + - monitoring + - name: uninstall cAdvisor - include_role: + ansible.builtin.include_role: name: cadvisor_install - tasks_from: cleanup_cadvisor + tasks_from: cleanup tags: - cadvisor - name: uninstall LinkerD - include_role: + ansible.builtin.include_role: name: linkerd_service_mesh tasks_from: uninstall tags: - - linkerd_service_mesh + - linkerd-service-mesh - name: cleanup cpu_ctlplane - include_role: + ansible.builtin.include_role: name: intel_cpu_controlplane tasks_from: cleanup_cpu_controlplane tags: - - cpu_ctlplane + - cpu-ctlplane - name: cleanup Rook - include_role: + ansible.builtin.include_role: name: rook_install tasks_from: cleanup_rook tags: - rook-ceph -- name: cleanup Intel AI +- name: cleanup Intel Media Analytics + ansible.builtin.include_role: + name: intel_media_analytics + tasks_from: cleanup_intel_media_analytics + tags: + - intel-media-analytics + +- name: cleanup FFmpeg include_role: - name: intel_ai - tasks_from: cleanup_intel_ai + name: ffmpeg_install + tasks_from: ffmpeg_cleanup tags: - - intel-ai + - intel-ffmpeg + +- name: cleanup sigstore policy controller + ansible.builtin.include_role: + name: sigstore_policy_controller + tasks_from: cleanup + tags: + - sigstore + +- name: cleanup container_registry + ansible.builtin.include_role: + name: container_registry + tasks_from: cleanup + tags: + - registry + +- name: cleanup dyna config dpdk + include_role: + name: configure_dpdk + tasks_from: cleanup + tags: + - dyna_config_dpdk + +- name: cleanup intel oneAPI kits + include_role: + name: intel_oneapi_install + tasks_from: cleanup + tags: + - intel-oneapi - name: reset and remove Kubernetes cluster - import_tasks: k8s_cleanup.yml + ansible.builtin.import_tasks: k8s_cleanup.yml + when: kube_provisioner == "kubespray" + +- name: reset and remove rke2 cluster + ansible.builtin.import_tasks: rke2_cleanup.yml + when: kube_provisioner == "rke2" - name: remove Intel Container Experience Kit features artifacts - import_tasks: intel_cleanup.yml + ansible.builtin.import_tasks: intel_cleanup.yml - name: uninstall MinIO - include_role: + ansible.builtin.include_role: name: minio_install tasks_from: cleanup_minio_main tags: - minio - name: reboot - debug: + ansible.builtin.debug: msg: rebooting after pre-redeploy cleanup changed_when: true notify: diff --git a/roles/redeploy_cleanup/tasks/rke2_cleanup.yml b/roles/redeploy_cleanup/tasks/rke2_cleanup.yml new file mode 100644 index 00000000..611db55e --- /dev/null +++ b/roles/redeploy_cleanup/tasks/rke2_cleanup.yml @@ -0,0 +1,70 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- ansible.builtin.debug: + msg: "Starting to remove rke2 cluster" + +- name: remove rke2 cluster + block: + - name: check if rke2-uninstall.sh exists + ansible.builtin.stat: + path: /usr/local/bin/rke2-uninstall.sh + register: rke2_uninstall_sh + when: + - inventory_hostname == groups['kube_control_plane'][0] + + - name: use rke2-uninstall.sh to uninstall rke2 + ansible.builtin.command: >- + /usr/local/bin/rke2-uninstall.sh + when: + - rke2_uninstall_sh.stat.exists + - inventory_hostname == groups['kube_control_plane'][0] + register: result + failed_when: "'error' in result.stderr" + +- name: remove rke2 cluster files + ansible.builtin.import_tasks: remove_files.yml + vars: + files_to_delete: "{{ rke2_dirs_to_remove }}" + changed_when: false + +- name: remove copied file + block: + - name: remove binary copied from rke2 + ansible.builtin.file: + path: /usr/local/bin/{{ item }} + state: absent + with_items: + - "kubelet" + - "kubectl" + - "containerd" + - "containerd-shim" + - "containerd-shim-runc-v1" + - "containerd-shim-runc-v2" + - "crictl" + - "ctr" + - "runc" + + - name: remove crictl config copied during rke2 + ansible.builtin.file: path="/etc/crictl.yaml" state=absent + +- name: uninstall helm + ansible.builtin.file: + path: /usr/local/bin/helm + state: absent + +- ansible.builtin.debug: + msg: "Done removing rke2 cluster ..." diff --git a/roles/remove_kubespray_host_dns_settings/tasks/main.yml b/roles/remove_kubespray_host_dns_settings/tasks/main.yml index 305a4941..faf4bd0b 100644 --- a/roles/remove_kubespray_host_dns_settings/tasks/main.yml +++ b/roles/remove_kubespray_host_dns_settings/tasks/main.yml @@ -32,6 +32,13 @@ marker: "# Ansible inventory hosts {mark}" failed_when: false +- name: reset entries in /etc/systemd/resolved.conf + lineinfile: + path: "/etc/systemd/resolved.conf" + state: absent + regexp: "^[A-Z#].*" + failed_when: false + - name: run dhclient to get IP after restarting network in case of failure command: "dhclient" changed_when: true diff --git a/roles/rke2_defaults/tasks/rke2_preflight.yml b/roles/rke2_defaults/tasks/rke2_preflight.yml new file mode 100644 index 00000000..73dfc0fd --- /dev/null +++ b/roles/rke2_defaults/tasks/rke2_preflight.yml @@ -0,0 +1,41 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: check linux distro version and kernel for RKE2 + ansible.builtin.assert: + that: > + - (ansible_distribution == 'Ubuntu' and ansible_distribution_version == '22.04') + msg: + - "RKE2 is supported only on Ubuntu 22.04 with RA" + +- name: check container runtime for RKE2 + ansible.builtin.assert: + that: container_runtime == 'containerd' + fail_msg: + - "RKE2 is supported only with containerd, please set it in group_vars/all.yml" + success_msg: "RKE2 container runtime set to containerd" + +- name: check k8s network plugin for rke2 + block: + - name: check kube_network_plugin for rke2 + ansible.builtin.assert: + that: kube_network_plugin in ['canal', 'calico', 'cilium'] + fail_msg: "{{ kube_network_plugin }} is not supported on rke2, please correct the configuration in groups/all.yml" + - name: check calico_network_backend for rke2 + ansible.builtin.assert: + that: calico_network_backend == "vxlan" + fail_msg: "{{ calico_network_backend }} is not supported on rke2, please correct the configuration in groups/all.yml" + when: kube_network_plugin == 'calico' diff --git a/roles/rke2_defaults/tasks/rke2_registries.yaml b/roles/rke2_defaults/tasks/rke2_registries.yaml new file mode 100644 index 00000000..c59be1b5 --- /dev/null +++ b/roles/rke2_defaults/tasks/rke2_registries.yaml @@ -0,0 +1,27 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: setup registry config file + ansible.builtin.template: + src: rke2_registries.yaml.j2 + dest: "{{ rke2_conf_dir }}/registries.yaml" + mode: 0644 + force: yes + +- name: restart rke2 server + ansible.builtin.systemd: + name: rke2-server.service + state: restarted diff --git a/roles/rke2_defaults/templates/rke2_registries.yaml.j2 b/roles/rke2_defaults/templates/rke2_registries.yaml.j2 new file mode 100644 index 00000000..7892e239 --- /dev/null +++ b/roles/rke2_defaults/templates/rke2_registries.yaml.j2 @@ -0,0 +1,22 @@ +mirrors: +{% if registry_enable == true %} + {{ registry_local_address }}: + endpoint: + - "https://{{ registry_local_address }}" +{% endif %} + +configs: +{% if registry_enable == true %} + "{{ registry_local_address }}": + auth: + auth: {{ ("docker" + ':' + registry_password ) | b64encode }} + tls: + insecure_skip_verify: true +{% endif %} +{% if intel_sriov_fec_operator_enabled | default(false) and container_runtime == "containerd" %} + "registry.redhat.io": + auth: + auth: {{ (redhat_user + ':' + redhat_password) | b64encode }} + tls: + insecure_skip_verify: true +{% endif %} diff --git a/roles/rke2_kubernetes_apps/cert_manager_install/defaults/main.yml b/roles/rke2_kubernetes_apps/cert_manager_install/defaults/main.yml new file mode 100644 index 00000000..d64cbfad --- /dev/null +++ b/roles/rke2_kubernetes_apps/cert_manager_install/defaults/main.yml @@ -0,0 +1,18 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +cert_manager_version: "v1.11.1" +cert_manager_crd_url: "https://github.com/cert-manager/cert-manager/releases/download/{{ cert_manager_version }}/cert-manager.crds.yaml" diff --git a/roles/rke2_kubernetes_apps/cert_manager_install/tasks/main.yml b/roles/rke2_kubernetes_apps/cert_manager_install/tasks/main.yml new file mode 100644 index 00000000..9a846ee8 --- /dev/null +++ b/roles/rke2_kubernetes_apps/cert_manager_install/tasks/main.yml @@ -0,0 +1,53 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: Apply cert manager CRD + block: + - name: download cert manager CRD yaml file + ansible.builtin.get_url: + url: "{{ cert_manager_crd_url }}" + dest: "{{ (rke2_root_dir, 'cert_manager_crd.yaml') | path_join }}" + mode: 0640 + register: cert_manager_downloaded + retries: "{{ number_of_retries | default(10) }}" + until: cert_manager_downloaded is succeeded + delay: "{{ retry_delay | default(3) }}" + + - name: Apply cert manager CRD + kubernetes.core.k8s: + state: present + src: "{{ (rke2_root_dir, 'cert_manager_crd.yaml') | path_join }}" + +- name: Add Jetstack Helm Repository + kubernetes.core.helm_repository: + name: jetstack + repo_url: https://charts.jetstack.io + +- name: Create cert manager namespace + kubernetes.core.k8s: + kind: Namespace + state: present + name: cert-manager + +- name: deploy cert manager + kubernetes.core.helm: + chart_ref: "jetstack/cert-manager" + chart_version: "{{ cert_manager_version }}" + release_name: cert-manager + release_namespace: cert-manager + state: present + wait: true + timeout: 15m0s diff --git a/roles/rke2_kubernetes_apps/dashboard/defaults/main.yml b/roles/rke2_kubernetes_apps/dashboard/defaults/main.yml new file mode 100644 index 00000000..6955a28b --- /dev/null +++ b/roles/rke2_kubernetes_apps/dashboard/defaults/main.yml @@ -0,0 +1,18 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +dashboard_image_tag: "v2.7.0" +dashboard_install_url: "https://raw.githubusercontent.com/kubernetes/dashboard/{{ dashboard_image_tag }}/aio/deploy/recommended.yaml" diff --git a/roles/rke2_kubernetes_apps/dashboard/tasks/main.yml b/roles/rke2_kubernetes_apps/dashboard/tasks/main.yml new file mode 100644 index 00000000..016f323f --- /dev/null +++ b/roles/rke2_kubernetes_apps/dashboard/tasks/main.yml @@ -0,0 +1,32 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: Engage dashboard + block: + - name: download dashboard yaml file + ansible.builtin.get_url: + url: "{{ dashboard_install_url }}" + dest: "{{ (rke2_root_dir, 'dashboard.yaml') | path_join }}" + mode: 0640 + register: dashboard_download + retries: "{{ number_of_retries | default(5) }}" + until: dashboard_download is succeeded + delay: "{{ retry_delay | default(3) }}" + + - name: Apply dashboard yaml file + kubernetes.core.k8s: + state: present + src: "{{ (rke2_root_dir, 'dashboard.yaml') | path_join }}" diff --git a/roles/rke2_kubernetes_apps/helm/defaults/main.yml b/roles/rke2_kubernetes_apps/helm/defaults/main.yml new file mode 100644 index 00000000..be6b55a4 --- /dev/null +++ b/roles/rke2_kubernetes_apps/helm/defaults/main.yml @@ -0,0 +1,31 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +_host_architecture_groups: + x86_64: amd64 + aarch64: arm64 + armv7l: arm +host_architecture: >- + {%- if ansible_architecture in _host_architecture_groups -%} + {{ _host_architecture_groups[ansible_architecture] }} + {%- else -%} + {{ ansible_architecture }} + {%- endif -%} +image_arch: "{{host_architecture | default('amd64')}}" + +helm_version: "v3.11.3" +helm_download_url: "https://get.helm.sh/helm-{{ helm_version }}-linux-{{ image_arch }}.tar.gz" +helm_dest: "{{ rke2_root_dir }}/helm-{{ helm_version }}" diff --git a/roles/rke2_kubernetes_apps/helm/tasks/main.yml b/roles/rke2_kubernetes_apps/helm/tasks/main.yml new file mode 100644 index 00000000..dfdd74b8 --- /dev/null +++ b/roles/rke2_kubernetes_apps/helm/tasks/main.yml @@ -0,0 +1,45 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: Create helm dest + ansible.builtin.file: + path: "{{ helm_dest }}" + state: directory + mode: 0755 + +- name: Get Helm + ansible.builtin.get_url: + url: "{{ helm_download_url }}" + dest: "{{ helm_dest }}" + mode: 0644 + register: helm_download + retries: 3 + until: helm_download is success + +- name: Unpack helm + ansible.builtin.unarchive: + src: "{{ helm_download.dest }}" + dest: "{{ helm_dest }}" + remote_src: true + list_files: yes + mode: 0774 + +- name: Helm | Copy helm binary from download dir + ansible.builtin.copy: + src: "{{ helm_dest }}/linux-{{ image_arch }}/helm" + dest: "/usr/local/bin/helm" + mode: 0755 + remote_src: true diff --git a/roles/rke2_kubernetes_apps/rancher/defaults/main.yml b/roles/rke2_kubernetes_apps/rancher/defaults/main.yml new file mode 100644 index 00000000..213cdc16 --- /dev/null +++ b/roles/rke2_kubernetes_apps/rancher/defaults/main.yml @@ -0,0 +1,18 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +rancher_namespace: cattle-system +rancher_version: '2.7.3' diff --git a/roles/rke2_kubernetes_apps/rancher/tasks/main.yml b/roles/rke2_kubernetes_apps/rancher/tasks/main.yml new file mode 100644 index 00000000..9388ae56 --- /dev/null +++ b/roles/rke2_kubernetes_apps/rancher/tasks/main.yml @@ -0,0 +1,54 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: Install rancher + block: + - name: Add Rancher repo + kubernetes.core.helm_repository: + name: rancher-stable + repo_url: https://releases.rancher.com/server-charts/stable + + - name: Create rancher namespace + kubernetes.core.k8s: + kind: Namespace + state: present + name: "{{ rancher_namespace}}" + + - name: Generate rancher bootstrap password if not provided + ansible.builtin.set_fact: + bootstrap_password: "{{ lookup('ansible.builtin.password', '/dev/null', chars=['ascii_letters', 'digits']) }}" + no_log: true + when: + - (bootstrap_password is not defined) or (not bootstrap_password) + run_once: true + + - name: Install rancher + kubernetes.core.helm: + chart_ref: "rancher-stable/rancher" + chart_version: "{{ rancher_version }}" + release_name: rancher + release_namespace: "{{ rancher_namespace }}" + values: + hostname: "{{ ansible_default_ipv4.address }}.sslip.io" + bootstrapPassword: "{{ bootstrap_password }}" + replicas: 1 + global: + cattle: + psp: + enabled: false + state: present + wait: true + timeout: 15m0s diff --git a/roles/rke2_target_setup/tasks/main.yml b/roles/rke2_target_setup/tasks/main.yml new file mode 100644 index 00000000..6aa70b01 --- /dev/null +++ b/roles/rke2_target_setup/tasks/main.yml @@ -0,0 +1,52 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: install dependencies + ansible.builtin.include_role: + name: install_dependencies + +- name: Ensure etcd is configured properly + block: + - name: Ensure group "etcd" exists + ansible.builtin.group: + name: etcd + state: present + - name: Add the user 'etcd' with a primary group of 'etcd' + ansible.builtin.user: + name: etcd + comment: etcd user + create_home: false + system: true + shell: /sbin/nologin + group: etcd + become: true + when: inventory_hostname in groups['etcd'] + +- name: set kubelet requirements in sysctl + ansible.builtin.copy: + dest: /etc/sysctl.d/90-kubelet.conf + content: | + vm.panic_on_oom=0 + vm.overcommit_memory=1 + kernel.panic=10 + kernel.panic_on_oops=1 + mode: 0755 + become: true + +- name: apply kubelet sysctl changes + ansible.builtin.command: sysctl -p /etc/sysctl.d/90-kubelet.conf + become: true + changed_when: true diff --git a/roles/rke2_target_setup/vars/main.yml b/roles/rke2_target_setup/vars/main.yml new file mode 100644 index 00000000..68ce82c5 --- /dev/null +++ b/roles/rke2_target_setup/vars/main.yml @@ -0,0 +1,20 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +install_dependencies: + Debian: + - sshpass + - curl diff --git a/roles/rook_install/tasks/preflight_rook.yml b/roles/rook_install/tasks/preflight_rook.yml index a1878cd0..ef443a2a 100755 --- a/roles/rook_install/tasks/preflight_rook.yml +++ b/roles/rook_install/tasks/preflight_rook.yml @@ -56,4 +56,4 @@ # - rook_install is defined and rook_install.enabled # any_errors_fatal: true # tags: -# - rook_ceph +# - rook-ceph diff --git a/roles/sigstore_policy_controller/defaults/main.yml b/roles/sigstore_policy_controller/defaults/main.yml new file mode 100644 index 00000000..64287a1b --- /dev/null +++ b/roles/sigstore_policy_controller/defaults/main.yml @@ -0,0 +1,32 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +cosign_version: v2.0.0 +cosign_url: github.com/sigstore/cosign/v2/cmd/cosign +cosign_namespace: cosign-system +cosign_pubkey_secret: cosign-pubkey + +cosign_enforce_namespace: my-cosign-namespace +cosign_password: +cosign_key_secret: cosign-key +container_registry_secret: container-registry-secret + +sigstore_chart_name: sigstore +sigstore_chart_repo: https://sigstore.github.io/helm-charts +policy_controller_release: 0.5.6 +sigstore_chart_tag: "policy-controller-{{ policy_controller_release }}" +policy_controller_release_name: policy-controller +policy_controller_dir: "{{ (project_root_dir, 'policy-controller') | path_join }}" diff --git a/roles/sigstore_policy_controller/meta/main.yml b/roles/sigstore_policy_controller/meta/main.yml new file mode 100644 index 00000000..8abed973 --- /dev/null +++ b/roles/sigstore_policy_controller/meta/main.yml @@ -0,0 +1,18 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +dependencies: + - role: container_registry_shared_vars diff --git a/roles/sigstore_policy_controller/tasks/cleanup.yml b/roles/sigstore_policy_controller/tasks/cleanup.yml new file mode 100644 index 00000000..99bd2673 --- /dev/null +++ b/roles/sigstore_policy_controller/tasks/cleanup.yml @@ -0,0 +1,65 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- debug: + msg: "start to remove sigstore policy controller feature ..." + tags: + - sigstore + +- name: uninstall sigstore policy controller feature + block: + - name: delete sigstore custom resources + kubernetes.core.k8s: + state: absent + wait: true + definition: + api_version: v1 + kind: CustomResourceDefinition + metadata: + name: "{{ item }}" + loop: + - clusterimagepolicies.policy.sigstore.dev + - trustroots.policy.sigstore.dev + + - name: helm uninstall previous setup + kubernetes.core.helm: + name: "policy-controller" + state: absent + namespace: "{{ cosign_namespace }}" + wait: true + timeout: 4m0s + + - name: clear example enforce namespace + kubernetes.core.k8s: + name: "{{ cosign_enforce_namespace }}" + api_version: v1 + kind: Namespace + state: absent + wait: true + wait_timeout: 240 + + - name: clear cosign-system namespace + kubernetes.core.k8s: + name: "{{ cosign_namespace }}" + api_version: v1 + kind: Namespace + state: absent + wait: true + wait_timeout: 240 + when: + - inventory_hostname == groups['kube_control_plane'][0] + tags: + - sigstore diff --git a/roles/sigstore_policy_controller/tasks/enforce_namespace.yml b/roles/sigstore_policy_controller/tasks/enforce_namespace.yml new file mode 100644 index 00000000..065c7c43 --- /dev/null +++ b/roles/sigstore_policy_controller/tasks/enforce_namespace.yml @@ -0,0 +1,128 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: create enforce namespace if doesn't exist and mark the namespace to be enforced by policy-controller + kubernetes.core.k8s: + state: present + definition: + apiVersion: v1 + kind: Namespace + metadata: + name: "{{ cosign_enforce_namespace }}" + labels: + 'policy.sigstore.dev/include': 'true' + +- name: create the cosign public/private key pair for enforce namespace + block: + - name: query existing cosign key pair + kubernetes.core.k8s_info: + api_version: v1 + kind: Secret + name: "{{ cosign_key_secret }}" + namespace: "{{ cosign_enforce_namespace }}" + register: cosign_secret + + - name: Generate cosign password if not provided + ansible.builtin.set_fact: + cosign_password: "{{ lookup('ansible.builtin.password', '/dev/null') }}" + no_log: true + run_once: true + when: + - (cosign_password is not defined) or (not cosign_password) + - cosign_secret.resources | length == 0 + + - name: create the cosign key pair + shell: > + export COSIGN_PASSWORD={{ cosign_password }} && + source /etc/profile.d/golang.sh && + cosign generate-key-pair k8s://{{ cosign_enforce_namespace }}/{{ cosign_key_secret }} + args: + executable: /bin/bash + changed_when: true + when: cosign_secret.resources | length == 0 + +- name: import public key for policy-controller + block: + - name: get cosign secret + kubernetes.core.k8s_info: + api_version: v1 + kind: Secret + name: "{{ cosign_key_secret }}" + namespace: "{{ cosign_enforce_namespace }}" + register: cosign_secret + no_log: true + + - name: create public key secret for policy-controller + kubernetes.core.k8s: + state: present + definition: + api_version: v1 + kind: Secret + metadata: + name: "{{ cosign_pubkey_secret }}" + namespace: "{{ cosign_namespace }}" + data: + cosign.pub: "{{ cosign_secret['resources'][0]['data']['cosign.pub'] }}" + +- name: create the registry auth secret in the enforce namespace to pull image + block: + - name: get registry auth + ansible.builtin.slurp: + src: "{{ registry_auth_path }}" + register: reg_auth + no_log: true + + - name: create secret in enforce namespace + kubernetes.core.k8s: + state: present + definition: + api_version: v1 + kind: Secret + metadata: + name: "{{ container_registry_secret }}" + namespace: "{{ cosign_enforce_namespace }}" + data: + .dockerconfigjson: "{{ reg_auth['content'] }}" + type: kubernetes.io/dockerconfigjson + no_log: true + +- name: create the enforce pubkey policy example + block: + - name: create enforce pubkey policy crd yaml + ansible.builtin.template: + src: "key-cosign-verification.yaml.j2" + dest: "{{ (policy_controller_dir, 'key-cosign-verification.yaml') | path_join }}" + force: yes + mode: preserve + + - name: apply enforce pubkey policy crd yaml + kubernetes.core.k8s: + state: present + src: "{{ (policy_controller_dir, 'key-cosign-verification.yaml') | path_join }}" + +- name: create the enforce keyless policy example + block: + - name: create enforce keyless policy crd yaml + ansible.builtin.template: + src: "keyless-cosign-verification.yaml.j2" + dest: "{{ (policy_controller_dir, 'keyless-cosign-verification.yaml') | path_join }}" + force: yes + mode: preserve + + - name: apply enforce keyless policy crd yaml + kubernetes.core.k8s: + state: present + src: "{{ (policy_controller_dir, 'keyless-cosign-verification.yaml') | path_join }}" diff --git a/roles/sigstore_policy_controller/tasks/main.yml b/roles/sigstore_policy_controller/tasks/main.yml new file mode 100644 index 00000000..5bd550b7 --- /dev/null +++ b/roles/sigstore_policy_controller/tasks/main.yml @@ -0,0 +1,121 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: create policy_controller directory if needed + ansible.builtin.file: + path: "{{ policy_controller_dir }}" + state: directory + mode: 0755 + when: + - inventory_hostname == groups['kube_control_plane'][0] + +- name: create cosign namespace if doesn't exist + kubernetes.core.k8s: + name: "{{ cosign_namespace }}" + api_version: v1 + kind: Namespace + state: present + when: + - inventory_hostname == groups['kube_control_plane'][0] + +- name: import container_registry public certificate + block: + - name: get registry secret + kubernetes.core.k8s_info: + api_version: v1 + kind: Secret + name: "{{ registry_tls_secret_name }}" + namespace: "{{ registry_namespace }}" + register: registry_secret + no_log: true + + - name: install kubernetes root ca for cosign + ansible.builtin.copy: + dest: |- + {% if ansible_os_family == "Debian" -%} + /usr/local/share/ca-certificates/{{ inventory_hostname }}_ca.crt + {%- elif ansible_os_family == "RedHat" -%} + /etc/pki/ca-trust/source/anchors/{{ inventory_hostname }}_ca.crt + {%- endif %} + src: /etc/kubernetes/ssl/ca.crt + remote_src: true + mode: preserve + changed_when: true + + - name: update ca-certificates (Debian) + ansible.builtin.command: update-ca-certificates + changed_when: true + when: + - ansible_os_family == "Debian" + + - name: update ca-certificates (RedHat) + ansible.builtin.command: update-ca-trust extract + changed_when: true + when: + - ansible_os_family == "RedHat" + + - name: create configmap for policy-controller access this registry + kubernetes.core.k8s: + state: present + definition: + api_version: v1 + kind: ConfigMap + metadata: + name: "ca-bundle-config" + namespace: "{{ cosign_namespace }}" + data: + ca-bundle.crt: "{{ registry_secret['resources'][0]['data']['tls.crt'] | b64decode }}" + when: + - registry_enable | default(false) + - inventory_hostname == groups['kube_control_plane'][0] + +- name: helm install policy-controller + block: + - name: create helm values yaml file + ansible.builtin.template: + src: "values.yaml.j2" + dest: "{{ (policy_controller_dir, 'policy-controller-values.yaml') | path_join }}" + force: yes + mode: preserve + + - name: add policy-controller chart repo + kubernetes.core.helm_repository: + name: "{{ sigstore_chart_name }}" + repo_url: "{{ sigstore_chart_repo }}" + + - name: deploy policy-controller + kubernetes.core.helm: + chart_ref: "{{ sigstore_chart_name }}/{{ policy_controller_release_name }}" + chart_version: "{{ policy_controller_release }}" + release_name: "{{ policy_controller_release_name }}" + release_namespace: "{{ cosign_namespace }}" + values_files: "{{ (policy_controller_dir, 'policy-controller-values.yaml') | path_join }}" + wait: true + timeout: 4m0s + when: + - inventory_hostname == groups['kube_control_plane'][0] + +- name: install cosign tool for container image signing + ansible.builtin.command: go install {{ cosign_url }}@{{ cosign_version }} + changed_when: true + when: + - inventory_hostname == groups['kube_control_plane'][0] + +- name: install example namespace to enforce policy-controller + include_tasks: enforce_namespace.yml + when: + - registry_enable | default(false) + - inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/sigstore_policy_controller/tasks/preflight.yml b/roles/sigstore_policy_controller/tasks/preflight.yml new file mode 100644 index 00000000..62a8e091 --- /dev/null +++ b/roles/sigstore_policy_controller/tasks/preflight.yml @@ -0,0 +1,22 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +- name: Sigstore | Container registry must be enabled + assert: + that: registry_enable | default(false) + fail_msg: |- + Sigstore policy controller is enabled, but Container Registry is disabled. + Please enable container registry in group_vars. + when: inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/sigstore_policy_controller/templates/key-cosign-verification.yaml.j2 b/roles/sigstore_policy_controller/templates/key-cosign-verification.yaml.j2 new file mode 100644 index 00000000..87b9d81a --- /dev/null +++ b/roles/sigstore_policy_controller/templates/key-cosign-verification.yaml.j2 @@ -0,0 +1,11 @@ +apiVersion: policy.sigstore.dev/v1alpha1 +kind: ClusterImagePolicy +metadata: + name: cosign-image-policy-pubkey +spec: + images: + - glob: "{{ cosign_registry_address }}/key/*" + authorities: + - key: + secretRef: + name: "{{ cosign_pubkey_secret }}" diff --git a/roles/sigstore_policy_controller/templates/keyless-cosign-verification.yaml.j2 b/roles/sigstore_policy_controller/templates/keyless-cosign-verification.yaml.j2 new file mode 100644 index 00000000..bbe3e54c --- /dev/null +++ b/roles/sigstore_policy_controller/templates/keyless-cosign-verification.yaml.j2 @@ -0,0 +1,13 @@ +apiVersion: policy.sigstore.dev/v1alpha1 +kind: ClusterImagePolicy +metadata: + name: cosign-image-policy-keyless +spec: + images: + - glob: "{{ cosign_registry_address }}/keyless/*" + authorities: + - keyless: + url: https://fulcio.sigstore.dev + identities: + - issuerRegExp: {{ cosign_issuer | default('https://github.com/login/oauth') }} + subjectRegExp: {{ cosign_subject | default('john.doe@example.com') }} diff --git a/roles/sigstore_policy_controller/templates/values.yaml.j2 b/roles/sigstore_policy_controller/templates/values.yaml.j2 new file mode 100644 index 00000000..1f84d2a5 --- /dev/null +++ b/roles/sigstore_policy_controller/templates/values.yaml.j2 @@ -0,0 +1,13 @@ +webhook: + env: +{% if "https_proxy" in proxy_env %} + https_proxy: "{{ proxy_env.https_proxy }}" +{% endif %} +{% if "no_proxy" in proxy_env %} + no_proxy: "{{ proxy_env.no_proxy }}" +{% endif %} +{% if registry_enable|d(false) %} + registryCaBundle: + name: ca-bundle-config + key: ca-bundle.crt +{% endif %} diff --git a/roles/sigstore_policy_controller/vars/main.yml b/roles/sigstore_policy_controller/vars/main.yml new file mode 100644 index 00000000..fa61a6ec --- /dev/null +++ b/roles/sigstore_policy_controller/vars/main.yml @@ -0,0 +1,17 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +cosign_registry_address: "{{ registry_local_address }}" diff --git a/roles/sriov_cni_install/tasks/main.yml b/roles/sriov_cni_install/tasks/main.yml index cf541c58..a3c6d1c0 100644 --- a/roles/sriov_cni_install/tasks/main.yml +++ b/roles/sriov_cni_install/tasks/main.yml @@ -18,6 +18,11 @@ include_role: name: install_dependencies +# WA till upstream fix reach official version - issue https://github.com/k8snetworkplumbingwg/sriov-cni/issues/241 +- name: WA till upstream fix reach official version for issue 241 - use commit id + set_fact: + sriov_cni_version: "c1faa0805c92be6e5c629e9caf481f43cfee866c" + - name: clone sriov-cni repository git: repo: "{{ sriov_cni_url }}" @@ -35,6 +40,12 @@ command: "go env -w GOFLAGS=-mod=mod" changed_when: true +# WA for upstream idempotency issue https://github.com/k8snetworkplumbingwg/sriov-cni/issues/263 +- name: WA for issue 263 - cleanup sriov-cni plugin + make: + chdir: "{{ sriov_cni_dir }}" + target: clean + - name: build sriov-cni plugin make: chdir: "{{ sriov_cni_dir }}" diff --git a/roles/sriov_network_operator_install/defaults/main.yml b/roles/sriov_network_operator_install/defaults/main.yml index a2c785a1..c8de552a 100644 --- a/roles/sriov_network_operator_install/defaults/main.yml +++ b/roles/sriov_network_operator_install/defaults/main.yml @@ -24,7 +24,7 @@ sriov_network_operator_helm_release_name: "sriov-network-operator" sriov_network_operator_version: "v1.2.0" network_resources_injector_version: "v1.5" -ib_sriov_cni_version: "v1.0.2" +ib_sriov_cni_version: "v1.0.3" # helm values defaults sriov_network_operator_images: diff --git a/roles/sriov_network_operator_install/tasks/sriov_network_operator_install.yml b/roles/sriov_network_operator_install/tasks/sriov_network_operator_install.yml index b5822087..0c4c6e6a 100644 --- a/roles/sriov_network_operator_install/tasks/sriov_network_operator_install.yml +++ b/roles/sriov_network_operator_install/tasks/sriov_network_operator_install.yml @@ -99,7 +99,7 @@ - name: wait for SriovNetworkNodeState CR to be created and sync completed k8s_info: kind: SriovNetworkNodeState - name: "{{ item }}" + name: "{{ hostvars[item]['ansible_hostname'] }}" namespace: "{{ sriov_network_operator_namespace }}" register: cr_status retries: 30 diff --git a/roles/sriov_shared_versions/defaults/main.yml b/roles/sriov_shared_versions/defaults/main.yml index 847b3cee..c783f0ab 100644 --- a/roles/sriov_shared_versions/defaults/main.yml +++ b/roles/sriov_shared_versions/defaults/main.yml @@ -19,3 +19,5 @@ sriov_net_dp_tag: "v3.5.1" sriov_cni_url: "https://github.com/k8snetworkplumbingwg/sriov-cni.git" sriov_cni_version: "v2.7.0" +# Once new version is released, remove version WA from roles/sriov_cni_install/tasks/main.yml +# sriov_cni_version: "c1faa0805c92be6e5c629e9caf481f43cfee866c" diff --git a/roles/tac_install/vars/main.yml b/roles/tac_install/vars/main.yml index 610914a1..08491f74 100644 --- a/roles/tac_install/vars/main.yml +++ b/roles/tac_install/vars/main.yml @@ -18,14 +18,7 @@ # Define respective field in the group_vars/all.yml instead tac_defaults: enabled: false - apphsm_hostname: | - {%- if vm_enabled %} - {{ hostvars[groups['kube_node'][0]]['ansible_all_ipv4_addresses'] | - ansible.utils.ipaddr(hostvars[groups['vm_host'][0]]['vxlan_gw_ip']) | - join('') }} - {%- else %} - {{ hostvars[groups['kube_node'][0]]['ansible_default_ipv4']['address'] }} - {%- endif %} + apphsm_hostname: "kmra-apphsm.kmra.svc.{{ cluster_name | default('cluster.local') }}" apphsm_port: 5000 client_mtls_secret_name: "generic-apphsm-client-tls" client_mtls_secret_namespace: "kmra" diff --git a/roles/tadk_install/defaults/main.yml b/roles/tadk_install/defaults/main.yml index ef190499..b02a0373 100644 --- a/roles/tadk_install/defaults/main.yml +++ b/roles/tadk_install/defaults/main.yml @@ -18,7 +18,7 @@ dest_path: "{{ (project_root_dir, 'charts') | path_join }}" image_registry: intel image_name: tadk-waf -tadk_version: "v22.09.0" +tadk_version: "v23.03.0" container_port: 8005 service_type: NodePort diff --git a/roles/tcs_install/defaults/main.yml b/roles/tcs_install/defaults/main.yml index db570244..37723a2d 100644 --- a/roles/tcs_install/defaults/main.yml +++ b/roles/tcs_install/defaults/main.yml @@ -15,7 +15,7 @@ ## --- tcs_git_repo_url: https://github.com/intel/trusted-certificate-issuer -tcs_git_version: 0.4.0 +tcs_git_version: 0.5.0 tcs_git_path: "{{ (project_root_dir, 'tcs') | path_join }}" tcs_image_tag: "{{ tcs_git_version }}" tac_image_name: intel/trusted-certificate-issuer diff --git a/roles/tcs_install/tasks/local_build.yml b/roles/tcs_install/tasks/local_build.yml index 2fe8ecfc..fd74cbad 100644 --- a/roles/tcs_install/tasks/local_build.yml +++ b/roles/tcs_install/tasks/local_build.yml @@ -96,14 +96,3 @@ changed_when: false when: - '"docker" not in container_runtime' - -# this task should be removed when TCS fix issue with containerd -- name: change owner of tokens directory - file: - path: /var/lib/tcs-issuer/tokens - state: directory - recurse: yes - owner: 5000 - group: 5000 - when: - - container_runtime != "docker" diff --git a/roles/tcs_install/tasks/main.yml b/roles/tcs_install/tasks/main.yml index 938ba6b5..b6efca52 100644 --- a/roles/tcs_install/tasks/main.yml +++ b/roles/tcs_install/tasks/main.yml @@ -31,6 +31,17 @@ - inventory_hostname in groups['kube_node'] - tcs.build_image_locally | default(false) +# this task should be removed when TCS fix issue with containerd +- name: change owner of tokens directory + ansible.builtin.file: + path: /var/lib/tcs-issuer/tokens + state: directory + recurse: yes + owner: 5000 + group: 5000 + when: + - container_runtime != "docker" + - name: install TCS include_tasks: tcs_install.yml when: diff --git a/roles/telegraf_install/charts/telegraf/templates/clusterrole.yml b/roles/telegraf_install/charts/telegraf/templates/clusterrole.yaml similarity index 100% rename from roles/telegraf_install/charts/telegraf/templates/clusterrole.yml rename to roles/telegraf_install/charts/telegraf/templates/clusterrole.yaml diff --git a/roles/telegraf_install/charts/telegraf/templates/clusterrolebinding.yml b/roles/telegraf_install/charts/telegraf/templates/clusterrolebinding.yaml similarity index 100% rename from roles/telegraf_install/charts/telegraf/templates/clusterrolebinding.yml rename to roles/telegraf_install/charts/telegraf/templates/clusterrolebinding.yaml diff --git a/roles/telegraf_install/charts/telegraf/templates/configmap_pmu_events.yaml b/roles/telegraf_install/charts/telegraf/templates/configmap_pmu_events.yaml new file mode 100644 index 00000000..2428031d --- /dev/null +++ b/roles/telegraf_install/charts/telegraf/templates/configmap_pmu_events.yaml @@ -0,0 +1,12 @@ +--- +{{- if .Values.telegraf.pmu_events -}} +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ include "telegraf.fullname" . }}-pmu-events + namespace: {{ .Release.Namespace }} + labels: + {{- include "telegraf.labels" . | nindent 4 }} +data: + event_definitions.json: {{ toJson .Values.telegraf.pmu_events | indent 2 }} +{{- end -}} diff --git a/roles/telegraf_install/charts/telegraf/templates/daemonset.yml b/roles/telegraf_install/charts/telegraf/templates/daemonset.yaml similarity index 83% rename from roles/telegraf_install/charts/telegraf/templates/daemonset.yml rename to roles/telegraf_install/charts/telegraf/templates/daemonset.yaml index 360465b7..77f7c4e7 100644 --- a/roles/telegraf_install/charts/telegraf/templates/daemonset.yml +++ b/roles/telegraf_install/charts/telegraf/templates/daemonset.yaml @@ -55,20 +55,16 @@ spec: - name: config mountPath: /etc/telegraf/ readOnly: true - - name: hostself - mountPath: /hostfs/proc/self/mounts + {{- if .Values.telegraf.pmu_events }} + - name: pmu-events + mountPath: /etc/telegraf-pmu/event_definitions.json + # IMPORTANT: telegraf cannot accept symlink created by k8s mount + # subPath do not use symlink but disables auto-update of mount + subPath: event_definitions.json readOnly: true - - name: hostdiskstats - mountPath: /hostfs/proc/diskstats - readOnly: true - - name: hostprocstat - mountPath: /hostfs/proc/stat - readOnly: true - - name: hostnetdev - mountPath: /hostfs/proc/net/dev - readOnly: true - - name: hostutmp - mountPath: /hostfs/var/run/utmp + {{- end }} + - name: hostroot + mountPath: /hostfs readOnly: true - name: hostkdebug mountPath: /sys/kernel/debug @@ -122,24 +118,20 @@ spec: items: - key: telegraf.conf path: telegraf.conf + {{- if .Values.telegraf.pmu_events }} + - name: pmu-events + configMap: + name: {{ include "telegraf.fullname" . }}-pmu-events + items: + - key: event_definitions.json + path: event_definitions.json + {{- end }} - name: tls secret: secretName: {{ include "telegraf.fullname" . }}-tls - - name: hostself - hostPath: - path: /proc/self/mounts - - name: hostdiskstats - hostPath: - path: /proc/diskstats - - name: hostprocstat - hostPath: - path: /proc/stat - - name: hostnetdev - hostPath: - path: /proc/net/dev - - name: hostutmp + - name: hostroot hostPath: - path: /var/run/utmp + path: / - name: hostkdebug hostPath: path: /sys/kernel/debug diff --git a/roles/kubespray_patch/defaults/main.yml b/roles/telegraf_install/charts/telegraf/values.yaml similarity index 94% rename from roles/kubespray_patch/defaults/main.yml rename to roles/telegraf_install/charts/telegraf/values.yaml index d75cde29..cdae118f 100644 --- a/roles/kubespray_patch/defaults/main.yml +++ b/roles/telegraf_install/charts/telegraf/values.yaml @@ -13,5 +13,5 @@ ## See the License for the specific language governing permissions and ## limitations under the License. ## ---- -target_ansible_pkg_mgr: "yum" +telegraf: + pmu_events: diff --git a/roles/telegraf_install/defaults/main.yml b/roles/telegraf_install/defaults/main.yml index f7f5a2e1..6b6d073a 100644 --- a/roles/telegraf_install/defaults/main.yml +++ b/roles/telegraf_install/defaults/main.yml @@ -14,121 +14,13 @@ ## limitations under the License. ## --- -telegraf_plugins_config: - agent: | - [agent] - interval = "5s" - round_interval = true - metric_batch_size = 10000 - metric_buffer_limit = 100000 - collection_jitter = "0s" - flush_interval = "5s" - flush_jitter = "0s" - precision = "" - debug = true - quiet = false - logfile = "" - hostname = "$HOSTNAME" - omit_hostname = false - prometheus_client: | - [[outputs.prometheus_client]] - listen = "127.0.0.1:9272" - metric_version = 2 - export_timestamp = true - output_to_file: | - [[outputs.file]] - files = ["/tmp/metrics.out"] - data_format = "prometheus" - intel_rdt: | - [[inputs.intel_rdt]] - pqos_path = "/usr/local/bin/pqos" - cores = ["0-{{ ansible_processor_vcpus - 1 }}"] - ras: | - [[inputs.ras]] - intel_powerstat: | - [[inputs.intel_powerstat]] - cpu_metrics = [ - "cpu_frequency", - "cpu_busy_frequency", - "cpu_temperature", - "cpu_c1_state_residency", - "cpu_c6_state_residency", - "cpu_busy_cycles" - ] - smart: | - [[inputs.smart]] - use_sudo = true - attributes = true - cpu: | - [[inputs.cpu]] - percpu = true - totalcpu = true - collect_cpu_time = true - report_active = false - diskio: | - [[inputs.diskio]] - device_tags = ["ID_FS_TYPE", "ID_FS_USAGE"] - ethtool: | - [[inputs.ethtool]] - net: | - [[inputs.net]] - iptables: | - [[inputs.iptables]] - use_sudo = true - use_lock = false - table = "filter" - chains = [ "INPUT" ] - system: | - [[inputs.system]] - kernel_vmstat: | - [[inputs.kernel_vmstat]] - cgroups: | - [[inputs.cgroup]] - paths = [ - "/sys/fs/cgroup/cpu", - "/sys/fs/cgroup/cpu/*", - "/sys/fs/cgroup/cpu/*/*", - ] - files = ["cpuacct.usage", "cpu.cfs_period_us", "cpu.cfs_quota_us", "cpu.shares", "cpu.stat"] - disk: | - [[inputs.disk]] - ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"] - ping: | - [[inputs.ping]] - urls = ["google.com"] - method = "exec" - count = 1 - ping_interval = 1.0 - timeout = 1.0 - deadline = 10 - interface = "" - percentiles = [50, 95, 99] - binary = "ping" - dns_query: | - [[inputs.dns_query]] - servers = ["8.8.8.8"] - network = "tcp" - domains = ["google.com"] - record_type = "A" - port = 53 - timeout = 2 - mem: | - [[inputs.mem]] - temp: | - [[inputs.temp]] - ipmi_sensor: | - [[inputs.ipmi_sensor]] - use_sudo = true - interval = "30s" - timeout = "20s" - metric_version = 2 - -telegraf_profiles: +telegraf_config_profiles: basic: &basic - agent - prometheus_client - - output_to_file +# - output_to_file - intel_rdt + - intel_pmu - intel_powerstat - ras - cpu @@ -156,21 +48,36 @@ telegraf_profiles: - *basic full_nfv: &full_nfv - *basic - storage: &storage - - *basic build_your_own: &build_your_own - *basic + on_prem_vss: + - *on_prem + on_prem_sw_defined_factory: + - *on_prem + telegraf_release_name: telegraf telegraf_namespace: monitoring -# telegraf_dpdk_socket_path: /var/run/dpdk/rte - telegraf_chart_path: "{{ (project_root_dir, 'charts', 'telegraf') | path_join }}" +telegraf_root_path: "{{ (project_root_dir, 'telegraf') | path_join }}" telegraf_helm_values_file: "{{ telegraf_chart_path }}/values.yaml" telegraf_scrap_interval: 30 telegraf_prometheus_metrics_endpoint_port: 9273 telegraf_image_name: "docker.io/intel/observability-telegraf" -telegraf_image_tag: "1.2.0" +telegraf_image_tag: "1.2.0" # IMPORTANT see warning below telegraf_image_pullpolicy: IfNotPresent + +# WARNING: Current download script from PMU tools downloads events definition in old format. +# The old format is used by current telegraf version correctly. In case of update +# of telegraf OR PMU tools, changes may be required to achieve that intel_pmu plugin works. +# Problem is following: format of event definitions json file has changed and now includes +# 'headers' section. Neither pinned versions of PMU tools nor telegraf is prepared to use +# the section so its working correctly now. +# Please read more here: https://github.com/intel/perfmon/issues/22 + +# PMU tools scripts to get performance event definitions of target system +# definitions are used in configuraion of intel_pmu telegraf input plugin +telegraf_pmu_tools_git: "https://github.com/andikleen/pmu-tools.git" +telegraf_pmu_tools_version: r220420 # IMPORTANT see warning at above diff --git a/roles/telegraf_install/tasks/cleanup.yml b/roles/telegraf_install/tasks/cleanup.yml index 45d514de..86bf032f 100644 --- a/roles/telegraf_install/tasks/cleanup.yml +++ b/roles/telegraf_install/tasks/cleanup.yml @@ -14,12 +14,22 @@ ## limitations under the License. ## --- -- name: delete previous telegraf deployment - command: helm delete {{ telegraf_release_name }} --namespace {{ telegraf_namespace }} - failed_when: false - changed_when: true +- name: Cleanup telegraf stuff + block: + - name: Delete telegraf deployment + kubernetes.core.helm: + name: "{{ telegraf_release_name }}" + namespace: "{{ telegraf_namespace }}" + state: absent -- name: remove telegraf charts directory - file: - path: "{{ telegraf_chart_path }}" - state: absent + - name: Remove telegraf charts directory + ansible.builtin.file: + path: "{{ telegraf_chart_path }}" + state: absent + + - name: Remove telegraf root directory + ansible.builtin.file: + path: "{{ telegraf_root_path }}" + state: absent + when: + - inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/telegraf_install/tasks/main.yml b/roles/telegraf_install/tasks/main.yml index d6196752..e2826a3b 100644 --- a/roles/telegraf_install/tasks/main.yml +++ b/roles/telegraf_install/tasks/main.yml @@ -28,8 +28,6 @@ - name: remove existing telegraf deployment include_tasks: cleanup.yml - when: - - inventory_hostname == groups['kube_control_plane'][0] - name: configure msr include_tasks: msr-config.yml diff --git a/roles/telegraf_install/tasks/pmu_events.yml b/roles/telegraf_install/tasks/pmu_events.yml new file mode 100644 index 00000000..0c4281ec --- /dev/null +++ b/roles/telegraf_install/tasks/pmu_events.yml @@ -0,0 +1,55 @@ +## +## Copyright (c) 2020-2023 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +- name: create Telegraf directory if needed + ansible.builtin.file: + path: "{{ telegraf_root_path }}" + state: directory + mode: 0755 + +- name: fetch scripts to load + ansible.builtin.git: + repo: "{{ telegraf_pmu_tools_git }}" + dest: "{{ (telegraf_root_path, 'pmu-tools') | path_join }}" + version: "{{ telegraf_pmu_tools_version }}" + +- name: Fetch PMU events definitions for current CPU + become: true + ansible.builtin.command: + cmd: 'python3 ./event_download.py' + chdir: "{{ (telegraf_root_path, 'pmu-tools') | path_join }}" + changed_when: true # script will redownload files each time + register: pmu_results + environment: + PMU_EVENTS_PATH: "{{ (telegraf_root_path, 'pmu-events') | path_join }}" + +- name: Prepare PMU events for current CPU + vars: + pmu_events_file: '{{ pmu_results.stdout_lines | last | regex_search("\/[^ ]+\.json$") }}' + when: pmu_events_file | length() != 0 + block: + - name: Load encoded PMU events definitions json + ansible.builtin.slurp: + src: "{{ pmu_events_file }}" + become: true + register: pmu_events_json_encoded + + - name: Decode PMU events json + ansible.builtin.set_fact: + pmu_event_definitions: "{{ (pmu_events_json_encoded['content'] | b64decode | from_json) }}" + + - name: Get all PMU C0 events available for the target CPU + ansible.builtin.set_fact: + cstates_events_available: "{{ pmu_event_definitions | map(attribute='EventName') | list | select('contains', 'CPU_CLK_UNHALTED.C0') }}" diff --git a/roles/cndp_dp_install/defaults/main.yml b/roles/telegraf_install/tasks/preflight.yml similarity index 63% rename from roles/cndp_dp_install/defaults/main.yml rename to roles/telegraf_install/tasks/preflight.yml index 4708b707..4db76792 100644 --- a/roles/cndp_dp_install/defaults/main.yml +++ b/roles/telegraf_install/tasks/preflight.yml @@ -13,9 +13,10 @@ ## See the License for the specific language governing permissions and ## limitations under the License. ## ---- -intel_cndp_dp_git_url: "https://github.com/intel/afxdp-plugins-for-kubernetes.git" -intel_cndp_dp_version: "v0.0.2" -intel_cndp_dp_dir: "{{ (project_root_dir, 'intel-afxdp-dp') | path_join }}" -intel_cndp_dp_image: "{{ registry_local_address }}/afxdp-device-plugin" -intel_cndp_dp_image_version: "latest" # TODO update this version when the docker image published. +- name: Check deployment profile exists in telegraf config profiles list + ansible.builtin.assert: + that: + - telegraf_config_profiles[profile_name] is defined + msg: + - Deployment profile '{{ profile_name }}' has no telegraf configuration defined. + - Please define telegraf configuration for the current profile in {{ role_name }} role defaults. diff --git a/roles/telegraf_install/tasks/telegraf.yml b/roles/telegraf_install/tasks/telegraf.yml index 3cd2fb60..69995828 100644 --- a/roles/telegraf_install/tasks/telegraf.yml +++ b/roles/telegraf_install/tasks/telegraf.yml @@ -15,41 +15,50 @@ ## --- - name: create Helm charts directory if needed - file: + ansible.builtin.file: path: "{{ (project_root_dir, 'charts') | path_join }}" state: directory mode: 0755 - name: copy telegraf Helm chart to the controller node - copy: - src: "{{ role_path }}/charts/telegraf" - dest: "{{ (project_root_dir, 'charts') | path_join }}" + ansible.builtin.copy: + src: "{{ role_path }}/charts/telegraf/" # Copy contentx of charts/telegraf + dest: "{{ telegraf_chart_path }}" mode: 0755 -- name: build telegraf configuration - set_fact: - telegraf_config: "{{ telegraf_config | default('') + telegraf_plugins_config[telegraf_plugin] }}" - loop: "{{ telegraf_profiles[telegraf_profile] | flatten(levels=1) }}" +- name: prepare PMU events definitions + ansible.builtin.include_tasks: pmu_events.yml + +- name: template all telegraf configuration options + ansible.builtin.set_fact: + telegraf_plugins_config: "{{ lookup('ansible.builtin.template', 'telegraf_plugins_conf.yml.j2') | from_yaml }}" + +- name: build telegraf configuration based on current RA profile + ansible.builtin.set_fact: + telegraf_config: "{{ telegraf_config | default('') + telegraf_plugins_config[telegraf_plugin] }}" # add enabled plugin section to config + loop: "{{ telegraf_config_profiles[profile_name] | flatten(levels=1) }}" # select plugins enabled for current RA profile loop_control: loop_var: telegraf_plugin - name: print out effective telegraf config - debug: + ansible.builtin.debug: msg: | - "Effective telegraf configuration to use:" - "{{ telegraf_config }}" + Effective telegraf configuration to use: + {{ telegraf_config }} - name: populate values.yaml template with values - template: + ansible.builtin.template: src: "values.yaml.j2" dest: "{{ telegraf_helm_values_file }}" force: yes mode: preserve - name: install telegraf helm chart - command: >- - helm upgrade -i {{ telegraf_release_name }} - --namespace {{ telegraf_namespace }} - -f {{ telegraf_helm_values_file }} - {{ telegraf_chart_path }} - changed_when: true + kubernetes.core.helm: + chart_ref: "{{ telegraf_chart_path }}" + name: "{{ telegraf_release_name }}" + namespace: "{{ telegraf_namespace }}" + create_namespace: true + values_files: + - "{{ telegraf_helm_values_file }}" + wait: true diff --git a/roles/telegraf_install/templates/telegraf_plugins_conf.yml.j2 b/roles/telegraf_install/templates/telegraf_plugins_conf.yml.j2 new file mode 100644 index 00000000..5789c9be --- /dev/null +++ b/roles/telegraf_install/templates/telegraf_plugins_conf.yml.j2 @@ -0,0 +1,145 @@ +agent: | + [agent] + interval = "5s" + round_interval = true + metric_batch_size = 10000 + metric_buffer_limit = 100000 + collection_jitter = "0s" + flush_interval = "5s" + flush_jitter = "0s" + precision = "" + debug = true + quiet = false + logfile = "" + hostname = "$HOSTNAME" + omit_hostname = false + +prometheus_client: | + [[outputs.prometheus_client]] + listen = "127.0.0.1:9272" + metric_version = 2 + export_timestamp = true + +output_to_file: | + [[outputs.file]] + files = ["/tmp/metrics.out"] + data_format = "prometheus" + +intel_rdt: | + [[inputs.intel_rdt]] + pqos_path = "/usr/local/bin/pqos" + cores = ["0-{{ ansible_processor_vcpus - 1 }}"] + +ras: | + [[inputs.ras]] + +intel_pmu: | +{% if not vm_enabled | default(false) and pmu_event_definitions | default(false) %} + [[inputs.intel_pmu]] + event_definitions = ["/etc/telegraf-pmu/event_definitions.json"] + [[inputs.intel_pmu.core_events]] + events = ["INST_RETIRED.ANY"] +{% if (cstates_events_available | default(false)) %} + [[inputs.intel_pmu.core_events]] + events = ["CPU_CLK_UNHALTED.THREAD", "{{ cstates_events_available | join('", "') }}"] + events_tag = "c0wait" + perf_group = true +{%- endif %} +{% endif %} + +intel_powerstat: | + [[inputs.intel_powerstat]] + package_metrics = [ + "current_power_consumption", + "current_dram_power_consumption" + ] + cpu_metrics = [ + "cpu_frequency", + "cpu_busy_frequency", + "cpu_temperature", + "cpu_c1_state_residency", + "cpu_c6_state_residency", + "cpu_busy_cycles" + ] + +smart: | + [[inputs.smart]] + use_sudo = true + attributes = true + +cpu: | + [[inputs.cpu]] + percpu = true + totalcpu = true + collect_cpu_time = true + report_active = false + +diskio: | + [[inputs.diskio]] + device_tags = ["ID_FS_TYPE", "ID_FS_USAGE"] + +ethtool: | + [[inputs.ethtool]] + +net: | + [[inputs.net]] + +iptables: | + [[inputs.iptables]] + use_sudo = true + use_lock = false + table = "filter" + chains = [ "INPUT" ] + +system: | + [[inputs.system]] + +kernel_vmstat: | + [[inputs.kernel_vmstat]] + +cgroups: | + [[inputs.cgroup]] + paths = [ + "/sys/fs/cgroup/cpu", + "/sys/fs/cgroup/cpu/*", + "/sys/fs/cgroup/cpu/*/*", + ] + files = ["cpuacct.usage", "cpu.cfs_period_us", "cpu.cfs_quota_us", "cpu.shares", "cpu.stat"] + +disk: | + [[inputs.disk]] + ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"] + +ping: | + [[inputs.ping]] + urls = ["google.com"] + method = "exec" + count = 1 + ping_interval = 1.0 + timeout = 1.0 + deadline = 10 + interface = "" + percentiles = [50, 95, 99] + binary = "ping" + +dns_query: | + [[inputs.dns_query]] + servers = ["8.8.8.8"] + network = "tcp" + domains = ["google.com"] + record_type = "A" + port = 53 + timeout = 2 + +mem: | + [[inputs.mem]] + +temp: | + [[inputs.temp]] + +ipmi_sensor: | + [[inputs.ipmi_sensor]] + use_sudo = true + interval = "30s" + timeout = "20s" + metric_version = 1 diff --git a/roles/telegraf_install/templates/values.yaml.j2 b/roles/telegraf_install/templates/values.yaml.j2 index b031fd76..c821aaeb 100644 --- a/roles/telegraf_install/templates/values.yaml.j2 +++ b/roles/telegraf_install/templates/values.yaml.j2 @@ -19,6 +19,8 @@ telegraf: securityContext: privileged: true allowPrivilegeEscalation: true + seccompProfile: + type: "RuntimeDefault" resources: limits: cpu: 1000m @@ -27,10 +29,16 @@ telegraf: cpu: 500m memory: 512Mi config: | - {{ telegraf_config | indent(4) -}} - {% if telegraf_dpdk_socket_path is defined and telegraf_dpdk_socket_path | length > 0 %} + {{ telegraf_config | indent(4) }} + +{% if pmu_event_definitions | default(false) %} + pmu_events: | + {{ pmu_event_definitions | to_json(indent=2) | indent(4) }} +{% endif %} + +{% if telegraf_dpdk_socket_path is defined and telegraf_dpdk_socket_path | length > 0 %} dpdk_socket_path: {{ telegraf_dpdk_socket_path }} - {% endif %} +{% endif %} # rbac-proxy container rbacproxy: diff --git a/roles/vm/compile_libvirt/defaults/main.yml b/roles/vm/compile_libvirt/defaults/main.yml index a61bcd27..d2bdc652 100644 --- a/roles/vm/compile_libvirt/defaults/main.yml +++ b/roles/vm/compile_libvirt/defaults/main.yml @@ -17,3 +17,4 @@ libvirt_groups: - libvirt - libvirtd - libvirt-qemu +libvirt_tag: 9.3.0 diff --git a/roles/vm/compile_libvirt/files/preferences b/roles/vm/compile_libvirt/files/preferences index 0b9d758a..d6bdfc7f 100644 --- a/roles/vm/compile_libvirt/files/preferences +++ b/roles/vm/compile_libvirt/files/preferences @@ -10,3 +10,21 @@ Pin-Priority: -1 Package: libvirt-daemon Pin: release * Pin-Priority: -1 +Package: qemu-system-x86 +Pin: release n=jammy +Pin-Priority: -1 +Package: qemu-system-x86 +Pin: release n=kinetic +Pin-Priority: 20 +Package: python3-libvirt +Pin: release n=jammy +Pin-Priority: -1 +Package: python3-libvirt +Pin: release n=kinetic +Pin-Priority: 20 +Package: libvirt-daemon-driver-qemu +Pin: release n=jammy +Pin-Priority: -1 +Package: libvirt-daemon-driver-qemu +Pin: release n=kinetic +Pin-Priority: 20 diff --git a/roles/vm/compile_libvirt/tasks/compile_libvirt.yml b/roles/vm/compile_libvirt/tasks/compile_libvirt.yml index e3d50c29..01455431 100644 --- a/roles/vm/compile_libvirt/tasks/compile_libvirt.yml +++ b/roles/vm/compile_libvirt/tasks/compile_libvirt.yml @@ -20,15 +20,19 @@ dest: /etc/apt/preferences mode: '0644' +- name: add 'kinetic' apt repository for qemu packages + apt_repository: + repo: "deb http://archive.ubuntu.com/ubuntu kinetic main" + - name: Install dependencies include_role: name: install_dependencies - name: Clone libvirt fork with sgx support git: - repo: 'https://github.com/hhb584520/libvirt.git' + repo: 'https://github.com/libvirt/libvirt.git' dest: "{{ (project_root_dir, 'libvirt') | path_join }}" - version: sgx-dev + version: v{{ libvirt_tag }} - name: Disabling apparmor systemd: diff --git a/roles/vm/compile_libvirt/tasks/main.yml b/roles/vm/compile_libvirt/tasks/main.yml index abfd09ce..378afe14 100644 --- a/roles/vm/compile_libvirt/tasks/main.yml +++ b/roles/vm/compile_libvirt/tasks/main.yml @@ -16,12 +16,10 @@ --- - name: check if libvirt is already compiled command: "libvirtd --version" - args: - executable: /bin/bash register: libvirtd_version changed_when: false failed_when: false - name: Compile libvirt include_tasks: compile_libvirt.yml - when: "'8.4.0' not in libvirtd_version.stdout" + when: "libvirt_tag not in libvirtd_version.stdout" diff --git a/roles/vm/compile_libvirt/vars/main.yml b/roles/vm/compile_libvirt/vars/main.yml index 4d5f54cb..f50dabdf 100644 --- a/roles/vm/compile_libvirt/vars/main.yml +++ b/roles/vm/compile_libvirt/vars/main.yml @@ -15,6 +15,7 @@ ## install_dependencies: Debian: + - dconf-service - virt-manager - libxml2-utils - xsltproc @@ -44,5 +45,4 @@ install_dependencies: - librados-dev - librbd-dev - libsasl2-dev - - libsystemd-dev - - dnsmasq + - dnsmasq-base diff --git a/roles/vm/manage_bridges/templates/dhcp-bridge.xml.j2 b/roles/vm/manage_bridges/templates/dhcp-bridge.xml.j2 index 8df103cf..779f5d65 100644 --- a/roles/vm/manage_bridges/templates/dhcp-bridge.xml.j2 +++ b/roles/vm/manage_bridges/templates/dhcp-bridge.xml.j2 @@ -5,8 +5,9 @@ - + + diff --git a/roles/vm/manage_vms/tasks/start_vm.yml b/roles/vm/manage_vms/tasks/start_vm.yml index 4ea41825..7c49be2f 100644 --- a/roles/vm/manage_vms/tasks/start_vm.yml +++ b/roles/vm/manage_vms/tasks/start_vm.yml @@ -22,18 +22,41 @@ --connect qemu:///system --name {{ vm.name }} --cpu host + {%- if vm.type == 'work' and sgx_dp_enabled | default(false) %} + --ram {{ vm.memory - sgx_memory_size }} + {%- else %} --ram {{ vm.memory }} + {%- endif %} --vcpus={{ vm.cpu_total }},sockets=1,cores={{ (vm.cpu_total / 2) | int }},threads=2 --cpuset={{ vm.cpus }} --os-variant {{ vm_os_variant }} --disk path={{ vm_project_root_dir }}/{{ vm.type }}/{{ vm.name }}/cek.qcow2,format=qcow2 --disk {{ vm_project_root_dir }}/{{ vm.type }}/{{ vm.name }}/cek.iso,device=cdrom --network network=vm-default,model=virtio + --iommu model=intel,driver.intremap=on,driver.caching_mode=on + --features apic=on,ioapic.driver=qemu {%- if vm.type == "work" %} {%- for pci in vm.pci %} --hostdev {{ pci }},address.type=pci {%- endfor -%} {%- endif %} + {%- if vm.type == 'work' and sgx_dp_enabled | default(false) %} + --xml xpath.create=./devices/memory + --xml ./devices/memory/@model="sgx-epc" + --xml xpath.create=./devices/memory/target/size + --xml ./devices/memory/target/size/@unit="MiB" + --xml ./devices/memory/target/size={{ sgx_memory_size }} + --xml xpath.create=./maxMemory + --xml ./maxMemory/@slots={%- if (vm.memory) > 2048 %}{{ ((vm.memory) / 1024) | int }}{%- else %}"2"{%- endif %} + --xml ./maxMemory/@unit="MiB" + --xml ./maxMemory={{ vm.memory }} + --xml xpath.create=./cpu/numa + --xml xpath.create=./cpu/numa/cell + --xml ./cpu/numa/cell/@id="0" + --xml ./cpu/numa/cell/@cpus="0 - {{ vm.cpu_total - 1 }}" + --xml ./cpu/numa/cell/@memory="{{ vm.memory - sgx_memory_size }}" + --xml ./cpu/numa/cell/@unit="MiB" + {%- endif %} --network network=vxlanbr{{ vm.vxlan }},model=virtio --import --noautoconsole changed_when: true diff --git a/roles/vm/prepare_bastion_host_config_vxlan/tasks/main.yml b/roles/vm/prepare_bastion_host_config_vxlan/tasks/main.yml index 811e020a..d70ef5cc 100644 --- a/roles/vm/prepare_bastion_host_config_vxlan/tasks/main.yml +++ b/roles/vm/prepare_bastion_host_config_vxlan/tasks/main.yml @@ -14,6 +14,17 @@ ## limitations under the License. ## --- +- name: Remove old records for the same VXLAN IP in ~/.ssh/config + replace: + path: "{{ local_login_user_dir }}/.ssh/config" + regexp: "^Host {{ item.value }}\n ProxyCommand ssh (.*)$\n" + replace: "" + with_items: "{{ vm_vxlan_ips | dict2items }}" + delegate_to: localhost + become: false + when: + - "vm_cluster_name | default('') | length == 0" + - name: Prepare bastion host configuration for VXLAN in ~/.ssh/config blockinfile: path: "{{ local_login_user_dir }}/.ssh/config" diff --git a/roles/vm/prepare_cek_vxlan/tasks/main.yml b/roles/vm/prepare_cek_vxlan/tasks/main.yml index beb03ea5..df4f5ed1 100644 --- a/roles/vm/prepare_cek_vxlan/tasks/main.yml +++ b/roles/vm/prepare_cek_vxlan/tasks/main.yml @@ -117,6 +117,9 @@ delegate_to: "{{ groups['vm_host'][0] }}" become: false register: ssh_result + retries: 5 + delay: 1 + until: ssh_result.stdout | length > 0 changed_when: '"Warning: Permanently added " in ssh_result.stderr' when: - (not item.key in current_vms.stdout) or diff --git a/roles/vm/vm_sgx_enable/tasks/vm-domain-edit.yml b/roles/vm/vm_sgx_enable/tasks/vm-domain-edit.yml deleted file mode 100644 index 1475a039..00000000 --- a/roles/vm/vm_sgx_enable/tasks/vm-domain-edit.yml +++ /dev/null @@ -1,72 +0,0 @@ -## -## Copyright (c) 2020-2023 Intel Corporation. -## -## Licensed under the Apache License, Version 2.0 (the "License"); -## you may not use this file except in compliance with the License. -## You may obtain a copy of the License at -## -## http://www.apache.org/licenses/LICENSE-2.0 -## -## Unless required by applicable law or agreed to in writing, software -## distributed under the License is distributed on an "AS IS" BASIS, -## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -## See the License for the specific language governing permissions and -## limitations under the License. -## ---- -- name: Handle SGX configuration for VM - block: - - name: Dump domain XML - {{ vm.name }} - shell: virsh dumpxml "{{ vm.name }}" > "{{ (vm_project_root_dir, vm.name) | path_join }}.xml" - changed_when: true - - - name: Adding memory node to domain XML - community.general.xml: - path: "{{ (vm_project_root_dir, vm.name) | path_join }}.xml" - xpath: /domain/devices - add_children: - - memory: - model: 'sgx-epc' - - - name: Adding target node to domain XML - community.general.xml: - path: "{{ (vm_project_root_dir, vm.name) | path_join }}.xml" - xpath: /domain/devices/memory - add_children: - - target: - - - name: Adding size node to domain XML - community.general.xml: - path: "{{ (vm_project_root_dir, vm.name) | path_join }}.xml" - xpath: /domain/devices/memory/target - add_children: - - size: - unit: 'MiB' - - - name: Setting memory size to domain XML - community.general.xml: - path: "{{ (vm_project_root_dir, vm.name) | path_join }}.xml" - xpath: /domain/devices/memory/target/size - value: "{{ sgx_memory_size | string }}" - - - name: VM destroy for sgx modifications - command: virsh destroy "{{ vm.name }}" - changed_when: true - - - name: VM undefine for sgx modifications - command: virsh undefine "{{ vm.name }}" - changed_when: true - - - name: VM create with sgx modifications - command: virsh create "{{ (vm_project_root_dir, vm.name) | path_join }}.xml" - changed_when: true - when: - - (not vm.name in current_vms.stdout) or - vm_recreate_existing | default(true) - -- name: Current VM SGX - {{ vm.name }} - debug: - msg: "Current VM - {{ vm.name }} was already running. Nothing was changed" - when: - - (vm.name in current_vms.stdout) - - not vm_recreate_existing | default(true) diff --git a/roles/wait_for_kubernetes_ready/tasks/main.yml b/roles/wait_for_kubernetes_ready/tasks/main.yml index 66f6143a..3cf69732 100644 --- a/roles/wait_for_kubernetes_ready/tasks/main.yml +++ b/roles/wait_for_kubernetes_ready/tasks/main.yml @@ -16,16 +16,20 @@ --- - delegate_to: "{{ groups['kube_control_plane'][0] }}" run_once: true + when: not scale | default(false) or force_check | default(false) block: - name: Wait for kube api to be up ansible.builtin.uri: url: "https://127.0.0.1:6443/healthz" + client_cert: "{{ kube_apiserver_cert }}" + client_key: "{{ kube_apiserver_key }}" validate_certs: no use_proxy: no register: kube_api until: kube_api.status == 200 retries: 10 delay: 2 + when: not (on_cloud | default(false)) - name: show all nodes on kubernetes cluster command: kubectl get nodes diff --git a/validation/sylva-validation/stack-validation/image/robot/tests/clean.sh b/validation/sylva-validation/stack-validation/image/robot/tests/clean.sh new file mode 100755 index 00000000..d3a7b003 --- /dev/null +++ b/validation/sylva-validation/stack-validation/image/robot/tests/clean.sh @@ -0,0 +1,4 @@ +#! /bin/bash -v + +rm -f log.html multus-cni-net.yml multus-cni-test-pod-1.yml\ + multus-cni-test-pod-2.yml output.xml report.html sriov-dpdk-pod.yaml diff --git a/validation/sylva-validation/stack-validation/image/robot/tests/rc2.robot b/validation/sylva-validation/stack-validation/image/robot/tests/rc2.robot new file mode 100644 index 00000000..04eb8863 --- /dev/null +++ b/validation/sylva-validation/stack-validation/image/robot/tests/rc2.robot @@ -0,0 +1,14 @@ +*** Settings *** +Documentation RC2 Robot Framework selection +Suite Setup Get Machine Info 2 +Library Collections +Resource ../keywords/common.robot +Resource ../keywords/bmra_onprem.robot + +*** Test Cases *** + +BMRA NP Device Plugins SRIOV DPDK + [Timeout] 1 minutes + [Tags] SRIOV + Run Keyword Create And Check SRIOV-dpdk-pod + [Teardown] NP SRIOV DPDK Test Teardown diff --git a/validation/verification-manual/cadvisor/README.md b/validation/verification-manual/cadvisor/README.md new file mode 100644 index 00000000..3960a2f9 --- /dev/null +++ b/validation/verification-manual/cadvisor/README.md @@ -0,0 +1,54 @@ +# Check cAdvisor + +cAdvisor () is a running daemon that collects, aggregates, processes, and exports information about running containers. Specifically, for each container it keeps resource isolation parameters, historical resource usage, histograms of complete historical resource usage and network statistics. This data is exported by container and machine-wide. + +cAdvisor is deployed when `cadvisor_enabled: true` in `group_vars/all.yml`. + +Collected data are exposed via cAdvisor exposed REST API (). +In RA Deployment, the cAdvisor's API is not exposed to external net. To access it, one can use port-forward method to expose it: + +```bash +kubectl port-forward -n cadvisor pod/cadvisor-xxxx --address localhost 8080 +``` + +__NOTE__: cAdvisor is deployed as daemonset so check should be done for pod on each node to ensure all pods are configured properly. + +To verify, that cAdvisor collects data of containers. One could check the specific container's stats collected: + +1. Request container stats via cAdvisor REST API: + + ```bash + CONTAINER_NAME="" + curl "http://localhost:8080/api/v1.3/containers/${CONTAINER_NAME}" + ``` + + alternatively (docker only): + + ```bash + CONTAINER_ID="" + curl "http://localhost:8080/api/v2.0/stats/${CONTAINER_ID}?type=docker" + ``` + + alternatively, to check all containers: + + ```bash + curl "http://localhost:8080/api/v2.0/stats/?recursive=true" + ``` + + The response should not be empty. + +## Perf Events configuration + +cAdvisor can be configured to measure perf events. RA deployment supports setting up the sample perf events configuration via parameter `cadvisor_sample_perf_events_enabled` set to `true`. + +Sample perf events configuration is supplied as json file [here](/roles/cadvisor_install/files/sample-perf-event.json) + +To verify perf events configuration is applied correctly: + +1. Request base container stats via cAdvisor REST API and check `LLC-load-misses` is being measured: + + ```bash + curl "http://localhost:8080/api/v1.3/containers/" | grep "LLC-load-misses" + ``` + + Output of grep should not be empty. diff --git a/validation/verification-manual/power_manager/README.md b/validation/verification-manual/power_manager/README.md index e89eb214..50f00338 100644 --- a/validation/verification-manual/power_manager/README.md +++ b/validation/verification-manual/power_manager/README.md @@ -1,143 +1,136 @@ -# Check Intel Power Manager (Balance Performance Power-Profile & Sample Power-Pods) -Sample pods can be deployed by setting `deploy_example_pods: true` in group vars. Following are the results that can be obtained from Power Manager work -``` -# kubectl get pods -n intel-power -NAME READY STATUS RESTARTS AGE -balance-performance-power-pod 1/1 Running 0 33m -balance-power-power-pod 1/1 Running 0 33m -controller-manager-f584c9458-p5llp 1/1 Running 0 34m -performance-power-pod 1/1 Running 0 33m -power-node-agent-9dkch 2/2 Running 0 34m -``` -**Note:** each profile was deployed in a separate pod +# This Power Manager guide should describe what exactly can be configured in group vars and host vars -Check the power profiles: -``` -# kubectl get powerprofiles -n intel-power -NAME AGE -balance-performance 30m -balance-performance-node1 30m -balance-power 30m -balance-power-node1 30m -performance 30m -performance-node1 30m -``` +# group vars +In group vars, you can enable/disable the whole power manager feature. If the feature is enabled, you need to fill power_nodes list. You can also choose to build power manager locally, to deploy sample pods, or to turn on the cluster-wide shared profile. + +# host vars +In host vars, there are more options for power manager. You can configure your desired power profiles here, which are needed for power-config or sample pods. There is also configuration for node-specific shared profile and shared workload uncore frequency and c-states configuration. -You can check the frequencies that will be set by balance-performance Power Profile +# sample pods deployment +Sample pods are requesting cpu cores, which will be part of exclusive pool. To function properly, you will also need to have shared pool configured in your cluster (read more below). For this example, we are using balance-performance profile: ``` -# kubectl get PowerProfiles -n intel-power balance-performance-node1 -o yaml -apiVersion: power.intel.com/v1alpha1 -kind: PowerProfile -metadata: - creationTimestamp: "2022-02-07T20:50:44Z" - generation: 1 - name: balance-performance-node1 - namespace: intel-power - resourceVersion: "4790" - uid: 3bc5d223-f31e-4fdc-8c49-8a87148a014d -spec: - epp: balance_performance - max: 2700 - min: 2500 - name: balance-performance-node1 +# kubectl get pods -n intel-power +NAME READY STATUS RESTARTS AGE +balance-performance-power-pod-node1 1/1 Running 0 71s +balance-performance-power-pod-node2 1/1 Running 0 97s +controller-manager-6f95578567-g74lw 1/1 Running 0 114s +power-node-agent-2xnd5 1/1 Running 0 106s +power-node-agent-cptbd 1/1 Running 0 106s ``` +**Note:** You can also request different profile for each node -To obtain balance-performance cores, apply the Power Profile +You need to enable global/local shared profile and shared workload to enable the Shared Pool, ``` -# kubectl get PowerWorkloads -n intel-power balance-performance-node1-workload -o yaml -apiVersion: power.intel.com/v1alpha1 -kind: PowerWorkload -metadata: - creationTimestamp: "2022-02-07T20:51:43Z" - generation: 1 - name: balance-performance-node1-workload - namespace: intel-power - resourceVersion: "5090" - uid: 19de2932-6ab6-4863-b664-764cc555e23d -spec: - name: balance-performance-node1-workload - nodeInfo: - containers: - - exclusiveCpus: - - 4 - - 68 - id: 870e1d2eb4f971328d5030f97a647b8ee5fb7dae52daebec4714588e9a563667 - name: balance-performance-container - pod: balance-performance-power-pod - powerProfile: balance-performance-node1 - cpuIds: - - 4 - - 68 - name: node1 - powerProfile: balance-performance-node1 +# default in group_vars +global_shared_profile_enabled: true # default in group_vars + +# default in host_vars +local_shared_profile: + enabled: true +shared_workload: + enabled: true + ``` -If you want to check all the cores in your Power Nodes, you can use the following command +If you want to check all the cores in your Power Nodes, or frequencies, which will be set by your desired profile, you can use the following command ``` # kubectl get PowerNodes -A -o yaml apiVersion: v1 items: -- apiVersion: power.intel.com/v1alpha1 +- apiVersion: power.intel.com/v1 kind: PowerNode metadata: - creationTimestamp: "2022-02-07T20:50:40Z" - generation: 1018 + creationTimestamp: "2023-07-12T12:06:46Z" + generation: 4 name: node1 namespace: intel-power - resourceVersion: "44835" - uid: 2aa0f908-2f18-473f-989e-12c46ad2811a + resourceVersion: "3514623" + uid: 6990a520-4ea6-4cde-91e0-eef6ac058fda spec: - activeProfiles: - balance-performance-node1: true - balance-power-node1: true - performance-node1: true - activeWorkloads: - - cores: - - 2 - - 66 - name: performance-node1-workload - - cores: - - 4 - - 68 - name: balance-performance-node1-workload - - cores: - - 3 - - 67 - name: balance-power-node1-workload nodeName: node1 - powerContainers: - - exclusiveCpus: - - 2 - - 66 - id: c152e29f49db457417beca958133e7d8d995ea7302f76073b96c5797fd20d770 - name: performance-container - pod: performance-power-pod - powerProfile: performance-node1 - workload: performance-node1-workload - - exclusiveCpus: - - 4 - - 68 - id: 870e1d2eb4f971328d5030f97a647b8ee5fb7dae52daebec4714588e9a563667 - name: balance-performance-container - pod: balance-performance-power-pod - powerProfile: balance-performance-node1 - workload: balance-performance-node1-workload - - exclusiveCpus: - - 3 - - 67 - id: 3ea83bf1369946fbe625e7fec4355de4760a1b8a1528959cd7eacb87c3e046a9 - name: balance-power-container - pod: balance-power-power-pod - powerProfile: balance-power-node1 - workload: balance-power-node1-workload - sharedPools: - - name: Default - sharedPoolCpuIds: - - 0 - - 1 - - 2 - - 3 - - 4 - - 5 -... + powerProfiles: + - 'balance-performance: 2825000 || 2625000 || ' + sharedPool: shared-global || 1500000 || 1000000 || 0,2-72,74-143 + unaffectedCores: 0-143 +- apiVersion: power.intel.com/v1 + kind: PowerNode + metadata: + creationTimestamp: "2023-07-12T12:06:46Z" + generation: 1 + name: ar09-28-cyp + namespace: intel-power + resourceVersion: "3514271" + uid: e76eb4ae-be6a-4dea-b6d1-5c1ae2a12ad0 + spec: + nodeName: ar09-28-cyp +- apiVersion: power.intel.com/v1 + kind: PowerNode + metadata: + creationTimestamp: "2023-07-12T12:06:46Z" + generation: 4 + name: node2 + namespace: intel-power + resourceVersion: "3514603" + uid: a70c4b99-e27e-4811-b85e-7aad6ad8dab6 + spec: + nodeName: node2 + powerProfiles: + - 'balance-performance: 2825000 || 2625000 || ' + sharedPool: shared-global || 1500000 || 1000000 || 0,2-64,66-127 + unaffectedCores: 0-127 +kind: List +metadata: + resourceVersion: "" +``` +**Note:** As there is only one power config in whole cluster, all PowerProfiles from each node are applied cluster-wide +**Note:** Exclusive pool cores aren't displayed by power manager in time of making this guide, but you can see them absent in shared pool. In this case cores 1,73 for node1 and cores 1,65 for node2. + +To setup Uncore Frequency, you can choose from two options: +``` +# You can set up system-wide uncore frequency with: + system_max_frequency: 2300000 + system_min_frequency: 1300000 + +# Or you can use die/package specific settings: + die_selector: + - package: 0 + die: 0 + min: 1500000 + max: 2400000 +``` +To check Uncore Frequency, you can use following path: +``` +/sys/devices/system/cpu/intel_uncore_frequency/package_XY_die_XY +``` +You can then check files 'max_freq_khz' or 'min_freq_khz' which should store your desired Uncore Frequency values. +**Note:** Valid min and max values are determined by hardware. Die config will precede Package config, which will precede system-wide config. + +To set up C-States, you can choose from three different options: +``` +# First option will enable/disable desired_state for all cores in shared pool + shared: + desired_state: true + +# Second option will enable/disable desired_state for balance-performance exclusive pool + profile_exclusive: + balance-performance: + desired_state: false + +# Third option will enable/disable desired_state for specific_core + core: + "specific_core": + desired_state: true +``` + +To check C-States, you can use following path: +``` +# /sys/devices/system/cpu/cpuX/cpuidle/stateY +``` +You can check all C-State information there. For example: +``` +# cat /sys/devices/system/cpu/cpu3/cpuidle/state3/name +C6 + +# cat /sys/devices/system/cpu/cpu3/cpuidle/state3/disable +1 ``` diff --git a/validation/verification-manual/tas/README.md b/validation/verification-manual/tas/README.md index 89bcdbd0..1c5b5b39 100644 --- a/validation/verification-manual/tas/README.md +++ b/validation/verification-manual/tas/README.md @@ -7,7 +7,7 @@ tas_namespace: monitoring # create and enable TAS demonstration policy: [true, false] tas_enable_demo_policy: true ``` -The Health Metric Demo Policy requires a Prometheus metric file to exist on the node and be read by Prometheus. For security reasons, BMRA does not deploy it in the /tmp directory, where every user has access. Instead, it is deployed in the `/opt/intel/tas-demo-policy/` directory with root-only access. +The Health Metric Demo Policy requires a Prometheus metric file to exist on the node and be read by Prometheus. For security reasons, BMRA does not deploy it in the /tmp directory, where every user has access. Instead, it is deployed in the `/opt/cek/tas-demo-policy/` directory. To verify that the policy has been deployed, use the command: ``` @@ -21,7 +21,7 @@ Details of this policy, including the rules and associated metrics, can be descr ``` To verify that the proper files exist on the worker node, use the following command: ``` -# cat /opt/intel/tas-demo-policy/test.prom +# cat /opt/cek/tas-demo-policy/test.prom node_health_metric 0 ``` The node health metric value indicates the following: @@ -39,7 +39,7 @@ Start by checking the logs to verify that `node_health_metric = 0` on worker nod ``` If it is not 0, set the `node_health_metric` to 0 (scheduleonmetric) on all worker nodes as follows: ``` -# echo 'node_health_metric 0' > /opt/intel/tas-demo-policy/test.prom +# echo 'node_health_metric 0' > /opt/cek/tas-demo-policy/test.prom ``` The provided deployment manifest [tas-test.yml](tas-test.yml) can be used to deploy a pod that is susceptible to the demo scheduling policy. The content of the file is: ``` @@ -101,7 +101,7 @@ Delete the pod before continuing with the next test: ## Check Dontschedule Policy Set the `node_health_metric` to 1 (dontschedule) on all worker nodes as follows: ``` -# echo 'node_health_metric 1' > /opt/intel/tas-demo-policy/test.prom +# echo 'node_health_metric 1' > /opt/cek/tas-demo-policy/test.prom ``` After a few seconds, check the logs to verify that the status has changed: @@ -129,10 +129,11 @@ Delete the pod before continuing with the next test: ## Check Deschedule Policy To see the impact of the descheduling policy, use a component called descheduler. For more details, visit (https://github.com/intel/platform-aware-scheduling/blob/master/telemetry-aware-scheduling/docs/health-metric-example.md#seeing-the-impact). +**Descheduler require minimum 2 worker nodes.** Start by setting `node_health_metric` to 0 (scheduleonmetric) on all worker nodes as follows: ``` -# echo 'node_health_metric 0' > /opt/intel/tas-demo-policy/test.prom +# echo 'node_health_metric 0' > /opt/cek/tas-demo-policy/test.prom ``` Check the logs to verify that `node_health_metric = 0` on worker nodes: ``` @@ -140,6 +141,7 @@ Check the logs to verify that `node_health_metric = 0` on worker nodes: "Evaluating demo-policy" component="controller" "controller1 health_metric = 0" component="controller" "worker1 health_metric = 0" component="controller" + "worker2 health_metric = 0" component="controller" ``` Deploy the [tas-test.yml](tas-test.yml) pod again: ``` @@ -147,15 +149,15 @@ Deploy the [tas-test.yml](tas-test.yml) pod again: ``` The pod should deploy successfully and end up in state “running” as shown below: ``` -# kubectl get pods -n kube-system | grep tas-test - NAME READY STATUS RESTARTS AGE - tas-test-xxxx-yyyy 1/1 Running 0 3m +# kubectl get pods -n kube-system -o wide | grep tas-test + NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES + tas-test 1/1 Running 0 44s 10.244.194.101 worker1 ``` -Set the `node_health_metric` to 2 (deschedule) on all worker nodes as follows: +Set the `node_health_metric` to 2 (deschedule) on worker node that is currently running tas-test as follows: ``` -# echo 'node_health_metric 2' > /opt/intel/tas-demo-policy/test.prom +# echo 'node_health_metric 2' > /opt/cek/tas-demo-policy/test.prom ``` -After a few seconds, check the logs to verify that the status has changed: +After a few seconds, check the logs to verify that the status has changed for worker1, worker2 status should remained the same: ``` # kubectl logs pod/tas-telemetry-aware-scheduling-xxxx-yyyy -n kube-system --tail=20 "Evaluating demo-policy" component="controller" @@ -164,6 +166,7 @@ After a few seconds, check the logs to verify that the status has changed: health_metric violated in node worker1 "worker1 violating demo-policy: health_metric Equals 2" component="controller" "Node worker1 violating demo-policy, " component="controller" + "worker2 health_metric = 0" component="controller" ``` Use the provided desceduler policy [descheduler-policy.yml](descheduler-policy.yml) as a configuration file for the descheduler. The content of the file is: @@ -186,9 +189,9 @@ Then run the descheduler from the same controller node with following command: Now check the status of the pod deployed previously: ``` -# kubectl get pods -n kube-system | grep tas-test - NAME READY STATUS RESTARTS AGE - tas-test-xxxx-yyyy 1/1 Terminating 0 4m - tas-test-xxxx-zzzz 0/1 Pending 0 10s +# kubectl get pods -n kube-system -o wide | grep tas-test + NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES + tas-test 1/1 Terminating 0 44s 10.244.194.101 worker1 + tas-test 1/1 Running 0 44s 10.244.194.101 worker2 ``` -The pod will be rescheduled onto a healthier node based on its TAS policy. If no other suitable nodes are available, the new pod fails to schedule as shown above. Depending on how fast you check you might see the previous pod in state "Terminating" +The pod will be rescheduled onto a healthier node based on its TAS policy. Depending on how fast you check you might see the previous pod in state "Terminating"