diff --git a/README.md b/README.md index 6e190c78..3e5be1d8 100644 --- a/README.md +++ b/README.md @@ -13,8 +13,7 @@ The software provided here is for reference only and not intended for production ``` 2. Decide which configuration profile you want to use and export environmental variable. - For VM case is PROFILE environment variable mandatory. - > **_NOTE:_** For non-VM case it will be used only to ease execution of the steps listed below. + > **_NOTE:_** It will be used only to ease execution of the steps listed below. - For **Kubernetes Basic Infrastructure** deployment: ```bash @@ -118,12 +117,13 @@ The software provided here is for reference only and not intended for production For VM case: - update details relevant for vm_host (e.g.: datalane_interfaces, ...) - - update VMs definition in host_vars/host-for-vms-1.yml + - update VMs definition in host_vars/host-for-vms-1.yml - use that template for the first vm_host + - update VMs definition in host_vars/host-for-vms-2.yml - use that template for the second and all other vm_hosts - update/create host_vars for all defined VMs (e.g.: host_vars/vm-ctrl-1.yml and host_vars/vm-work-1.yml) Needed details are at least dataplane_interfaces For more details see [VM case configuration guide](docs/vm_config_guide.md) -9. **Recommended:** Apply bug fix patch for Kubespray submodule (Required for RHEL 8+ and Ubuntu 22.04). +9. **Required:** Apply bug fix patch for Kubespray submodule (for RHEL 8+ or Rocky 9(if wireguard is enabled)). ```bash ansible-playbook -i inventory.ini playbooks/k8s/patch_kubespray.yml @@ -149,11 +149,14 @@ Refer to the documentation linked below to see configuration details for selecte - [SRIOV Network Device Plugin and SRIOV CNI plugin](docs/sriov.md) - [MinIO Operator](docs/storage.md) +- [Adding and removing worker node(s)](docs/add_remove_nodes.md) - [VM case configuration guide](docs/vm_config_guide.md) +- [VM multinode setup guide](docs/vm_multinode_setup_guide.md) +- [VM cluster expansion guide](docs/vm_cluster_expansion_guide.md) ## Prerequisites and Requirements - Required packages on the target servers: **Python3**. -- Required packages on the ansible host (where ansible playbooks are run): **Python3 and Pip3**. +- Required packages on the ansible host (where ansible playbooks are run): **Python3.8-3.10 and Pip3**. - Required python packages on the ansible host. **See requirements.txt**. - SSH keys copied to all Kubernetes cluster nodes (`ssh-copy-id @` command can be used for that). diff --git a/ansible.cfg b/ansible.cfg index cf72535c..3773be12 100644 --- a/ansible.cfg +++ b/ansible.cfg @@ -1,5 +1,5 @@ [ssh_connection] -pipelining=True +pipelining = True ssh_args = -o ServerAliveInterval=60 -o ControlMaster=auto -o ControlPersist=30m -o ConnectionAttempts=100 -o UserKnownHostsFile=/dev/null [defaults] @@ -8,10 +8,11 @@ display_skipped_hosts = no host_key_checking = False gathering = smart stdout_callback = debug +callbacks_enabled = timer, profile_tasks, profile_roles fact_caching = jsonfile fact_caching_connection = /tmp fact_caching_timeout = 7200 -action_plugins=./action_plugins:~/.ansible/plugins/action:/usr/share/ansible/plugins/action -library=./library +action_plugins = ./action_plugins:~/.ansible/plugins/action:/usr/share/ansible/plugins/action +library = ./library diff --git a/cloud/README.md b/cloud/README.md new file mode 100644 index 00000000..d143ebbe --- /dev/null +++ b/cloud/README.md @@ -0,0 +1,126 @@ +# Cloud RA + +## Prerequisites + +- Python 3.8+ +- AWS CLI 2+ +- Terraform 1.2+ +- Docker 20.10.17+ +- `pip install -r requirements.txt` +- `aws configure` + +## Managed Kubernetes deployment + +### Automatic + +Create deployment directory with `cwdf.yaml`, `sw.yaml` hardware and software configuration files: + +```commandline +mkdir deployment +vim cwdf.yaml +vim sw.yaml +``` + +Example `cwdf.yaml` file: +```yaml +cloudProvider: aws +awsConfig: + profile: default + region: eu-central-1 + vpc_cidr_block: "10.21.0.0/16" + # These tags will be applied to all created resources + extra_tags: + Owner: "some_user" + Project: "CWDF" + subnets: + - name: "subnet_a" + az: eu-central-1a + cidr_block: "10.21.1.0/24" + - name: "subnet_b" + az: eu-central-1b + cidr_block: "10.21.2.0/24" + sg_whitelist_cidr_blocks: + - "0.0.0.0/0" + eks: + kubernetes_version: "1.22" + # AWS EKS requires at least 2 subnets + subnets: ["subnet_a", "subnet_b"] + node_groups: + - name: "default" + instance_type: "t3.medium" + vm_count: 3 +``` + +Then `sw.yaml` for the software configuration. +[Link to sw_deployment tool README file.](sw_deployment/README.md) + +Example `sw.yaml` file: +```yaml +cloud_settings: + provider: aws + region: eu-central-1 +controller_ips: +- 127.0.0.1 +# exec_containers can be used to deploy additional containers or workloads. +# It defaults to an empty list, but can be changed as shown in the commented lines +exec_containers: [] +#exec_containers: +#- ubuntu/kafka +git_tag: None +git_url: https://github.com/intel/container-experience-kits +github_personal_token: None +ra_config_file: data/node1.yaml +ra_ignore_assert_errors: true +ra_machine_architecture: skl +ra_profile: build_your_own +replicate_from_container_registry: https://registry.hub.docker.com +``` + +Then run `deployer.py deploy` and pass the deployment directory as an argument: +```commandline +python deployer.py deploy --deployment_dir=deployment +``` + +Along with the EKS cluster additional Ansible instance and ECR container registry will be created. + +Ansible instance will be available with AWS CLI, Ansible and kubectl pre-installed. +Kubectl will be also pre-configured and authorized against created EKS cluster. +On the Ansible instance default user is `ubuntu`. Folder `cwdf_deployment` in home directory contains ssh keys for EKS worker nodes and connection info for worker nodes and ECR registry. + +After the deployment discovery will run on each EKS worker node. Output will be written to `discovery_results` directory on local machine where `deployer.py` is running and then copied to Ansible host's `cwdf_deployment` directory. + +Cleanup created resources: +```commandline +python deployer.py cleanup --deployment_dir=deployment +``` + +### Manual + +Start by creating a directory for the deployment and generate SSH key for instances: +```commandline +mkdir deployment +mkdir deployment/ssh +ssh-keygen -f deployment/ssh +``` + +1. Create a `cwdf` hardware definition yaml file e.g. `cwdf.yaml`: +```commandline +cp cwdf_example.yaml deployment/cwdf.yaml +``` + +2. Then generate Terraform manifest using `cwdf.py`: +```commandline +python cwdf.py generate-terraform \ + --cwdf_config=deployment/cwdf.yaml \ + --ssh_public_key=deployment/ssh/id_rsa.pub \ + --job_id=manual \ + --create_ansible_host=True \ + --create_container_registry=True \ + > deployment/main.tf +``` + +3. Initialize Terraform and deploy resources in the deployment directory: +```commandline +terraform init +terraform apply +``` diff --git a/cloud/cwdf.py b/cloud/cwdf.py new file mode 100644 index 00000000..5793e18d --- /dev/null +++ b/cloud/cwdf.py @@ -0,0 +1,31 @@ +import click +from cwdf import compose_terraform + + +@click.group() +def cli(): + pass + + +@click.command() +@click.option('--cwdf_config', help='Path to CWDF yaml config file', required=True) +@click.option('--ssh_public_key', help='Path to SSH public key', required=True) +@click.option('--job_id', help='Unique identifier that will be included in resource tags and names', default="manual") +@click.option('--create_ansible_host', help='Will include ansible host in the Terraform manifest', default=True) +@click.option('--create_container_registry', help='Will include managed container registry in the Terraform manifest', default=True) +def generate_terraform(cwdf_config, ssh_public_key, job_id, create_ansible_host, create_container_registry): + with open(cwdf_config, 'r') as f: + cwdf_config = f.read() + + with open(ssh_public_key, 'r') as f: + ssh_public_key = f.read().strip() + + tf_manifest = compose_terraform(cwdf_config, job_id, ssh_public_key, create_ansible_host, create_container_registry) + click.echo(tf_manifest) + + +cli.add_command(generate_terraform) + + +if __name__ == "__main__": + cli() diff --git a/cloud/cwdf/__init__.py b/cloud/cwdf/__init__.py new file mode 100644 index 00000000..114aaa1d --- /dev/null +++ b/cloud/cwdf/__init__.py @@ -0,0 +1 @@ +from .main import compose_terraform diff --git a/cloud/cwdf/config.py b/cloud/cwdf/config.py new file mode 100644 index 00000000..3126d6e8 --- /dev/null +++ b/cloud/cwdf/config.py @@ -0,0 +1,36 @@ +from schema import Schema, Or, Optional + + +config_schema = Schema({ + "cloudProvider": Or("aws"), + Optional("awsConfig"): { + Optional("region", default='eu-central-1'): str, + Optional("profile", default='default'): str, + Optional("vpc_cidr_block", default='10.0.0.0/16'): str, + Optional("sg_whitelist_cidr_blocks", default=['0.0.0.0/0']): [str], + Optional("extra_tags", default={}): {str: str}, + "subnets": [{ + "name": str, + "cidr_block": str, + "az": str + }], + Optional("instance_profiles"): [{ + "name": str, + Optional("instance_type", default='t3.medium'): str, + "ami_id": str, + "subnet": str, + Optional("vm_count", default=1): int, + Optional("root_volume_size", default=16): int, + Optional("root_volume_type", default='gp2'): str + }], + Optional("eks"): { + Optional("kubernetes_version", default='1.22'): str, + "subnets": [str], + "node_groups": [{ + "name": str, + Optional("instance_type", default='t3.medium'): str, + Optional("vm_count", default=1): int + }] + } + }, +}) diff --git a/cloud/cwdf/main.py b/cloud/cwdf/main.py new file mode 100644 index 00000000..020d74f4 --- /dev/null +++ b/cloud/cwdf/main.py @@ -0,0 +1,95 @@ +from .config import config_schema +from schema import SchemaError +import yaml +from jinja2 import Template +from os import path +import json + + +def verify_cwdf_config(config): + # Verify config file has correct schema + configuration = yaml.safe_load(config) + try: + pop = config_schema.validate(configuration) + return pop + except SchemaError as se: + raise se + + +def compose_terraform( + config, job_id, ssh_public_key, + create_ansible_instance=True, + create_container_registry=True): + cwdf_configuration = verify_cwdf_config(config) + aws_config = cwdf_configuration['awsConfig'] + + extra_tags_json = json.dumps(aws_config["extra_tags"]) + aws_config["extra_tags_json"] = extra_tags_json.replace('"', '\\"') + + aws_config['job_id'] = job_id + aws_config['ssh_pub_key'] = ssh_public_key + + aws_config["will_create_ansible_instance"] = create_ansible_instance + aws_config["will_create_container_registry"] = create_container_registry + + tf_manifest = "" + + provider_template_path = path.join( + path.dirname(__file__), + 'templates/terraform/aws/provider.tf.jinja') + with open(provider_template_path, 'r') as f: + provider_template = Template(f.read()) + tf_manifest += "### Provider ###\n" + tf_manifest += provider_template.render(aws_config) + tf_manifest += "### End of Provider ###\n\n" + + common_template_path = path.join( + path.dirname(__file__), + 'templates/terraform/aws/common.tf.jinja') + with open(common_template_path, 'r') as f: + common_template = Template(f.read()) + tf_manifest += "### Common ###\n" + tf_manifest += common_template.render(aws_config) + tf_manifest += "### End of Common ###\n\n" + + if "instance_profiles" in aws_config: + compute_template_path = path.join( + path.dirname(__file__), + 'templates/terraform/aws/compute.tf.jinja') + with open(compute_template_path, 'r') as f: + compute_template = Template(f.read()) + tf_manifest += "### Bare Metal Compute ###\n" + tf_manifest += compute_template.render(aws_config) + tf_manifest += "### End of Bare Metal Compute ###\n\n" + + if "eks" in aws_config: + eks_template_path = path.join( + path.dirname(__file__), + 'templates/terraform/aws/eks.tf.jinja') + with open(eks_template_path, 'r') as f: + eks_template = Template(f.read()) + tf_manifest += "### Elastic Kubernetes Service ###\n" + tf_manifest += eks_template.render(aws_config) + tf_manifest += "### End of Elastic Kubernetes Service ###\n\n" + + if create_ansible_instance: + ansible_host_template_path = path.join( + path.dirname(__file__), + 'templates/terraform/aws/ansible_host.tf.jinja') + with open(ansible_host_template_path, 'r') as f: + ansible_host_template = Template(f.read()) + tf_manifest += "### Ansible Host ###\n" + tf_manifest += ansible_host_template.render(aws_config) + tf_manifest += "### End of Ansible Host ###\n\n" + + if create_container_registry: + ecr_template_path = path.join( + path.dirname(__file__), + 'templates/terraform/aws/ecr.tf.jinja') + with open(ecr_template_path, 'r') as f: + ecr_template = Template(f.read()) + tf_manifest += "### Elastic Container Registry ###\n" + tf_manifest += ecr_template.render(aws_config) + tf_manifest += "### End of Elastic Container Registry ###\n\n" + + return tf_manifest diff --git a/cloud/cwdf/templates/terraform/aws/ansible_host.tf.jinja b/cloud/cwdf/templates/terraform/aws/ansible_host.tf.jinja new file mode 100644 index 00000000..c323222a --- /dev/null +++ b/cloud/cwdf/templates/terraform/aws/ansible_host.tf.jinja @@ -0,0 +1,170 @@ +resource "aws_iam_role" "ansible-instance-role" { + name = "cwdf-infra-{{ job_id }}-ansible-instance-role" + + assume_role_policy = jsonencode({ + Statement = [{ + Action = "sts:AssumeRole" + Effect = "Allow" + Principal = { + Service = "ec2.amazonaws.com" + } + }] + Version = "2012-10-17" + }) + + tags = merge( + jsondecode("{{ extra_tags_json }}"), + { + Name = "cwdf-infra-{{ job_id }}-ansible-instance-role" + JobId = "{{ job_id }}" + } + ) +} + +resource "aws_iam_policy" "eks-cluster-access-policy" { + policy = jsonencode({ + Statement = [{ + Action = [ + "eks:*" + ] + Effect = "Allow" + Resource = aws_eks_cluster.default.arn + }] + Version = "2012-10-17" + }) + + tags = merge( + jsondecode("{{ extra_tags_json }}"), + { + Name = "cwdf-infra-{{ job_id }}-eks-cluster-access-policy" + JobId = "{{ job_id }}" + } + ) +} + +resource "aws_iam_role_policy_attachment" "ansible-instance-role-eks-cluster-access-policy" { + policy_arn = aws_iam_policy.eks-cluster-access-policy.arn + role = aws_iam_role.ansible-instance-role.name +} + +resource "aws_iam_instance_profile" "ansible-instance-profile" { + name = "cwdf-infra-{{ job_id }}-ansible-instance-iam-profile" + role = aws_iam_role.ansible-instance-role.name + + tags = merge( + jsondecode("{{ extra_tags_json }}"), + { + Name = "cwdf-infra-{{ job_id }}-ansible-instance-iam-profile" + JobId = "{{ job_id }}" + } + ) +} + +data "aws_eks_cluster_auth" "default" { + name = aws_eks_cluster.default.name +} + +provider "kubernetes" { + host = aws_eks_cluster.default.endpoint + cluster_ca_certificate = base64decode(aws_eks_cluster.default.certificate_authority[0].data) + token = data.aws_eks_cluster_auth.default.token + + exec { + api_version = "client.authentication.k8s.io/v1beta1" + args = ["eks", "get-token", "--cluster-name", aws_eks_cluster.default.name] + command = "aws" + } +} + +resource "kubernetes_config_map" "aws-auth" { + data = { + "mapRoles" = yamlencode([ + { + rolearn = aws_iam_role.eks-cluster-nodegroup-role.arn + username = "system:node:{% raw %}{{EC2PrivateDNSName}}{% endraw %}" + groups = [ + "system:bootstrappers", + "system:nodes" + ] + }, + { + rolearn = aws_iam_role.ansible-instance-role.arn + username = "ansible" + groups = [ + "system:masters" + ] + } + ]) + } + + metadata { + name = "aws-auth" + namespace = "kube-system" + } +} + +data "aws_ami" "ubuntu2004" { + most_recent = true + + filter { + name = "name" + values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"] + } + + filter { + name = "virtualization-type" + values = ["hvm"] + } + + owners = ["099720109477"] # Canonical +} + +resource "aws_instance" "ansible" { + ami = data.aws_ami.ubuntu2004.id + instance_type = "t3.medium" + + vpc_security_group_ids = [aws_security_group.default.id] + subnet_id = aws_subnet.{{ subnets[0].name }}.id + key_name = aws_key_pair.default.key_name + iam_instance_profile = aws_iam_instance_profile.ansible-instance-profile.name + + root_block_device { + volume_size = 64 + volume_type = "gp3" + } + + user_data = <> /home/ubuntu/cwdf_deployment/ssh/id_rsa.pub +chown ubuntu /home/ubuntu/cwdf_deployment -R +sudo -H -u ubuntu bash -c 'pip install --user paramiko' +EOF + + tags = merge( + jsondecode("{{ extra_tags_json }}"), + { + Name = "cwdf-infra-{{ job_id }}-ansible-instance" + JobId = "{{ job_id }}" + } + ) + + depends_on = [aws_eks_cluster.default] +} + +output "ansible_host_public_ip" { + value = aws_instance.ansible.public_ip +} diff --git a/cloud/cwdf/templates/terraform/aws/common.tf.jinja b/cloud/cwdf/templates/terraform/aws/common.tf.jinja new file mode 100644 index 00000000..247c2617 --- /dev/null +++ b/cloud/cwdf/templates/terraform/aws/common.tf.jinja @@ -0,0 +1,116 @@ +resource "aws_vpc" "default" { + cidr_block = "{{ vpc_cidr_block }}" + enable_dns_hostnames = true + enable_dns_support = true + + tags = merge( + jsondecode("{{ extra_tags_json }}"), + { + Name = "cwdf-infra-{{ job_id }}-default-vpc" + JobId = "{{ job_id }}" + } + ) +} + +resource "aws_internet_gateway" "default" { + vpc_id = aws_vpc.default.id + + tags = merge( + jsondecode("{{ extra_tags_json }}"), + { + Name = "cwdf-infra-{{ job_id }}-default-igw" + JobId = "{{ job_id }}" + } + ) +} + +resource "aws_route_table" "default" { + vpc_id = aws_vpc.default.id + + route { + cidr_block = "0.0.0.0/0" + gateway_id = aws_internet_gateway.default.id + } + + tags = merge( + jsondecode("{{ extra_tags_json }}"), + { + Name = "cwdf-infra-{{ job_id }}-default-rt" + JobId = "{{ job_id }}" + } + ) +} + +{% for subnet in subnets %} +resource "aws_subnet" "{{ subnet.name }}" { + vpc_id = aws_vpc.default.id + map_public_ip_on_launch = true + + cidr_block = "{{ subnet.cidr_block }}" + availability_zone = "{{ subnet.az }}" + + tags = merge( + jsondecode("{{ extra_tags_json }}"), + { + Name = "cwdf-infra-{{ job_id }}-subnet-{{ subnet.name }}" + JobId = "{{ job_id }}" + } + ) +} + +resource "aws_route_table_association" "{{ subnet.name }}" { + subnet_id = aws_subnet.{{ subnet.name }}.id + route_table_id = aws_route_table.default.id +} + +{% endfor %} + +resource "aws_security_group" "default" { + name = "cwdf-infra-{{ job_id }}-default-sg" + vpc_id = aws_vpc.default.id + + ingress { + description = "SSH" + from_port = 22 + to_port = 22 + protocol = "tcp" + cidr_blocks = [{% for cidr_block in sg_whitelist_cidr_blocks %}"{{cidr_block}}",{% endfor %}] + } + + ingress { + description = "PING" + from_port = 8 + to_port = 0 + protocol = "icmp" + cidr_blocks = [{% for cidr_block in sg_whitelist_cidr_blocks %}"{{cidr_block}}",{% endfor %}] + } + + egress { + from_port = 0 + to_port = 0 + protocol = "-1" + cidr_blocks = ["0.0.0.0/0"] + ipv6_cidr_blocks = ["::/0"] + } + + tags = merge( + jsondecode("{{ extra_tags_json }}"), + { + Name = "cwdf-infra-{{ job_id }}-default-sg" + JobId = "{{ job_id }}" + } + ) +} + +resource "aws_key_pair" "default" { + key_name = "cwdf-infra-{{ job_id }}-default-keypair" + public_key = "{{ ssh_pub_key }}" + + tags = merge( + jsondecode("{{ extra_tags_json }}"), + { + Name = "cwdf-infra-{{ job_id }}-default-keypair" + JobId = "{{ job_id }}" + } + ) +} diff --git a/cloud/cwdf/templates/terraform/aws/compute.tf.jinja b/cloud/cwdf/templates/terraform/aws/compute.tf.jinja new file mode 100644 index 00000000..7eb860b7 --- /dev/null +++ b/cloud/cwdf/templates/terraform/aws/compute.tf.jinja @@ -0,0 +1,25 @@ +{% for profile in instance_profiles %} +{% for i in range(profile.vm_count) %} +resource "aws_instance" "{{ profile.name }}_{{ i }}" { + ami = "{{ profile.ami_id }}" + instance_type = "{{ profile.instance_type }}" + + vpc_security_group_ids = [aws_security_group.default.id] + subnet_id = aws_subnet.{{ profile.subnet }}.id + key_name = aws_key_pair.default.key_name + + root_block_device { + volume_size = {{ profile.root_volume_size }} + volume_type = "{{ profile.root_volume_type }}" + } + + tags = merge( + jsondecode("{{ extra_tags_json }}"), + { + Name = "cwdf-infra-{{ job_id }}-instance-{{ profile.name }}-{{ i }}" + JobId = "{{ job_id }}" + } + ) +} +{% endfor %} +{% endfor %} diff --git a/cloud/cwdf/templates/terraform/aws/ecr.tf.jinja b/cloud/cwdf/templates/terraform/aws/ecr.tf.jinja new file mode 100644 index 00000000..1959a3b9 --- /dev/null +++ b/cloud/cwdf/templates/terraform/aws/ecr.tf.jinja @@ -0,0 +1,20 @@ +resource "aws_ecr_repository" "default" { + name = "cwdf-infra-{{ job_id }}-ecr-repository" + image_tag_mutability = "MUTABLE" + force_delete = true + image_scanning_configuration { + scan_on_push = false + } + + tags = merge( + jsondecode("{{ extra_tags_json }}"), + { + Name = "cwdf-infra-{{ job_id }}-ecr-repository" + JobId = "{{ job_id }}" + } + ) +} + +output "ecr_url" { + value = aws_ecr_repository.default.repository_url +} diff --git a/cloud/cwdf/templates/terraform/aws/eks.tf.jinja b/cloud/cwdf/templates/terraform/aws/eks.tf.jinja new file mode 100644 index 00000000..b5ec09aa --- /dev/null +++ b/cloud/cwdf/templates/terraform/aws/eks.tf.jinja @@ -0,0 +1,153 @@ +resource "aws_iam_role" "eks-cluster-role" { + name = "cwdf-infra-{{ job_id }}-eks-cluster-role" + assume_role_policy = jsonencode({ + Statement = [{ + Action = "sts:AssumeRole" + Effect = "Allow" + Principal = { + Service = "eks.amazonaws.com" + } + }] + Version = "2012-10-17" + }) + + tags = merge( + jsondecode("{{ extra_tags_json }}"), + { + Name = "cwdf-infra-{{ job_id }}-eks-cluster-role" + JobId = "{{ job_id }}" + } + ) +} + +resource "aws_iam_role_policy_attachment" "eks-cluster-role-AmazonEKSClusterPolicy" { + policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy" + role = aws_iam_role.eks-cluster-role.name +} + +resource "aws_iam_role" "eks-cluster-nodegroup-role" { + name = "cwdf-infra-{{ job_id }}-eks-cluster-nodegroup-role" + + assume_role_policy = jsonencode({ + Statement = [{ + Action = "sts:AssumeRole" + Effect = "Allow" + Principal = { + Service = "ec2.amazonaws.com" + } + }] + Version = "2012-10-17" + }) + + tags = merge( + jsondecode("{{ extra_tags_json }}"), + { + Name = "cwdf-infra-{{ job_id }}-eks-cluster-nodegroup-role" + JobId = "{{ job_id }}" + } + ) +} + +resource "aws_iam_role_policy_attachment" "eks-cluster-nodegroup-role-AmazonEKSWorkerNodePolicy" { + policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy" + role = aws_iam_role.eks-cluster-nodegroup-role.name +} + +resource "aws_iam_role_policy_attachment" "eks-cluster-nodegroup-role-AmazonEKS_CNI_Policy" { + policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy" + role = aws_iam_role.eks-cluster-nodegroup-role.name +} + +resource "aws_iam_role_policy_attachment" "eks-cluster-nodegroup-role-AmazonEC2ContainerRegistryReadOnly" { + policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly" + role = aws_iam_role.eks-cluster-nodegroup-role.name +} + +resource "aws_eks_cluster" "default" { + role_arn = aws_iam_role.eks-cluster-role.arn + + name = "cwdf-infra-{{ job_id }}-eks-cluster" + version = "{{ eks.kubernetes_version }}" + + vpc_config { + subnet_ids = [{% for subnet in eks.subnets %}aws_subnet.{{ subnet }}.id,{% endfor %}] + } + + # Ensure that IAM Role permissions are created before and deleted after EKS Cluster handling. + # Otherwise, EKS will not be able to properly delete EKS managed EC2 infrastructure such as Security Groups. + depends_on = [ + aws_iam_role_policy_attachment.eks-cluster-role-AmazonEKSClusterPolicy + ] + + tags = merge( + jsondecode("{{ extra_tags_json }}"), + { + Name = "cwdf-infra-{{ job_id }}-eks-cluster" + JobId = "{{ job_id }}" + } + ) +} + +{% for node_group in eks.node_groups %} +resource "aws_eks_node_group" "{{ node_group.name }}" { + cluster_name = aws_eks_cluster.default.name + node_group_name = "{{ node_group.name }}" + node_role_arn = aws_iam_role.eks-cluster-nodegroup-role.arn + subnet_ids = [{% for subnet in eks.subnets %}aws_subnet.{{ subnet }}.id,{% endfor %}] + + scaling_config { + desired_size = {{ node_group.vm_count }} + max_size = {{ node_group.vm_count }} + min_size = {{ node_group.vm_count }} + } + + remote_access { + ec2_ssh_key = aws_key_pair.default.key_name + source_security_group_ids = [aws_security_group.default.id] + } + + instance_types = ["{{ node_group.instance_type }}"] + + # Ensure that IAM Role permissions are created before and deleted after EKS Node Group handling. + # Otherwise, EKS will not be able to properly delete EC2 Instances and Elastic Network Interfaces. + depends_on = [ + aws_iam_role_policy_attachment.eks-cluster-nodegroup-role-AmazonEKSWorkerNodePolicy, + aws_iam_role_policy_attachment.eks-cluster-nodegroup-role-AmazonEKS_CNI_Policy, + aws_iam_role_policy_attachment.eks-cluster-nodegroup-role-AmazonEC2ContainerRegistryReadOnly, + {% if will_create_ansible_instance %} + kubernetes_config_map.aws-auth + {% endif %} + ] + + tags = merge( + jsondecode("{{ extra_tags_json }}"), + { + Name = "cwdf-infra-{{ job_id }}-eks-nodegroup-{{ node_group.name }}" + JobId = "{{ job_id }}" + } + ) +} +{% endfor %} + +data "aws_instances" "eks-instances" { + filter { + name = "tag:eks:cluster-name" + values = [aws_eks_cluster.default.name] + } + + depends_on = [{% for node_group in eks.node_groups %} aws_eks_node_group.{{node_group.name}}, {% endfor %}] +} + +locals { + eks_worker_instances = [ + for index, id in data.aws_instances.eks-instances.ids : { + id: id + public_ip: data.aws_instances.eks-instances.public_ips[index] + private_ip: data.aws_instances.eks-instances.private_ips[index] + } + ] +} + +output "eks_worker_instances" { + value = local.eks_worker_instances +} diff --git a/cloud/cwdf/templates/terraform/aws/provider.tf.jinja b/cloud/cwdf/templates/terraform/aws/provider.tf.jinja new file mode 100644 index 00000000..04459796 --- /dev/null +++ b/cloud/cwdf/templates/terraform/aws/provider.tf.jinja @@ -0,0 +1,17 @@ +terraform { + required_providers { + aws = { + source = "hashicorp/aws" + version = "4.25.0" + } + kubernetes = { + source = "hashicorp/kubernetes" + version = "2.12.1" + } + } +} + +provider "aws" { + region = "{{ region }}" + profile = "{{ profile }}" +} diff --git a/cloud/cwdf_example.yaml b/cloud/cwdf_example.yaml new file mode 100644 index 00000000..0a2210e1 --- /dev/null +++ b/cloud/cwdf_example.yaml @@ -0,0 +1,23 @@ +cloudProvider: aws +awsConfig: + profile: default + region: eu-central-1 + vpc_cidr_block: "10.21.0.0/16" + extra_tags: + Owner: "some_user" + subnets: + - name: "subnet_a" + az: eu-central-1a + cidr_block: "10.21.1.0/24" + - name: "subnet_b" + az: eu-central-1b + cidr_block: "10.21.2.0/24" + sg_whitelist_cidr_blocks: + - "95.67.93.23/32" + eks: + kubernetes_version: "1.22" + subnets: ["subnet_a", "subnet_b"] + node_groups: + - name: "default" + instance_type: "t3.medium" + vm_count: 1 \ No newline at end of file diff --git a/cloud/deployer.py b/cloud/deployer.py new file mode 100644 index 00000000..7a4a011d --- /dev/null +++ b/cloud/deployer.py @@ -0,0 +1,262 @@ +import os +import io +from Crypto.PublicKey import RSA +import click +import json +import paramiko +import subprocess +from paramiko import SSHClient, SSHException +from scp import SCPClient +import string +import random +import socket +from time import sleep +import cwdf +import yaml +import shutil +import sw_deployment.sw_deployment_tool as sw_deployment + +def generate_ssh_keys(ssh_dir, public_key_path, private_key_path): + if not os.path.exists(ssh_dir): + os.makedirs(ssh_dir) + private_key = RSA.generate(2048) + with open(private_key_path, 'wb') as f: + f.write(private_key.exportKey('PEM')) + public_key = private_key.publickey() + with open(public_key_path, 'wb') as f: + f.write(public_key.exportKey('OpenSSH')) + os.chmod(private_key_path, 0o600) + + +@click.group() +def cli(): + pass + + +@click.command() +@click.option('--deployment_dir', help='Path to deployment directory', required=True) +def deploy(deployment_dir): + config_path = os.path.join(deployment_dir, "cwdf.yaml") + sw_config_path = os.path.join(deployment_dir, "sw.yaml") + + # Verify config file exists + if not os.path.exists(config_path): + click.echo("Config file does not exist.", err=True) + return None + if not os.path.exists(sw_config_path): + click.echo("Software config file does not exist.", err=True) + return None + + click.echo("Beginning deployment...") + + with open(config_path, 'r') as f: + cwdf_configuration = f.read() + + # Generate SSH keys for instances + ssh_dir = os.path.join(os.path.abspath(deployment_dir), "ssh") + public_key_path = os.path.join(ssh_dir, "id_rsa.pub") + private_key_path = os.path.join(ssh_dir, "id_rsa") + generate_ssh_keys(ssh_dir, public_key_path, private_key_path) + with open(public_key_path, 'r') as f: + ssh_public_key = f.read() + with open(private_key_path, 'r') as f: + ssh_private_key = f.read() + + # Create lock file if not exists + # Lock file is intended to contain info not to break previous deployment in future + # For now only job id is stored there to preserve previous deployment + lock_path = os.path.join(deployment_dir, "tbd.lock") + with open(lock_path, 'a+') as f: + f.seek(0) + lock_str = f.read() + try: + lock = json.loads(lock_str) + except ValueError as e: + lock = None + + if lock is not None and "job_id" in lock: + job_id = lock["job_id"] + else: + # Random 8 digit identifier + job_id = ''.join(random.choices(string.digits, k=8)) + lock = json.dumps({"job_id": job_id}) + f.write(lock) + + click.echo("Job ID: " + job_id) + + manifest = cwdf.compose_terraform(cwdf_configuration, job_id, ssh_public_key) + manifest_path = os.path.join(deployment_dir, 'deploy.tf') + with open(manifest_path, 'w') as f: + f.write(manifest) + + click.echo("Initializing Terraform...") + proc = subprocess.run(["terraform", "init"], cwd=deployment_dir, universal_newlines=True) + if proc.returncode != 0: + click.secho("Error while initializing Terraform", err=True, bold=True, fg="red") + return + + click.echo("Building Terraform plan...") + proc = subprocess.run([ + "terraform", "plan", "-out=outfile", "-detailed-exitcode"], + cwd=deployment_dir, universal_newlines=True + ) + if proc.returncode == 1: + click.echo("Error while planning deployment", err=True) + return + elif proc.returncode == 0: + click.echo("No changes needed.") + #return + + if click.confirm("Continue with above modifications?"): + proc = subprocess.run(["terraform", "apply", "outfile"], cwd=deployment_dir, universal_newlines=True) + if proc.returncode == 1: + click.echo("Error while running deployment", err=True) + return + else: + click.echo("Deployment finished.") + else: + return + + proc = subprocess.run(["terraform", "output", "-json"], cwd=deployment_dir, capture_output=True) + json_output = proc.stdout + terraform_output = json.loads(json_output) + ansible_host_ip = terraform_output["ansible_host_public_ip"]["value"] + click.echo("Ansible Host is accessible on: " + ansible_host_ip) + click.echo("-------------------") + ecr_url = terraform_output["ecr_url"]["value"] + click.echo("ECR Registry URL:") + click.echo(ecr_url) + click.echo("-------------------") + click.echo("Worker nodes:") + click.echo("-------------------") + workers = terraform_output["eks_worker_instances"]["value"] + workers_ip = [] + for worker in workers: + workers_ip.append(worker["private_ip"]) + click.echo("Worker " + worker["id"]) + click.echo("Private ip: " + worker["private_ip"]) + click.echo("Public ip: " + worker["public_ip"]) + click.echo("-------------------") + click.echo("Opening SSH connection to Ansible host...") + ssh = SSHClient() + ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) + cfg = { + 'hostname': ansible_host_ip, + 'timeout': 200, + 'username': 'ubuntu', + 'key_filename': private_key_path + } + if os.path.exists(os.path.expanduser("~/.ssh/config")): + ssh_config = paramiko.SSHConfig() + user_config_file = os.path.expanduser("~/.ssh/config") + with io.open(user_config_file, 'rt', encoding='utf-8') as f: + ssh_config.parse(f) + host_conf = ssh_config.lookup(ansible_host_ip) + if host_conf: + if 'proxycommand' in host_conf: + cfg['sock'] = paramiko.ProxyCommand(host_conf['proxycommand']) + if 'user' in host_conf: + cfg['username'] = host_conf['user'] + if 'identityfile' in host_conf: + cfg['key_filename'] = host_conf['identityfile'] + if 'hostname' in host_conf: + cfg['hostname'] = host_conf['hostname'] + ssh_connected = False + while not ssh_connected: + try: + ssh.connect(**cfg) + ssh_connected = True + except (SSHException, socket.error) as e: + click.echo("SSH not available yet. Retrying in 10 seconds.") + sleep(10) + click.echo("Opened SSH connection.") + click.echo("Waiting for cloud init to complete on Ansible host...") + scp = SCPClient(ssh.get_transport()) + stdin, stdout, stderr = ssh.exec_command("cloud-init status --wait") + stdout.channel.recv_exit_status() + click.echo("Cloud init done.") + + # Make deployment output yaml file + cwdf_output = { + "ecr_url": ecr_url, + "eks_worker_instances": workers + } + cwdf_output_filename = os.path.join(deployment_dir, "cwdf_output.yaml") + with open(cwdf_output_filename, 'w') as f: + yaml.dump(cwdf_output, f) + scp.put(cwdf_output_filename, remote_path="/home/ubuntu/cwdf_deployment/") + + click.echo("Transferring SSH keys to Ansible machine...") + scp.put(private_key_path, remote_path='/tmp/id_rsa',) + ssh.exec_command("sudo mv /tmp/id_rsa /home/ubuntu/cwdf_deployment/ssh/id_rsa") + ssh.exec_command("sudo chmod 600 /home/ubuntu/cwdf_deployment/ssh/id_rsa") + click.echo("Successfully transferred SSH key to ~/cwdf_deployment/ssh/id_rsa") + click.echo("Transferring discovery to Ansible instance...") + scp.put("discovery", remote_path="/home/ubuntu/cwdf_deployment/", recursive=True) + click.echo("Successfully transferred discovery to ~/cwdf_deployment/discovery") + click.echo("Running discovery on EKS workers...") + discovery_results_path = os.path.join(deployment_dir, "discovery_results") + if not os.path.exists(discovery_results_path): + os.makedirs(discovery_results_path) + for worker in workers: + stdin, stdout, stderr = ssh.exec_command( + f"python3 /home/ubuntu/cwdf_deployment/discovery/discover.py {worker['private_ip']} ec2-user /home/ubuntu/cwdf_deployment/ssh/id_rsa" + ) + if stdout.channel.recv_exit_status() != 0: + click.echo(f"Error while running discovery on {worker['private_ip']}:", err=True) + click.echo(stderr.read(), err=True) + else: + filename = os.path.join(discovery_results_path, worker['private_ip'].replace(".", "-") + ".json") + with open(filename, 'w') as f: + f.write(stdout.read().decode("utf-8")) + click.echo("Wrote to discovery_results directory.") + click.echo("Copying to Ansible instance...") + scp.put(discovery_results_path, remote_path="/home/ubuntu/cwdf_deployment/", recursive=True) + click.echo("Copied discovery results to Ansible host.") + ssh.close() + + click.echo("-------------------") + click.echo('Running SW deployment') + + with open(file=sw_config_path, mode='r', encoding='utf-8') as file: + sw_configuration = yaml.load(file, Loader=yaml.FullLoader) + sw_configuration['ansible_host_ip'] = ansible_host_ip + sw_configuration['worker_ips'] = workers_ip + sw_configuration['ssh_key'] = os.path.join('..', private_key_path) + sw_configuration['replicate_to_container_registry'] = ecr_url + with open(file=sw_config_path, mode="w", encoding='utf-8') as file: + yaml.dump(sw_configuration, file) + + sw_deployment.start_deploy(config=sw_config_path) + +@click.command() +@click.option('--deployment_dir', help='Path to deployment directory', required=True) +def cleanup(deployment_dir): + sw_deployment.cleanup(config=os.path.join(deployment_dir, "sw.yaml")) + subprocess.run(["terraform", "destroy"], cwd=deployment_dir, universal_newlines=True) + click.echo("Removing temporary files...") + + discovery_results_path = os.path.join(deployment_dir, "discovery_results") + ssh_dir = os.path.join(deployment_dir, "ssh") + directories = [discovery_results_path, ssh_dir] + + cwdf_output_filename = os.path.join(deployment_dir, "cwdf_output.yaml") + lock_path = os.path.join(deployment_dir, "tbd.lock") + manifest_path = os.path.join(deployment_dir, 'deploy.tf') + outfile_path = os.path.join(deployment_dir, 'outfile') + files = [cwdf_output_filename, lock_path, manifest_path, outfile_path] + + for directory in directories: + if os.path.exists(directory): + shutil.rmtree(directory) + for file in files: + if os.path.exists(file): + os.remove(file) + + +cli.add_command(deploy) +cli.add_command(cleanup) + + +if __name__ == "__main__": + cli() diff --git a/cloud/discovery/__init__.py b/cloud/discovery/__init__.py new file mode 100644 index 00000000..579e0887 --- /dev/null +++ b/cloud/discovery/__init__.py @@ -0,0 +1 @@ +from .discover import main diff --git a/cloud/discovery/ddp_devs b/cloud/discovery/ddp_devs new file mode 100644 index 00000000..ca69219e --- /dev/null +++ b/cloud/discovery/ddp_devs @@ -0,0 +1,2 @@ +# List of DDP capable NICs - Source: https://github.com/intel/ddp-tool +['8086:10A6', '8086:1590', '8086:1591', '8086:1592', '8086:1592', '8086:1592', '8086:1592', '8086:1592', '8086:1592', '8086:1592', '8086:1592', '8086:1592', '8086:1593', '8086:159B', '8086:1593', '8086:1593', '8086:1592', '8086:1593', '8086:1593', '8086:1593', '8086:1593', '8086:1593', '8086:1593', '8086:1592', '8086:1593', '8086:1598', '8086:1599', '8086:159A', '8086:159B', '8086:159B', '8086:159B', '8086:159B', '8086:159B', '8086:159B', '8086:159B', '8086:159B', '8086:159B', '8086:159B', '8086:159C', '8086:159D', '8086:1889', '8086:124C', '8086:124D', '8086:124E', '8086:124F', '8086:151D', '8086:1890', '8086:1891', '8086:1892', '8086:1893', '8086:1894', '8086:1897', '8086:1898', '8086:1899', '8086:189A', '8086:0CF8', '8086:0CF8', '8086:0D58', '8086:0D58', '8086:101F', '8086:104E', '8086:104F', '8086:154B', '8086:154C', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1572', '8086:1573', '8086:1574', '8086:1580', '8086:1581', '8086:1581', '8086:1581', '8086:1581', '8086:1581', '8086:1581', '8086:1582', '8086:1583', '8086:1583', '8086:1583', '8086:1583', '8086:1583', '8086:1583', '8086:1583', '8086:1583', '8086:1583', '8086:1583', '8086:1583', '8086:1583', '8086:1583', '8086:1583', '8086:1584', '8086:1584', '8086:1584', '8086:1584', '8086:1584', '8086:1584', '8086:1585', '8086:1586', '8086:1586', '8086:1586', '8086:1587', '8086:1587', '8086:1588', '8086:1588', '8086:1588', '8086:1589', '8086:1589', '8086:1589', '8086:1589', '8086:1589', '8086:1589', '8086:1589', '8086:1589', '8086:1589', '8086:1589', '8086:158A', '8086:158A', '8086:158A', '8086:158A', '8086:158A', '8086:158B', '8086:158B', '8086:158B', '8086:158B', '8086:158B', '8086:158B', '8086:158B', '8086:158B', '8086:158B', '8086:158B', '8086:158B', '8086:158B', '8086:158B', '8086:158B', '8086:158B', '8086:158B', '8086:158B', '8086:15FF', '8086:15FF', '8086:15FF', '8086:15FF', '8086:15FF', '8086:15FF', '8086:15FF', '8086:15FF', '8086:15FF', '8086:FAFA', '8086:FBFB'] diff --git a/cloud/discovery/discover.py b/cloud/discovery/discover.py new file mode 100644 index 00000000..cb2017da --- /dev/null +++ b/cloud/discovery/discover.py @@ -0,0 +1,447 @@ +from asyncio.subprocess import DEVNULL +import json +import os +import pprint +import subprocess +import paramiko +import sys +import fnmatch + +qat_pf_ids = ['0435', '37c8', '19e2', '18ee', '6f54', '18a0', '4940', '4942'] +qat_vf_ids = ['0443', '37c9', '19e3', '18ef', '6f55', '18a1', '4941', '4943'] +feature_flag_summary = ["sgx", "avx"] + +class Remote: + def __init__(self, ip_addr, username, key_filename): + self.ip_addr = ip_addr + self.username = username + self.key_filename = key_filename + self.session = None + + def connect(self): + try: + self.session = paramiko.SSHClient() + self.session.load_system_host_keys + self.session.set_missing_host_key_policy(paramiko.AutoAddPolicy()) + #if self.pkey is not None: + # pkey_file=os.path.expanduser(self.pkey) + # print("PKEY: ",pkey_file) + #else: + # pkey_file=os.path.expanduser("~/.ssh/id_rsa") + #priv_key = paramiko.RSAKey.from_private_key_file(pkey_file) + #self.session.connect(self.ip_addr, pkey=priv_key) + self.session.connect(self.ip_addr, username=self.username, key_filename=self.key_filename) + except paramiko.AuthenticationException as e: + print("Auth failed: ",e) + sys.exit() + except paramiko.SSHException as e: + print("SSH Connection failed: ",e) + sys.exit() + + def exec(self, cmd, split=False): + try: + _stdin, output, stderr = self.session.exec_command(cmd) + parse_out = output.read().decode("UTF-8").rstrip('\n') + except paramiko.SSHException as e: + print("Command exec failed: ",e) + return None + if stderr.read(1): + return None + if split: + return parse_out.splitlines() + else: + return parse_out + + def close(self): + self.close + + +def check_output(cmd, split=False): + if remote is not None: + output = remote.exec(cmd, split=split) + return output + try: + if split: + output = subprocess.check_output(cmd, shell=True, stderr=DEVNULL).decode("UTF-8").splitlines() + else: + output = subprocess.check_output(cmd, shell=True, stderr=DEVNULL).decode("UTF-8") + except subprocess.CalledProcessError as e: + return None + return output + +def socket_update(orig: dict, update: dict): + for i in set(update["Socket"].keys()): + if i not in orig["Socket"]: + orig["Socket"].update({i: {}}) + if "Device" in list(set(orig["Socket"][i].keys())&set(update["Socket"][i].keys())): + orig["Socket"][i]["Device"].update(update["Socket"][i]["Device"]) + else: + orig["Socket"][i].update(update["Socket"][i]) + return orig + + +def get_pci_net(): + socket_out = {"Socket": {}} + global nic_sriov + global nic_ddp + global nic_types + try: + with open(os.path.join(sys.path[0],"ddp_devs"), 'r') as file: + for line in file: + if not line.strip().startswith("#"): + ddp_list = line.strip() + break + except IOError as e: + print("Error loading ddp_devs - Exiting") + sys.exit() + net_devices = check_output("ls -1 /sys/class/net/*/device/numa_node", split=True) + if net_devices is None: return None + net_numa = check_output("cat /sys/class/net/*/device/numa_node", split=True) + for (i, h) in zip(net_devices, net_numa): + h = int(h) + dev_name = i.split("/")[4] + device = {dev_name: {}} + dev_path = os.path.split(i) + uevent_dump = check_output("cat %s/uevent" % dev_path[0]) + for line in uevent_dump.splitlines(): + linevals = list(map(str.strip, line.split('=', 1))) + device[dev_name].update({linevals[0].title(): linevals[1]}) + pci_slot = device[dev_name]["Pci_Slot_Name"].split(':', 1)[1] + del device[dev_name]["Pci_Slot_Name"] + device[dev_name].update({"Interface": dev_name}) + pci_subsystem = check_output("lspci -s %s -v | grep Subsystem" % pci_slot) + if pci_subsystem: + pci_subsystem = pci_subsystem.split(':')[1].strip() + else: + try: + pci_subsystem = check_output("lspci -s %s" % pci_slot).split(':')[2] + if pci_subsystem: + pci_subsystem = pci_subsystem.strip() + else: + pci_subsystem = "Unknown" + except AttributeError: + pci_subsystem = "Unknown" + device[dev_name].update({"Device": pci_subsystem}) + device[pci_slot] = device[dev_name] + del device[dev_name] + if "Pci_Id" in device[pci_slot].keys(): + if device[pci_slot]["Pci_Id"] in ddp_list: + device[pci_slot].update({"Ddp_Support": True}) + if not nic_ddp: + nic_ddp = True + if "Driver" in device[pci_slot].keys(): + if device[pci_slot]["Driver"] == "ice": + if "cvl" not in nic_types: + nic_types.append("cvl") + elif device[pci_slot]["Driver"] == "i40e": + if "fvl" not in nic_types: + nic_types.append("fvl") + + ## Get information about PF/VF and SR-IOV + ## Check for SR-IOV Capabilities + ########## CONTINUE WORKING ON THIS, MAKE SURE ALL INTERFACES HAVE RELEVANT INFO + totalvfs = check_output("cat %s/sriov_totalvfs" % dev_path[0]) + if totalvfs is not None and int(totalvfs) > 0: + # PF with SR-IOV enabled + device[pci_slot].update({"Sriov_Enabled": True}) + nic_sriov = True + device[pci_slot].update({"Sriov_Maxvfs": int(totalvfs)}) + device[pci_slot].update({"Type": "PF"}) + vf_list = check_output("cat %s/virtfn*/uevent | grep PCI_SLOT_NAME" % dev_path[0], split=True) + if vf_list is not None: + # PF with SR-IOV enabled and VFs configured + device[pci_slot].update({"Sriov_Vf_Count": len(vf_list)}) + vf_pcis = [] + for vf in vf_list: + pci = vf.split('=', 1)[1].strip() + vf_pcis.append(pci.split(':', 1)[1]) + device[pci_slot].update({"Vf_Pci_Ids": vf_pcis}) + else: + pf_id = check_output("cat %s/physfn/uevent | grep PCI_SLOT_NAME" % dev_path[0]) + if pf_id is None: + # PF without SR-IOV + device[pci_slot].update({"Type": "Pf"}) + device[pci_slot].update({"Sriov_Enabled": False}) + else: + # VF + short_id = pf_id.split('=', 1)[1].strip().split(':', 1)[1] + device[pci_slot].update({"Pf_Pci_Id": short_id}) + device[pci_slot].update({"Type": "Vf"}) + if h not in socket_out["Socket"]: + socket_out["Socket"].update({h: {}}) + socket_out["Socket"][h].update({"Device": {"Nic": {}}}) + socket_out["Socket"][h]["Device"]["Nic"].update(device) + return socket_out + +def get_pci_qat(): + pf_ids = [] + vf_ids = [] + socket_out = {"Socket": {}} + global qat_sriov + dev_path = "/sys/bus/pci/devices/0000:" + pci_devices = check_output("lspci -nmm", split=True) + if not pci_devices: + return None + for device in pci_devices: + for pf_id in qat_pf_ids: + if pf_id in device: + pf_ids.append(device.split()[0]) + for vf_id in qat_vf_ids: + if vf_id in device: + vf_ids.append(device.split()[0]) + if len(pf_ids) == 0 and len(vf_ids) == 0: return None + for pf_id in pf_ids: + device = {pf_id: {}} + qat_numa = int(check_output("cat %s%s/numa_node" % (dev_path, pf_id))) + uevent_dump = check_output("cat %s%s/uevent" % (dev_path, pf_id)) + pci_subsystem = check_output("lspci -s %s -v | grep Subsystem" % pf_id) + if pci_subsystem: + pci_subsystem = pci_subsystem.split(':')[1].strip() + else: + try: + pci_subsystem = check_output("lspci -s %s" % pf_id).split(':')[2] + if pci_subsystem: + pci_subsystem = pci_subsystem.strip() + else: + pci_subsystem = "Unknown" + except AttributeError: + pci_subsystem = "Unknown" + for line in uevent_dump.splitlines(): + linevals = list(map(str.strip, line.split('=', 1))) + device[pf_id].update({linevals[0].title(): linevals[1]}) + del device[pf_id]["Pci_Slot_Name"] + device[pf_id].update({"Device": pci_subsystem}) + device[pf_id].update({"Type": "PF"}) + totalvfs = check_output("cat %s%s/sriov_totalvfs" % (dev_path, pf_id)) + if totalvfs is not None and int(totalvfs) > 0: + # PF with SR-IOV enabled + device[pf_id].update({"Sriov_Enabled": True}) + qat_sriov = True + device[pf_id].update({"Sriov_Maxvfs": int(totalvfs)}) + vf_list = check_output("cat %s%s/virtfn*/uevent | grep PCI_SLOT_NAME" % (dev_path, pf_id), split=True) + if vf_list is not None: + # PF with SR-IOV enabled and VFs configured + device[pf_id].update({"Sriov_Vf_Count": len(vf_list)}) + vf_pcis = [] + for vf in vf_list: + pci = vf.split('=', 1)[1].strip() + vf_pcis.append(pci.split(':', 1)[1]) + device[pf_id].update({"Vf_Pci_Ids": vf_pcis}) + if qat_numa not in socket_out["Socket"]: + socket_out["Socket"].update({qat_numa: {}}) + socket_out["Socket"][qat_numa].update({"Device": {"Qat": {}}}) + socket_out["Socket"][qat_numa]["Device"]["Qat"].update(device) + + for vf_id in vf_ids: + device = {vf_id: {}} + qat_numa = int(check_output("cat %s%s/numa_node" % (dev_path, vf_id))) + uevent_dump = check_output("cat %s%s/uevent" % (dev_path, vf_id)) + pci_subsystem = check_output("lspci -s %s -v | grep Subsystem" % vf_id) + if pci_subsystem: + pci_subsystem = pci_subsystem.split(':')[1].strip() + else: + pci_subsystem = check_output("lspci -s %s" % vf_id).split(':')[2] + if pci_subsystem: + pci_subsystem = pci_subsystem.strip() + else: + pci_subsystem = "Unknown" + pf_sub_id = check_output("cat %s%s/physfn/uevent | grep PCI_SLOT_NAME" % (dev_path, vf_id)) + for line in uevent_dump.splitlines(): + linevals = list(map(str.strip, line.split('=', 1))) + device[vf_id].update({linevals[0].title(): linevals[1]}) + del device[vf_id]["Pci_Slot_Name"] + device[vf_id].update({"Device": pci_subsystem}) + device[vf_id].update({"Type": "Vf"}) + if pf_sub_id is not None: + # VF + short_id = pf_sub_id.split('=', 1)[1].strip().split(':', 1)[1] + device[vf_id].update({"Pf_Pci_Id": short_id}) + device[vf_id].update({"Type": "Vf"}) + if qat_numa not in socket_out["Socket"]: + socket_out["Socket"].update({qat_numa: {}}) + socket_out["Socket"][qat_numa].update({"Device": {"Qat": {}}}) + socket_out["Socket"][qat_numa]["Device"]["Qat"].update(device) + return socket_out + +def get_lscpu(): + lscpu_out = {} + cpu_info_json = check_output("lscpu -J") + if cpu_info_json is None: return None + json_object = json.loads(cpu_info_json) + for i in json_object['lscpu']: + lscpu_out[i['field'].replace(":","")] = i['data'] + return {"lscpu": lscpu_out} + +def get_core_info(): + socket_out = {"Socket": {}} + core_info_csv = check_output("lscpu -p=cpu,core,socket,node,cache") + if core_info_csv is None: return None + for i in core_info_csv.splitlines(): + # CPU, Core, Socket, Node, Cache + if i and not i.startswith("#"): + cpustats = i.split(",") + cpu_id = int(cpustats[0]) + core_id = int(cpustats[1]) + socket_id = int(cpustats[2]) + if cpustats[3]: + node_id = int(cpustats[3]) + else: + node_id = None + cache = str(cpustats[4]) + + if socket_id not in socket_out["Socket"]: + socket_out["Socket"].update({socket_id: {"Cores": {}}}) + if core_id not in socket_out["Socket"][socket_id]["Cores"]: + socket_out["Socket"][socket_id]["Cores"].update({core_id: {"Cpus": []}}) + #print(socket_out["Socket"][socket_id]["Core"][core_id]) + socket_out["Socket"][socket_id]["Cores"][core_id]["Cpus"].append(cpu_id) + if node_id is not None and "Node" not in socket_out["Socket"][socket_id]["Cores"][core_id].keys(): + socket_out["Socket"][socket_id]["Cores"][core_id].update({"Node": node_id}) + if "Cache" not in socket_out["Socket"][socket_id]["Cores"][core_id].keys(): + socket_out["Socket"][socket_id]["Cores"][core_id].update({"Cache": cache}) + return socket_out + +def get_socket_mem_info(): + socket_out = {"Socket": {}} + mem_nodes = check_output("ls -1 /sys/devices/system/node/node*/meminfo", split=True) + if mem_nodes is None: return None + for i in mem_nodes: + socket = int(i.split("/")[5].lstrip('node')) + socket_out["Socket"].update({socket: {"Memory": {}}}) + memdump = check_output("cat %s" % i) + for h in memdump.splitlines(): + valpair = h.split()[2:4] + socket_out["Socket"][socket]["Memory"].update({valpair[0].lstrip(':'): valpair[1]}) + return socket_out + +def get_mem_info(): + # Add to full output + meminfo_out = {"Memory": {}} + mem_info = check_output("cat /proc/meminfo", split=True) + if mem_info is None: return None + for i in mem_info: + valpair = i.split()[0:2] + meminfo_out["Memory"].update({valpair[0].rstrip(':'): valpair[1]}) + return meminfo_out + +def get_host_info(): + hostinfo_out = {"Host": {}} + # consider changing to /etc/os-release if hostnamectl is not common + host_info = check_output("hostnamectl", split=True) + if host_info: + for i in host_info: + value = i.split(':', 1)[1].strip() + if "Static hostname" in i: + hostinfo_out["Host"].update({"Hostname": value}) + elif "Operating System" in i: + hostinfo_out["Host"].update({"OS": value}) + elif "Kernel" in i: + hostinfo_out["Host"].update({"Kernel": value}) + elif "Architecture" in i: + hostinfo_out["Host"].update({"Arch": value}) + codename = check_output("cat /sys/devices/cpu/caps/pmu_name") + if codename: + codename = codename.strip() + hostinfo_out["Host"].update({"Codename": codename.title()}) + if not hostinfo_out["Host"].keys(): return None + return hostinfo_out + +def get_summary(info: dict): + summary = {} + # summarize existing object + if "Memory" in info.keys(): + if "HugePages_Total" in info["Memory"]: + if int(info["Memory"]["HugePages_Total"]) != 0: + summary["Hugepages_Total"] = info["Memory"]["HugePages_Total"] + summary["Hugepages_Free"] = info["Memory"]["HugePages_Free"] + if info["Memory"]["Hugepagesize"] == "1048576": + summary["Hugepage_Size"] = "1G" + elif info["Memory"]["Hugepagesize"] == "2048": + summary["Hugepage_Size"] = "2M" + else: + summary["Hugepage_Size"] = info["Memory"]["Hugepagesize"]+"K" + if "lscpu" in info.keys(): + if "Model name" in info["lscpu"]: + summary["Cpu_Model"] = info["lscpu"]["Model name"] + if "CPU(s)" in info["lscpu"]: + summary["Cpu_Count"] = info["lscpu"]["CPU(s)"] + if "Socket(s)" in info["lscpu"]: + summary["Sockets"] = info["lscpu"]["Socket(s)"] + if "Core(s) per socket" in info["lscpu"]: + summary["Cores_Per_Socket"] = info["lscpu"]["Core(s) per socket"] + if "Thread(s) per core" in info["lscpu"]: + summary["Threads_Per_Core"] = info["lscpu"]["Thread(s) per core"] + if "NUMA node(s)" in info["lscpu"]: + summary["Numa_Nodes"] = info["lscpu"]["NUMA node(s)"] + if int(summary["Numa_Nodes"]) != 0: + for i in range(int(summary["Numa_Nodes"])): + summary["Numa_Node"+str(i)+"_Cpus"] = info["lscpu"]["NUMA node"+str(i)+" CPU(s)"] + if "Flags" in info["lscpu"]: + flags = info["lscpu"]["Flags"].split() + for i in feature_flag_summary: + matches = fnmatch.filter(flags,i+"*") + if matches: + summary[i.title()] = matches + if "Virtualization" in info["lscpu"]: + if "VT-x" in info["lscpu"]["Virtualization"]: + if "vmx" in flags: + summary["Virtualization"] = True + if nic_sriov: + summary["Nic_Sriov"] = True + if qat_sriov: + summary["Qat_Sriov"] = True + if nic_ddp: + summary["Nic_Ddp"] = True + if nic_types: + summary["Nic_Types"] = nic_types + + if not summary: + return None + summary_out = {"Summary": summary} + return summary_out + +def main(ip_addr, username, key_filename): + global remote + remote = Remote(ip_addr, username, key_filename) + remote.connect() + output = {"Socket": {}} + global nic_sriov + nic_sriov = False + global nic_types + nic_types = [] + global qat_sriov + qat_sriov = False + global nic_ddp + nic_ddp = False + pci_net = get_pci_net() + pci_qat = get_pci_qat() + core_info = get_core_info() + if pci_net is not None: + socket_update(output, pci_net) + if pci_qat is not None: + socket_update(output, pci_qat) + if core_info is not None: + socket_update(output, core_info) + output.update(get_lscpu()) + socket_mem_info = get_socket_mem_info() + mem_info = get_mem_info() + host_info = get_host_info() + if mem_info is not None: + output.update(mem_info) + if socket_mem_info is not None: + socket_update(output, socket_mem_info) + if host_info is not None: + output.update(host_info) + summary_info = get_summary(output) + if summary_info is not None: + output.update(summary_info) + + print(json.dumps(output)) + if remote is not None: + remote.close() + return output + + +if __name__ == "__main__": + if len(sys.argv) > 1: + main(sys.argv[1], sys.argv[2], sys.argv[3]) diff --git a/cloud/discovery/discover_local.py b/cloud/discovery/discover_local.py new file mode 100644 index 00000000..98f848a2 --- /dev/null +++ b/cloud/discovery/discover_local.py @@ -0,0 +1,391 @@ +from asyncio.subprocess import DEVNULL +import json +import os +import pprint +import subprocess +import sys +import fnmatch + +qat_pf_ids = ['0435', '37c8', '19e2', '18ee', '6f54', '18a0', '4940', '4942'] +qat_vf_ids = ['0443', '37c9', '19e3', '18ef', '6f55', '18a1', '4941', '4943'] +feature_flag_summary = ["sgx", "avx"] + +def check_output(cmd, split=False): + try: + if split: + output = subprocess.check_output(cmd, shell=True, stderr=DEVNULL).decode("UTF-8").splitlines() + else: + output = subprocess.check_output(cmd, shell=True, stderr=DEVNULL).decode("UTF-8") + except subprocess.CalledProcessError as e: + return None + return output + +def socket_update(orig: dict, update: dict): + for i in set(update["Socket"].keys()): + if i not in orig["Socket"]: + orig["Socket"].update({i: {}}) + if "Device" in list(set(orig["Socket"][i].keys())&set(update["Socket"][i].keys())): + orig["Socket"][i]["Device"].update(update["Socket"][i]["Device"]) + else: + orig["Socket"][i].update(update["Socket"][i]) + return orig + + +def get_pci_net(): + socket_out = {"Socket": {}} + global nic_sriov + global nic_ddp + global nic_types + try: + with open(os.path.join(sys.path[0],"ddp_devs"), 'r') as file: + for line in file: + if not line.strip().startswith("#"): + ddp_list = line.strip() + break + except IOError as e: + print("Error loading ddp_devs - Exiting") + sys.exit() + net_devices = check_output("ls -1 /sys/class/net/*/device/numa_node", split=True) + if net_devices is None: return None + net_numa = check_output("cat /sys/class/net/*/device/numa_node", split=True) + for (i, h) in zip(net_devices, net_numa): + h = int(h) + dev_name = i.split("/")[4] + device = {dev_name: {}} + dev_path = os.path.split(i) + uevent_dump = check_output("cat %s/uevent" % dev_path[0]) + for line in uevent_dump.splitlines(): + linevals = list(map(str.strip, line.split('=', 1))) + device[dev_name].update({linevals[0].title(): linevals[1]}) + pci_slot = device[dev_name]["Pci_Slot_Name"].split(':', 1)[1] + del device[dev_name]["Pci_Slot_Name"] + device[dev_name].update({"Interface": dev_name}) + pci_subsystem = check_output("lspci -s %s -v | grep Subsystem" % pci_slot) + if pci_subsystem: + pci_subsystem = pci_subsystem.split(':')[1].strip() + else: + try: + pci_subsystem = check_output("lspci -s %s" % pci_slot).split(':')[2] + if pci_subsystem: + pci_subsystem = pci_subsystem.strip() + else: + pci_subsystem = "Unknown" + except AttributeError: + pci_subsystem = "Unknown" + device[dev_name].update({"Device": pci_subsystem}) + device[pci_slot] = device[dev_name] + del device[dev_name] + if "Pci_Id" in device[pci_slot].keys(): + if device[pci_slot]["Pci_Id"] in ddp_list: + device[pci_slot].update({"Ddp_Support": True}) + if not nic_ddp: + nic_ddp = True + if "Driver" in device[pci_slot].keys(): + if device[pci_slot]["Driver"] == "ice": + if "cvl" not in nic_types: + nic_types.append("cvl") + elif device[pci_slot]["Driver"] == "i40e": + if "fvl" not in nic_types: + nic_types.append("fvl") + + ## Get information about PF/VF and SR-IOV + ## Check for SR-IOV Capabilities + ########## CONTINUE WORKING ON THIS, MAKE SURE ALL INTERFACES HAVE RELEVANT INFO + totalvfs = check_output("cat %s/sriov_totalvfs" % dev_path[0]) + if totalvfs is not None and int(totalvfs) > 0: + # PF with SR-IOV enabled + device[pci_slot].update({"Sriov_Enabled": True}) + nic_sriov = True + device[pci_slot].update({"Sriov_Maxvfs": int(totalvfs)}) + device[pci_slot].update({"Type": "PF"}) + vf_list = check_output("cat %s/virtfn*/uevent | grep PCI_SLOT_NAME" % dev_path[0], split=True) + if vf_list is not None: + # PF with SR-IOV enabled and VFs configured + device[pci_slot].update({"Sriov_Vf_Count": len(vf_list)}) + vf_pcis = [] + for vf in vf_list: + pci = vf.split('=', 1)[1].strip() + vf_pcis.append(pci.split(':', 1)[1]) + device[pci_slot].update({"Vf_Pci_Ids": vf_pcis}) + else: + pf_id = check_output("cat %s/physfn/uevent | grep PCI_SLOT_NAME" % dev_path[0]) + if pf_id is None: + # PF without SR-IOV + device[pci_slot].update({"Type": "Pf"}) + device[pci_slot].update({"Sriov_Enabled": False}) + else: + # VF + short_id = pf_id.split('=', 1)[1].strip().split(':', 1)[1] + device[pci_slot].update({"Pf_Pci_Id": short_id}) + device[pci_slot].update({"Type": "Vf"}) + if h not in socket_out["Socket"]: + socket_out["Socket"].update({h: {}}) + socket_out["Socket"][h].update({"Device": {"Nic": {}}}) + socket_out["Socket"][h]["Device"]["Nic"].update(device) + return socket_out + +def get_pci_qat(): + pf_ids = [] + vf_ids = [] + socket_out = {"Socket": {}} + global qat_sriov + dev_path = "/sys/bus/pci/devices/0000:" + pci_devices = check_output("lspci -nmm", split=True) + if not pci_devices: + return None + for device in pci_devices: + for pf_id in qat_pf_ids: + if pf_id in device: + pf_ids.append(device.split()[0]) + for vf_id in qat_vf_ids: + if vf_id in device: + vf_ids.append(device.split()[0]) + if len(pf_ids) == 0 and len(vf_ids) == 0: return None + for pf_id in pf_ids: + device = {pf_id: {}} + qat_numa = int(check_output("cat %s%s/numa_node" % (dev_path, pf_id))) + uevent_dump = check_output("cat %s%s/uevent" % (dev_path, pf_id)) + pci_subsystem = check_output("lspci -s %s -v | grep Subsystem" % pf_id) + if pci_subsystem: + pci_subsystem = pci_subsystem.split(':')[1].strip() + else: + try: + pci_subsystem = check_output("lspci -s %s" % pf_id).split(':')[2] + if pci_subsystem: + pci_subsystem = pci_subsystem.strip() + else: + pci_subsystem = "Unknown" + except AttributeError: + pci_subsystem = "Unknown" + for line in uevent_dump.splitlines(): + linevals = list(map(str.strip, line.split('=', 1))) + device[pf_id].update({linevals[0].title(): linevals[1]}) + del device[pf_id]["Pci_Slot_Name"] + device[pf_id].update({"Device": pci_subsystem}) + device[pf_id].update({"Type": "PF"}) + totalvfs = check_output("cat %s%s/sriov_totalvfs" % (dev_path, pf_id)) + if totalvfs is not None and int(totalvfs) > 0: + # PF with SR-IOV enabled + device[pf_id].update({"Sriov_Enabled": True}) + qat_sriov = True + device[pf_id].update({"Sriov_Maxvfs": int(totalvfs)}) + vf_list = check_output("cat %s%s/virtfn*/uevent | grep PCI_SLOT_NAME" % (dev_path, pf_id), split=True) + if vf_list is not None: + # PF with SR-IOV enabled and VFs configured + device[pf_id].update({"Sriov_Vf_Count": len(vf_list)}) + vf_pcis = [] + for vf in vf_list: + pci = vf.split('=', 1)[1].strip() + vf_pcis.append(pci.split(':', 1)[1]) + device[pf_id].update({"Vf_Pci_Ids": vf_pcis}) + if qat_numa not in socket_out["Socket"]: + socket_out["Socket"].update({qat_numa: {}}) + socket_out["Socket"][qat_numa].update({"Device": {"Qat": {}}}) + socket_out["Socket"][qat_numa]["Device"]["Qat"].update(device) + + for vf_id in vf_ids: + device = {vf_id: {}} + qat_numa = int(check_output("cat %s%s/numa_node" % (dev_path, vf_id))) + uevent_dump = check_output("cat %s%s/uevent" % (dev_path, vf_id)) + pci_subsystem = check_output("lspci -s %s -v | grep Subsystem" % vf_id) + if pci_subsystem: + pci_subsystem = pci_subsystem.split(':')[1].strip() + else: + pci_subsystem = check_output("lspci -s %s" % vf_id).split(':')[2] + if pci_subsystem: + pci_subsystem = pci_subsystem.strip() + else: + pci_subsystem = "Unknown" + pf_sub_id = check_output("cat %s%s/physfn/uevent | grep PCI_SLOT_NAME" % (dev_path, vf_id)) + for line in uevent_dump.splitlines(): + linevals = list(map(str.strip, line.split('=', 1))) + device[vf_id].update({linevals[0].title(): linevals[1]}) + del device[vf_id]["Pci_Slot_Name"] + device[vf_id].update({"Device": pci_subsystem}) + device[vf_id].update({"Type": "Vf"}) + if pf_sub_id is not None: + # VF + short_id = pf_sub_id.split('=', 1)[1].strip().split(':', 1)[1] + device[vf_id].update({"Pf_Pci_Id": short_id}) + device[vf_id].update({"Type": "Vf"}) + if qat_numa not in socket_out["Socket"]: + socket_out["Socket"].update({qat_numa: {}}) + socket_out["Socket"][qat_numa].update({"Device": {"Qat": {}}}) + socket_out["Socket"][qat_numa]["Device"]["Qat"].update(device) + return socket_out + +def get_lscpu(): + lscpu_out = {} + cpu_info_json = check_output("lscpu -J") + if cpu_info_json is None: return None + json_object = json.loads(cpu_info_json) + for i in json_object['lscpu']: + lscpu_out[i['field'].replace(":","")] = i['data'] + return {"lscpu": lscpu_out} + +def get_core_info(): + socket_out = {"Socket": {}} + core_info_csv = check_output("lscpu -p=cpu,core,socket,node,cache") + if core_info_csv is None: return None + for i in core_info_csv.splitlines(): + # CPU, Core, Socket, Node, Cache + if i and not i.startswith("#"): + cpustats = i.split(",") + cpu_id = int(cpustats[0]) + core_id = int(cpustats[1]) + socket_id = int(cpustats[2]) + if cpustats[3]: + node_id = int(cpustats[3]) + else: + node_id = None + cache = str(cpustats[4]) + + if socket_id not in socket_out["Socket"]: + socket_out["Socket"].update({socket_id: {"Cores": {}}}) + if core_id not in socket_out["Socket"][socket_id]["Cores"]: + socket_out["Socket"][socket_id]["Cores"].update({core_id: {"Cpus": []}}) + #print(socket_out["Socket"][socket_id]["Core"][core_id]) + socket_out["Socket"][socket_id]["Cores"][core_id]["Cpus"].append(cpu_id) + if node_id is not None and "Node" not in socket_out["Socket"][socket_id]["Cores"][core_id].keys(): + socket_out["Socket"][socket_id]["Cores"][core_id].update({"Node": node_id}) + if "Cache" not in socket_out["Socket"][socket_id]["Cores"][core_id].keys(): + socket_out["Socket"][socket_id]["Cores"][core_id].update({"Cache": cache}) + return socket_out + +def get_socket_mem_info(): + socket_out = {"Socket": {}} + mem_nodes = check_output("ls -1 /sys/devices/system/node/node*/meminfo", split=True) + if mem_nodes is None: return None + for i in mem_nodes: + socket = int(i.split("/")[5].lstrip('node')) + socket_out["Socket"].update({socket: {"Memory": {}}}) + memdump = check_output("cat %s" % i) + for h in memdump.splitlines(): + valpair = h.split()[2:4] + socket_out["Socket"][socket]["Memory"].update({valpair[0].lstrip(':'): valpair[1]}) + return socket_out + +def get_mem_info(): + # Add to full output + meminfo_out = {"Memory": {}} + mem_info = check_output("cat /proc/meminfo", split=True) + if mem_info is None: return None + for i in mem_info: + valpair = i.split()[0:2] + meminfo_out["Memory"].update({valpair[0].rstrip(':'): valpair[1]}) + return meminfo_out + +def get_host_info(): + hostinfo_out = {"Host": {}} + # consider changing to /etc/os-release if hostnamectl is not common + host_info = check_output("hostnamectl", split=True) + if host_info: + for i in host_info: + value = i.split(':', 1)[1].strip() + if "Static hostname" in i: + hostinfo_out["Host"].update({"Hostname": value}) + elif "Operating System" in i: + hostinfo_out["Host"].update({"OS": value}) + elif "Kernel" in i: + hostinfo_out["Host"].update({"Kernel": value}) + elif "Architecture" in i: + hostinfo_out["Host"].update({"Arch": value}) + codename = check_output("cat /sys/devices/cpu/caps/pmu_name") + if codename: + codename = codename.strip() + hostinfo_out["Host"].update({"Codename": codename.title()}) + if not hostinfo_out["Host"].keys(): return None + return hostinfo_out + +def get_summary(info: dict): + summary = {} + # summarize existing object + if "Memory" in info.keys(): + if "HugePages_Total" in info["Memory"]: + if int(info["Memory"]["HugePages_Total"]) != 0: + summary["Hugepages_Total"] = info["Memory"]["HugePages_Total"] + summary["Hugepages_Free"] = info["Memory"]["HugePages_Free"] + if info["Memory"]["Hugepagesize"] == "1048576": + summary["Hugepage_Size"] = "1G" + elif info["Memory"]["Hugepagesize"] == "2048": + summary["Hugepage_Size"] = "2M" + else: + summary["Hugepage_Size"] = info["Memory"]["Hugepagesize"]+"K" + if "lscpu" in info.keys(): + if "Model name" in info["lscpu"]: + summary["Cpu_Model"] = info["lscpu"]["Model name"] + if "CPU(s)" in info["lscpu"]: + summary["Cpu_Count"] = info["lscpu"]["CPU(s)"] + if "Socket(s)" in info["lscpu"]: + summary["Sockets"] = info["lscpu"]["Socket(s)"] + if "Core(s) per socket" in info["lscpu"]: + summary["Cores_Per_Socket"] = info["lscpu"]["Core(s) per socket"] + if "Thread(s) per core" in info["lscpu"]: + summary["Threads_Per_Core"] = info["lscpu"]["Thread(s) per core"] + if "NUMA node(s)" in info["lscpu"]: + summary["Numa_Nodes"] = info["lscpu"]["NUMA node(s)"] + if int(summary["Numa_Nodes"]) != 0: + for i in range(int(summary["Numa_Nodes"])): + summary["Numa_Node"+str(i)+"_Cpus"] = info["lscpu"]["NUMA node"+str(i)+" CPU(s)"] + if "Flags" in info["lscpu"]: + flags = info["lscpu"]["Flags"].split() + for i in feature_flag_summary: + matches = fnmatch.filter(flags,i+"*") + if matches: + summary[i.title()] = matches + if "Virtualization" in info["lscpu"]: + if "VT-x" in info["lscpu"]["Virtualization"]: + if "vmx" in flags: + summary["Virtualization"] = True + if nic_sriov: + summary["Nic_Sriov"] = True + if qat_sriov: + summary["Qat_Sriov"] = True + if nic_ddp: + summary["Nic_Ddp"] = True + if nic_types: + summary["Nic_Types"] = nic_types + + if not summary: + return None + summary_out = {"Summary": summary} + return summary_out + +def main(): + output = {"Socket": {}} + global nic_sriov + nic_sriov = False + global nic_types + nic_types = [] + global qat_sriov + qat_sriov = False + global nic_ddp + nic_ddp = False + pci_net = get_pci_net() + pci_qat = get_pci_qat() + core_info = get_core_info() + if pci_net is not None: + socket_update(output, pci_net) + if pci_qat is not None: + socket_update(output, pci_qat) + if core_info is not None: + socket_update(output, core_info) + output.update(get_lscpu()) + socket_mem_info = get_socket_mem_info() + mem_info = get_mem_info() + host_info = get_host_info() + if mem_info is not None: + output.update(mem_info) + if socket_mem_info is not None: + socket_update(output, socket_mem_info) + if host_info is not None: + output.update(host_info) + summary_info = get_summary(output) + if summary_info is not None: + output.update(summary_info) + + pprint.pprint(output) + return output + +if __name__ == "__main__": + main() diff --git a/cloud/discovery/feature_reqs.yml b/cloud/discovery/feature_reqs.yml new file mode 100644 index 00000000..daa1ea36 --- /dev/null +++ b/cloud/discovery/feature_reqs.yml @@ -0,0 +1,46 @@ +features: + sgx: + arch: + - icelake + - sapphirerapids + sgx_dp: + arch: + - icelake + - sapphirerapids + kmra: + arch: + - icelake + - sapphirerapids + tcs: + arch: + - icelake + - sapphirerapids + tca: + arch: + - icelake + - sapphirerapids + pstate: + arch: + - cascadelake + - icelake + - sapphirerapids + sst: + arch: + - cascadelake + - icelake + - sapphirerapids + power_manager: + arch: + - cascadelake + - icelake + - sapphirerapids + intel_ethernet_operator: + nic: + - cvl + +sub_features: + service_mesh: + sgx_signer: + arch: + - icelake + - sapphirerapids \ No newline at end of file diff --git a/cloud/discovery/profiler.py b/cloud/discovery/profiler.py new file mode 100644 index 00000000..8169d8e5 --- /dev/null +++ b/cloud/discovery/profiler.py @@ -0,0 +1,231 @@ +import discover +import yaml +import pprint +import os +import sys + +dists = ["RedHat", "Rocky", "Ubuntu"] +dist_vers = ['8.5', '20.04', '21.10', '22.04'] +# Verify pmu_name for SPR below +archs = ["skylake", "cascadelake", "icelake", "sapphirerapids"] + +class Features: + def __init__(self, plat: dict): + self.plat = plat + self.dist_support = self._check_distro() + self.codename = self._get_codename() + self.nics = self._get_nic_types() + feature_reqs = self._load_yaml("feature_reqs.yml") + self.feat_reqs = feature_reqs["features"] + self.sub_feat_reqs = feature_reqs["sub_features"] + self.profiles = self._load_yaml("profiles.yml") + + def _load_yaml(self, featfile: str): + try: + with open(os.path.join(sys.path[0],featfile), 'r') as file: + try: + output = parsed_yaml=yaml.safe_load(file) + except yaml.YAMLError as exc: + print("Error parsing %s - Exiting" % featfile) + sys.exit() + return output + except IOError as e: + print("Error loading %s - Exiting" % featfile) + sys.exit() + + def _get_codename(self): + if "Host" not in self.plat.keys(): + print("No host information available") + return None + if "Codename" not in self.plat["Host"].keys(): + print("No Codename information available") + return None + codename = self.plat["Host"]["Codename"] + if not codename: return None + if codename.lower() not in archs: return None + return codename.lower() + + def _get_nic_types(self): + if "Summary" not in self.plat.keys(): + print("No summary information available") + return None + if "Nic_Types" not in self.plat["Summary"].keys(): + return None + nics = self.plat["Summary"]["Nic_Types"] + if not nics: return None + return nics + + def _check_distro(self): + match = False + if "Host" not in self.plat.keys(): + print("No host information available") + return None + if "OS" not in self.plat["Host"].keys(): + print("No OS information available") + return None + for d in dists: + if d in self.plat["Host"]["OS"]: + for dv in dist_vers: + if dv in self.plat["Host"]["OS"]: + match = True + break + if match: break + if not match: + return None + return match + +def check_feat_support(key, feats): + reqs = feats.feat_reqs + if key in reqs.keys(): + for lim_type in reqs[key].keys(): + if lim_type == "arch": + if feats.codename not in reqs[key][lim_type]: + return False + elif lim_type == "nic": + if not any(i in feats.nics for i in reqs[key][lim_type]): + return False + return True + +def check_sub_feat_support(key, byo_sub_dict, feats): + output_dict = {} + reqs = feats.sub_feat_reqs + if key in reqs.keys(): + for subfeat in byo_sub_dict.keys(): + if subfeat in reqs[key].keys(): + for lim_type in reqs[key][subfeat].keys(): + if lim_type == "arch": + if feats.codename not in reqs[key][subfeat][lim_type]: + output_dict.update({subfeat: "Unsupported"}) + break + elif lim_type == "nic": + if not any(i in feats.nics for i in reqs[key][subfeat][lim_type]): + output_dict.update({subfeat: "Unsupported"}) + break + else: + output_dict.update({subfeat: True}) + else: + for subfeat in byo_sub_dict.keys(): + output_dict.update({subfeat: True}) + return output_dict + +def byo_check(plat: dict, feats: object): + output = {} + if "build_your_own" not in feats.profiles.keys(): + return None + byo_list = feats.profiles["build_your_own"].keys() + byo_dict = feats.profiles["build_your_own"] + for key in byo_list: + if type(byo_dict[key]) == dict: + feat_support = check_feat_support(key, feats) + if feat_support is False: + support = "Unsupported" + else: + support = check_sub_feat_support(key, byo_dict[key], feats) + output.update({key: support}) + else: + feat_support = check_feat_support(key, feats) + if feat_support is False: + support = "Unsupported" + else: + support = check_feat(key, feats) + output.update({key: support}) + return output + +def set_sub_static(subfeats, state): + feat_dict = {} + for feat in subfeats.keys(): + feat_dict.update({feat: state}) + return feat_dict + +def check_feat(key, feats): + features = ["sriov_operator", "sriov_network_dp", "qat", "qat_dp", "ddp"] # sgx features covered in arch_features + unchecked = ["gpu", "gpu_dp", "name", "on_vms", "vm_mode"] # Consider minio (when not test-mode) and physical storage + if key in unchecked: + return None + elif key not in features: + return True + if key == "sriov_operator" or key == "sriov_network_dp": + if "Summary" in feats.plat.keys(): + if "Nic_Sriov" in feats.plat["Summary"]: + if feats.plat["Summary"]["Nic_Sriov"]: + return True + return False + if key == "qat" or key == "qat_dp": + if "Summary" in feats.plat.keys(): + if "Qat_Sriov" in feats.plat["Summary"]: + if feats.plat["Summary"]["Qat_Sriov"]: + return True + return False + if key == "ddp": + if "Summary" in feats.plat.keys(): + if "Nic_Ddp" in feats.plat["Summary"]: + if feats.plat["Summary"]["Nic_Ddp"]: + return True + return False + +def check_profiles(profiles: object, byo_feats: dict): + summary = {} + for prof in profiles.keys(): + if prof == "build_your_own": + continue + prof_support = True + summary.update({prof: {"Features": {}}}) + for feat in profiles[prof].keys(): + try: + if profiles[prof][feat] is True: + if byo_feats[feat] is True: + summary[prof]["Features"].update({feat: True}) + elif byo_feats[feat] is False: + summary[prof]["Features"].update({feat: False}) + prof_support = False + elif byo_feats[feat] == "Unsupported": + summary[prof]["Features"].update({feat: "Unsupported (CPU/NIC)"}) + elif byo_feats[feat] is None: + summary[prof]["Features"].update({feat: "Unchecked (TODO)"}) + elif type(profiles[prof][feat]) is dict: + subfeat_set = {} + if byo_feats[feat] == "Unsupported": + summary[prof]["Features"].update({feat: "Unsupported"}) + continue + elif byo_feats[feat] is None: + summary[prof]["Features"].update({feat: "Unchecked (TODO)"}) + continue + for subfeat in profiles[prof][feat].keys(): + if byo_feats[feat][subfeat] is True: + subfeat_set.update({subfeat: True}) + elif byo_feats[feat][subfeat] is False: + subfeat_set.update({subfeat: False}) + prof_support = False + elif byo_feats[feat][subfeat] == "Unsupported": + continue + elif byo_feats[feat][subfeat] is None: + subfeat_set.update({subfeat: "Unchecked (TODO)"}) + if subfeat_set: + summary[prof]["Features"].update({feat: subfeat_set}) + except KeyError: + print("KeyError (expected): ",feat) + summary[prof]["Features"].update({feat: "Special feature (not in BYO)"}) + summary[prof].update({"Supported": prof_support}) + if not summary: + return None + return summary + +def main(): + platform_info = discover.main() + feats = Features(platform_info) + if not feats.dist_support: + print("Unsupported OS distribution and/or version - exiting") + sys.exit() + if not feats.codename: + print("Unsupported CPU codename - exiting") + sys.exit() + byo_feats = byo_check(platform_info, feats) + pprint.pprint(byo_feats) + full_summary = check_profiles(feats.profiles, byo_feats) + pprint.pprint(full_summary) + print("Printing support summary:") + for profile in full_summary.keys(): + print(" Profile: %s, Supported: %s" % (profile, full_summary[profile]["Supported"])) + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/cloud/discovery/profiles.yml b/cloud/discovery/profiles.yml new file mode 100644 index 00000000..daf8dfb4 --- /dev/null +++ b/cloud/discovery/profiles.yml @@ -0,0 +1,489 @@ +# How to use this file: +# -------------------- +# can be: +# - on (included and enabled) +# - optional (included in vars but disabled) +# - off (not included, might as well drop it from the profile section) +# +# features: +# - vm_mode - is 'optional(false)' on k8s and is 'on(true)' on vm_host and on VMs +# - on_vms - is 'optional(false)' on k8s and on vm_host and is 'on(true)' on VMs +# - nfd +# - kube_dashboard +# - isolcpu +# - cpusets +# - native_cpu_manager +# - bond_cni +# - topology_manager +# - sriov_operator +# - sriov_network_dp +# - nic_drivers +# - sgx +# - sgx_dp +# - kmra: +# pccs +# apphsm +# ctk_demo +# - tcs +# - tca +# - qat +# - gpu +# - gpu_dp +# - openssl +# - tas +# - gas +# - ddp +# - network_userspace +# - dpdk +# - ovs_dpdk +# - pstate +# - cstate +# - ufs - uncore frequency scaling +# - sst +# - power_manager +# - telemetry: +# prometheus +# collectd +# telegraf +# - wireguard +# - multus +# - cndp +# - cndp_dp +# - psp +# - minio +# - cert_manager +# - registry +# - hugepages +# - service_mesh +# enabled +# tcpip_bypass_ebpf +# tls_splicing +# sgx_signer +# - intel_ethernet_operator +# enabled +# flow_config +# ddp +# fw_update +# - intel_sriov_fec_operator + +--- +access: + name: access + vm_mode: optional + on_vms: optional + nfd: on + kube_dashboard: on + isolcpu: optional + cpusets: optional + native_cpu_manager: on + topology_manager: on + sriov_operator: on + sriov_network_dp: optional + nic_drivers: on + bond_cni: off + qat: optional + qat_dp: optional + openssl: on + dsa: on + dsa_dp: on + dlb: optional + dlb_dp: optional + gpu: off + gpu_dp: off + sgx: off + sgx_dp: off + kmra: + pccs: off + apphsm: off + ctk_demo: off + tcs: off + tca: off + tas: off + gas: off + ddp: off + network_userspace: off + dpdk: on + ovs_dpdk: off + pstate: off + cstate: on + ufs: off + sst: off + power_manager: on + telemetry: + prometheus: on + collectd: optional + telegraf: on + service_mesh: + enabled: off + tcpip_bypass_ebpf: off + tls_splicing: off + sgx_signer: off + wireguard: on + multus: on + firewall: optional + cndp: off + cndp_dp: off + psp: on + minio: off + cert_manager: on + registry: on + hugepages: on + tadk: off + intel_ethernet_operator: + enabled: optional + flow_config: optional + ddp: optional + fw_update: optional + intel_sriov_fec_operator: on + +basic: + name: basic + vm_mode: optional + on_vms: optional + nfd: on + kube_dashboard: on + isolcpu: optional + cpusets: optional + topology_manager: on + sriov_operator: optional + sriov_network_dp: optional + nic_drivers: on + dpdk: optional + cstate: optional + ufs: optional + telemetry: + prometheus: on + collectd: optional + telegraf: on + wireguard: on + multus: on + firewall: optional + cndp: optional + cndp_dp: optional + psp: on + cert_manager: on + registry: on + hugepages: optional + intel_ethernet_operator: + enabled: optional + flow_config: optional + fw_update: optional + +full_nfv: + name: full_nfv + vm_mode: optional + on_vms: optional + nfd: on + kube_dashboard: on + isolcpu: optional + cpusets: optional + native_cpu_manager: on + topology_manager: on + sriov_operator: on + sriov_network_dp: optional + nic_drivers: on + bond_cni: on + qat: on + qat_dp: on + openssl: on + gpu: optional + gpu_dp: optional + sgx: on + sgx_dp: on + kmra: + pccs: on + apphsm: on + ctk_demo: on + tcs: on + tca: on + tas: on + gas: optional + ddp: on + network_userspace: on + dpdk: on + ovs_dpdk: on + pstate: optional + cstate: optional + ufs: optional + sst: optional + power_manager: on + telemetry: + prometheus: on + collectd: optional + telegraf: on + service_mesh: + enabled: on + tcpip_bypass_ebpf: on + tls_splicing: on + sgx_signer: on + wireguard: on + multus: on + firewall: optional + cndp: on + cndp_dp: on + psp: on + minio: optional + cert_manager: on + registry: on + hugepages: on + intel_ethernet_operator: + enabled: optional + flow_config: optional + ddp: optional + fw_update: optional + intel_sriov_fec_operator: optional + +on_prem: + name: on_prem + vm_mode: optional + on_vms: optional + nfd: on + kube_dashboard: on + isolcpu: optional + cpusets: optional + native_cpu_manager: on + topology_manager: on + sriov_operator: on + sriov_network_dp: optional + nic_drivers: on + sgx: on + sgx_dp: on + kmra: + pccs: on + apphsm: on + ctk_demo: on + tcs: on + tca: on + qat: on + qat_dp: on + openssl: on + tas: on + dpdk: on + bond_cni: optional + pstate: optional + cstate: optional + ufs: optional + sst: optional + power_manager: optional + telemetry: + prometheus: on + collectd: optional + telegraf: on + service_mesh: + enabled: on + tcpip_bypass_ebpf: on + tls_splicing: on + sgx_signer: on + wireguard: on + multus: on + firewall: optional + cndp: optional + cndp_dp: optional + psp: on + cert_manager: on + registry: on + hugepages: on + intel_ethernet_operator: + enabled: optional + flow_config: optional + fw_update: optional + +regional_dc: + name: regional_dc + vm_mode: optional + on_vms: optional + nfd: on + kube_dashboard: on + isolcpu: optional + cpusets: optional + topology_manager: on + sriov_operator: optional + sriov_network_dp: optional + nic_drivers: on + native_cpu_manager: on + gpu: on + gpu_dp: on + tas: on + gas: on + dpdk: optional + cstate: optional + ufs: optional + telemetry: + prometheus: on + collectd: optional + telegraf: on + service_mesh: + enabled: on + tcpip_bypass_ebpf: on + tls_splicing: on + wireguard: on + multus: on + firewall: optional + cndp: optional + cndp_dp: optional + psp: on + cert_manager: on + registry: on + hugepages: optional + intel_ethernet_operator: + enabled: optional + flow_config: optional + fw_update: optional + +remote_fp: + name: remote_fp + vm_mode: optional + on_vms: optional + nfd: on + kube_dashboard: on + isolcpu: optional + cpusets: optional + native_cpu_manager: on + topology_manager: on + sriov_operator: on + sriov_network_dp: optional + nic_drivers: on + sgx: on + sgx_dp: on + kmra: + pccs: optional + apphsm: optional + ctk_demo: optional + tcs: optional + tca: optional + qat: on + qat_dp: optional + openssl: on + tas: on + ddp: on + bond_cni: optional + network_userspace: optional + dpdk: on + pstate: on + cstate: optional + ufs: optional + sst: optional + power_manager: optional + telemetry: + prometheus: on + collectd: on + telegraf: optional + service_mesh: + enabled: optional + tcpip_bypass_ebpf: optional + tls_splicing: optional + sgx_signer: optional + wireguard: on + multus: on + firewall: optional + cndp: optional + cndp_dp: optional + psp: on + cert_manager: on + registry: on + hugepages: on + intel_ethernet_operator: + enabled: optional + flow_config: optional + ddp: optional + fw_update: optional + +storage: + name: storage + vm_mode: optional + on_vms: optional + nfd: on + kube_dashboard: on + native_cpu_manager: on + topology_manager: on + sriov_operator: on + sriov_network_dp: optional + nic_drivers: on + qat: optional + qat_dp: optional + tas: on + ddp: optional + dpdk: on + cstate: optional + ufs: optional + telemetry: + prometheus: on + collectd: optional + telegraf: on + wireguard: on + multus: on + firewall: optional + psp: on + minio: on + cert_manager: on + registry: on + hugepages: on + intel_ethernet_operator: + enabled: optional + flow_config: optional + ddp: optional + fw_update: optional + +build_your_own: + name: build_your_own + vm_mode: optional + on_vms: optional + nfd: optional + kube_dashboard: optional + isolcpu: optional + cpusets: optional + native_cpu_manager: optional + topology_manager: optional + sriov_operator: optional + sriov_network_dp: optional + nic_drivers: optional + bond_cni: optional + qat: optional + qat_dp: optional + openssl: optional + gpu: optional + gpu_dp: optional + sgx: optional + sgx_dp: optional + kmra: + pccs: optional + apphsm: optional + ctk_demo: optional + tcs: optional + tca: optional + tas: optional + gas: optional + ddp: optional + network_userspace: optional + dpdk: optional + ovs_dpdk: optional + pstate: optional + cstate: optional + ufs: optional + sst: optional + power_manager: optional + telemetry: + prometheus: optional + collectd: optional + telegraf: optional + service_mesh: + enabled: optional + tcpip_bypass_ebpf: optional + tls_splicing: optional + sgx_signer: optional + wireguard: optional + multus: optional + firewall: optional + cndp: optional + cndp_dp: optional + psp: optional + minio: optional + cert_manager: optional + registry: optional + hugepages: optional + intel_ethernet_operator: + enabled: optional + flow_config: optional + ddp: optional + fw_update: optional + intel_sriov_fec_operator: optional \ No newline at end of file diff --git a/cloud/requirements.txt b/cloud/requirements.txt new file mode 100644 index 00000000..fbca4b96 --- /dev/null +++ b/cloud/requirements.txt @@ -0,0 +1,11 @@ +click~=8.1.3 +PyYAML~=6.0 +schema~=0.7.5 +Jinja2~=3.1.2 +paramiko~=2.11.0 +scp~=0.14.4 +pycryptodome~=3.15.0 +validators~=0.20.0 +docker~=6.0.0 +boto3~=1.24.60 +GitPython~=3.1.27 \ No newline at end of file diff --git a/cloud/sw_deployment/README.md b/cloud/sw_deployment/README.md new file mode 100644 index 00000000..80e80c0d --- /dev/null +++ b/cloud/sw_deployment/README.md @@ -0,0 +1,43 @@ +# SW Deployment part + +In this folder is example configuration file (sw_deployment/configure.yaml). + +Example of configure.yaml file: +```yaml +ansible_host_ip: xxx.xxx.xxx.xxx +cloud_settings: + provider: aws + region: eu-central-1 +controller_ips: +- 127.0.0.1 +# exec_containers can be used to deploy additional containers or workloads. +# It defaults to an empty list, but can be changed as shown in the commented lines +exec_containers: [] +#exec_containers: +#- ubuntu/kafka +git_tag: None +git_url: https://@github.com/intel/container-experience-kits +github_personal_token: xxxxxxxxxxxxxxxxxxx +ra_config_file: data/node1.yaml +ra_ignore_assert_errors: true +ra_machine_architecture: skl +ra_profile: build_your_own +replicate_from_container_registry: https://registry.hub.docker.com +replicate_to_container_registry: +ssh_key: ../deployment/ssh/id_rsa +worker_ips: +- xxx.xxx.xxx.xxx +- xxx.xxx.xxx.xxx +- xxx.xxx.xxx.xxx +``` + +For proper functionality, modify `github_personal_token` with a personal GitHub token with access to the RA repository. +To deploy to a different cloud region, adjust the `region` setting using the target region. + +The following list of settings will be adjusted automatically: +- `ansible_host_ip` +- `worker_ips` +- `replicate_to_container_registry` + +After whole deployment the list of containers are deployed. Place the selected containers in the `exec_containers` list. +Furthermore, it is necessary to set the source repository in the `replicate_from_container_registry` setting. diff --git a/cloud/sw_deployment/__init__.py b/cloud/sw_deployment/__init__.py new file mode 100644 index 00000000..1f2d063c --- /dev/null +++ b/cloud/sw_deployment/__init__.py @@ -0,0 +1,2 @@ +import sys +sys.path.append('sw_deployment') diff --git a/cloud/sw_deployment/configure.yaml b/cloud/sw_deployment/configure.yaml new file mode 100644 index 00000000..9df410f4 --- /dev/null +++ b/cloud/sw_deployment/configure.yaml @@ -0,0 +1,21 @@ +ansible_host_ip: xxx.xxx.xxx.xxx +cloud_settings: + provider: aws + region: eu-central-1 +controller_ips: +- 127.0.0.1 +exec_containers: [] +git_tag: None +git_url: https://@github.com/intel/container-experience-kits +github_personal_token: xxxxxxxxxxxxxxxxxxx +ra_config_file: data/node1.yaml +ra_ignore_assert_errors: true +ra_machine_architecture: skl +ra_profile: build_your_own +replicate_from_container_registry: https://registry.hub.docker.com +replicate_to_container_registry: +ssh_key: ../deployment/ssh/id_rsa +worker_ips: +- xxx.xxx.xxx.xxx +- xxx.xxx.xxx.xxx +- xxx.xxx.xxx.xxx diff --git a/cloud/sw_deployment/data/inventory.ini.j2 b/cloud/sw_deployment/data/inventory.ini.j2 new file mode 100644 index 00000000..2b1a4b19 --- /dev/null +++ b/cloud/sw_deployment/data/inventory.ini.j2 @@ -0,0 +1,38 @@ +[all] +{% for host in hosts -%} + {{host.host_name}} ansible_host={{host.internal_ip}} ip={{host.internal_ip}} ansible_user={{host.root_user_name}} ansible_ssh_private_key_file={{host.ansible_ssh_key_path}} +{% endfor -%} +localhost ansible_connection=local ansible_python_interpreter=/usr/bin/python3 + +[vm_host] + +[kube_control_plane] +{%- for host in hosts -%} + {% if 'ra_host' in host.ansible_type %} + {{- '\n' -}} + {{- host.host_name -}} + {% endif %} +{%- endfor %} + +[etcd] +{%- for host in hosts -%} + {% if 'ra_host' in host.ansible_type %} + {{- '\n' -}} + {{- host.host_name -}} + {% endif %} +{%- endfor %} + +[kube_node] +{%- for host in hosts -%} + {% if 'ra_worker' in host.ansible_type %} + {{- '\n' -}} + {{- host.host_name -}} + {% endif %} +{%- endfor %} + +[k8s_cluster:children] +kube_control_plane +kube_node + +[all:vars] +ansible_python_interpreter=/usr/bin/python3 diff --git a/cloud/sw_deployment/data/node1.yml b/cloud/sw_deployment/data/node1.yml new file mode 100644 index 00000000..9aed37ea --- /dev/null +++ b/cloud/sw_deployment/data/node1.yml @@ -0,0 +1,259 @@ +--- +# Kubernetes node configuration +# Do not change profile_name, configured_nic and configured_arch here !!! +# To generate vars for different profile/architecture use make command +# generated for profile and arch: +profile_name: build_your_own +configured_arch: skl +configured_nic: cvl + +# Enable IOMMU (required for SR-IOV networking and QAT) +iommu_enabled: false + +# dataplane interface configuration list +dataplane_interfaces: [] +# - bus_info: "18:00.0" # pci bus info +# pf_driver: ice # PF driver, "i40e", "ice" +# ddp_profile: "ice_comms-1.3.37.0.pkg" # DDP package name to be loaded into the NIC + # For i40e(XV710-*) allowable ddp values are: "ecpri.pkg", "esp-ah.pkg", "ppp-oe-ol2tpv2.pkgo", "mplsogreudp.pkg" and "gtp.pkgo", replace as required + # For ice(E810-*) allowable ddp values are: ice_comms-1.3.[17,20,22,24,28,30,31,35].0.pkg such as "ice_comms-1.3.37.0.pkg", replace as required + # ddp_profile must be defined for first port of each network device. bifurcated cards will appear as unique devices. + +# flow_configuration: false # Flow Configuration # NOTE: this option is for Intel E810 Series NICs and requires Intel Ethernet Operator and Flow Config to be enabled in group vars. + # with Flow Configuration enabled the first VF (VF0) will be reserved for Flow Configuration and the rest of VFs will be indexed starting from 1. + +# default_vf_driver: "iavf" # default driver to be used with VFs if specific driver is not defined in the "sriov_vfs" section +# sriov_numvfs: 6 # total number of VFs to create including VFs listed in the "sriov_vfs" section. + # If total number of VFs listed in the "sriov_vfs" section is greater than "sriov_numvfs" then excessive entities will be ignored. + # VF's name should follow scheme: _ + # If index in the VF's name is greater than "sriov_numfs - 1" such VF will be ignored. +# minio_vf: true + +# sriov_vfs: # list of VFs to create on this PF with specific driver +# vf_00: "vfio-pci" # VF driver to be attached to this VF under this PF. Options: "iavf", "vfio-pci", "igb_uio" +# vf_05: "vfio-pci" + +# - bus_info: "18:00.1" +# pf_driver: ice +# ddp_profile: "ice_comms-1.3.37.0.pkg" +# default_vf_driver: "vfio-pci" +# flow_configuration: false + +# sriov_numvfs: 4 +# minio_vf: true + +# sriov_vfs: {} # no VFs with specific driver on this PF or "sriov_vfs" can be omitted for convenience + +# Set to 'true' to update i40e, ice and iavf kernel modules +update_nic_drivers: false +#i40e_driver_version: "2.20.12" # Downgrading i40e drivers is not recommended due to the possible consequences. Users should update and proceed at their own risk. +#i40e_driver_checksum: "sha1:a24f0c5512af31c68cd90667d5822121780d5487" # update checksum per required i40e drivers version +#ice_driver_version: "1.9.11" # Downgrading ice drivers is not recommended due to the possible consequences. Users should update and proceed at their own risk. +#ice_driver_checksum: "sha1:f05e2322a66de5d4019e7aa6141a109bb419dda4" # update checksum per required ice drivers version +#iavf_driver_version: "4.5.3" # Downgrading iavf drivers is not recommended due to the possible consequences. Users should update and proceed at their own risk. +#iavf_driver_checksum: "sha1:76b3a7dec392e559dea6112fa55f5614857cff2a" # update checksum per required iavf drivers version + +# Set 'true' to upgrade / downgrade NIC firmware. FW upgrade / downgrade will be executed on all NICs listed in "dataplane_interfaces[*].bus_info". +update_nic_firmware: false # Note: downgrading FW is not recommended, users should proceed at their own risk. +#nvmupdate: [] # remove '[]' in case of downgrading FW such as 'nvmupdate:' +# ice: [] # remove '[]' in case of downgrading FW to get required version of NVM 'ICE' 800 Series such as 'ice:' +# nvmupdate_pkg_url: "https://downloadmirror.intel.com/738715/E810_NVMUpdatePackage_v4_00_Linux.tar.gz" +# nvmupdate_pkg_checksum: "sha1:7C168880082653B579FDF225A2E6E9301C154DD1" +# required_fw_version: "4.0" +# # https://builders.intel.com/docs/networkbuilders/intel-ethernet-controller-800-series-device-personalization-ddp-for-telecommunications-workloads-technology-guide.pdf +# # document above does not specify any min fw version needed for ddp feature. So, min_ddp_loadable_fw is the same as min_updatable_fw +# min_ddp_loadable_fw_version: "0.70" +# min_updatable_fw_version: "0.70" + # when downgrading only, the recommended below version is required to download the supported NVMupdate64E tool. Users should replace the tool at their own risk. +# supported_nvmupdate_tool_pkg_url: "https://downloadmirror.intel.com/738715/E810_NVMUpdatePackage_v4_00_Linux.tar.gz" +# supported_nvmupdate_tool_pkg_checksum: "sha1:7C168880082653B579FDF225A2E6E9301C154DD1" +# supported_nvmupdate_tool_fw_version: "4.0" + +# install Intel x700 & x800 series NICs DDP packages +install_ddp_packages: false +# If following error appears: "Flashing failed: Operation not permitted" +# run deployment with update_nic_firmware: true +# or +# Disable ddp installation via install_ddp_packages: false + +# set 'true' to enable custom ddp package to be loaded after reboot +enable_ice_systemd_service: false + +sriov_cni_enabled: false + +# Custom SriovNetworkNodePolicy manifests local path +# custom_sriov_network_policies_dir: /tmp/sriov +# Bond CNI +bond_cni_enabled: false + +# Install DPDK (required for SR-IOV networking) +install_dpdk: false +# DPDK version (will be in action if install_dpdk: true) +dpdk_version: "22.07" +# Custom DPDK patches local path +#dpdk_local_patches_dir: "/tmp/patches/dpdk" +# It might be necessary to adjust the patch strip parameter, update as required. +#dpdk_local_patches_strip: 0 + +# Userspace networking +userspace_cni_enabled: false +ovs_dpdk_enabled: false # Should be enabled with Userspace CNI, when VPP is set to "false"; 1G hugepages required +ovs_version: "v2.17.2" +# CPU mask for OVS-DPDK PMD threads +ovs_dpdk_lcore_mask: 0x1 +# Huge memory pages allocated by OVS-DPDK per NUMA node in megabytes +# example 1: "256,512" will allocate 256MB from node 0 and 512MB from node 1 +# example 2: "1024" will allocate 1GB from node 0 on a single socket board, e.g. in a VM +ovs_dpdk_socket_mem: "256,0" +vpp_enabled: false # Should be enabled with Userspace CNI, when ovs_dpdk is set to "false"; 2M hugepages required + +# Enables hugepages support +hugepages_enabled: false +# Hugepage sizes available: 2M, 1G +default_hugepage_size: 1G +# Sets how many hugepages should be created +number_of_hugepages_1G: 4 +number_of_hugepages_2M: 1024 + +# Intel Ethernet Operator for Intel E810 Series network interface cards +intel_ethernet_operator: + ddp_update: false # perform DDP update on PFs listed in dataplane_interfaces using selected DDP profile + fw_update: false # perform firmware update on PFs listed in dataplane_interfaces + # NodeFlowConfig manifests local path + # For more information refer to: + # https://github.com/intel/intel-ethernet-operator/blob/main/docs/flowconfig-daemon/creating-rules.md + # node_flow_config_dir: /tmp/node_flow_config + +# Wireless FEC H/W Accelerator Device (e.g. ACC100) PCI ID +fec_acc: "0000:27:00.0" # must be string in [a-fA-F0-9]{4}:[a-fA-F0-9]{2}:[01][a-fA-F0-9].[0-7] format + +# Intel FlexRAN +intel_flexran_enabled: false # if true, deploy FlexRAN + +# Enabling this feature will install QAT drivers + services +update_qat_drivers: false + +# qat interface configuration list +qat_devices: [] +# - qat_id: "0000:ab:00.0" # QAT device id one using DPDK compatible driver for VF devices to be used by vfio-pci kernel driver, replace as required +# qat_sriov_numvfs: 12 # Number of VFs per PF to create - cannot exceed the maximum number of VFs available for the device. Set to 0 to not create any VFs. + # Note: Currently when trying to create fewer virtual functions than the maximum, the maximum number always gets created +# - qat_id: "0000:xy:00.0" +# qat_sriov_numvfs: 10 + +# - qat_id: "0000:yz:00.0" +# qat_sriov_numvfs: 10 + +# Install and configure OpenSSL cryptography +openssl_install: false # This requires update_qat_drivers set to 'true' in host vars + +# CPU isolation from Linux scheduler +isolcpus_enabled: false +isolcpus: "4-11" + +# CPU shielding +cpusets_enabled: false +cpusets: "4-11" + +# Native CPU Manager (Kubernetes built-in) +# These settings are relevant only if in group_vars native_cpu_manager_enabled: true +# Amount of CPU cores that will be reserved for the housekeeping (2000m = 2000 millicores = 2 cores) +native_cpu_manager_system_reserved_cpus: 2000m +# Amount of CPU cores that will be reserved for Kubelet +native_cpu_manager_kube_reserved_cpus: 1000m +# Explicit list of the CPUs reserved for the host level system threads and Kubernetes related threads +#native_cpu_manager_reserved_cpus: "0,1,2" +# Note: All remaining unreserved CPU cores will be consumed by the workloads. + +cstate_enabled: false +cstates: + C1: # default values: C6 for access, C1 for other profiles + cpu_range: '0-9' # change as needed, cpus to modify cstates on + enable: true # true - enable given cstate, false - disable given cstate + +ufs_enabled: false +ufs: # uncore frequency scaling + min: 1000 # minimal uncore frequency + max: 2000 # maximal uncore frequency + +# Intel Speed Select Base-Frequency configuration. + +# Intel custom GPU kernel - this is required to be true in order to +# deploy Intel GPU Device Plugin on that node +configure_gpu: false + +# Telemetry configuration +# intel_pmu plugin collects information provided by Linux perf interface. +enable_intel_pmu_plugin: false + +# CPU Threads to be monitored by Intel PMU Plugin. +# If the field is empty, all available cores will be monitored. +# Please refer to https://collectd.org/wiki/index.php/Plugin:Intel_PMU for configuration details. +intel_pmu_plugin_monitored_cores: "" + +# CPU Threads to be monitored by Intel RDT Plugin. +# If the field is empty, all available cores will be monitored. +# Please refer to https://collectd.org/wiki/index.php/Plugin:IntelRDT for configuration details. +intel_rdt_plugin_monitored_cores: "" + +# Additional list of plugins that will be excluded from collectd deployment. +exclude_collectd_plugins: [] + +# Intel Cloud Native Data Plane. +cndp_enabled: false +cndp_dp_pools: + - name: "e2e" + drivers: "{{ dataplane_interfaces | map(attribute='pf_driver') | list | unique }}" # List of NIC driver to be included in CNDP device plugin ConfigMap. + + +# MinIO storage configuration +minio_pv: [] +# - name: "mnt-data-1" # PV identifier will be used for PVs names followed by node name(e.g., mnt-data-1-hostname) +# storageClassName: "local-storage" # Storage class name to match with PVC +# accessMode: "ReadWriteOnce" # Access mode when mounting a volume, e.g., ReadWriteOnce/ReadOnlyMany/ReadWriteMany/ReadWriteOncePod +# persistentVolumeReclaimPolicy: "Retain" # Reclaim policy when a volume is released once it's bound, e.g., Retain/Recycle/Delete +# capacity: 1GiB # Size of the PV. support only GiB/TiB +# mountPath: /mnt/data0 # Mount path of a volume +# device: /dev/nvme0n1 # Target storage device name when creating a volume. + # When group_vars: minio_deploy_test_mode == true, use a file as a loop device for storage + # otherwise, an actual NVME or SSD device for storage on the device name. + +# - name: "mnt-data-2" +# storageClassName: "local-storage" +# accessMode: "ReadWriteOnce" +# persistentVolumeReclaimPolicy: "Retain" +# capacity: 1GiB +# mountPath: /mnt/data1 +# device: /dev/nvme1n1 + +# - name: "mnt-data-3" +# storageClassName: "local-storage" +# accessMode: "ReadWriteOnce" +# persistentVolumeReclaimPolicy: "Retain" +# capacity: 1GiB +# mountPath: /mnt/data2 +# device: /dev/nvme2n1 + +# - name: "mnt-data-4" +# storageClassName: "local-storage" +# accessMode: "ReadWriteOnce" +# persistentVolumeReclaimPolicy: "Retain" +# capacity: 1GiB +# mountPath: /mnt/data3 +# device: /dev/nvme3n1 + +# - name: "mnt-data-5" +# storageClassName: "local-storage" +# accessMode: "ReadWriteOnce" +# persistentVolumeReclaimPolicy: "Retain" +# capacity: 1GiB +# mountPath: /mnt/data4 +# device: /dev/nvme4n1 + +# - name: "mnt-data-6" +# storageClassName: "local-storage" +# accessMode: "ReadWriteOnce" +# persistentVolumeReclaimPolicy: "Retain" +# capacity: 1GiB +# mountPath: /mnt/data5 +# device: /dev/nvme5n1 \ No newline at end of file diff --git a/cloud/sw_deployment/docker_management.py b/cloud/sw_deployment/docker_management.py new file mode 100644 index 00000000..59f73466 --- /dev/null +++ b/cloud/sw_deployment/docker_management.py @@ -0,0 +1,199 @@ +"""Class for Docker images management""" +from dis import show_code +import os +from pathlib import Path +import configparser +import base64 +import validators +import click +import docker +import boto3 + + +class DockerManagement: + """ + Class contains methods for copy docker images between registries. + """ + docker_client = None + CLOUD = None + to_registry = None + from_registry = None + show_log = False + images_to_replicate = None + tagged_images = [] + + AWS_ACCESS_KEY_ID = None + AWS_ACCESS_SECRET_KEY = None + AWS_REGION = None + ECR_PASSWORD = None + ECR_USERNAME = 'AWS' + ECR_URL = None + + def __init__(self, from_registry, to_registry, images_to_replicate, region, cloud=None, show_log=False): + """ + Init method for class. + + Parameters: + from_registry (string): URL adress of source registry + to_registry (string): URL address of target registry + images_to_duplicate (list): List of images to copy between registries + cloud (string): [Not required] Type of cloud with targer registry. Currently supported: ['aws'] + show_log (bool): [Not required] Show log of push image + + Return: + None + + """ + self.docker_client = docker.from_env() + self.CLOUD = cloud + self.AWS_REGION = region + self.show_log = show_log + self.to_registry = to_registry + self.images_to_replicate = images_to_replicate + + if not validators.url(from_registry): + click.secho('The source registry does not have a valid URL!', fg='red') + return + else: + self.from_registry = from_registry.replace('https://', '') + + self.images_to_replicate = images_to_replicate + click.echo(f"Images to replicate: {self.images_to_replicate}") + + if self.CLOUD == "aws": + self.initialize_ecr() + + def copy_images(self): + """ + Copy images between registries. + In case of set cloud, method using global variables with cloud credentials. + + Parameters: + None + + Return: + None + + """ + for image in self.images_to_replicate: + self.pull_image(registry_url=self.from_registry, + image_name=image) + new_image = self.tag_image(image_name=image, + registry_old= self.from_registry, + registry_new=self.to_registry) + self.tagged_images.append(new_image) + if self.CLOUD == 'aws': + self.push_image(image=new_image['repository'], + tag=new_image['tag'], + registry=self.ECR_URL, + username=self.ECR_USERNAME, + password=self.ECR_PASSWORD) + else: + self.push_image(image=new_image['repository'], + tag=new_image['tag'], + registry=self.to_registry) + + def initialize_ecr(self): + """ + Initializing ECR and getting AWS credentials for autentification in ECR. + Method using local AWS credentials and config files for autentification. + Method set the global variables used in previous method. + + Parameters: + None + + Return: + None + + """ + aws_credentials = os.path.join(Path.home(), '.aws', 'credentials') + config = configparser.RawConfigParser() + try: + config.read(aws_credentials) + credentials = config['default'] + self.AWS_ACCESS_KEY_ID = credentials['aws_access_key_id'] + self.AWS_ACCESS_SECRET_KEY = credentials['aws_secret_access_key'] + except configparser.ParsingError as parser_error: + click.secho(parser_error, fg='red') + + aws_session = boto3.Session(region_name=self.AWS_REGION) + ecr_client = aws_session.client('ecr', aws_access_key_id=self.AWS_ACCESS_KEY_ID, + aws_secret_access_key=self.AWS_ACCESS_SECRET_KEY, + region_name=self.AWS_REGION) + + ecr_credentials = (ecr_client.get_authorization_token()['authorizationData'][0]) + self.ECR_PASSWORD = (base64.b64decode(ecr_credentials['authorizationToken']) + .replace(b'AWS:', b'').decode('utf-8')) + self.ECR_URL = self.to_registry + + def pull_image(self, registry_url, image_name, username=None, password=None): + """ + Downloading image from remote to local registry. + + Parametes: + registry_url (string): URL adress of source registry + image_name (string): Name of downloaded image + username (string): User name for source registry + password (string): Password for source registry + + Return: + None + + """ + if not (username is None and password is None): + self.docker_client.login(username=username, + password=password, + registry=registry_url) + output = self.docker_client.images.pull(f"{registry_url}/{image_name}") + click.echo(output) + else: + output = self.docker_client.images.pull(f"{registry_url}/{image_name}") + click.echo(output) + + def tag_image(self, image_name, registry_old, registry_new): + """ + Tagging image with new registry. + + Parameters: + image_name (string): Name of image + registry_old (string): URL address of source registry + registry_new (string): URL address of target registry + + Return: + string:Name of tagged image + + """ + image = self.docker_client.images.get(f"{registry_old}/{image_name}") + if self.CLOUD == 'aws': + target_image = registry_new + tag = image_name.replace('/', '-').replace(':', '-') + else: + target_image = f"{registry_new}/{image_name}" + tag = 'latest' + result = image.tag(target_image, tag) + if result: + return {'repository': target_image, 'tag': tag} + + def push_image(self, image, tag, registry=None, username=None, password=None): + """ + Pushing image to target registry. + + Parameters: + image (string): Name of the image + registry (string): URL address of target registry + username (string): User name for target registry + password (string): Password for target registry + + Return: + None + + """ + click.echo("Pushing image:") + if registry is not None and username is not None and password is not None: + self.docker_client.login(username=username, + password=password, + registry=registry) + auth_config = {'username': username, 'password': password} + push_log = self.docker_client.images.push(image, tag=tag, auth_config=auth_config) + if not self.show_log: + click.echo(push_log) diff --git a/cloud/sw_deployment/git_clone.py b/cloud/sw_deployment/git_clone.py new file mode 100644 index 00000000..1dc4b758 --- /dev/null +++ b/cloud/sw_deployment/git_clone.py @@ -0,0 +1,77 @@ +import logging +import git +import click +from git import RemoteProgress + + +class CloneProgress(RemoteProgress): + """ + Class for logging the output from Git clone. + """ + def update(self, op_code, cur_count, max_count=None, message=''): + """ + Takes output from Git clone and wrote it to console. + + op_code (-): Not used + cur_count (-): Not used + max_count (-): Not used + message (string): String with information to print + + Return: + None + + """ + if message: + click.echo(message) + + +def clone_repository(clone_dir, repo_url, branch, token): + """ + Cloning a Git repository with the ability to specify a specific branch. + + Parameters: + clone_dir (string): Path where to clone git repository + repo_url (string): URL address of git repository + branch (string): Name of the branch to checkout + token (string): Personal access token for non public repository + + Return: + None + + """ + if (token is not None) and ("@" in repo_url): + repo_url = repo_url.replace("", token) + else: + repo_url = repo_url.replace("@", "") + click.echo('GIT clone:') + click.echo(f"Repository URL: {repo_url}") + click.echo(f"Repository local path: {clone_dir}") + click.echo(f"Selected repository branch: {branch}") + if branch is not None: + return git.Repo.clone_from(url=repo_url, + to_path=clone_dir, + single_branch=True, + branch=branch, + progress=CloneProgress()) + else: + return git.Repo.clone_from(url=repo_url, + to_path=clone_dir, + single_branch=True, + progress=CloneProgress()) + +def switch_repository_to_tag(repo, tag): + """ + Changing the repository version to specific tag. + + Parameters: + repo (git obj): PythonGit obj of clonned repository + tag (string): Name of tag to checkout + + Return: + None + + """ + for list_tag in repo.tags: + if tag in str(list_tag): + repo.git.checkout(tag) + click.echo("Cloned repository was checkout to tag %s", tag) \ No newline at end of file diff --git a/cloud/sw_deployment/ssh_connector.py b/cloud/sw_deployment/ssh_connector.py new file mode 100644 index 00000000..d9238f95 --- /dev/null +++ b/cloud/sw_deployment/ssh_connector.py @@ -0,0 +1,150 @@ +"""Class for SSH connection""" +from http import client +from logging import exception +import os +import io +import subprocess +import click +from paramiko import SSHClient, SSHConfig, ProxyCommand, AutoAddPolicy, SSHException +from scp import SCPClient, SCPException + + +class SSHConnector: + """ + SSHConnector class for managing SSH connections to remote instances. + Class supports proxy jump connection for cloud instances without public access. + """ + + def __init__(self, ip_address, username, port=22, priv_key=None, gateway=None): + """ + Initialize the class and connect to the client. + The method supports gateway proxy hopping. + The gateway uses an already open SSH connection using the same SSHConnector class. + + Parameters: + ip_address (string): IP address of the remote instance + username (string): User name for autentication in remote instance + port (int): SSH port + priv_key (string): Path to private RSA key for autentication in remote instance + gateway (SSHConnector obj): [optional] SSHConnector object with active SSH connection + to gateway for create proxy jump + + Rerurn: + None + + """ + self.client = SSHClient() + self.client.set_missing_host_key_policy(AutoAddPolicy()) + + sock = None + if gateway: + dest_addr = (ip_address, port) + local_addr = ('127.0.0.1', 1234) + sock = gateway.get_transport().open_channel( + 'direct-tcpip', dest_addr, local_addr + ) + + cfg = { + 'hostname': ip_address, + 'port': port, + 'timeout': 200, + 'banner_timeout': 15, + 'key_filename': priv_key, + 'username': username, + 'sock': sock + } + + if os.path.exists(os.path.expanduser("~/.ssh/config")): + ssh_config = SSHConfig() + user_config_file = os.path.expanduser("~/.ssh/config") + with io.open(user_config_file, 'rt', encoding='utf-8') as conf_file: + ssh_config.parse(conf_file) + + host_conf = ssh_config.lookup(ip_address) + if host_conf: + if ('proxycommand' in host_conf) and (gateway is None): + cfg['sock'] = ProxyCommand(host_conf['proxycommand']) + if 'user' in host_conf: + cfg['username'] = host_conf['user'] + if 'identityfile' in host_conf: + cfg['key_filename'] = host_conf['identityfile'] + if 'hostname' in host_conf: + cfg['hostname'] = host_conf['hostname'] + + try: + self.client.connect(**cfg) + except SSHException as ssh_excep: + click.echo("Cannot connect to instance via SSH", err=True) + click.echo(f"Error message: {ssh_excep}", err=True) + + def exec_command(self, command, print_output=False): + """ + Executes command on connected client. + + Parameters: + command (string): Command to execute on remote instance + print_output (bool): To print output to console + + Return: + string:Command output + + """ + stdin = None + stdout = None + stderr = None + try: + stdin, stdout, stderr = self.client.exec_command(command) + except SSHException: + click.echo(f"During command: {stdin}") + click.echo(f"Error ocured: {stderr}") + if print_output: + for line in iter(lambda: stdout.readline(2048), ""): + click.echo(line, nl=False) + return stdout.read().decode('ascii').strip('\n') + + def progress(self, filename, size, sent): + """ + Define progress callback that prints the current + percentage completed for the file + + Parameters: + filename (string): Name of the uploaded file + size (int): Size of the file + sent (int): Count of already sent bytes + + Return: + None + + """ + with click.progressbar(length=100, + label=f"Uploading {filename} progress") as prog_bar: + prog_bar.update(float(sent)/float(size)*100) + + def copy_file(self, file_path, destination_path): + """ + For upload file to remote client via SCP protocol. + + Parameters: + file_path (string): Path to file to upload + destination_path (string): Path where to upload file + + Return: + None + + """ + scp = SCPClient(self.client.get_transport(), progress=self.progress) + try: + scp.put(file_path, destination_path) + except SCPException as error: + click.print(f"Error during uploading host_var file: {error}", err=True) + scp.close() + + def close_connection(self): + """ + Close SSH connection. + + Return: + None + + """ + self.client.close() diff --git a/cloud/sw_deployment/sw_deployment_tool.py b/cloud/sw_deployment/sw_deployment_tool.py new file mode 100644 index 00000000..cc3bd1d4 --- /dev/null +++ b/cloud/sw_deployment/sw_deployment_tool.py @@ -0,0 +1,652 @@ +"""Script for deploying Reference Architecture (RA) on Cloud solutions""" +import os +import tarfile +import shutil +import click +import yaml +import jinja2 +import pathlib +import sys +from ssh_connector import SSHConnector +from git_clone import clone_repository, switch_repository_to_tag +from docker_management import DockerManagement + +configuration = { + 'cloud_settings': { + 'provider': None, + 'region': None + }, + 'ansible_host_ip': None, + 'controller_ips': [], + 'worker_ips': [], + 'ssh_key': None, + 'git_url': None, + 'git_tag': None, + 'github_personal_token': None, + 'git_branch': None, + 'ra_config_file': None, + 'ra_profile': None, + 'ra_machine_architecture': None, + 'ra_ignore_assert_errors': None, + 'replicate_from_container_registry': None, + 'replicate_to_container_registry': None, + 'exec_containers': [], + 'skip_git_clonning': True +} + +ROOT_DIR = pathlib.Path(__file__).absolute().parent.resolve() +DATA_DIR = os.path.join(ROOT_DIR, "data") +RA_CLONED_REPO = os.path.join(DATA_DIR, + "container-experience-kits") +RA_REMOTE_PATH = "/home/ubuntu/container-experience-kits" +INVENTORY_FILE=os.path.join(DATA_DIR, "inventory.ini") + +TAR_NAME = "git_repo.tar.gz" +TAR_PATH = os.path.join(DATA_DIR, TAR_NAME) + +DISCOVERY_TOOL_PATH = "~/cwdf_deployment/discovery/discover.py" + +DEFAULT_CONFIG=os.path.join(ROOT_DIR, '../deployment/sw.yaml') + +nodes_list = [] + +@click.command() +@click.option('-p','--provider', + type=click.Choice(['aws', 'azure', 'gcp', 'ali', 'tencent']), + help='Select cloud provider where RA will be deploy. [aws, azure, gcp, alibaba, tencent])') +@click.option('--ansible-host-ip', help='IP address of instance where Ansible will be running') +@click.option('--controller-ips', help='Array of K8s controller IPs') +@click.option('--worker-ips', help='Array of K8s worker IPs') +@click.option('--ssh-key', help='SSH key for accessing the cloud instances') +@click.option('--git-url', help='The URL address of the Git project that will be cloned into the Cloud instance') +@click.option('--ra-config-file', help='Configuration file with') +@click.option('--ra-profile', + type=click.Choice(['access', 'basic', 'full_nfv', 'on_prem', 'regional_dc', 'remote_fp', 'storage', 'build_your_own']), + help='Selection of RA profile. At the moment, ' + 'Container Experience Kits supports the following profiles: ' + 'access, basic, full_nfv, on_prem, regional_dc, remote_fp, ' + 'storage, build_your_own') +@click.option('--ra-machine-architecture', + type=click.Choice(['spr', 'icx', 'clx', 'skl']), + help='CPU architecture of cloud instance. Supported architectures are: ' + 'spr - Sapphire Rapids - 4th Generation Intel(R) Xeon(R) Scalable Processor' + 'icx - IceLake (default) - 3rd Generation Intel(R) Xeon(R) Scalable Processor' + 'clx - CascadeLake - 2nd Generation Intel(R) Xeon(R) Scalable Processor' + 'skl - SkyLake - 1st Generation Intel(R) Xeon(R) Scalable Processor') +@click.option('--github-personal-token', help='Git token with permission to clone selected repository') +@click.option('--git-tag', help='Clone Git repository with specified tag') +@click.option('--git-branch', default='master', help='Clone Git repository with specified branch') +@click.option('--ra-ignore-assert-errors', default=False, help='Ignore assert errors in RA deployment') +@click.option('-c', '--config', + type=click.Path(dir_okay=False), + default=DEFAULT_CONFIG, + help="Path to configuration file in yaml format.") +@click.option('--skip-git-clonning', is_flag=True, default=True, help='Skip clonning repository from Git and use already clonned directory.') +@click.option('--replicate-from-container-registry', help='URL address of source docker registry') +@click.option('--replicate-to-container-registry', help='URL address of target docker registry') +@click.option('--exec-containers', multiple=True, help='List of containers to be executed') +def main(provider, ansible_host_ip, controller_ips, + worker_ips, ssh_key, git_url, ra_config_file, + ra_profile, ra_machine_architecture, github_personal_token, + git_tag, git_branch, ra_ignore_assert_errors, config, + skip_git_clonning, replicate_from_container_registry, + replicate_to_container_registry, exec_containers): + """ + Main function for configuring whole cluster and deploy benchmark pods. + + Parameters: + provider (string): Cloud provider where RA will be deploy. [aws, azure, gcp, alibaba, tencent] + ansible_host_ip (string): IP address of Ansible host + controller_ips (list): List of Ansible controller instances + worker_ips (list): List of Ansible worker instances + ssh_key (string): Path to private SSH key + git_url (string): URL of Git repository to be clonned to Ansible instance + ra_config_file (string): Path to configuration file for RA + ra_profile (string): Selection of RA profile + ra_machine_architecture (string): CPU architecture of cloud instance + github_personal_token (string): Personal GitHub token for clonning private repositories + git_tag (string): To clone Git repository with specified tag + git_branch (string): Clone Git repository with specified branch + ra_ignore_assert_errors (bool): Ignore assert errors in RA deployment + config (string): Path to configuration file for sw_deployment_tool + skip_git_clonning (bool): Skip clonning repository from Git and use already clonned directory + replicate_from_container_registry (string): URL address of source docker registry + replicate_to_container_registry (string): URL address of target docker registry + exec_containers (list): List of containers to be executed + + Return: + None + + """ + arguments = locals() + for argument in arguments: + if arguments[argument] is not None: + configuration[argument] = arguments[argument] + + start_deploy(config=config) + + + +def start_deploy(config): + """ + Start deploying SW deployment. + + Parameters: + config (string): Path to configuration file. + + Return: + None + + """ + if os.path.exists(config): + configuration = _parse_configuration_file(config) + if configuration['skip_git_clonning']: + if os.path.exists(RA_CLONED_REPO): + shutil.rmtree(RA_CLONED_REPO) + repository = clone_repository(clone_dir=RA_CLONED_REPO, + repo_url=configuration['git_url'], + token=configuration['github_personal_token'], + branch=configuration['git_branch']) + + if configuration['git_tag'] is not None: + switch_repository_to_tag(repo=repository, tag=configuration['git_tag']) + + _tar_repository(output_filename=TAR_PATH, source_dir=RA_CLONED_REPO) + + _deploy(provider=configuration['cloud_settings']['provider'], + ansible_host_ip=configuration['ansible_host_ip'], + ssh_key=configuration['ssh_key']) + + +def _tar_repository(output_filename, source_dir): + ''' + Making tar.gz file that contains clonned repository. + Creating a tar file is more convenient for submitting a + cloned repository to a cloud instance. + + Parameters: + output_filename (string): Name of the tar.gz file + source_dir (string): The path to the folder to be packed + + Return: + None + + ''' + if os.path.exists(output_filename): + os.remove(output_filename) + with tarfile.open(output_filename, "w:gz") as tar: + tar.add(source_dir, arcname=os.path.basename(source_dir)) + +def _parse_configuration_file(config): + ''' + Get configuration from configuration.yaml + If some of the parameters are set through cli, + this settings have higher priority. + + Parameters: + config (string): Path to the configuration file + + Return: + dict:Configuration dictionary + + ''' + if not os.path.exists(config): + return None + with open(config, 'r', encoding="UTF-8") as stream: + try: + file_configuration = yaml.safe_load(stream) + except yaml.YAMLError as error: + click.echo(error) + if file_configuration is not None: + for item in file_configuration: + if file_configuration[item] is not None: + configuration[item] = file_configuration[item] + return configuration + +def _remove_ssh_banner(ssh_client, node_ips_array, user): + ''' + Remove SSH for enabling root login via SSH. + Using root is necessary for Ansible playbooks. + + Parameters: + ssh_client (SSHConnector obj): SSHConnector object with active connection + node_ips_array (list): List of node IPs + user (string): Regular remote user with enabled login + + Return: + None + + ''' + for node_ip in node_ips_array: + ssh_client.exec_command(f"ssh-keyscan -H {node_ip} >> /home/ubuntu/.ssh/known_hosts") + if node_ip != "127.0.0.1": + click.echo(f"{node_ip}, {user}") + ssh_node = SSHConnector(node_ip, user, 22, configuration['ssh_key'], ssh_client.client) + ssh_node.exec_command('sudo rm /root/.ssh/authorized_keys') + ssh_node.exec_command(f"sudo cp /home/{user}/.ssh/authorized_keys /root/.ssh/") + ssh_node.close_connection() + else: + ssh_client.exec_command('sudo rm /root/.ssh/authorized_keys') + ssh_client.exec_command(f"sudo cp /home/{user}/.ssh/authorized_keys /root/.ssh/") + +def _install_dependencies_on_nodes(ssh_client, node_ips_array): + ''' + Installing lspci and golang as RA dependencies. + + Parameters: + ssh_client (SSHConnector obj): SSHConnector object with active connection + node_ips_array (list): List of node IPs + + Return: + None + + ''' + for node_ip in node_ips_array: + if node_ip != "127.0.0.1": + ssh_node = SSHConnector(ip_address=node_ip, + username="root", + port=22, + priv_key=configuration['ssh_key'], + gateway=ssh_client.client) + ssh_node.exec_command('yum makecache && yum -y install pciutils.x86_64 golang', + print_output=True) + ssh_node.close_connection() + else: + ssh_client.exec_command('yum makecache && yum -y install pciutils.x86_64 golang', + print_output=True) + +def _discovery_nodes(ssh_client, root_user, node_ips, node_type): + ''' + Creating array with information of Ansible nodes. + + Parameters: + ssh_client (SSHConnector obj): SSHConnector object with active connection + node_type (string): Ansible node type supported are: ['ra_host', 'ra_worker'] + + Return: + None + + ''' + for node_ip in node_ips: + ssh_client.exec_command(f"ssh-keyscan -H {node_ip} >> /home/ubuntu/.ssh/known_hosts") + if node_ip != "127.0.0.1": + ssh_node = SSHConnector(node_ip, root_user, 22, configuration['ssh_key'], ssh_client.client) + node_hostname = ssh_node.exec_command('sudo cat /etc/hostname') + ssh_node.close_connection() + else: + node_hostname = ssh_client.exec_command('sudo cat /etc/hostname') + + node = { + "host_name": node_hostname, + "internal_ip": node_ip, + "root_user_name": root_user, + "ansible_ssh_key_path": "/home/ubuntu/cwdf_deployment/ssh/id_rsa", + "ansible_type": node_type + } + + nodes_list.append(node) + +def _create_inventory_file(ssh_client, nodes): + """ + Creating inventory file for RA Ansible with information + of all ansible nodes. + + Parameters: + ssh_client (SSHConnector obj): SSHConnector object with active connection + nodes (list): List of node IPs + + Return: + None + + """ + template_loader = jinja2.FileSystemLoader(searchpath=DATA_DIR) + environment = jinja2.Environment(loader=template_loader) + template = environment.get_template("inventory.ini.j2") + with open(INVENTORY_FILE, mode="w", encoding="utf-8") as inventory: + inventory.write(template.render(hosts=nodes)) + + ssh_client.copy_file(file_path=INVENTORY_FILE, destination_path=RA_REMOTE_PATH) + os.remove(INVENTORY_FILE) + + +def _create_host_var_files(ssh_client, hosts): + """ + Creating HostVar file for every cloud instance. + + Parameters: + ssh_client (SSHConnector obj): SSHConnector object with active connection + hosts (list): List of host dictionaries created in _discovery_nodes function + + Return: + None + + """ + for host in hosts: + if host['internal_ip'] != '127.0.0.1': + ssh_worker = SSHConnector(ip_address=host['internal_ip'], + username=host['root_user_name'], + port=22, + priv_key=configuration['ssh_key'], + gateway=ssh_client.client) + ethernet_devices = ssh_worker.exec_command("lspci | grep 'ther' | awk '{print $1}", ) + ssh_worker.close_connection() + else: + ethernet_devices = ssh_client.exec_command("lspci | grep 'ther' | awk '{print $1}", ) + + with open(file=os.path.join(DATA_DIR, "node1.yml"), encoding="utf-8") as file: + list_doc = yaml.safe_load(file) + + list_doc["profile_name"] = configuration['ra_profile'] + list_doc["configured_arch"] = configuration['ra_machine_architecture'] + + dataplane_interfaces = [] + + for address in ethernet_devices.split('\n'): + dataplane_interface = { + 'bus_info': address, + 'default_vf_driver': 'ena', + 'name': 'eth0', + 'pf_driver': 'ena', + 'sriov_numvfs': 6, + 'sriov_vfs': { + 'vf_00': 'vfio-pci', + 'vf_05': 'vfio-pci' + }} + dataplane_interfaces.append(dataplane_interface) + + list_doc["dataplane_interfaces"] = dataplane_interfaces + + if configuration['ra_profile'] == "access": + list_doc = _set_access_profile(list_doc) + if configuration['ra_profile'] == "basic": + list_doc = _set_basic_profile(list_doc) + if configuration['ra_profile'] == "full_nfv": + list_doc = _set_full_nfv_profile(list_doc) + if configuration['ra_profile'] == "on_prem": + list_doc = _set_on_prem_profile(list_doc) + if configuration['ra_profile'] == "regional_dc": + list_doc = _set_regional_dc_profile(list_doc) + if configuration['ra_profile'] == "remote_fp": + list_doc = _set_remote_fp_profile(list_doc) + if configuration['ra_profile'] == "storage": + list_doc = _set_storage_profile(list_doc) + + with open(file=os.path.join(DATA_DIR, f"{host['host_name']}.yml"), + mode="w", + encoding="utf-8") as file: + yaml.dump(list_doc, file) + + ssh_client.copy_file(file_path=os.path.join(DATA_DIR, f"{host['host_name']}.yml"), + destination_path=f"{RA_REMOTE_PATH}/host_vars/{host['host_name']}.yml") + os.remove(os.path.join(DATA_DIR, f"{host['host_name']}.yml")) + +def _set_access_profile(settings): + """ + Function for additional settings for RA access profile. + + Parameters: + settings (list): List of dictionaries contains settings of RA profile + + Return: + list:List of dictionaries contains settings of RA profile + + """ + return settings + +def _set_regional_dc_profile(settings): + """ + Function for additional settings for RA regional dc profile. + + Parameters: + settings (list): List of dictionaries contains settings of RA profile + + Return: + list:List of dictionaries contains settings of RA profile + + """ + additional_settings = { + "configure_gpu": False, + "gpu_dp_enabled": False + } + settings.update(additional_settings) + return settings + +def _set_basic_profile(settings): + """ + Function for additional settings for RA basic profile. + + Parameters: + settings (list): List of dictionaries contains settings of RA profile + + Return: + list:List of dictionaries contains settings of RA profile + + """ + additional_settings = { + } + settings.update(additional_settings) + return settings + +def _set_full_nfv_profile(settings): + """ + Function for additional settings for RA Full NFV profile. + + Parameters: + settings (list): List of dictionaries contains settings of RA profile + + Return: + list:List of dictionaries contains settings of RA profile + + """ + additional_settings = { + } + settings.update(additional_settings) + return settings + +def _set_on_prem_profile(settings): + """ + Function for additional settings for RA On prem profile. + + Parameters: + settings (list): List of dictionaries contains settings of RA profile + + Return: + list:List of dictionaries contains settings of RA profile + + """ + additional_settings = { + "update_qat_drivers": False + } + settings.update(additional_settings) + return settings + +def _set_remote_fp_profile(settings): + """ + Function for additional settings for RA remote fp profile. + + Parameters: + settings (list): List of dictionaries contains settings of RA profile + + Return: + list:List of dictionaries contains settings of RA profile + + """ + additional_settings = { + } + settings.update(additional_settings) + return settings + +def _set_storage_profile(settings): + """ + Function for additional settings for RA storage profile. + + Parameters: + settings (list): List of dictionaries contains settings of RA profile + + Return: + list:List of dictionaries contains settings of RA profile + + """ + additional_settings = { + } + settings.update(additional_settings) + return settings + +def _docker_login(node_ips, ssh_client, user, registry, registry_username, password): + """ + Login to private AWS ECR. + + Parameters: + node_ips (list): List of K8s nodes + ssh_client (SSHConnector obj): SSHConnector object with active connection + user (string): Host os username + registry (string): URL address of private registry + registry_username (string): Registry username + password (string): Registry password + + Return: + None + + """ + for node_ip in node_ips: + ssh_node = SSHConnector(node_ip, user, 22, configuration['ssh_key'], ssh_client.client) + ssh_node.exec_command(f"docker login {registry} --username {registry_username} --password {password}", print_output=True) + ssh_node.close_connection() + +def cleanup(config): + """ + Cleanup function. + + Parameters: + provider (string): Cloud provider ['aws', 'azure', 'gcp', 'ali', 'tencent'] + ansible_host_ip (string): The IP address of the instance where Ansible will run + ssh_key (string): Path to private RSA key for autentification in Ansible instance + + Return: + None + + """ + click.echo("-------------------") + click.secho("Starting cleanup", fg="yellow") + + _parse_configuration_file(config=config) + + client = SSHConnector(ip_address=configuration['ansible_host_ip'], username='ubuntu', priv_key=configuration['ssh_key']) + + for image in configuration['exec_containers']: + image_name = image.replace('/','-') + click.echo(f"Deleting pod: {image_name}") + client.exec_command(f"kubectl delete {image_name}", print_output=True) + + client.exec_command(f"cd {RA_REMOTE_PATH} && ansible-playbook -i inventory.ini ./playbooks/redeploy_cleanup.yml") + client.exec_command(f"rm {RA_REMOTE_PATH} -rf") + +def _deploy(provider, ansible_host_ip, ssh_key): + """ + Function for deploy process of RA. + + Parameters: + provider (string): Cloud provider ['aws', 'azure', 'gcp', 'ali', 'tencent'] + ansible_host_ip (string): The IP address of the instance where Ansible will run + ssh_key (string): Path to private RSA key for autentification in Ansible instance + + Return: + None + + """ + click.echo("-------------------") + click.secho(f"Connecting to Ansible instance with IP: {configuration['ansible_host_ip']}", fg="yellow") + client = None + if provider == 'aws': + client = SSHConnector(ip_address=ansible_host_ip, username='ubuntu', priv_key=ssh_key) + + click.echo("-------------------") + click.secho("Copy private SSH key to Ansible instance", fg="yellow") + client.copy_file(file_path=ssh_key, destination_path=f"/home/ubuntu/cwdf_deployment/ssh/id_rsa") + + client.exec_command(f"sudo chmod 600 /home/ubuntu/cwdf_deployment/ssh/id_rsa") + + click.echo("-------------------") + click.secho("Copy clonned git repo as tar.gz file to Ansible instance", fg="yellow") + client.copy_file(file_path=TAR_PATH, destination_path=f"/home/ubuntu/{TAR_NAME}") + os.remove(TAR_PATH) + + client.exec_command(f"tar -xvf {TAR_NAME}", print_output=False) + client.exec_command(f"rm /home/ubuntu/{TAR_NAME}") + + click.secho("\nEnabling root login", fg="yellow") + _remove_ssh_banner(client, configuration['worker_ips'], 'ec2-user') + _remove_ssh_banner(client, configuration['controller_ips'], 'ubuntu') + + click.secho("\nInstalling lspci on Ansible workers", fg="yellow") + _install_dependencies_on_nodes(client, configuration['worker_ips']) + _install_dependencies_on_nodes(client, configuration['controller_ips']) + + click.secho("\nDiscovering Ansible nodes", fg="yellow") + _discovery_nodes(client, 'root', configuration['worker_ips'], "ra_worker") + _discovery_nodes(client, 'root', configuration['controller_ips'], "ra_host") + + click.echo("-------------------") + click.secho("Creating invenotry file", fg="yellow") + _create_inventory_file(client, nodes_list) + + click.secho("\nInitializing RA repository", fg="yellow") + commands = f"""cd {RA_REMOTE_PATH} && \ + git submodule update --init && \ + sudo python3 -m pip install -r requirements.txt && \ + export PROFILE={configuration['ra_profile']} && \ + make k8s-profile PROFILE={configuration['ra_profile']} ARCH={configuration['ra_machine_architecture']} + """ + + client.exec_command(commands, print_output=True) + + click.secho("\nCreating host_var files", fg="yellow") + _create_host_var_files(client, nodes_list) + + client.exec_command(f"{RA_REMOTE_PATH}/ansible -i inventory.ini -m setup all > {RA_REMOTE_PATH}/all_system_facts.txt") + + click.echo("-------------------") + click.secho("Running RA Ansible playbooks", fg="yellow") + click.secho("Selected profile:", fg="yellow") + click.secho(configuration['ra_profile'], fg="green") + + ansible_playbook_commnads = f""" + ansible-playbook -i {RA_REMOTE_PATH}/inventory.ini {RA_REMOTE_PATH}/playbooks/intel/{configuration['ra_profile']}.yml && \ + ansible-playbook -i {RA_REMOTE_PATH}/inventory.ini {RA_REMOTE_PATH}/playbooks/k8s/post_deployment_hooks.yml + """ + client.exec_command(command=ansible_playbook_commnads, print_output=True) + client.close_connection() + + if (configuration['replicate_from_container_registry'] is not None and + configuration['replicate_to_container_registry'] is not None and + configuration['exec_containers'] is not None): + click.echo("-------------------") + click.secho("Copy Docker images to cloud registry") + if provider == 'aws': + ssh_client = SSHConnector(ip_address=ansible_host_ip, username='ubuntu', priv_key=ssh_key) + click.echo(configuration['exec_containers']) + click.echo(f"From registry: {configuration['replicate_from_container_registry']}") + docker_mgmt = DockerManagement(from_registry=configuration['replicate_from_container_registry'], + to_registry=configuration['replicate_to_container_registry'], + images_to_replicate=configuration['exec_containers'], + region=configuration['cloud_settings']['region'], + cloud=provider, + show_log=True) + docker_mgmt.copy_images() + + _docker_login(node_ips=configuration['worker_ips'], + ssh_client=ssh_client, + user='root', + registry=configuration['replicate_to_container_registry'], + registry_username=docker_mgmt.ECR_USERNAME, + password=docker_mgmt.ECR_PASSWORD) + + for image in configuration['exec_containers']: + image_name = docker_mgmt.tagged_images[configuration['exec_containers'].index(image)]['repository'] + pod_name = docker_mgmt.tagged_images[configuration['exec_containers'].index(image)]['tag'] + click.echo(f"Starting pod: {pod_name}") + ssh_client.exec_command(f"kubectl run {pod_name} --image={image_name} -n default", print_output=True) + ssh_client.close_connection() + +if __name__ == '__main__': + main() diff --git a/docs/add_remove_nodes.md b/docs/add_remove_nodes.md new file mode 100644 index 00000000..9dee0676 --- /dev/null +++ b/docs/add_remove_nodes.md @@ -0,0 +1,20 @@ +# Adding and removing worker nodes + +Note: adding new nodes currently will disturb current nodes in the cluster as +ansible-playbook limit option is not supported yet. + +## Add new node(s) + +Modify the inventory with the added new node(s), then run: + +`ansible-playbook -i inventory.ini playbooks/${PROFILE}.yml -e scale=true` + +Alternatively, scale variable may accept any of the following values `{ yes, on, 1, true }`, case insensitive + +## Remove node(s) + +With the node(s) being removed still in the inventory, run: + +`ansible-playbook -i inventory.ini playbooks/remove_node.yml -e node=NODE_NAME` + +You may pass `-e node=NODE_NAME` or `-e node=NODE_NAME1,NODE_NAME2,...,NODE_NAMEn` or `-e node=NODES_GROUP_NAME` to the playbook to limit the execution to the node(s) being removed. \ No newline at end of file diff --git a/docs/get_sw_versions.md b/docs/get_sw_versions.md new file mode 100644 index 00000000..4b5984f8 --- /dev/null +++ b/docs/get_sw_versions.md @@ -0,0 +1,24 @@ +# Tool to extract SW version from CEK code + +## Output of this tool to be used as input for BOM generation + +In order to extract SW versions system should be configured for deployment. +group_vars and host_vars needs to be generated for expected profile. + +Follow README.md up to point 10. The deployment itself is not needed, so we do not need target machines anailable. + +To start tool execute `ansible-playbook`. + + ```bash + ansible-playbook -i inventory.ini playbooks/versions.yml + or + ansible-playbook playbooks/versions.yml + ``` + +Software component versions are generated to csv file versions_output.csv +Possible errors are generated to file versions_parsing_errors +Both files are located in project dir: + ```bash + versions_output_file: "{{ playbook_dir }}/../versions_output.csv" + versions_parsing_errors_file: "{{ playbook_dir }}/../versions_parsing_errors" + ``` diff --git a/docs/power_operator.md b/docs/power_manager.md similarity index 100% rename from docs/power_operator.md rename to docs/power_manager.md diff --git a/docs/vm_cluster_expansion_guide.md b/docs/vm_cluster_expansion_guide.md new file mode 100644 index 00000000..19df33b4 --- /dev/null +++ b/docs/vm_cluster_expansion_guide.md @@ -0,0 +1,112 @@ +# VM cluster expansion guide + + +VM cluster expansion means that we can add additional vm-work node(s) to existing VM cluster. They can be added on any existing vm_host machines or new vm_host machine(s) can be added as well. By default VM cluster expanssion feature won't re-create existing VMs. They will remain untouched during VM creation phase, nevertheless standard ansible deployment tasks will run on them as well. Corresponding ansible playbooks should be idempotent to ensure that running tasks again won't corrupt target system. Group vars for VM case contain new variable `vm_recreate_existing` with value set to `false`. +In order to expand VM cluster, the original VM cluster have to be up and running. + +``` +vm_recreate_existing: false +``` + +For VM cluster expansion deployment we should use the same ansible host, which was used for original VM cluster deployment. +Deployment configuration can't be changed except adding definitions for new VMs. Configuration of existing VMs have to remain the same. + +**_NOTE:_** If you want to add new vm-work node to existing vm_host then the vm_host needs to have enough free available resoures for it. + +**_NOTE:_** VM cluster expansion deployment is not intended for any kind of configuration update on existing VMs. + +**_NOTE:_** VM cluster reduction - removing vm-work node(s) from VM cluster - is not supported at all. + + +## Adding new vm-work to existing vm_host + +To add new vm-work to existing vm-host you need to update vms definition in host_vars file for corresponding vm_host. +In example bellow we've added new vm-work node definition with name `vm-work-4` + +``` +vms: + ... + ... + ... + - type: "work" + name: "vm-work-4" + cpu_total: 16 + memory: 20480 + vxlan: 128 + pci: + - "18:02.0" + - "18:02.1" + - "18:02.6" + - "18:02.7" + - "b1:01.3" + - "b3:01.3" +``` + +New host_vars file needs to be created for added vm-work node. In our case host_vars/vm-work-4.yml. +Existing host_vars/vm-work-1.yml can be used as a template. + + +## Adding new vm_host machine with new vm-work + +To add new vm_host machine you need to follow point 6 and point 8 in [README](README.md) +To be more precise we need to do three steps: +1) add new vm_host to inventory + - Add vm_host info to section [all] + - Add vm_host name to the end of section [vm_host] + +2) create host_vars file for that vm_host from template host_vars/host-for-vms-2.yml + - Update host specific params like `dataplane_interfaces` and `qat_devices` + - Fill vms section with new vm-work node(s) + +3) create new host_vars file for added vm-work node. In our case host_vars/vm-work-5.yml. + Existing host_vars/vm-work-1.yml can be used as a template. + + +``` +vms: + - type: "work" + name: "vm-work-5" + cpu_total: 16 + memory: 20480 + vxlan: 128 + pci: + - "18:02.0" + - "18:02.1" + - "18:02.6" + - "18:02.7" + - "b1:01.3" + - "b3:01.3" +``` + + + +## Run VM cluster expansion deployment + +Once we have prepared configuration for new VMs, we can run deployment via following command: + +``` +ansible-playbook -i inventory.ini playbooks/vm.yml -e scale=true +or +ansible-playbook -i inventory.ini playbooks/vm.yml -e scale=true --flush-cache -vv 2>&1 | tee vm_cluster_expansion.log +``` + + + +## Other options + +If you want to update VM cluster including current VMs then `vm_recreate_existing` parameter can be switched to `true` +In that case all existing VMs will be destroyed and re-created again. + +**_NOTE:_** All data, configurations and logs stored on VMs will be lost. Re-created VMs will get new IPs. + +``` +vm_recreate_existing: true +``` + +To run this option use following command: + +``` +ansible-playbook -i inventory.ini playbooks/vm.yml +or +ansible-playbook -i inventory.ini playbooks/vm.yml --flush-cache -vv 2>&1 |tee vm_cluster_recreate.log +``` diff --git a/docs/vm_config_guide.md b/docs/vm_config_guide.md index d09af445..178a72c7 100644 --- a/docs/vm_config_guide.md +++ b/docs/vm_config_guide.md @@ -54,14 +54,14 @@ The first option defines VM image distribution of cloud image, which will be use Currently supported distributions are: "ubuntu" and "rocky". Default is "ubuntu" Following two options define VM image version for Ubuntu and for Rocky. Currently supported ubuntu versions are: "20.04" and "22.04". Default is "20.04" - Currently supported rocky version is: "8.5". Default is "8.5" + Currently supported rocky versions are: "8.5" and "9.0". Default is "8.5" Default VM image distribution is "ubuntu" and default version is "20.04" Setting for VM image can be done just on the first VM host. It is common for all VMs across all VM hosts. ``` vm_image_distribution: "ubuntu" vm_image_version_ubuntu: "22.04" -vm_image_version_rocky: "8.5" +vm_image_version_rocky: "9.0" ``` The next options defines VM networking @@ -404,6 +404,8 @@ qat_devices: ``` +**For SGX** currently it's in experimental phase - it's compiling libvirt from custom repository. Beacause of that it's not supported on all operating system, but only for: Ubuntu 22.04 for host and Ubuntu 20.04 for VMs. + ### Once the deployment is finished we can access VMs from ansible_host via VM name: ``` ssh vm-ctrl-1 diff --git a/docs/vm_multinode_setup_guide.md b/docs/vm_multinode_setup_guide.md new file mode 100644 index 00000000..6007f097 --- /dev/null +++ b/docs/vm_multinode_setup_guide.md @@ -0,0 +1,110 @@ +# VM multinode setup guide + +VM multinode setup means that we can configure more vm_host machines and spread VMs across all of them. +All requested vm_hosts have to be added to inventory.ini to section `all` with all relevat info and +to section `vm_host` hostname only. +To configure multinode setup for VM case we need to use two host_vars file templates. + + +## The first vm_host + +The first template host_vars/host-for-vms-1.yml is used for the first vm_host inside vm_host group. +For the first vm_host we need to configure common parameters for the whole multinode deployment. + +### VM image +The only common VM image for all VMs inside deployment is supported at the moment +Default VM image version is Ubuntu 20.04 - focal. That version is used when following params are not configured inside host_vars file. + +Supported VM image distributions are ['ubuntu', 'rocky']. VM image distribution can be configured via following parameter: + +``` +vm_image_distribution: "rocky" +``` + +Supported VM image ubuntu versions ['20.04', '22.04']. Default is '20.04'. +VM image version for ubuntu can be changed via following parameter: + +``` +vm_image_version_ubuntu: "22.04" +``` + +Supported VM image rocky versions ['8.5', '9.0']. Default is '8.5'. +VM image version for rocky can be changed via following parameter: + +``` +vm_image_version_rocky: "9.0" +``` + + +### DHCP configuration +dhcp for vxlan have to be enabled just on the first vm_host. "VXLAN tag" inside dhcp list means that DHCP will be configured for that VXLAN. +The same VXLAN tag have to be used inside `vms` definitions on all vm_hosts for all VMs. Param to be set there is vxlan: 128 + +``` +dhcp: + - 128 +``` + +``` +vms: + - type: ... + ... + vxlan: 128 +``` + +DHCP will use following IP range to assing IPs for all VMs. Unique IP range should be used for additional deployments on the same physical network. + +``` +vxlan_gw_ip: "40.8.0.1/24" +``` + + +## Other vm_hosts execept the first one +The second template host_vars/host-for-vms-2.yml is used for all other vm_hosts inside vm_host group. + +### DHCP configuration +Secondary vm_host - do not change dhcp settings here +dhcp list have to remain empty here + +``` +dhcp: [] +``` + +The same VXLAN tag, which was configured for the first vm_host to be used inside `vms` definitions on all vm_hosts for all VMs. +Param to be set there is vxlan: 128 + +``` +vms: + - type: ... + ... + vxlan: 128 +``` + + +## Common configuration for all vm_hosts + +### VXLAN device +vxlan_device parameter have to contain physical nerwork interface, which is connected to network. +All vm_hosts have to be connected to the same network and corresponding network interfaces have to contain IP address from the same subnet. + +e.g.: +``` +vxlan_device: ens786f0 +``` + +### VM password +Set hashed password for root user inside VMs. Current value is just placeholder. +To create hashed password use e.g.: openssl passwd -6 -salt SaltSalt +The placeholder have to be replaced with real hashed password value. + +``` +vm_hashed_passwd: 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' +``` + +### Reserved number of CPUs for host OS +cpu_host_os will change number of CPUs reserved for host OS. Default value is 16 +It is for experts only who do performance benchmarking. Let it commneted out. + +``` +#cpu_host_os: 8 +``` diff --git a/generate/playbook_templates/infra_playbook.j2 b/generate/playbook_templates/infra_playbook.j2 index 476fd20a..f0e3cf36 100644 --- a/generate/playbook_templates/infra_playbook.j2 +++ b/generate/playbook_templates/infra_playbook.j2 @@ -59,7 +59,19 @@ when: ufs_enabled | default(false) | bool - role: bootstrap/set_sriov_kernel_flags tags: setup-sriov - when: iommu_enabled | default(true) | bool or on_vms | default(false) | bool + when: + - iommu_enabled | default(true) | bool or on_vms | default(false) | bool + - not ((configure_dlb_devices is defined and configure_dlb_devices) or + (configure_dsa_devices is defined and configure_dsa_devices)) + - role: bootstrap/set_siov_kernel_flags + tags: setup-siov + when: + - iommu_enabled | default(true) | bool + - ((configure_dsa_devices is defined and configure_dsa_devices) or + (configure_dlb_devices is defined and configure_dlb_devices)) and + ((ansible_distribution == "Ubuntu" and ansible_distribution_version == '20.04' and update_kernel) or + (ansible_distribution == "Ubuntu" and ansible_distribution_version >= '21.04') or + (ansible_os_family == "RedHat" and ansible_distribution_version >= '9.0')) - role: bootstrap/set_rdt_kernel_flags when: telegraf_enabled | default(true) | bool - role: bootstrap/set_intel_flexran_kernel_flags @@ -110,6 +122,16 @@ when: - update_qat_drivers | default(false) | bool - qat_devices | default([]) | length > 0 + - role: bootstrap/configure_dlb + tags: dlb-dp + when: + - configure_dlb_devices is defined and configure_dlb_devices + - (ansible_distribution == "Ubuntu" and ansible_distribution_version == '20.04' and update_kernel) or + (ansible_distribution == "Ubuntu" and ansible_distribution_version >= '21.04') or + (ansible_os_family == "RedHat" and ansible_distribution_version >= '9.0') + - role: bootstrap/configure_dsa + tags: dsa-dp + when: configure_dsa_devices | default(false) environment: "{{ '{{' }} proxy_env | d({}) {{ '}}' }}" any_errors_fatal: true {%- endif %} @@ -126,6 +148,9 @@ roles: - role: cluster_defaults - role: bootstrap/determine_dataplane_interfaces + tags: + - determine-dataplane-interfaces + - setup-sriov-nic when: - dataplane_interfaces | default([]) | length > 0 - role: bootstrap/update_nic_drivers @@ -181,6 +206,7 @@ or on_vms | default(false) | bool - update_qat_drivers | default(false) | bool - role: bootstrap/configure_openssl + tags: configure-openssl when: - qat_devices | default([]) | length > 0 - iommu_enabled | default(true) | bool @@ -188,7 +214,7 @@ - update_qat_drivers | default(false) | bool - openssl_install | default(false) | bool {%- endif %} -{%- if playbook_name in ['full_nfv', 'on_prem', 'remote_fp', 'build_your_own'] %} +{%- if playbook_name in ['full_nfv', 'on_prem', 'remote_fp', 'regional_dc', 'build_your_own'] %} - role: bootstrap/configure_sgx tags: sgx when: @@ -196,3 +222,54 @@ {%- endif %} environment: "{{ '{{' }} proxy_env | d({}) {{ '}}' }}" any_errors_fatal: true + +- hosts: kube_control_plane[0] + vars: + proxy_env: + {%- raw %} + http_proxy: "{{ http_proxy | d('') }}" + https_proxy: "{{ https_proxy | d ('') }}" + no_proxy: "{{ no_proxy | d('') }}" + {%- endraw %} + tasks: + {%- raw %} + - name: register mgmt driver + shell: "set -o pipefail && ethtool -i {{ hostvars[inventory_hostname]['ansible_default_ipv4']['interface'] }} | grep driver | sed 's/^driver: //'" + args: + executable: /bin/bash + register: mgmt_interface_driver + changed_when: false + when: adq_dp.enabled |d(false) | bool + {%- endraw %} + - include_role: + name: cluster_defaults + when: adq_dp.enabled |d(false) | bool + - name: install dependencies + package: + name: + - build-essential + - linux-headers-{% raw %}{{ ansible_kernel }}{% endraw %} + - libelf-dev + - ethtool + state: present + when: adq_dp.enabled |d(false) | bool + - meta: flush_handlers + when: adq_dp.enabled |d(false) | bool + - name: regather facts in case mgmt interface changed since start of play + setup: + gather_subset: + - network + when: adq_dp.enabled |d(false) | bool + - name: ADQ - update ICE driver on control plane + include_role: + name: bootstrap/update_nic_drivers + tasks_from: ice.yml + when: adq_dp.enabled |d(false) | bool + - name: ADQ - update ICE firmware on control plane + include_role: + name: bootstrap/update_nic_firmware + tasks_from: adq_update.yml + when: adq_dp.enabled |d(false) | bool + roles: [] + environment: "{{ '{{' }} proxy_env | d({}) {{ '}}' }}" + any_errors_fatal: true diff --git a/generate/playbook_templates/intel_playbook.j2 b/generate/playbook_templates/intel_playbook.j2 index a81e2140..bdb1c0b9 100644 --- a/generate/playbook_templates/intel_playbook.j2 +++ b/generate/playbook_templates/intel_playbook.j2 @@ -8,6 +8,9 @@ tags: remove-kubespray-host-dns-settings when: - remove_kubespray_host_dns_settings | default(false) | bool + - role: adq_dp_install + tags: adq_dp + when: adq_dp.enabled |d(false) | bool - role: nfd_install tags: nfd when: nfd_enabled | default(true) | bool @@ -36,14 +39,46 @@ tags: dp-operator when: sgx_dp_enabled | default(false) or gpu_dp_enabled | default(false) or - qat_dp_enabled | default(false) + qat_dp_enabled | default(false) or + dsa_dp_enabled | default(false) or + dlb_dp_enabled | default(false) {%- endif %} -{%- if playbook_name in ['full_nfv', 'on_prem', 'remote_fp', 'build_your_own'] %} +{%- if playbook_name in ['full_nfv', 'regional_dc', 'build_your_own'] %} + - role: gpu_dp_install + tags: gpu-dp + when: gpu_dp_enabled | default(false) | bool +{%- endif %} +{%- if playbook_name in ['access', 'full_nfv', 'on_prem', 'remote_fp', 'storage', 'build_your_own'] %} + - role: qat_dp_install + tags: qat-dp + when: qat_dp_enabled | default(false) | bool +{%- endif %} +{%- if playbook_name in ['full_nfv', 'regional_dc', 'build_your_own'] %} + - role: gpu_dp_install + tags: gpu-dp + when: gpu_dp_enabled | default(false) | bool +{%- endif %} +{%- if playbook_name in ['access', 'full_nfv', 'on_prem', 'remote_fp', 'storage', 'build_your_own'] %} + - role: qat_dp_install + tags: qat-dp + when: qat_dp_enabled | default(false) | bool +{%- endif %} +{%- if playbook_name in ['full_nfv', 'on_prem', 'remote_fp', 'regional_dc', 'build_your_own'] %} - role: sgx_dp_install tags: sgx-dp when: - sgx_dp_enabled | default(false) - ansible_os_family == "Debian" or (ansible_os_family == "RedHat" and ansible_distribution_version >= '8.3') + - role: dlb_dp_install + tags: dlb-dp + when: + - dlb_dp_enabled is defined and dlb_dp_enabled | default(false) | bool + - (ansible_distribution == "Ubuntu" and ansible_distribution_version == '20.04' and update_kernel) or + (ansible_distribution == "Ubuntu" and ansible_distribution_version >= '21.04') or + (ansible_os_family == "RedHat" and ansible_distribution_version >= '9.0') + - role: dsa_dp_install + tags: dsa-dp + when: dsa_dp_enabled is defined and dsa_dp_enabled | default(false) | bool - role: kmra_install tags: kmra when: @@ -60,23 +95,17 @@ tags: tca when: - tca.enabled | default(false) | bool +{%- endif %} +{%- if playbook_name in ['full_nfv', 'on_prem', 'remote_fp', 'regional_dc', 'build_your_own'] %} - role: intel_power_manager tags: power-manager when: intel_power_manager is defined and intel_power_manager.enabled | default(false) | bool {%- endif %} {%- if playbook_name in ['access', 'full_nfv', 'on_prem', 'remote_fp', 'storage', 'build_your_own'] %} - - role: qat_dp_install - tags: qat-dp - when: qat_dp_enabled | default(false) | bool - role: openssl_engine_install tags: openssl-engine when: openssl_engine_enabled | default(false) | bool {%- endif %} -{%- if playbook_name in ['full_nfv', 'regional_dc', 'build_your_own'] %} - - role: gpu_dp_install - tags: gpu-dp - when: gpu_dp_enabled | default(false) | bool -{%- endif %} {%- if playbook_name in ['full_nfv', 'on_prem', 'regional_dc', 'remote_fp', 'build_your_own'] %} - role: platform_aware_scheduling_install tags: platform-aware-scheduling @@ -99,23 +128,21 @@ tags: monitoring vars: telegraf_profile: {{ playbook_name }} + - role: opentelemetry_install + when: + - opentelemetry_enabled | default(false) | bool + tags: opentelemetry {%- if playbook_name in ['access', 'full_nfv', 'build_your_own'] %} - role: intel_sriov_fec_operator tags: intel-sriov-fec-operator when: - intel_sriov_fec_operator_enabled | default(false) | bool {%- endif %} -{%- if playbook_name in ['access', 'full_nfv', 'build_your_own'] %} - - role: intel_flexran - tags: intel-flexran - when: - - intel_flexran_enabled | default(false) | bool -{%- endif %} {%- if playbook_name in ['access', 'full_nfv', 'on_prem', 'regional_dc', 'remote_fp', 'build_your_own'] %} - - role: service_mesh_install - tags: service-mesh + - role: istio_service_mesh + tags: istio-service-mesh when: - - service_mesh.enabled | default(true) | bool + - istio_service_mesh.enabled | default(true) | bool {%- endif %} {%- if playbook_name in ['basic', 'full_nfv', 'on_prem', 'regional_dc', 'remote_fp', 'build_your_own'] %} - role: cndp_install @@ -137,6 +164,16 @@ when: - minio_enabled | default(false) | bool {%- endif %} +{%- if playbook_name in ['full_nfv', 'build_your_own'] %} + - role: tadk_install + tags: tadk + when: + - tadk_install | default(false) | bool +{%- endif %} + - role: cadvisor_install + tags: cadvisor + when: + - cadvisor_enabled | default(false) | bool environment: - "{{ '{{' }} proxy_env | d({}) {{ '}}' }}" - REGISTRY_AUTH_FILE: "{{ '{{' }} registry_containerd {{ '}}' }}" @@ -170,5 +207,26 @@ roles: - role: net_attach_defs_create tags: net-attach-defs + - role: jaeger_install + tags: jaeger + when: + - jaeger_operator | default(false) | bool +{%- if playbook_name in ['access', 'full_nfv', 'on_prem', 'regional_dc', 'remote_fp', 'build_your_own'] %} + - role: linkerd_service_mesh + tags: linkerd-service-mesh + when: + - linkerd_service_mesh.enabled | default(false) | bool +{%- endif %} environment: "{{ '{{' }} proxy_env | d({}) {{ '}}' }}" any_errors_fatal: true + +- hosts: oru, kube_node[0] + tasks: [] + roles: +{%- if playbook_name in ['access', 'full_nfv', 'build_your_own'] %} + - role: intel_flexran + tags: intel-flexran + when: + - intel_flexran_enabled | default(false) | bool +{%- endif %} + any_errors_fatal: true diff --git a/generate/playbook_templates/main_playbook.j2 b/generate/playbook_templates/main_playbook.j2 index b7587ae6..6501dd25 100644 --- a/generate/playbook_templates/main_playbook.j2 +++ b/generate/playbook_templates/main_playbook.j2 @@ -13,3 +13,6 @@ - name: install Intel Container Experience Kit features import_playbook: intel/{{ playbook_name }}.yml when: kubernetes | default(true) +- name: run post deployment hooks + import_playbook: k8s/post_deployment_hooks.yml + when: post_deployment_hook_enabled | default(false) diff --git a/generate/profiles_templates/common/group_vars.j2 b/generate/profiles_templates/common/group_vars.j2 index e1fac6ca..76d09c4f 100644 --- a/generate/profiles_templates/common/group_vars.j2 +++ b/generate/profiles_templates/common/group_vars.j2 @@ -6,6 +6,13 @@ profile_name: {{ name }} configured_arch: {{ arch }} +# Extends the list of CPUs, that can be used for installation. +# You can get the model of your CPU using command `lscpu`. +# The CPU models in unconfirmed_cpu_models list can be used for the CEK installation, +# nevertheless they haven't been tested, so installation process may fail +# or some features may not work properly. +unconfirmed_cpu_models: [] # update list if required such as, unconfirmed_cpu_models: ['$0000%@'] or unconfirmed_cpu_models: ['$0000%@', '8490H'] + # CEK project directory on all nodes project_root_dir: /opt/cek/ @@ -14,20 +21,38 @@ vm_enabled: {% if vm_mode == 'on' %}true{% else %}false{% endif %} # vm_mode can't be enabled manually here # To enable it, vm specific configuration from examples/vm need to be taken {%- endif %} +{%- if vm_mode == 'on' %} +# When vm_recreate_existing is false, existing VMs are not touch during cluster update/scaling +# When vm_recreate_existing is true, existing VMs are destroyed and created again during cluster update/scaling +vm_recreate_existing: false +{%- endif %} + +#POST DEPLOYMENT HOOKS: hooks_local dir will run .py, .sh and .yaml files (it will find inside this dir) on ansible host +#hooks_remote dir will run .py and .sh scripts (it will find inside this dir) on kube_control_plane; +post_deployment_hook_enabled: false +hooks_local: /root/hooks/local +hooks_remote: /root/hooks/remote # Kubernetes version kubernetes: true -kube_version: v1.23.4 +kube_version: v1.24.3 +#kube_version: v1.23.4 #kube_version: v1.22.3 -#kube_version: v1.21.5 # To deploy only container runtime set this variable as "true", and kubernetes as "false" # Set both variables as "false" to perform only host configuration container_runtime_only_deployment: false +# Kubenetes Audit policy custom rules +# https://github.com/kubernetes-sigs/kubespray/blob/master/roles/kubernetes/control-plane/templates/apiserver-audit-policy.yaml.j2 +audit_policy_custom_rules: "" # Kubernetes container runtime: docker, containerd, crio # When "crio" is set, please enable "crio_registries" section container_runtime: docker +# cAdvisor +cadvisor_enabled: false +cadvisor_custom_events_config_on: false + # Preflight will check vars configuration # It is NOT recommended to disable preflight, unless it is a conscious decision preflight_enabled: true @@ -47,7 +72,6 @@ selinux_state: current {% if nfd in ['on', 'optional'] %} # Node Feature Discovery nfd_enabled: {% if nfd == 'on' %}true{% else %}false{% endif %} -nfd_build_image_locally: false nfd_namespace: kube-system nfd_sleep_interval: 60s {% endif %} @@ -148,13 +172,30 @@ intel_power_manager: min_shared_frequency: 1000 # min frequency that will be applied for cores by Shared Workload {% endif %} -{%- if sgx_dp in ['on', 'optional'] and arch in ['icx', 'spr'] or +{%- if sgx_dp in ['on', 'optional'] and arch in ['icx'] or gpu_dp in ['on', 'optional'] or - qat_dp in ['on', 'optional'] %} + qat_dp in ['on', 'optional'] or + dsa_dp in ['on', 'optional'] and arch in ['spr'] or + dlb_dp in ['on', 'optional'] and arch in ['spr'] %} # Intel Device Plugin Operator intel_dp_namespace: kube-system # namespace will be applied for SGX DP, GPU DP and QAT DP {% endif %} +{%- if dlb_dp in ['on', 'optional'] and arch in ['spr'] %} +# Intel Dynamic Load Balancing Device Plugin (Intel DLB DP) for Kubernetes +dlb_dp_enabled: {% if dlb_dp == 'on' %}true{% else %}false{% endif %} # if true set configure_dlb_devices to true in host vars +dlb_dp_build_image_locally: false +dlb_dp_verbosity: 4 +{% endif %} + +{%- if dsa_dp in ['on', 'optional'] and arch in ['spr'] %} +# Intel Data Streaming Accelerator Device Plugin (Intel DSA DP) for Kubernetes +dsa_dp_enabled: {% if dsa_dp == 'on' %}true{% else %}false{% endif %} # if true set configure_dsa_devices to true in host vars +dsa_dp_build_image_locally: false +dsa_dp_verbosity: 4 +dsa_shared_devices: 10 # number of containers that can share the same DSA device. +{% endif %} + {%- if intel_ethernet_operator.enabled in ['on', 'optional'] %} # Intel Ethernet Operator for Intel E810 Series network interface cards intel_ethernet_operator_enabled: {% if intel_ethernet_operator.enabled == 'on' and nic == 'cvl' %}true{% else %}false{% endif %} @@ -175,8 +216,11 @@ qat_dp_verbosity: 4 # e.g node1 - 48VFs, node2 - 32VFs, qat_dp_max_devices: 48 # It is possible to use a subset of QAT devices in QAT DP. E.g by putting 10 here, QAT DP will use just 10VFs on each node qat_dp_max_num_devices: 32 -qat_dp_build_image_locally: {% if vm_mode == 'on' %}true{% else %}false{% endif %} - +qat_dp_build_image_locally: false +# Allocation policy - 2 possible values: balanced and packed. +# Balanced mode spreads allocated QAT VF resources balanced among QAT PF devices, and packed mode packs one QAT PF device +# full of QAT VF resources before allocating resources from the next QAT PF.(There is no default.) +# allocation_policy: balanced qat_supported_pf_dev_ids: - "435" - "37c8" @@ -211,7 +255,7 @@ gpu_dp_verbosity: 4 gpu_dp_build_image_locally: false # Configuration-options -# To fully discover the below settings usage, please refer to: https://github.com/intel/intel-device-plugins-for-kubernetes/tree/v0.23.0/cmd/gpu_plugin +# To fully discover the below settings usage, please refer to: https://github.com/intel/intel-device-plugins-for-kubernetes/tree/v0.24.0/cmd/gpu_plugin gpu_dp_shared_devices: 10 # number of containers (min. 1) that can share the same GPU device gpu_dp_monitor_resources: false # enable monitoring all GPU resources on the node gpu_dp_fractional_manager: false # enable handling of fractional resources for multi-GPU nodes @@ -224,7 +268,7 @@ gpu_dp_prefered_allocation: 'none' # available policies are: ['balanced', 'pack gpu_dp_max_memory: "8 GB" # max memory per card - for Intel SG1 single card has 8 GB of memory {% endif %} -{%- if sgx_dp in ['on', 'optional'] and arch in ['icx', 'spr'] %} +{%- if sgx_dp in ['on', 'optional'] and arch in ['icx'] %} # Intel SGX Device Plugin for Kubernetes sgx_dp_enabled: {% if sgx_dp == 'on' %}true{% else %}false{% endif %} sgx_dp_verbosity: 4 @@ -236,13 +280,18 @@ sgx_aesmd_demo_enable: false sgx_dp_provision_limit: 20 # EnclaveLimit is a number of containers that can share the same SGX enclave device. sgx_dp_enclave_limit: 20 +{%- if vm_mode == 'on' %} +# Memory size for SGX enclave in MB +sgx_memory_size: 16 +{%- endif %} {% endif %} {%- if (kmra and (kmra.pccs in ['on', 'optional'] or kmra.apphsm in ['on', 'optional'] or kmra.ctk_demo in ['on', 'optional'])) and - arch in ['icx', 'spr'] %} + arch in ['icx'] %} # KMRA (Key Management Reference Application) +# Please, refer to the roles/kmra_install/defaults/main.yml for the full list of configuration options available. kmra: {%- if kmra.pccs in ['on', 'optional'] %} pccs: @@ -262,50 +311,58 @@ kmra: {%- endif %} {% endif %} -{%- if service_mesh and service_mesh.enabled in ['on', 'optional'] %} +{%- if istio_service_mesh and istio_service_mesh.enabled in ['on', 'optional'] %} # Service mesh deployment # https://istio.io/latest/docs/setup/install/istioctl/ # Intel Istio # https://github.com/intel/istio -# for all available options, please, refer to the 'roles/service_mesh_install/vars/main.yml; +# for all available options, please, refer to the 'roles/istio_service_mesh/vars/main.yml; # for the options dependencies and compatibility, please, refer to the official CEK documentation; -service_mesh: - enabled: {% if service_mesh.enabled == 'on' %}true{% else %}false{% endif %} # enable Service Mesh +istio_service_mesh: + enabled: {% if istio_service_mesh.enabled == 'on' %}true{% else %}false{% endif %} # enable Service Mesh # available profiles are: 'default', 'demo', 'minimal', 'external', 'empty', 'preview', # 'sgx-mtls', 'intel-qat-hw', 'intel-qat-sw', 'intel-cryptomb' # if custom profile needs to be deployed, please, place the file named '.yaml' - # into the directory 'roles/service_mesh_install/files/profiles/' + # into the directory 'roles/istio_service_mesh/files/profiles/' # 'custom-ca' profile name is reserved for usage by sgx_signer if sgx_signer option is enabled # any profile name provided will be overwritten in this case - profile: {% if service_mesh.sgx_signer == 'on' and arch in ['icx', 'spr'] %}custom-ca{% else %}default{% endif %} + profile: {% if istio_service_mesh.sgx_signer == 'on' and arch in ['icx'] %}custom-ca{% else %}default{% endif %} intel_preview: - enabled: {% if service_mesh.intel_preview == 'on' %}true{% else %}false{% endif %} # enable intel istio preview - {%- if service_mesh.tcpip_bypass_ebpf in ['on', 'optional'] %} + enabled: {% if istio_service_mesh.intel_preview == 'on' %}true{% else %}false{% endif %} # enable intel istio preview + {%- if istio_service_mesh.tcpip_bypass_ebpf in ['on', 'optional'] %} tcpip_bypass_ebpf: - enabled: {% if service_mesh.tcpip_bypass_ebpf == 'on' %}true{% else %}false{% endif %} # enable tcp/ip ebpf bypass demo + enabled: {% if istio_service_mesh.tcpip_bypass_ebpf == 'on' %}true{% else %}false{% endif %} # enable tcp/ip ebpf bypass demo {%- endif %} - {%- if service_mesh.tls_splicing in ['on', 'optional'] %} + {%- if istio_service_mesh.tls_splicing in ['on', 'optional'] %} tls_splicing: - enabled: {% if service_mesh.tls_splicing == 'on' %}true{% else %}false{% endif %} # enable TLS splicing demo + enabled: {% if istio_service_mesh.tls_splicing == 'on' %}true{% else %}false{% endif %} # enable TLS splicing demo {%- endif %} - {%- if service_mesh.sgx_signer in ['on', 'optional'] and arch in ['icx', 'spr'] %} + {%- if istio_service_mesh.sgx_signer in ['on', 'optional'] and arch in ['icx'] %} sgx_signer: - enabled: {% if service_mesh.sgx_signer == 'on' %}true{% else %}false{% endif %} # enable automated key management integration + enabled: {% if istio_service_mesh.sgx_signer == 'on' %}true{% else %}false{% endif %} # enable automated key management integration name: sgx-signer {%- endif %} - {%- if service_mesh.intel_preview in ['on', 'optional'] %} + {%- if istio_service_mesh.intel_preview in ['on', 'optional'] and arch not in ['spr']%} # uncomment following section and enable intel_preview if sgx-mtls profile is selected - {% if service_mesh.intel_preview == 'optional' %}#{% endif %}set: - {% if service_mesh.intel_preview == 'optional' %}# {% endif %}- values.global.proxy.sgx.enabled=true - {% if service_mesh.intel_preview == 'optional' %}# {% endif %}- values.global.proxy.sgx.certExtensionValidationEnabled=true - {% if service_mesh.intel_preview == 'optional' %}# {% endif %}- values.gateways.sgx.enabled=true - {% if service_mesh.intel_preview == 'optional' %}# {% endif %}- values.gateways.sgx.certExtensionValidationEnabled=true + {% if istio_service_mesh.intel_preview == 'optional' %}#{% endif %}set: + {% if istio_service_mesh.intel_preview == 'optional' %}# {% endif %}- values.global.proxy.sgx.enabled=true + {% if istio_service_mesh.intel_preview == 'optional' %}# {% endif %}- values.global.proxy.sgx.certExtensionValidationEnabled=true + {% if istio_service_mesh.intel_preview == 'optional' %}# {% endif %}- values.gateways.sgx.enabled=true + {% if istio_service_mesh.intel_preview == 'optional' %}# {% endif %}- values.gateways.sgx.certExtensionValidationEnabled=true {%- endif %} {% endif %} +{%- if linkerd_service_mesh and linkerd_service_mesh.enabled in ['on', 'optional'] %} +# LinkerD service mesh +# https://linkerd.io/ +# +linkerd_service_mesh: + enabled: {% if linkerd_service_mesh.enabled == 'on' %}true{% else %}false{% endif %} +{% endif %} + {%- if tcs in ['on', 'optional'] and - arch in ['icx', 'spr'] %} + arch in ['icx'] %} # Trusted Certificate Service deployment # https://github.com/intel/trusted-certificate-issuer tcs: @@ -314,7 +371,7 @@ tcs: {% endif %} {%- if tca in ['on', 'optional'] and - arch in ['icx', 'spr'] %} + arch in ['icx'] %} # Trusted Certificate Attestation controller deployment # https://github.com/intel/trusted-certificate-attestation tca: @@ -351,9 +408,16 @@ collectd_enabled: {% if telemetry.collectd == 'on'%}true{% else %}false{% endif {%- if telemetry.telegraf in ['on', 'optional'] %} telegraf_enabled: {% if telemetry.telegraf == 'on'%}true{% else %}false{% endif %} {%- endif %} +{%- if telemetry.opentelemetry in ['on', 'optional'] %} +opentelemetry_enabled: {% if telemetry.opentelemetry == 'on'%}true{% else %}false{% endif %} +{%- endif %} collectd_scrap_interval: 30 telegraf_scrap_interval: 30 +{%- if jaeger in ['on', 'optional'] %} +jaeger_operator: {% if jaeger == 'on'%}true{% else %}false{% endif %} +{%- endif %} + {% if sriov_network_dp in ["on", "optional"] or network_userspace in ["on", "optional"] -%} # Create reference net-attach-def objects example_net_attach_defs: @@ -384,7 +448,10 @@ remove_kubespray_host_dns_settings: true cluster_name: cluster.local ## Kubespray variables ## - +{% if cert_manager in ['on', 'optional']%} +# Cert manager deployment +cert_manager_enabled: {% if cert_manager == "on"%}true{% else %}false{% endif%} +{%- endif %} # Supported network plugins(calico, flannel) and kube-proxy configuration kube_controller_manager_bind_address: 127.0.0.1 kube_proxy_metrics_bind_address: 127.0.0.1 @@ -395,7 +462,7 @@ kube_network_plugin: calico calico_network_backend: vxlan # For VM mode calico_backend has to be vxlan, otherwise deployment will fail {%- else %} -calico_network_backend: bird +calico_network_backend: vxlan {%- endif %} # Advanced calico options # https://github.com/kubernetes-sigs/kubespray/blob/master/docs/calico.md @@ -419,9 +486,6 @@ kube_proxy_mode: iptables # Set on true if you want to enable the eBPF dataplane support calico_bpf_enabled: false -# Set this var to true if you want to expose calico metrics endpoint -calico_metrics_enabled: false - # Comment this line out if you want to expose k8s services of type nodePort externally. kube_proxy_nodeport_addresses_cidr: 127.0.0.0/8 @@ -454,12 +518,6 @@ registry_enable: {% if registry == 'on' %}true{% else %}false{% endif %} registry_nodeport: "30500" registry_local_address: "localhost:{{ '{{' }} registry_nodeport {{ '}}' }}" {%- endif %} -{%- if cert_manager in ['on', 'optional'] %} -cert_manager_enable: {% if cert_manager == 'on' %}true{% else %}false{% endif %} -{%- endif %} - -# Enable Pod Security Policy. This option enables PSP admission controller and creates minimal set of rules. -psp_enabled: {% if psp == 'on' %}true{% else %}false{% endif %} # Set image pull policy to Always. Pull images prior to starting containers. Valid credentials must be configured. always_pull_enabled: false @@ -472,9 +530,12 @@ minio_enabled: {% if minio == 'on' %}true{% else %}false{% endif %} minio_tenant_enabled: true # Specifies whether to install MinIO Sample Tenant minio_tenant_servers: 4 # The number of MinIO Tenant nodes minio_tenant_volumes_per_server: 4 # The number of volumes per servers +minio_tenant_volume_size: 5 + # The size of each volume (unit: GiB) minio_deploy_test_mode: true # true (Test Mode) - use a file as loop device when creating storage # called "virtual block device" which is useful for test or automation purpose # false (Performance Mode) - use an actual NVME or SSD device when creating storage +minio_build_image_locally: true {%- endif %} {%- if cndp in ['on', 'optional'] or cndp_dp in ['on', 'optional'] %} @@ -495,3 +556,33 @@ cndp_net_attach_def_enabled: false {%- endif %} {%- endif %} {%- endif %} +{%- if tadk in ['on', 'optional'] %} + +## Traffic Analytics Development Kit (TADK) ## +# Install Web Application Firewall (WAF) using TADK +tadk_install: {% if tadk == 'on' %}true{% else %}false{% endif %} +{%- endif %} + +{%- if intel_flexran in ['on', 'optional'] %} +# Intel FlexRAN +intel_flexran_mode: timer # supported values are 'timer' and 'xran' +intel_flexran_bbu_front_haul: "0000:43:00.0" # must be string in [a-fA-F0-9]{4}:[a-fA-F0-9]{2}:[01][a-fA-F0-9].[0-7] format +intel_flexran_bbu_ptp_sync: "0000:43:00.1" # must be string in [a-fA-F0-9]{4}:[a-fA-F0-9]{2}:[01][a-fA-F0-9].[0-7] format +intel_flexran_oru_front_haul: "0000:4b:00.0" # must be string in [a-fA-F0-9]{4}:[a-fA-F0-9]{2}:[01][a-fA-F0-9].[0-7] format +intel_flexran_oru_ptp_sync: "0000:4b:00.1" # must be string in [a-fA-F0-9]{4}:[a-fA-F0-9]{2}:[01][a-fA-F0-9].[0-7] format +{% endif %} + +{%- if adq_dp in ['on', 'optional'] %} +# Note: ADQ is experimental feature and enabling it may lead to unexpected results. +# ADQ requires back-to-back connection between control plane and worker node on CVL interfaces. +# Name of CVL interfaces must be the same on both nodes, IP address must be present. +# In inventory.ini set "ip=" to IP address of CVL interface. +# Set kube_network_plugin to "cni", container_runtime to "containerd" and registry_enable to "true". +adq_dp: + enabled: false + # IP address of CVL interface located on the control plane + interface_address: "192.168.0.10" + interface_name: "ens107" + # bus_info of the interface located on the control plane + control_plane_interface: "a8:00.0" +{% endif %} diff --git a/generate/profiles_templates/common/host_vars.j2 b/generate/profiles_templates/common/host_vars.j2 index 68c763ea..b480fe55 100644 --- a/generate/profiles_templates/common/host_vars.j2 +++ b/generate/profiles_templates/common/host_vars.j2 @@ -7,17 +7,10 @@ profile_name: {{ name }} configured_arch: {{ arch }} configured_nic: {{ nic }} -{% if sriov_operator in ['on', 'optional'] or sriov_network_dp in ['on', 'optional'] or qat in ['on', 'optional'] -%} +{% if sriov_operator in ['on', 'optional'] or sriov_network_dp in ['on', 'optional'] or qat in ['on', 'optional'] or dsa in ['on', 'optional'] -%} # Enable IOMMU (required for SR-IOV networking and QAT) -iommu_enabled: {% if (sriov_operator == 'on' or sriov_network_dp == 'on' or qat == 'on') and on_vms != 'on' %}true{% else %}false{% endif %} +iommu_enabled: {% if (sriov_operator == 'on' or sriov_network_dp == 'on' or qat == 'on' or dsa == 'on' or dlb == 'on') and on_vms != 'on' %}true{% else %}false{% endif %} {% endif %} - -{%- if nic_drivers in ['on', 'optional'] %} -# Set to 'true' to update i40e, ice and iavf kernel modules -update_nic_drivers: {% if nic_drivers == 'on' %}true{% else %}false{% endif %} -# Set 'true' to update NIC firmware -update_nic_firmware: false # FW update will be executed on all NICs listed in "dataplane_interfaces[*].name" - # dataplane interface configuration list dataplane_interfaces: [] {%- if on_vms == 'on' %} @@ -41,9 +34,9 @@ dataplane_interfaces: [] # - bus_info: "18:00.0" # pci bus info # pf_driver: {% if nic == 'cvl' %}ice{% else %}i40e{% endif %} # PF driver, "i40e", "ice" {%- if ddp in ['on', 'optional'] %} -# ddp_profile: {% if nic == 'cvl' %}"ice_comms-1.3.35.0.pkg"{% else %}gtp.pkgo{% endif %} # DDP package name to be loaded into the NIC +# ddp_profile: {% if nic == 'cvl' %}"ice_comms-1.3.37.0.pkg"{% else %}gtp.pkgo{% endif %} # DDP package name to be loaded into the NIC # For i40e(XV710-*) allowable ddp values are: "ecpri.pkg", "esp-ah.pkg", "ppp-oe-ol2tpv2.pkgo", "mplsogreudp.pkg" and "gtp.pkgo", replace as required - # For ice(E810-*) allowable ddp values are: ice_comms-1.3.[17,20,22,24,28,30,31,35].0.pkg such as "ice_comms-1.3.35.0.pkg", replace as required + # For ice(E810-*) allowable ddp values are: ice_comms-1.3.[17,20,22,24,28,30,31,35].0.pkg such as "ice_comms-1.3.37.0.pkg", replace as required # ddp_profile must be defined for first port of each network device. bifurcated cards will appear as unique devices. {% endif %} {%- if intel_ethernet_operator.enabled in ['on', 'optional'] %} @@ -65,7 +58,7 @@ dataplane_interfaces: [] # - bus_info: "18:00.1" # pf_driver: {% if nic == 'cvl' %}ice{% else %}i40e{% endif %} {%- if ddp in ['on', 'optional'] %} -# ddp_profile: {% if nic == 'cvl' %}"ice_comms-1.3.35.0.pkg"{% else %}gtp.pkgo{% endif %} +# ddp_profile: {% if nic == 'cvl' %}"ice_comms-1.3.37.0.pkg"{% else %}gtp.pkgo{% endif %} {%- endif %} # default_vf_driver: "vfio-pci" {%- if intel_ethernet_operator.enabled in ['on', 'optional'] %} @@ -77,8 +70,49 @@ dataplane_interfaces: [] {% endif %} # sriov_vfs: {} # no VFs with specific driver on this PF or "sriov_vfs" can be omitted for convenience {% endif %} - -{%- if ddp in ['on', 'optional'] %} +{%- if nic_drivers in ['on', 'optional'] %} +# Set to 'true' to update i40e, ice and iavf kernel modules +update_nic_drivers: {% if nic_drivers == 'on' %}true{% else %}false{% endif %} +#i40e_driver_version: "2.20.12" # Downgrading i40e drivers is not recommended due to the possible consequences. Users should update and proceed at their own risk. +#i40e_driver_checksum: "sha1:a24f0c5512af31c68cd90667d5822121780d5487" # update checksum per required i40e drivers version +#ice_driver_version: "1.9.11" # Downgrading ice drivers is not recommended due to the possible consequences. Users should update and proceed at their own risk. +#ice_driver_checksum: "sha1:f05e2322a66de5d4019e7aa6141a109bb419dda4" # update checksum per required ice drivers version +#iavf_driver_version: "4.5.3" # Downgrading iavf drivers is not recommended due to the possible consequences. Users should update and proceed at their own risk. +#iavf_driver_checksum: "sha1:76b3a7dec392e559dea6112fa55f5614857cff2a" # update checksum per required iavf drivers version +{% endif %} +# Set 'true' to upgrade / downgrade NIC firmware. FW upgrade / downgrade will be executed on all NICs listed in "dataplane_interfaces[*].bus_info". +update_nic_firmware: false # Note: downgrading FW is not recommended, users should proceed at their own risk. +{%- if nic == 'fvl' %} +#nvmupdate: [] # remove '[]' in case of downgrading FW such as 'nvmupdate:' +# i40e: [] # remove '[]' in case of downgrading FW to get required version of NVM 'i40e' 700 Series such as 'i40e:' +# nvmupdate_pkg_url: "https://downloadmirror.intel.com/739639/700Series_NVMUpdatePackage_v9_00_Linux.tar.gz" +# nvmupdate_pkg_checksum: "sha1:B2B183ADD3B6EF8BCB2DA77A6E6A68F482F4BFD1" +# required_fw_version: "9.0" +# # min fw version for ddp was taken from: +# # https://www.intel.com/content/www/us/en/developer/articles/technical/dynamic-device-personalization-for-intel-ethernet-700-series.html +# min_ddp_loadable_fw_version: "6.01" +# min_updatable_fw_version: "5.02" +# # when downgrading only, the recommended below version is required to download the supported NVMupdate64E tool. Users should replace the tool at their own risk. +# supported_nvmupdate_tool_pkg_url: "https://downloadmirror.intel.com/738715/E810_NVMUpdatePackage_v4_00_Linux.tar.gz" +# supported_nvmupdate_tool_pkg_checksum: "sha1:7C168880082653B579FDF225A2E6E9301C154DD1" +# supported_nvmupdate_tool_fw_version: "4.0" +{%- endif %} +{%- if nic == 'cvl' %} +#nvmupdate: [] # remove '[]' in case of downgrading FW such as 'nvmupdate:' +# ice: [] # remove '[]' in case of downgrading FW to get required version of NVM 'ICE' 800 Series such as 'ice:' +# nvmupdate_pkg_url: "https://downloadmirror.intel.com/738715/E810_NVMUpdatePackage_v4_00_Linux.tar.gz" +# nvmupdate_pkg_checksum: "sha1:7C168880082653B579FDF225A2E6E9301C154DD1" +# required_fw_version: "4.0" +# # https://builders.intel.com/docs/networkbuilders/intel-ethernet-controller-800-series-device-personalization-ddp-for-telecommunications-workloads-technology-guide.pdf +# # document above does not specify any min fw version needed for ddp feature. So, min_ddp_loadable_fw is the same as min_updatable_fw +# min_ddp_loadable_fw_version: "0.70" +# min_updatable_fw_version: "0.70" + # when downgrading only, the recommended below version is required to download the supported NVMupdate64E tool. Users should replace the tool at their own risk. +# supported_nvmupdate_tool_pkg_url: "https://downloadmirror.intel.com/738715/E810_NVMUpdatePackage_v4_00_Linux.tar.gz" +# supported_nvmupdate_tool_pkg_checksum: "sha1:7C168880082653B579FDF225A2E6E9301C154DD1" +# supported_nvmupdate_tool_fw_version: "4.0" +{%- endif %} +{% if ddp in ['on', 'optional'] %} # install Intel x700 & x800 series NICs DDP packages install_ddp_packages: {% if ddp == 'on' and nic == 'fvl'%}true{% else %}false{% endif %} # If following error appears: "Flashing failed: Operation not permitted" @@ -93,7 +127,6 @@ enable_ice_systemd_service: {% if ddp == "on" %}true{% else %}false{% endif %} {%- if sriov_network_dp in ['on', 'optional'] %} sriov_cni_enabled: {% if sriov_network_dp == 'on' %}true{% else %}false{% endif %} {% endif %} -{%- endif %} {%- if sriov_operator in ['on', 'optional'] %} # Custom SriovNetworkNodePolicy manifests local path @@ -109,17 +142,17 @@ bond_cni_enabled: {% if bond_cni == 'on' %}true{% else %}false{% endif %} # Install DPDK (required for SR-IOV networking) install_dpdk: {% if dpdk == 'on' %}true{% else %}false{% endif %} # DPDK version (will be in action if install_dpdk: true) -dpdk_version: "22.03" +dpdk_version: {% if intel_flexran == 'on' %}"21.11"{% else %}"22.07"{% endif %} # Custom DPDK patches local path -#dpdk_local_patches_dir: "/tmp/patches/dpdk" +{% if intel_flexran == 'on' %}dpdk_local_patches_dir: "/tmp/flexran"{% else %}#dpdk_local_patches_dir: "/tmp/patches/dpdk"{% endif %} # It might be necessary to adjust the patch strip parameter, update as required. -#dpdk_local_patches_strip: 0 +{% if intel_flexran == 'on' %}dpdk_local_patches_strip: 1{% else %}#dpdk_local_patches_strip: 0{% endif %} {%- endif %} {% if network_userspace in ['on', 'optional'] %} # Userspace networking userspace_cni_enabled: {% if network_userspace == 'on' %}true{% else %}false{% endif %} ovs_dpdk_enabled: {% if ovs_dpdk == 'on' %}true{% else %}false{% endif %} # Should be enabled with Userspace CNI, when VPP is set to "false"; 1G hugepages required -ovs_version: "v2.17.1" +ovs_version: "v2.17.2" # CPU mask for OVS-DPDK PMD threads ovs_dpdk_lcore_mask: 0x1 # Huge memory pages allocated by OVS-DPDK per NUMA node in megabytes @@ -138,6 +171,44 @@ default_hugepage_size: {% if vpp == 'on' %}2M{% else %}1G{% endif %} number_of_hugepages_1G: 4 number_of_hugepages_2M: 1024 {% endif %} +{%- if dlb in ['on', 'optional'] and arch in ['spr'] %} +# Configure SIOV and Intel DLB devices - required for Intel DLB Device Plugin support +configure_dlb_devices: {% if dlb == "on" %}true{% else %}false{% endif %} +{% endif %} + +{%- if dsa in ['on', 'optional'] and arch in ['spr'] %} +# Configure SIOV and Intel DSA devices - required for Intel DSA Device Plugin support +configure_dsa_devices: {% if dsa == "on" %}true{% else %}false{% endif %} + +# Example DSA devices configuration list. If left empty and configure_dsa_devices is set to true then default configuration will be applied. +# It is possible to configure more DSA devices by extending dsa_devices list based on example config. +dsa_devices: [] + # - name: dsa0 # name of DSA device from /sys/bus/dsa/devices/ + # groups: 1 # number of groups to configure. The maximum number of groups per device can be found on /sys/bus/dsa/devices/dsaX/max_groups + # engines: 1 # number of engines to configure - one engine per group will be configured. + # # The maximum number of engines can be found on /sys/bus/dsa/devices/dsa0/max_engines + # wqs: # work queues will be named as wq., for example wq0.0 - WQ with id 0 owned by dsa0 device + # - id: 0 # work queue id + # mode: "dedicated" # [shared, dedicated] + # type: "user" # [kernel, user] + # size: 8 # sum of all configured WQs size must be less than /sys/bus/dsa/devices/dsa0/max_workqueue_size + # prio: 4 # must be set between 1 and 15 + # group_id: 0 # work queue will be assigned to specific group + # max_batch_size: 1024 # specify the max batch size used by a work queue - powers of 2 are accetable + # max_transfer_size: 2147483648 # specify the max transfer size used by a work queue - powers of 2 are accetable + # block_on_fault: 0 # [0, 1] If block on fault is disabled, + # # if a page fault occurs on a source or destination memory access, the operation stops and the page fault is reported to the software + # - id: 1 + # mode: "shared" + # type: "user" + # size: 8 + # prio: 5 + # threshold: 7 # only for Shared WQ, must be at least one less than size of WQ + # group_id: 0 + # max_batch_size: 1024 + # max_transfer_size: 2147483648 + # block_on_fault: 0 +{% endif %} {%- if intel_ethernet_operator.enabled in ['on', 'optional'] %} # Intel Ethernet Operator for Intel E810 Series network interface cards @@ -154,7 +225,7 @@ intel_ethernet_operator: {%- if intel_sriov_fec_operator in ['on', 'optional'] %} # Wireless FEC H/W Accelerator Device (e.g. ACC100) PCI ID -fec_acc: "0000:27:00.0" # must be string in [a-fA-F0-9]{4}:[a-fA-F0-9]{2}:[01][a-fA-F0-9].[0-7] format +fec_acc: "0000:6f:00.0" # must be string in [a-fA-F0-9]{4}:[a-fA-F0-9]{2}:[01][a-fA-F0-9].[0-7] format {% endif %} {%- if intel_flexran in ['on', 'optional'] %} @@ -166,6 +237,15 @@ intel_flexran_enabled: {% if intel_flexran == 'on' %}true{% else %}false{% endif # Enabling this feature will install QAT drivers + services update_qat_drivers: {% if qat == "on" %}true{% else %}false{% endif %} +{%- if arch in ['spr'] %} +# SPR platform QAT drivers requirements +# the folder must exist on the machine that will run Ansible playbooks +qat_drivers_folder: "/tmp/qat/" # QAT "QAT20.L.0.9.6-00024.tar.gz" driver package is expected to be present in this folder +qat_drivers_version: "QAT20.L.0.9.6-00024" # CEK has been validated with QAT drivers version "QAT20.L.0.9.6-00024.tar.gz" +# This package provides user space libraries that allow access to Intel(R) QuickAssist devices and expose the Intel(R) QuickAssist APIs and sample codes. +enable_intel_qatlibs: {% if qat == "on" %}true{% else %}false{% endif %} # Make sure "openssl_install" is set to "true" else this feature will be skipped in deployment. +enable_qat_svm: {% if qat == "on" %}true{% else %}false{% endif %} # Enable QAT Shared Virtual Memory (SVM). +{% endif %} # qat interface configuration list qat_devices: [] {%- if on_vms == 'on' %} @@ -175,32 +255,45 @@ qat_devices: [] # - qat_id: "0000:0b:00.0" # qat_sriov_numvfs: 0 # Have to be set to 0 here to not create any VFs inside VM. {%- else %} -# - qat_id: "0000:ab:00.0" # QAT device id one using DPDK compatible driver for VF devices to be used by vfio-pci kernel driver, replace as required -# qat_sriov_numvfs: 12 # Number of VFs per PF to create - cannot exceed the maximum number of VFs available for the device. Set to 0 to not create any VFs. - # Note: Currently when trying to create fewer virtual functions than the maximum, the maximum number always gets created +# - qat_id: "0000:ab:00.0" # QAT device id one using DPDK compatible driver for VF devices to be used by vfio-pci kernel driver, replace as required +# qat_sriov_numvfs: 12 # Number of VFs per PF to create - cannot exceed the maximum number of VFs available for the device. Set to 0 to not create any VFs. +# # Note: Currently when trying to create fewer virtual functions than the maximum, the maximum number always gets created. +# qat_default_vf_driver: {% if arch == "spr" %}"4xxxvf"{% else %}"c6xxvf"{% endif %} +# qat_vfs: # VFs drivers settings will be overridden by QAT device plugin. +# vf_00: "vfio-pci" +# vf_05: "vfio-pci" + # - qat_id: "0000:xy:00.0" # qat_sriov_numvfs: 10 +# qat_default_vf_driver: {% if arch == "spr" %}"4xxxvf"{% else %}"c6xxvf"{% endif %} +# qat_vfs: {} # - qat_id: "0000:yz:00.0" # qat_sriov_numvfs: 10 +# qat_default_vf_driver: {% if arch == "spr" %}"4xxxvf"{% else %}"c6xxvf"{% endif %} +# qat_vfs: {} {%- endif %} {% endif %} {%- if openssl in ['on', 'optional'] %} # Install and configure OpenSSL cryptography openssl_install: {% if openssl == 'on' and qat == "on" %}true{% else %}false{% endif %} # This requires update_qat_drivers set to 'true' in host vars -{% endif %} - +{% endif -%} {%- if isolcpu in ["on", "optional"] %} # CPU isolation from Linux scheduler isolcpus_enabled: {% if isolcpu == 'on' %}true{% else %}false{% endif %} {%- if on_vms == 'on' %} isolcpus: "4-15" +{%- else -%} +{% if vm_mode == 'on' %} +# isolcpus variable can't be enabled in case of VMRA deployment. +# Its content is generated automatically. +# isolcpus: "" {%- else %} isolcpus: "4-11" -{%- endif %} {% endif %} - +{%- endif %} +{%- endif %} {%- if cpusets in ["on", "optional"] %} # CPU shielding cpusets_enabled: {% if cpusets == 'on' %}true{% else %}false{% endif %} @@ -323,7 +416,7 @@ sst_tf_configuration_enabled: {% if sst == "on" %}true{% else %}false{% endif %} {%- endif %} {% endif %} -{%- if sgx in ['on', 'optional'] and arch in ['icx', 'spr'] %} +{%- if sgx in ['on', 'optional'] and arch in ['icx'] %} # Intel Software Guard Extensions (SGX) configure_sgx: {% if sgx == 'on' %}true{% else %}false{% endif %} {% endif %} @@ -369,6 +462,17 @@ cndp_dp_pools: {% endif %} {% endif %} +{% if adq_dp in ['on', 'optional'] %} +# Note: ADQ is experimental feature and enabling it may lead to unexpected results. +# ADQ requires back-to-back connection between control plane and worker node on CVL interfaces. +# Name of CVL interfaces must be the same on both nodes, IP address must be present. +# In inventory.ini set "ip=" to IP address of CVL interface +adq_dp: + enabled: false + # IP address of CVL interface located on the worker node + interface_address: "192.168.0.11" +{% endif %} + {%- if vm_mode in ['on'] and on_vms != 'on' %} # The only common VM image for all VMs inside deployment is supported at the moment # @@ -378,12 +482,12 @@ cndp_dp_pools: dhcp: [] {% else %} # Default VM image version is Ubuntu 20.04 - focal -# Supported VM image distributions ['ubuntu', 'rocky'] -#vm_image_distribution: "ubuntu" -# Supported VM image ubuntu versions ['20.04', '22.04'] +# Supported VM image distributions ['ubuntu', 'rocky']. Default is 'ubuntu'. +#vm_image_distribution: "rocky" +# Supported VM image ubuntu versions ['20.04', '22.04']. Default version is '20.04'. #vm_image_version_ubuntu: "22.04" -# Supported VM image rocky versions ['8.5'] -#vm_image_version_rocky: "8.5" +# Supported VM image rocky versions ['8.5', '9.0']. Default version is '8.5'. +#vm_image_version_rocky: "9.0" # dhcp for vxlan have to be enabled just on the first vm_host dhcp: - 120 @@ -487,10 +591,10 @@ vms: # vxlan: 120 {%- if name not in ['build_your_own'] %} # pci: -# - "18:02.6" -# - "18:02.6" # - "18:02.0" +# - "18:02.1" # - "18:02.6" +# - "18:02.7" {%- if qat == "on" %} ## - "3d:01.2" ## - "3f:01.2" @@ -500,7 +604,7 @@ vms: {%- endif %} {% endif -%} {%- if power_manager in ['on', 'optional'] and arch in ['icx', 'clx', 'spr'] -%} -# Power Operator Shared Profile/Workload settings. +# Power Manager Shared Profile/Workload settings. # It is possible to create node-specific Power Profile local_shared_profile: enabled: false @@ -521,7 +625,6 @@ minio_pv: [] # storageClassName: "local-storage" # Storage class name to match with PVC # accessMode: "ReadWriteOnce" # Access mode when mounting a volume, e.g., ReadWriteOnce/ReadOnlyMany/ReadWriteMany/ReadWriteOncePod # persistentVolumeReclaimPolicy: "Retain" # Reclaim policy when a volume is released once it's bound, e.g., Retain/Recycle/Delete -# capacity: 1GiB # Size of the PV. support only GiB/TiB # mountPath: /mnt/data0 # Mount path of a volume # device: /dev/nvme0n1 # Target storage device name when creating a volume. # When group_vars: minio_deploy_test_mode == true, use a file as a loop device for storage @@ -531,7 +634,6 @@ minio_pv: [] # storageClassName: "local-storage" # accessMode: "ReadWriteOnce" # persistentVolumeReclaimPolicy: "Retain" -# capacity: 1GiB # mountPath: /mnt/data1 # device: /dev/nvme1n1 @@ -539,7 +641,6 @@ minio_pv: [] # storageClassName: "local-storage" # accessMode: "ReadWriteOnce" # persistentVolumeReclaimPolicy: "Retain" -# capacity: 1GiB # mountPath: /mnt/data2 # device: /dev/nvme2n1 @@ -547,7 +648,20 @@ minio_pv: [] # storageClassName: "local-storage" # accessMode: "ReadWriteOnce" # persistentVolumeReclaimPolicy: "Retain" -# capacity: 1GiB # mountPath: /mnt/data3 # device: /dev/nvme3n1 + +# - name: "mnt-data-5" +# storageClassName: "local-storage" +# accessMode: "ReadWriteOnce" +# persistentVolumeReclaimPolicy: "Retain" +# mountPath: /mnt/data4 +# device: /dev/nvme4n1 + +# - name: "mnt-data-6" +# storageClassName: "local-storage" +# accessMode: "ReadWriteOnce" +# persistentVolumeReclaimPolicy: "Retain" +# mountPath: /mnt/data5 +# device: /dev/nvme5n1 {% endif -%} diff --git a/generate/profiles_templates/k8s/profiles.yml b/generate/profiles_templates/k8s/profiles.yml index 8e12ebce..e7aa90fa 100644 --- a/generate/profiles_templates/k8s/profiles.yml +++ b/generate/profiles_templates/k8s/profiles.yml @@ -27,6 +27,7 @@ # - tcs # - tca # - qat +# - dsa # - gpu # - gpu_dp # - openssl @@ -45,21 +46,24 @@ # prometheus # collectd # telegraf +# opentelemetry +# - jaeger # - wireguard # - multus # - cndp # - cndp_dp -# - psp # - minio # - cert_manager # - registry # - hugepages -# - service_mesh +# - istio_service_mesh # enabled # tcpip_bypass_ebpf # tls_splicing # sgx_signer # intel_preview +# - linkerd_service_mesh +# enabled # - intel_ethernet_operator # enabled # flow_config @@ -67,6 +71,7 @@ # fw_update # - intel_sriov_fec_operator # - intel_flexran +# - tadk --- access: @@ -107,7 +112,7 @@ access: dpdk: on ovs_dpdk: off pstate: off - cstate: on + cstate: off ufs: off sst: off power_manager: on @@ -115,23 +120,25 @@ access: prometheus: on collectd: optional telegraf: on - service_mesh: + opentelemetry: on + jaeger: optional + istio_service_mesh: enabled: off tcpip_bypass_ebpf: off tls_splicing: off sgx_signer: off intel_preview: off - wireguard: on + linkerd_service_mesh: + enabled: off + wireguard: optional multus: on firewall: optional cndp: off cndp_dp: off - psp: on minio: off cert_manager: on registry: on hugepages: on - tadk: off intel_ethernet_operator: enabled: optional flow_config: optional @@ -139,6 +146,7 @@ access: fw_update: optional intel_sriov_fec_operator: on intel_flexran: on + adq_dp: optional basic: name: basic @@ -159,12 +167,13 @@ basic: prometheus: on collectd: optional telegraf: on - wireguard: on + opentelemetry: on + jaeger: optional + wireguard: optional multus: on firewall: optional cndp: optional cndp_dp: optional - psp: on cert_manager: on registry: on hugepages: optional @@ -172,6 +181,7 @@ basic: enabled: optional flow_config: optional fw_update: optional + adq_dp: optional full_nfv: name: full_nfv @@ -190,6 +200,10 @@ full_nfv: qat: on qat_dp: on openssl: on + dsa: on + dsa_dp: on + dlb: on + dlb_dp: on gpu: optional gpu_dp: optional sgx: on @@ -215,18 +229,21 @@ full_nfv: prometheus: on collectd: optional telegraf: on - service_mesh: + opentelemetry: on + jaeger: on + istio_service_mesh: enabled: on tcpip_bypass_ebpf: on tls_splicing: on sgx_signer: on intel_preview: optional - wireguard: on + linkerd_service_mesh: + enabled: optional + wireguard: optional multus: on firewall: optional cndp: on cndp_dp: on - psp: on minio: optional cert_manager: on registry: on @@ -238,6 +255,8 @@ full_nfv: fw_update: optional intel_sriov_fec_operator: optional intel_flexran: optional + tadk: on + adq_dp: optional on_prem: name: on_prem @@ -262,6 +281,10 @@ on_prem: tca: on qat: on qat_dp: on + dsa: optional + dsa_dp: optional + dlb: optional + dlb_dp: optional openssl: on tas: on dpdk: on @@ -275,18 +298,21 @@ on_prem: prometheus: on collectd: optional telegraf: on - service_mesh: + opentelemetry: on + jaeger: optional + istio_service_mesh: enabled: on tcpip_bypass_ebpf: on tls_splicing: on sgx_signer: on intel_preview: optional - wireguard: on + linkerd_service_mesh: + enabled: optional + wireguard: optional multus: on firewall: optional cndp: optional cndp_dp: optional - psp: on cert_manager: on registry: on hugepages: on @@ -294,6 +320,7 @@ on_prem: enabled: optional flow_config: optional fw_update: optional + adq_dp: optional regional_dc: name: regional_dc @@ -310,6 +337,14 @@ regional_dc: native_cpu_manager: on gpu: on gpu_dp: on + sgx: on + sgx_dp: on + kmra: + pccs: on + apphsm: on + ctk_demo: on + tcs: on + tca: on tas: on gas: on dpdk: optional @@ -319,17 +354,21 @@ regional_dc: prometheus: on collectd: optional telegraf: on - service_mesh: + opentelemetry: on + jaeger: optional + istio_service_mesh: enabled: on tcpip_bypass_ebpf: on tls_splicing: on + sgx_signer: on intel_preview: optional - wireguard: on + linkerd_service_mesh: + enabled: optional + wireguard: optional multus: on firewall: optional cndp: optional cndp_dp: optional - psp: on cert_manager: on registry: on hugepages: optional @@ -337,6 +376,7 @@ regional_dc: enabled: optional flow_config: optional fw_update: optional + adq_dp: optional remote_fp: name: remote_fp @@ -361,6 +401,10 @@ remote_fp: tca: optional qat: on qat_dp: optional + dsa: optional + dsa_dp: optional + dlb: optional + dlb_dp: optional openssl: on tas: on ddp: on @@ -376,18 +420,21 @@ remote_fp: prometheus: on collectd: on telegraf: optional - service_mesh: + opentelemetry: on + jaeger: optional + istio_service_mesh: enabled: optional tcpip_bypass_ebpf: optional tls_splicing: optional sgx_signer: optional intel_preview: optional - wireguard: on + linkerd_service_mesh: + enabled: optional + wireguard: optional multus: on firewall: optional cndp: optional cndp_dp: optional - psp: on cert_manager: on registry: on hugepages: on @@ -396,6 +443,7 @@ remote_fp: flow_config: optional ddp: on fw_update: optional + adq_dp: optional storage: name: storage @@ -419,10 +467,11 @@ storage: prometheus: on collectd: optional telegraf: on - wireguard: on + opentelemetry: on + jaeger: optional + wireguard: optional multus: on firewall: optional - psp: on minio: on cert_manager: on registry: on @@ -432,6 +481,7 @@ storage: flow_config: optional ddp: optional fw_update: optional + adq_dp: optional build_your_own: name: build_your_own @@ -450,6 +500,8 @@ build_your_own: qat: optional qat_dp: optional openssl: optional + dsa: optional + dsa_dp: optional gpu: optional gpu_dp: optional sgx: optional @@ -475,18 +527,21 @@ build_your_own: prometheus: optional collectd: optional telegraf: optional - service_mesh: + opentelemetry: optional + jaeger: optional + istio_service_mesh: enabled: optional tcpip_bypass_ebpf: optional tls_splicing: optional sgx_signer: optional intel_preview: optional + linkerd_service_mesh: + enabled: optional wireguard: optional multus: optional firewall: optional cndp: optional cndp_dp: optional - psp: optional minio: optional cert_manager: optional registry: optional @@ -498,3 +553,5 @@ build_your_own: fw_update: optional intel_sriov_fec_operator: optional intel_flexran: optional + tadk: optional + adq_dp: optional diff --git a/generate/profiles_templates/vm/vm_host_profiles.yml b/generate/profiles_templates/vm/vm_host_profiles.yml index 112d5703..475173e7 100644 --- a/generate/profiles_templates/vm/vm_host_profiles.yml +++ b/generate/profiles_templates/vm/vm_host_profiles.yml @@ -48,16 +48,17 @@ # - multus # - cndp # - cndp_dp -# - psp # - cert_manager # - registry # - hugepages -# - service_mesh +# - istio_service_mesh # enabled # tcpip_bypass_ebpf # tls_splicing # sgx_signer # intel_preview +# - linkerd_service_mesh +# enabled # - intel_ethernet_operator # enabled # flow_config @@ -96,15 +97,17 @@ access: prometheus: on collectd: optional telegraf: on - service_mesh: - enabled: on - tcpip_bypass_ebpf: on - tls_splicing: on - intel_preview: optional + istio_service_mesh: + enabled: off + tcpip_bypass_ebpf: off + tls_splicing: off + sgx_signer: off + intel_preview: off + linkerd_service_mesh: + enabled: off wireguard: on multus: on firewall: optional - psp: on cert_manager: on registry: on hugepages: on @@ -137,7 +140,6 @@ basic: firewall: optional cndp: optional cndp_dp: optional - psp: on cert_manager: on registry: on hugepages: optional @@ -165,8 +167,8 @@ full_nfv: openssl: on gpu: optional gpu_dp: optional - sgx: optional - sgx_dp: optional + sgx: on + sgx_dp: on kmra: pccs: optional apphsm: optional @@ -187,18 +189,19 @@ full_nfv: prometheus: on collectd: optional telegraf: on - service_mesh: + istio_service_mesh: enabled: on tcpip_bypass_ebpf: on tls_splicing: on sgx_signer: optional intel_preview: optional + linkerd_service_mesh: + enabled: optional wireguard: on multus: on firewall: optional cndp: on cndp_dp: on - psp: on cert_manager: on registry: on hugepages: on @@ -221,8 +224,8 @@ on_prem: sriov_operator: optional sriov_network_dp: on nic_drivers: on - sgx: optional - sgx_dp: optional + sgx: on + sgx_dp: on kmra: pccs: optional apphsm: optional @@ -243,18 +246,19 @@ on_prem: prometheus: on collectd: optional telegraf: on - service_mesh: + istio_service_mesh: enabled: on tcpip_bypass_ebpf: on tls_splicing: on sgx_signer: optional intel_preview: optional + linkerd_service_mesh: + enabled: optional wireguard: on multus: on firewall: optional cndp: optional cndp_dp: optional - psp: on cert_manager: on registry: on hugepages: on @@ -287,17 +291,19 @@ regional_dc: prometheus: on collectd: optional telegraf: on - service_mesh: + istio_service_mesh: enabled: on tcpip_bypass_ebpf: on tls_splicing: on + sgx_signer: optional intel_preview: optional + linkerd_service_mesh: + enabled: optional wireguard: on multus: on firewall: optional cndp: optional cndp_dp: optional - psp: on cert_manager: on registry: on hugepages: optional @@ -319,8 +325,8 @@ remote_fp: sriov_operator: optional sriov_network_dp: on nic_drivers: on - sgx: optional - sgx_dp: optional + sgx: on + sgx_dp: on kmra: pccs: optional apphsm: optional @@ -343,18 +349,19 @@ remote_fp: prometheus: on collectd: on telegraf: optional - service_mesh: + istio_service_mesh: enabled: optional tcpip_bypass_ebpf: optional tls_splicing: optional sgx_signer: optional intel_preview: optional + linkerd_service_mesh: + enabled: optional wireguard: on multus: on firewall: optional cndp: optional cndp_dp: optional - psp: on cert_manager: on registry: on hugepages: on @@ -405,18 +412,19 @@ build_your_own: prometheus: optional collectd: optional telegraf: optional - service_mesh: + istio_service_mesh: enabled: optional tcpip_bypass_ebpf: optional tls_splicing: optional sgx_signer: optional intel_preview: optional + linkerd_service_mesh: + enabled: optional wireguard: optional multus: optional firewall: optional cndp: optional cndp_dp: optional - psp: optional cert_manager: optional registry: optional hugepages: optional diff --git a/generate/profiles_templates/vm/vms_profiles.yml b/generate/profiles_templates/vm/vms_profiles.yml index 9e6e9b04..0ef0c217 100644 --- a/generate/profiles_templates/vm/vms_profiles.yml +++ b/generate/profiles_templates/vm/vms_profiles.yml @@ -48,15 +48,17 @@ # - multus # - cndp # - cndp_dp -# - psp # - cert_manager # - registry # - hugepages +# - istio_service_mesh # enabled # tcpip_bypass_ebpf # tls_splicing # sgx_signer # intel_preview +# - linkerd_service_mesh +# enabled # - intel_ethernet_operator # enabled # flow_config @@ -74,8 +76,6 @@ # sriov_network_dp is enabled on vms # bond_cni is disabled on vms # ddp is disabled on vms -# -# sgx - disabled on vms due to incompatible kernel version # intel_ethernet_operator is disabled on vms --- @@ -103,15 +103,17 @@ access: prometheus: on collectd: optional telegraf: on - service_mesh: - enabled: on - tcpip_bypass_ebpf: on - tls_splicing: on - intel_preview: optional + istio_service_mesh: + enabled: off + tcpip_bypass_ebpf: off + tls_splicing: off + sgx_signer: off + intel_preview: off + linkerd_service_mesh: + enabled: off wireguard: on multus: on firewall: optional - psp: on cert_manager: on registry: on hugepages: on @@ -144,7 +146,6 @@ basic: firewall: optional cndp: optional cndp_dp: optional - psp: on cert_manager: on registry: on hugepages: optional @@ -172,8 +173,8 @@ full_nfv: openssl: on gpu: optional gpu_dp: optional - sgx: optional - sgx_dp: optional + sgx: on + sgx_dp: on kmra: pccs: optional apphsm: optional @@ -194,18 +195,19 @@ full_nfv: prometheus: on collectd: optional telegraf: on - service_mesh: + istio_service_mesh: enabled: on tcpip_bypass_ebpf: on tls_splicing: on sgx_signer: optional intel_preview: optional + linkerd_service_mesh: + enabled: optional wireguard: on multus: on firewall: optional cndp: on cndp_dp: on - psp: on cert_manager: on registry: on hugepages: on @@ -228,8 +230,8 @@ on_prem: sriov_operator: optional sriov_network_dp: on nic_drivers: on - sgx: optional - sgx_dp: optional + sgx: on + sgx_dp: on kmra: pccs: optional apphsm: optional @@ -250,18 +252,19 @@ on_prem: prometheus: on collectd: optional telegraf: on - service_mesh: + istio_service_mesh: enabled: on tcpip_bypass_ebpf: on tls_splicing: on sgx_signer: optional intel_preview: optional + linkerd_service_mesh: + enabled: optional wireguard: on multus: on firewall: optional cndp: optional cndp_dp: optional - psp: on cert_manager: on registry: on hugepages: on @@ -294,17 +297,19 @@ regional_dc: prometheus: on collectd: optional telegraf: on - service_mesh: + istio_service_mesh: enabled: on tcpip_bypass_ebpf: on tls_splicing: on + sgx_signer: optional intel_preview: optional + linkerd_service_mesh: + enabled: optional wireguard: on multus: on firewall: optional cndp: optional cndp_dp: optional - psp: on cert_manager: on registry: on hugepages: optional @@ -326,8 +331,8 @@ remote_fp: sriov_operator: optional sriov_network_dp: on nic_drivers: on - sgx: optional - sgx_dp: optional + sgx: on + sgx_dp: on kmra: pccs: optional apphsm: optional @@ -350,18 +355,19 @@ remote_fp: prometheus: on collectd: on telegraf: optional - service_mesh: + istio_service_mesh: enabled: optional tcpip_bypass_ebpf: optional tls_splicing: optional sgx_signer: optional intel_preview: optional + linkerd_service_mesh: + enabled: optional wireguard: on multus: on firewall: optional cndp: optional cndp_dp: optional - psp: on cert_manager: on registry: on hugepages: on @@ -412,18 +418,19 @@ build_your_own: prometheus: optional collectd: optional telegraf: optional - service_mesh: + istio_service_mesh: enabled: optional tcpip_bypass_ebpf: optional tls_splicing: optional sgx_signer: optional intel_preview: optional + linkerd_service_mesh: + enabled: optional wireguard: optional multus: optional firewall: optional cndp: optional cndp_dp: optional - psp: optional cert_manager: optional registry: optional hugepages: optional diff --git a/playbooks/infra/prepare_vms.yml b/playbooks/infra/prepare_vms.yml index cccfd050..98b91481 100644 --- a/playbooks/infra/prepare_vms.yml +++ b/playbooks/infra/prepare_vms.yml @@ -16,15 +16,22 @@ --- - hosts: vm_host roles: - - { role: "vm/conf_libvirt" } + - role: vm/compile_libvirt + when: + - ansible_distribution == "Ubuntu" and ansible_distribution_version == "22.04" + - sgx_dp_enabled | default(false) + - role: vm/conf_libvirt - hosts: vm_host gather_facts: false roles: - - "vm/manage_imgs" - - "vm/manage_bridges" - - "vm/manage_vms" - - "vm/prepare_cek" + - role: vm/manage_imgs + - role: vm/manage_bridges + - role: vm/manage_vms + - role: vm/vm_sgx_enable + when: + - sgx_dp_enabled | default(false) + - role: vm/prepare_cek - hosts: vm_host gather_facts: false @@ -35,7 +42,7 @@ - hosts: vm_host gather_facts: false roles: - - "vm/prepare_cek_vxlan" + - vm/prepare_cek_vxlan - hosts: vm_host gather_facts: false diff --git a/playbooks/infra/redeploy_cleanup.yml b/playbooks/infra/redeploy_cleanup.yml index fc411d68..945f4fdb 100755 --- a/playbooks/infra/redeploy_cleanup.yml +++ b/playbooks/infra/redeploy_cleanup.yml @@ -65,24 +65,7 @@ debug: msg: "group_vars profile_name is: '{{ group_vars_profile }}'" - - name: set env_profile variable - set_fact: - env_profile: "{{ lookup('env','PROFILE') | default('not defined', True) }}" - - - name: show environment PROFILE variable - debug: - msg: "environment variable PROFILE is: '{{ env_profile }}'" - - - name: assert that environment variable PROFILE and group_vars profile_name matches - assert: - that: "env_profile == group_vars_profile" - fail_msg: > - "Content of environment variable 'PROFILE'='{{ env_profile }}' doesn't match - group_vars variable 'profile_name'='{{ group_vars_profile }}', please fix this discrepancy" - when: - - env_profile != 'not defined' - -- hosts: k8s_cluster +- hosts: "{{ node | default('k8s_cluster') }}" roles: - role: redeploy_cleanup tag: cleanup diff --git a/playbooks/k8s/k8s.yml b/playbooks/k8s/k8s.yml index 3375b26e..0272dbdb 100644 --- a/playbooks/k8s/k8s.yml +++ b/playbooks/k8s/k8s.yml @@ -39,16 +39,9 @@ [EventRateLimit, DefaultStorageClass, {% if always_pull_enabled %}AlwaysPullImages,{% endif %} - NodeRestriction{% if psp_enabled %}, PodSecurityPolicy{% endif %}] - cek_docker_version: >- - {% if ansible_distribution_version >= '21.04' %}latest{% else %}19.03{%endif %} + NodeRestriction, + PodSecurity] kube_config_dir: /etc/kubernetes - - name: prepare facts for Ubuntu >= 21.10 - set_fact: - docker_containerd_version: latest - when: - - ansible_distribution == "Ubuntu" and ansible_distribution_version >= "21.10" - - container_runtime == "docker" - name: set kube_cert_dir set_fact: kube_cert_dir: "{{ kube_config_dir }}/ssl" @@ -62,6 +55,7 @@ set_fact: calico_vxlan_mode: 'CrossSubnet' calico_ipip_mode: 'Never' + calico_feature_detect_override: "ChecksumOffloadBroken=true" when: - kube_network_plugin == "calico" - calico_network_backend == "vxlan" @@ -70,13 +64,18 @@ set_fact: calico_ipip_mode: 'Always' calico_endpoint_to_host_action: "ACCEPT" - calico_wireguard_enabled: "{{ wireguard_enabled | default(true) | bool }}" + calico_vxlan_mode: 'Never' + calico_wireguard_enabled: "{{ wireguard_enabled | default(false) | bool }}" epel_enabled: >- - {% if ansible_distribution == 'Rocky' %}true{% else %}false{% endif %} + {% if ansible_distribution == 'Rocky' and ansible_distribution_version < '9' and wireguard_enabled | d(false) %}true{% else %}false{% endif %} when: - kube_network_plugin == "calico" - calico_network_backend == "bird" - not calico_advanced_options + - name: prepare ADQ facts + set_fact: + kube_proxy_remove: true + when: adq_dp.enabled | d(false) | bool environment: "{{ proxy_env | d({}) }}" any_errors_fatal: true @@ -87,7 +86,6 @@ container_manager: docker docker_iptables_enabled: true docker_dns_servers_strict: false - docker_version: "{{ cek_docker_version }}" when: container_runtime == "docker" - name: add containerd runtime vars set_fact: @@ -107,13 +105,15 @@ skip_downloads: false etcd_deployment_type: host when: container_runtime == "crio" + - name: run kubespray - import_playbook: kubespray/cluster.yml + import_playbook: "{% if scale | default(false) | bool %}kubespray/scale.yml{% else %}kubespray/cluster.yml{% endif %}" vars: kubeadm_enabled: true helm_enabled: true + krew_enabled: true multus_conf_file: /host/etc/cni/net.d/templates/00-multus.conf - nginx_image_tag: 1.21.3 + nginx_image_tag: 1.23.0-alpine calico_node_livenessprobe_timeout: 15 calico_node_readinessprobe_timeout: 15 override_system_hostname: false @@ -142,10 +142,8 @@ service-account-key-file: "{{ kube_cert_dir }}/sa.key" admission-control-config-file: "{{ kube_config_dir }}/admission-control/config.yaml" kube_kubeadm_scheduler_extra_args: - address: 127.0.0.1 profiling: false kube_kubeadm_controller_extra_args: - address: 127.0.0.1 service-account-private-key-file: "{{ kube_cert_dir }}/sa.key" kubelet_config_extra_args: protectKernelDefaults: true @@ -154,7 +152,6 @@ eventRecordQPS: 0 kube_apiserver_request_timeout: 60s kube_apiserver_enable_admission_plugins: "{{ enable_admission_plugins_prepare | from_yaml }}" - podsecuritypolicy_enabled: "{{ psp_enabled }}" kube_encrypt_secret_data: true apiserver_extra_volumes: - name: admission-control-config @@ -172,6 +169,12 @@ - hosts: k8s_cluster tasks: + - name: deploy Cilium + include_role: + name: cilium + when: + - inventory_hostname == groups['kube_control_plane'][0] + - adq_dp.enabled | d(false) | bool - name: restart docker daemon to recreate iptables rules systemd: name=docker state=restarted become: yes @@ -208,6 +211,7 @@ until: results.status == 200 retries: 30 delay: 5 + - name: allow traffic on wireguard interface block: - name: allow traffic on wireguard interface on Ubuntu @@ -230,14 +234,6 @@ - kube_network_plugin == "calico" and calico_network_backend == "bird" - firewall_enabled | default(false) | bool - - name: patch default calico controller configuration to not expose metrics port - command: "/usr/local/bin/calicoctl patch kubeControllersConfiguration default --patch='{ \"spec\": { \"prometheusMetricsPort\": 0 }}'" - when: - - ansible_hostname == groups['kube_control_plane'][0] - - kube_network_plugin == "calico" - - not calico_metrics_enabled - changed_when: true - - name: fix podman dependencies apt: name: @@ -298,12 +294,8 @@ - role: cluster_defaults tags: defaults when: - - cert_manager_enable | default(false) or + - cert_manager_enabled | default(false) or registry_enable | default(false) - - role: cert_manager_install - tags: cert-manager - when: - - cert_manager_enable | default(false) - role: container_registry tags: registry when: diff --git a/playbooks/k8s/kubespray b/playbooks/k8s/kubespray index 5a49ac52..e6976a54 160000 --- a/playbooks/k8s/kubespray +++ b/playbooks/k8s/kubespray @@ -1 +1 @@ -Subproject commit 5a49ac52f96269d7225a16e05fdb5419a53e3c72 +Subproject commit e6976a54e151b43483c89a5054f87a60007f4485 diff --git a/playbooks/k8s/post_deployment_hooks.yml b/playbooks/k8s/post_deployment_hooks.yml new file mode 100644 index 00000000..aeb1aeee --- /dev/null +++ b/playbooks/k8s/post_deployment_hooks.yml @@ -0,0 +1,90 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: copy remote hooks dir and execute .sh and .py scripts on k8s_cluster + hosts: kube_control_plane + tasks: + - name: check if remote hooks dir exists + stat: + path: "{{ hooks_remote }}" + register: remote_hooks_dir + - name: delete hooks dir on remote + file: + state: absent + path: "{{ hooks_remote }}" + when: remote_hooks_dir.stat.exists + - name: copy remote hooks dir to remote + become: true + copy: + src: "{{ hooks_remote }}/" + dest: "{{ hooks_remote }}" + owner: root + group: root + mode: 0644 + ignore_errors: yes + - name: read .sh files in remote hooks dir + find: + paths: "{{ hooks_remote }}" + patterns: "*.sh" + register: sh_remote_files_found + - debug: msg="{{ sh_remote_files_found.files }}" + - name: execute .sh scripts from hooks dir + command: "sh {{ item.path }}" + with_items: "{{ sh_remote_files_found.files }}" + when: sh_remote_files_found.files | length > 0 + - name: read .py files in remote hooks dir + find: + paths: "{{ hooks_remote }}" + patterns: "*.py" + register: py_remote_files_found + - debug: msg="{{ py_remote_files_found.files }}" + - name: execute .py scripts from hooks dir + command: "python3 {{ item.path }}" + with_items: "{{ py_remote_files_found.files }}" + when: py_remote_files_found.files | length > 0 +- name: execute .sh, .py scripts and ansible playbooks on localhost + hosts: localhost + tasks: + - name: read .sh files in local hooks dir + find: + paths: "{{ hooks_local }}" + patterns: "*.sh" + register: sh_local_files_found + - debug: msg="{{ sh_local_files_found.files }}" + - name: execute .sh scripts from local hooks dir + command: "sh {{ item.path }}" + with_items: "{{ sh_local_files_found.files }}" + when: sh_local_files_found.files | length > 0 + - name: read .py files in local hooks dir + find: + paths: "{{ hooks_local }}" + patterns: "*.py" + register: py_local_files_found + - debug: msg="{{ py_local_files_found.files }}" + - name: execute .py scripts from hooks dir + command: "python3 {{ item.path }}" + with_items: "{{ py_local_files_found.files }}" + when: py_local_files_found.files | length > 0 + - name: read .yml and .yaml files in local hooks dir + find: + paths: "{{ hooks_local }}" + patterns: "*.yaml,*.yml" + register: playbooks_local_files_found + - debug: msg="{{ playbooks_local_files_found.files }}" + - name: execute ansible playbooks from hooks dir + command: ansible-playbook -i ../../inventory.ini "{{ item.path }}" + with_items: "{{ playbooks_local_files_found.files }}" + when: playbooks_local_files_found.files | length > 0 diff --git a/playbooks/preflight.yml b/playbooks/preflight.yml index 9ab0a810..dbe6d653 100644 --- a/playbooks/preflight.yml +++ b/playbooks/preflight.yml @@ -130,29 +130,12 @@ debug: msg: "group_vars profile_name is: '{{ group_vars_profile }}'" - - name: set env_profile variable - set_fact: - env_profile: "{{ lookup('env','PROFILE') | default('not defined', True) }}" - - - name: show environment PROFILE variable - debug: - msg: "environment variable PROFILE is: '{{ env_profile }}'" - - - name: assert that environment variable PROFILE and group_vars profile_name matches - assert: - that: "env_profile == group_vars_profile" - fail_msg: > - "Content of environment variable 'PROFILE'='{{ env_profile }}' doesn't match - group_vars variable 'profile_name'='{{ group_vars_profile }}', please fix this discrepancy" - when: - - (env_profile != 'not defined') or - ( env_profile == 'not defined' and group_vars_profile != 'full_nfv' and vm_enabled | default(false) ) - - name: handle the error for check password less access to VM hosts block: - name: check password less access to VM hosts command: "ssh -o PasswordAuthentication=no {{ hostvars[item]['ip'] }} /bin/true" with_items: "{{ groups['vm_host'] }}" + changed_when: false rescue: - name: Print when password less authentication failed debug: @@ -163,18 +146,36 @@ when: - vm_enabled and (not on_vms | default(false)) + - name: check scale variable value + assert: + that: "{{ scale | bool }}" + fail_msg: "scale variable must be set to one of the following values { yes, on, 1, true }, case insensitive" + success_msg: "scale variable is set to {{ scale }} \ncluster scaling is enabled" + when: scale is defined + + - name: check vm_recreate_existing variable value + assert: + that: "not {{ vm_recreate_existing | bool }}" + fail_msg: "vm_recreate_existing has to be false for cluster scaling case" + success_msg: "vm_recreate_existing variable is set to {{ vm_recreate_existing }} for cluster scaling" + when: + - scale is defined + - vm_enabled | default(false) + ############################################## # Prerequisites for Control and Worker Nodes # ############################################## - hosts: k8s_cluster,vm_host any_errors_fatal: true + gather_facts: true vars: cek_supported_distros: [RedHat, Rocky, Ubuntu] - cek_supported_distros_versions: ['8.5', '20.04', '22.04'] + cek_supported_distros_versions: ['8.5', '8.6', '9.0', '20.04', '22.04'] cpusets_ranges: [] cpusets_discretes: [] isolcpus_ranges: [] isolcpus_discretes: [] + isolcpus_list: [] tasks: - name: end play for VM host @@ -183,6 +184,16 @@ - "'vm_host' in group_names" - on_vms is defined and on_vms + - name: fail if deployment is VMRA and isolcpus is enabled + assert: + that: + - isolcpus is not defined + fail_msg: + - "isolcpus variable can't be used on VMRA deployment" + when: + - vm_enabled and (not on_vms | default(false)) + - isolcpus_enabled + - name: read Host Vars for VMs stat: path: "{{ inventory_dir }}/host_vars/{{ item.name }}.yml" @@ -290,11 +301,12 @@ when: - ansible_os_family == "RedHat" - - name: Check if CRI-O runtime is not used on Ubuntu 21.04 + - name: Check if CRI-O runtime is not used on RedHat 9.0 assert: that: container_runtime != "crio" - msg: "CRI-O runtime is not supported on Ubuntu 21.04" - when: ansible_distribution == "Ubuntu" and ansible_distribution_version == "21.04" + msg: "CRI-O runtime is not supported on RedHat 9.0" + when: + - (ansible_os_family == "RedHat" and ansible_distribution_version >= "9.0") - name: check kubernetes and container runtime variables assert: @@ -309,8 +321,8 @@ - name: check k8s version assert: - that: "{{ kube_version is version('v1.21', '>=') }}" - msg: "Minimum supported k8s version is 1.21, please update kube_version variable with correct version" + that: "{{ kube_version is version('v1.22', '>=') }}" + msg: "Minimum supported k8s version is 1.22, please update kube_version variable with correct version" when: kubernetes and not container_runtime_only_deployment - name: assert that Multus is enabled in the config @@ -487,6 +499,7 @@ when: - isolcpus_enabled is defined and isolcpus_enabled | bool + - isolcpus is defined and isolcpus #################################### # Prerequisites for Worker Node(s) # @@ -500,7 +513,8 @@ gpu_pciids: - name: DG1 pciids: ["4907"] - + vars_files: + - "roles/check_machine_type/vars/main.yml" tasks: - name: end play for VM host @@ -630,8 +644,43 @@ when: - dataplane_interfaces is defined and dataplane_interfaces | length > 0 + - name: Print processor info + debug: + msg: "ansible_processor model: {{ ansible_processor[2] }}" + when: (not vm_enabled) or (vm_enabled and (not on_vms | default(false))) + + - name: set CPU ID in preflight + set_fact: + cpu_id: "{{ ansible_processor[2] | regex_search('\\$?\\d\\d\\d\\d\\%?\\@?\\w?|\\d\\d/\\d\\w') }}" + when: (not vm_enabled) or (vm_enabled and (not on_vms | default(false))) + + - name: print CPU ID + debug: + msg: "CPU ID: {{ cpu_id }}" + when: (not vm_enabled) or (vm_enabled and (not on_vms | default(false))) + + - name: check if CPU has confirmed support (preflight) + assert: + that: "cpu_id in {{ lookup('ansible.builtin.vars', 'confirmed_' + configured_arch + '_cpus') }} \ + {% if configured_arch == 'clx' %} or cpu_id in {{ confirmed_clx_ncpus }} {% endif %} \ + or cpu_id in {{ unconfirmed_cpu_models }}" + fail_msg: + "CPU model '{{ cpu_id }}' present on target is not in the confirmed CPUs list.\n + To proceed, please add '{{ cpu_id }}' to the list of unconfirmed CPUs in variable 'unconfirmed_cpu_models' in group_vars.\n + Please be aware that by using CPU model that is not confirmed, some features may not work properly." + when: (not vm_enabled) or (vm_enabled and (not on_vms | default(false))) + - name: check QAT Devices list is configured properly block: + - name: check QAT requirements + include_role: + name: ../roles/bootstrap/install_qat_drivers_services + tasks_from: qat_drivers_preflight + when: + - configured_arch == "spr" + - update_qat_drivers is defined and update_qat_drivers + - qat_devices is defined and qat_devices != [] + - debug: msg: "QAT device(s) defined in host_vars = {{ qat_devices }}" @@ -640,6 +689,16 @@ args: executable: /bin/bash register: lshw_qat_host + changed_when: false + failed_when: false + when: + - on_vms is not defined or not on_vms + + - name: assert QAT PCIIDs + assert: + that: "lshw_qat_host.rc == 0" + fail_msg: "No QAT devices were found in system. Please configure properly the QAT PCIIDs in group_vars or disable this feature" + success_msg: "QAT PCIIDs verification completed" when: - on_vms is not defined or not on_vms @@ -648,6 +707,7 @@ args: executable: /bin/bash register: lshw_qat_vms + changed_when: false when: - on_vms is defined and on_vms @@ -782,7 +842,7 @@ - nfd_enabled fail_msg: "Deployment of Intel Device Plugins requires nfd_enabled set to 'true' in group_vars" success_msg: "NFD configuration verified" - when: + when: - (qat_dp_enabled | default(false)) or (sgx_dp_enabled | default(false)) or (gpu_dp_enabled | default(false)) @@ -1011,15 +1071,15 @@ - name: fail if istio version is not compatible with current k8s version assert: that: - - "{{ service_mesh.version is version('1.10', '>=') }}" + - "{{ istio_service_mesh.version is version('1.10', '>=') }}" msg: | - "Selected Istio service mesh version: '{{ service_mesh.version }}' is not compatible with selected k8s version: '{{ kube_version }}'" + "Selected Istio service mesh version: '{{ istio_service_mesh.version }}' is not compatible with selected k8s version: '{{ kube_version }}'" "Please, refer to the compatibility table at https://istio.io/latest/docs/releases/supported-releases/" when: - kubernetes - not container_runtime_only_deployment - - service_mesh is defined - - service_mesh.version is defined + - istio_service_mesh is defined + - istio_service_mesh.version is defined # STORY: "TCS depends on KMRA AppHSM and KMRA PCCS" - name: check if KMRA Apps are enabled when TCS is enabled assert: @@ -1029,7 +1089,7 @@ msg: "KMRA AppHSM and PCCS applications should be enabled in order to have TCS functional." when: - tcs.enabled | default(false) or tca.enabled | default(false) - - configured_arch in ['icx', 'spr'] + - configured_arch in ['icx'] # STORY: "TCA depends on TCS" - name: check if TCS is enabled when TCA enabled @@ -1039,37 +1099,46 @@ msg: "TCS should be enabled in order to have TCA functional." when: - tca.enabled | default(false) - - configured_arch in ['icx', 'spr'] + - configured_arch in ['icx'] -# STORY: "service_mesh.sgx_signer' option is available only for icx and spr platforms" +# STORY: "istio_service_mesh.sgx_signer' option is available only for icx platforms" - name: particular service mesh options are available only for specific platforms assert: that: - - "{{ not service_mesh.sgx_signer.enabled | default(false) }}" - msg: "'service_mesh.sgx_signer' option is not available for the configured platform architecture." + - "{{ not istio_service_mesh.sgx_signer.enabled | default(false) }}" + msg: "'istio_service_mesh.sgx_signer' option is not available for the configured platform architecture." when: - - service_mesh.enabled | default(false) - - configured_arch not in ['icx', 'spr'] + - istio_service_mesh.enabled | default(false) + - configured_arch not in ['icx'] -# STORY: TCS is available only for icx and spr platforms" +# STORY: TCS is available only for icx platforms" - name: TCS is available only for specific platforms assert: that: - "{{ not tcs.enabled | default(false) }}" msg: "TCS is not available for the configured platform architecture." when: - - configured_arch not in ['icx', 'spr'] + - configured_arch not in ['icx'] -# STORY: TCA is available only for icx and spr platforms" + - name: Make sure istio and linkerd are not enabled at the same time + assert: + that: + - "{{ not linkerd_service_mesh.enabled | default (false) }}" + fail_msg: "You should not have enabled Istio and LinkerD service mesh on at the same time. + Please choose and enable only one service mesh." + when: + - istio_service_mesh.enabled | default(false) + +# STORY: TCA is available only for icx platforms" - name: TCA is available only for specific platforms assert: that: - "{{ not tca.enabled | default(false) }}" msg: "TCA is not available for the configured platform architecture." when: - - configured_arch not in ['icx', 'spr'] + - configured_arch not in ['icx'] -# STORY: "service_mesh.sgx_signer' option depends on KMRA AppHSM, KMRA PCCS, TCS, TCA" +# STORY: "istio_service_mesh.sgx_signer' option depends on KMRA AppHSM, KMRA PCCS, TCS, TCA" - name: check if KMRA Apps, TCS and TCA are enabled when service mesh sgx_signer option is enabled assert: that: @@ -1079,8 +1148,8 @@ - "{{ tca.enabled | default(false ) }}" msg: "In order to use service mesh sgx-signer option, please, enable KMRA AppHSM, KMRA PCCS, TCS, TCA." when: - - service_mesh.sgx_signer.enabled | default(false) - - configured_arch in ['icx', 'spr'] + - istio_service_mesh.sgx_signer.enabled | default(false) + - configured_arch in ['icx'] # STORY: TEMPORARY: "ovs dpdk version requirements" - debug: @@ -1098,7 +1167,7 @@ - name: check OVS DPDK compatibility assert: that: - (ovs_version >= 'v2.17.0' and ovs_version <= 'v2.17.1') and (dpdk_version >= '21.11' and dpdk_version <= '22.03') + (ovs_version >= 'v2.17.0' and ovs_version <= 'v2.17.2') and (dpdk_version >= '21.11' and dpdk_version <= '22.07') or (ovs_version < 'v2.16.2' and ovs_version >= 'v2.16.0') and dpdk_version == '21.08' or ovs_version == 'v2.15.0' and dpdk_version == '20.11' or ovs_version == 'v2.14.2' and dpdk_version == '19.11.6' @@ -1114,23 +1183,23 @@ - ovs_version is defined #host_vars - ovs_dpdk_enabled is defined and ovs_dpdk_enabled #host_vars - - name: check settings for Intel Power Operator + - name: check settings for Intel Power Manager assert: that: - intel_power_manager.power_profiles | length > 0 - intel_power_manager.power_nodes | length > 0 - fail_msg: "Intel Power Operator is enabled, but either Power Profiles or Power Nodes are not specified in group vars." + fail_msg: "Intel Power Manager is enabled, but either Power Profiles or Power Nodes are not specified in group vars." when: intel_power_manager is defined and intel_power_manager.enabled - name: check if power_nodes are available in inventory assert: that: - item in groups['kube_node'] - fail_msg: "Intel Power Operator power_nodes have to be present in inventory. '{{ item }}' is not there: {{ groups['kube_node'] }}" + fail_msg: "Intel Power Manager power_nodes have to be present in inventory. '{{ item }}' is not there: {{ groups['kube_node'] }}" loop: "{{ intel_power_manager.power_nodes }}" when: intel_power_manager is defined and intel_power_manager.enabled - - name: check if Intel Power Operator is enabled, the ISST features should be disabled + - name: check if Intel Power Manager is enabled, the ISST features should be disabled assert: that: - not (sst_bf_configuration_enabled is defined and sst_bf_configuration_enabled or @@ -1138,16 +1207,16 @@ sst_tf_configuration_enabled is defined and sst_tf_configuration_enabled or sst_pp_configuration_enabled is defined and sst_pp_configuration_enabled) fail_msg: - - "Currently Intel Power Operator and Intel SST features are mutually exclusive." + - "Currently Intel Power Manager and Intel SST features are mutually exclusive." - "Please disable ISST (SST-BF, SST-CP, SST-TF and SST-PP) in host vars." when: intel_power_manager is defined and intel_power_manager.enabled - - name: check if Intel Power Operator is build locally on containerd/cri-o runtime + - name: check if Intel Power Manager is build locally on containerd/cri-o runtime assert: that: intel_power_manager.build_image_locally fail_msg: - - "Currently Intel Power Operator must be build locally on containerd and cri-o runtime" - - "Please set build_image_locally as true in Intel Power Operator settings" + - "Currently Intel Power Manager must be build locally on containerd and cri-o runtime" + - "Please set build_image_locally as true in Intel Power Manager settings in group_vars" when: intel_power_manager is defined and intel_power_manager.enabled and container_runtime in ["crio", "containerd"] - name: check Intel Ethernet Operator configuration @@ -1178,10 +1247,10 @@ "Deploying the Intel SR-IOV FEC Operator is supported only in the 'access', 'full_nfv', or 'byo' profiles. Please correct the group_vars configuration" - - name: FEC Operator - check distro is Ubuntu 22.04 (generic or realtime) + - name: FEC Operator - check distro is Ubuntu 22.04 or RHEL 8.6 (generic or realtime) assert: - that: ansible_distribution_version == "22.04" - msg: "Deploying the Intel SR-IOV FEC Operator is supported only on Ubuntu 22.04. Please change the o/s or correct group_vars configuration" + that: ansible_distribution_version == "22.04" or ansible_distribution_version == "8.6" + msg: "Deploying the Intel SR-IOV FEC Operator is supported only on Ubuntu 22.04 or RHEL 8.6. Please change the o/s or correct group_vars configuration" - debug: msg="fec_acc is {{ fec_acc }}" @@ -1212,7 +1281,6 @@ # assert: # that: fec_acc PCIID is present in the host and its DevID in supported list # msg: "Deploying the Intel SR-IOV FEC Operator requires the ACC card to be present in the host. Please correct the host h/w configuration" - when: intel_sriov_fec_operator_enabled | default(false) | bool - name: check Intel FlexRAN requirements @@ -1221,6 +1289,40 @@ tasks_from: flexran_preflight when: intel_flexran_enabled | default(false) + - name: check OS when DLB or DSA is enabled + assert: + that: (ansible_distribution == "Ubuntu" and ansible_distribution_version == '20.04' and (update_kernel or ansible_kernel[0:4] is version('5.14', '>='))) or + (ansible_distribution == "Ubuntu" and ansible_distribution_version >= '22.04') or + (ansible_os_family == "RedHat" and ansible_distribution_version >= '8.6') + success_msg: "DLB or DSA can succesfully be enabled on {{ ansible_distribution }} {{ ansible_distribution_version }}" + msg: | + DLB and DSA features are not supported on Ubuntu 20.04(with stock kernel) and RHEL/Rocky < 9. + {% if ansible_distribution == "Ubuntu" and ansible_kernel[0:4] is version('5.14', '<') %} + Found {{ ansible_kernel[0:4] }} kernel on Ubuntu 20.04, but kernel in version 5.14 or higher is required. + If you wish to use DLB or DSA feature set 'update_kernel' as true. + {% endif %} + when: configure_dsa_devices | d(false) or configure_dlb_devices | d(false) +# SGX on VMs require Ubuntu 22.04 for VM Host and VM Image + - name: Check requirements to enable Intel SGX on VMs + block: + - name: Intel SGX - check if ansible_host distro is Ubuntu 22.04 + assert: + that: + - ansible_distribution == "Ubuntu" + - ansible_distribution_version == "22.04" + msg: "Deploying SGX on VMRA is supported only on Ubuntu 22.04 VM host. Please change the o/s for VM host" + + - name: Intel SGX - check if vm_image distro is Ubuntu 22.04 + assert: + that: + - vm_image_distribution | default("ubuntu") == "ubuntu" + - vm_image_version_ubuntu | default("20.04") == "20.04" + msg: "Deploying SGX on VMRA is supported only on Ubuntu 20.04 VM image. Please change the o/s for VM image" + when: + - vm_enabled | default(false) + - sgx_dp_enabled | default(false) + - inventory_hostname in groups['vm_host'] + #################################### # Prerequisites for Role specific # #################################### @@ -1228,6 +1330,23 @@ tasks: # STORY: "MinIO requires number of nodes should be more than the minimum number of nodes defined in group_vars/all/minio_tenant_servers" + - name: display MinIO requirement with multus-service + fail: + msg: | + MinIO deployment for k8s service on additional interfaces with multiple interface/Multus CNI(multus-service) requires: + - group_vars/kube_network_plugin: flannel + - group_vars/container_runtime: crio + - group_vars/minio_build_image_locally: true + - host_vars/dataplane_interfaces should be defined + - host_vars/dataplane_interfaces[*].minio_vf:true on interfaces + - host_vars/dataplane_interfaces[*].sriov_vfs[*].vf_0: 'iaxf' on interfaces + - SRIOV network ports should be connected with VEB/VEPA enabled + run_once: yes + when: + - kubernetes + - minio_enabled is defined and minio_enabled + ignore_errors: true + - name: check MinIO configuration include_role: name: minio_install diff --git a/playbooks/remove_node.yml b/playbooks/remove_node.yml new file mode 100644 index 00000000..88d94663 --- /dev/null +++ b/playbooks/remove_node.yml @@ -0,0 +1,22 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: prepare for removing worker node(s) + import_playbook: k8s/kubespray/remove-node.yml + +- name: prepare for Intel cleanup + import_playbook: infra/redeploy_cleanup.yml + when: kubernetes | default(true) diff --git a/playbooks/versions.yml b/playbooks/versions.yml new file mode 100644 index 00000000..8f6c9d9d --- /dev/null +++ b/playbooks/versions.yml @@ -0,0 +1,355 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +# This playbook gathers and displays component versions spread across different roles +- hosts: localhost + vars: + - versions_output_file: "{{ playbook_dir }}/../versions_output.csv" + - versions_parsing_errors_file: "{{ playbook_dir }}/../versions_parsing_errors" + tasks: + - name: Show versions_output_file name + debug: + msg: "versions_output_file is: {{ versions_output_file }}" + - name: Show versions_parsing_errors_file name + debug: + msg: "versions_parsing_errors_file is: {{ versions_parsing_errors_file }}" + + - name: Show variable values + block: + - shell: "echo -n '{{ item.description }}', && scripts/yaml_version_reader {{ item.var_file_path }} {{ item.shortname }}" + args: + chdir: ".." + register: item_value + loop: + - { 'description' : 'Telegraf', + 'shortname' : 'telegraf_image_tag', + 'var_file_path' : 'roles/telegraf_install/defaults/main.yml' + } + - { 'description' : 'k8s node-exporter', + 'var_file_path' : 'playbooks/infra/roles/kube_prometheus/files/kube-prometheus/node-exporter-clusterRoleBinding.yaml', + 'shortname' : "metadata\\'\\]\\[\\'labels\\'\\]\\[\\'app.kubernetes.io/version" + } + - { 'description' : 'k8s prometheus-operator', + 'var_file_path' : 'playbooks/infra/roles/kube_prometheus/files/kube-prometheus/setup/prometheus-operator-clusterRole.yaml', + 'shortname' : "metadata\\'\\]\\[\\'labels\\'\\]\\[\\'app.kubernetes.io/version" + } + - { 'description' : 'k8s kube-rbac-proxy', + 'var_file_path' : 'playbooks/infra/roles/collectd_install/defaults/main.yml', + 'shortname' : "image_rbac_proxy\\'\\]\\[\\'version" + } + - { 'description' : 'Node Feature Discovery', + 'var_file_path' : 'roles/nfd_install/charts/node-feature-discovery/Chart.yaml', + 'shortname' : 'appVersion' + } + - { 'description' : 'Vector Packet Processing', + 'var_file_path' : 'playbooks/infra/roles/userspace_cni_install/defaults/main.yml', + 'shortname' : 'vpp_version' + } + - { 'description' : 'SR-IOV CNI', + 'var_file_path' : 'roles/sriov_cni_install/defaults/main.yml', + 'shortname' : 'sriov_cni_version', + } + - { 'description' : 'SR-IOV network device plugin', + 'var_file_path' : 'roles/sriov_dp_install/defaults/main.yml', + 'shortname' : 'sriov_net_dp_tag' + } + - { 'description' : 'sriov network operator', + 'var_file_path' : 'roles/sriov_network_operator_install/defaults/main.yml', + 'shortname' : "sriov_network_operator_images\\'\\]\\[\\'operator" + } + - { 'description' : 'whereabouts service', + 'var_file_path' : 'roles/whereabouts_install/defaults/main.yml', + 'shortname' : 'whereabouts_commit_hash' + } + - { 'description' : 'intel dp operator', + 'var_file_path' : 'roles/intel_dp_operator/defaults/main.yml', + 'shortname' : 'intel_dp_operator_version' + } + - { 'description' : 'QAT device plugin', + 'var_file_path' : 'roles/qat_dp_install/defaults/main.yml', + 'shortname' : 'intel_qat_dp_version' + } + - { 'description' : 'GPU device plugin', + 'var_file_path' : 'roles/gpu_dp_install/defaults/main.yml', + 'shortname' : 'intel_gpu_dp_version' + } + - { 'description' : 'SGX device plugin', + 'var_file_path' : 'roles/sgx_dp_install/defaults/main.yaml', + 'shortname' : 'intel_sgx_dp_version' + } + - { 'description' : 'DLB device plugin (internal for RA 22.05)', + 'var_file_path' : 'roles/dlb_dp_install/defaults/main.yml', + 'shortname' : 'intel_dlb_dp_version' + } + - { 'description' : 'DSA device plugin (internal for RA 22.05)', + 'var_file_path' : 'roles/dsa_dp_install/defaults/main.yml', + 'shortname' : 'intel_dsa_dp_version' + } + - { 'description' : 'Userspace CNI', + 'var_file_path' : 'roles/userspace_cni_install/defaults/main.yml', + 'shortname' : 'userspace_cni_version' + } + - { 'description' : 'Bond CNI plugin', + 'var_file_path' : 'roles/bond_cni_install/defaults/main.yml', + 'shortname' : 'bond_cni_version' + } + - { 'description' : 'Intel® Ethernet Drivers i40e', + 'var_file_path' : 'playbooks/infra/roles/bootstrap/update_nic_drivers/defaults/main.yml', + 'shortname' : 'i40e_driver_version' + } + - { 'description' : 'Intel® Ethernet Drivers ice', + 'var_file_path' : 'playbooks/infra/roles/bootstrap/update_nic_drivers/defaults/main.yml', + 'shortname' : 'ice_driver_version' + } + - { 'description' : 'Intel® Ethernet Drivers iavf', + 'var_file_path' : 'playbooks/infra/roles/bootstrap/update_nic_drivers/defaults/main.yml', + 'shortname' : 'iavf_driver_version' + } + - { 'description' : 'Intel Ethernet Operator', + 'var_file_path' : 'playbooks/infra/roles/intel_ethernet_operator/defaults/main.yml', + 'shortname' : 'intel_ethernet_operator_git_ref' + } + - { 'description' : 'Intel Ethernet UFT', + 'var_file_path' : 'playbooks/infra/roles/intel_ethernet_operator/defaults/main.yml', + 'shortname' : 'uft_git_ref' + } + - { 'description' : 'OpenSSL QAT Engine', + 'var_file_path' : 'playbooks/infra/roles/openssl_engine_install/defaults/main.yml', + 'shortname' : 'openssl_engine_version' + } + - { 'description' : 'Intel(R) ipsec-mb', + 'var_file_path' : 'playbooks/infra/roles/openssl_engine_install/defaults/main.yml', + 'shortname' : 'intel_ipsec_version' + } + - { 'description' : 'Intel® SGX DCAP Drivers (ubuntu)', + 'var_file_path' : 'playbooks/infra/roles/bootstrap/configure_sgx/defaults/main.yml', + 'shortname' : 'dcap_driver_series_ubuntu_20' + } + - { 'description' : 'Intel® SGX DCAP Drivers (rhel)', + 'var_file_path' : 'playbooks/infra/roles/bootstrap/configure_sgx/defaults/main.yml', + 'shortname' : 'dcap_driver_series_rhel' + } + - { 'description' : 'Intel® SGX SDK (ubuntu)', + 'var_file_path' : 'playbooks/infra/roles/bootstrap/configure_sgx/defaults/main.yml', + 'shortname' : 'sgx_sdk_version_ubuntu_20' + } + - { 'description' : 'Intel® SGX SDK (rhel)', + 'var_file_path' : 'playbooks/infra/roles/bootstrap/configure_sgx/defaults/main.yml', + 'shortname' : 'sgx_sdk_version_rhel' + } + - { 'description' : 'Intel® KMRA AppHSM', + 'var_file_path' : 'playbooks/intel/roles/kmra_install/defaults/main.yml', + 'shortname' : "kmra_defaults\\'\\]\\[\\'apphsm\\'\\]\\[\\'image_tag" + } + - { 'description' : 'Intel® KMRA CTK', + 'var_file_path' : 'playbooks/intel/roles/kmra_install/defaults/main.yml', + 'shortname' : "kmra_defaults\\'\\]\\[\\'ctk_loadkey_demo\\'\\]\\[\\'image_tag" + } + - { 'description' : 'Intel® KMRA PCCS', + 'var_file_path' : 'playbooks/intel/roles/kmra_install/defaults/main.yml', + 'shortname' : "kmra_defaults\\'\\]\\[\\'pccs\\'\\]\\[\\'image_tag" + } + - { 'description' : 'istio operator', + 'var_file_path' : 'playbooks/infra/roles/istio_service_mesh/charts/istioctl/values.yaml', + 'shortname' : "image\\'\\]\\[\\'tag" + } +# - { 'description' : 'istio-intel/pilot-cryptomb (internal)', +# 'var_file_path' : 'playbooks/infra/roles/istio_service_mesh/files/profiles/intel-cryptomb.yaml', +# 'shortname' : "spec\\'\\]\\[\\'values\\'\\]\\[\\'pilot\\'\\]\\[\\'image" +# } + - { 'description' : 'istio-intel/pilot-cryptomb (internal)', + 'var_file_path' : 'playbooks/infra/roles/istio_service_mesh/files/profiles/intel-cryptomb.yaml', + 'shortname' : "spec\\'\\]\\[\\'tag" + } +# - { 'description' : 'istio-intel/pilot-cryptomb (internal)', +# 'var_file_path' : 'playbooks/infra/roles/istio_service_mesh/files/profiles/intel-cryptomb.yaml', +# 'shortname' : "spec\\'\\]\\[\\'values\\'\\]\\[\\'global\\'\\]\\[\\'proxy\\'\\]\\[\\'image" +# } + - { 'description' : 'istio-intel/proxyv2-cryptomb (internal)', + 'var_file_path' : 'playbooks/infra/roles/istio_service_mesh/files/profiles/intel-cryptomb.yaml', + 'shortname' : "spec\\'\\]\\[\\'tag" + } +# - { 'description' : 'istio-intel/proxyv2-openssl (internal)', +# 'var_file_path' : 'playbooks/infra/roles/istio_service_mesh/files/profiles/intel-qat-sw.yaml', +# 'shortname' : "spec\\'\\]\\[\\'values\\'\\]\\[\\'global\\'\\]\\[\\'proxy\\'\\]\\[\\'image" +# } + - { 'description' : 'istio-intel/proxyv2-openssl (internal)', + 'var_file_path' : 'playbooks/infra/roles/istio_service_mesh/files/profiles/intel-qat-sw.yaml', + 'shortname' : "spec\\'\\]\\[\\'tag" + } + - { 'description' : 'istio-intel/tcpip-bypass-ebpf', + 'var_file_path' : 'playbooks/infra/roles/istio_service_mesh/vars/main.yml', + 'shortname' : "istio_service_mesh_defaults\\'\\]\\[\\'tcpip_bypass_ebpf\\'\\]\\[\\'version" + } + - { 'description' : 'Intel Trusted Attestation Controller', + 'var_file_path' : 'playbooks/infra/roles/tca_install/defaults/main.yml', + 'shortname' : 'tca_git_version' + } + - { 'description' : 'Intel Trusted Certificate Issuer', + 'var_file_path' : 'playbooks/intel/roles/tcs_install/defaults/main.yml', + 'shortname' : 'tcs_git_version' + } + - { 'description' : 'CNDP DP', + 'var_file_path' : 'playbooks/infra/roles/cndp_dp_install/defaults/main.yml', + 'shortname' : 'intel_cndp_dp_version' + } + - { 'description' : 'CNDP CNI', + 'var_file_path' : 'playbooks/infra/roles/cndp_install/defaults/main.yml', + 'shortname' : 'intel_cndp_version' + } + - { 'description' : 'MinIO operator', + 'var_file_path' : 'playbooks/infra/roles/minio_install/charts/operator/values.yaml', + 'shortname' : "operator\\'\\]\\[\\'image\\'\\]\\[\\'tag" + } + - { 'description' : 'MinIO console', + 'var_file_path' : 'playbooks/infra/roles/minio_install/charts/operator/values.yaml', + 'shortname' : "console\\'\\]\\[\\'image\\'\\]\\[\\'tag" + } + - { 'description' : 'Power Manager Operator', + 'var_file_path' : 'playbooks/infra/roles/intel_power_manager/defaults/main.yml', + 'shortname' : 'intel_power_manager_git_ref' + } + - { 'description' : 'Intel® RDT telemetry plugin', + 'var_file_path' : 'playbooks/infra/roles/intel_power_manager/defaults/main.yml', + 'shortname' : 'intel_appqos_git_ref' + } + - { 'description' : 'FEC Operator', + 'var_file_path' : 'playbooks/infra/roles/intel_sriov_fec_operator/defaults/main.yml', + 'shortname' : 'intel_sriov_fec_operator_img_ver' + } + - { 'description' : 'FEC Operator SDK', + 'var_file_path' : 'playbooks/infra/roles/operator_framework/defaults/main.yml', + 'shortname' : 'operator_sdk_git_ref' + } + - { 'description' : 'Operator Package Manager', + 'var_file_path' : 'playbooks/infra/roles/intel_sriov_fec_operator/defaults/main.yml', + 'shortname' : 'opm_ver' + } + - { 'description' : 'Data Plane Development Kit', + 'var_file_path' : 'host_vars/host-for-vms-1.yml', + 'shortname' : 'dpdk_version' + } + - { 'description' : 'Open vSwitch with DPDK', + 'var_file_path' : 'host_vars/host-for-vms-1.yml', + 'shortname' : 'ovs_version' + } + - { 'description' : 'Comms DDP Profiles', + 'var_file_path' : 'host_vars/host-for-vms-1.yml', + 'shortname' : "dataplane_interfaces\\'\\]\\[0\\]\\[\\'ddp_profile" + } + - { 'description' : 'Intel® QAT Drivers', + 'var_file_path' : 'playbooks/infra/roles/bootstrap/install_qat_drivers_services/defaults/main.yml', + 'shortname' : 'qat_drivers_version' + } + - { 'description' : 'OpenSSL', + 'var_file_path' : 'roles/bootstrap/configure_openssl/defaults/main.yml', + 'shortname' : 'openssl_version' + } + - { 'description' : 'kube_version', + 'var_file_path' : 'playbooks/k8s/kubespray/extra_playbooks/roles/kubespray-defaults/defaults/main.yaml', + 'shortname' : 'kube_version' + } + - { + 'description' : 'linkerd_version', + 'var_file_path' : 'roles/linkerd_service_mesh/defaults/main.yml', + 'shortname' : 'linkerd_version' + } + - { + 'description' : 'cadvisor_helm_charts', + 'var_file_path' : 'roles/cadvisor_install/defaults/main.yaml', + 'shortname' : 'cadvisor_helm_charts_version' + } + - { + 'description' : 'intel_adq_dp_version', + 'var_file_path' : 'roles/adq_dp_install/defaults/main.yml', + 'shortname' : 'intel_adq_dp_version' + } + - { + 'description' : 'adq_ice_fw_required_version', + 'var_file_path' : 'roles/bootstrap/update_nic_firmware/defaults/main.yml', + 'shortname' : 'adq_ice_fw_required_version' + } + - { + 'description' : 'cilium_version', + 'var_file_path' : 'roles/cilium/defaults/main.yml', + 'shortname' : 'cilium_version' + } + - { + 'description' : 'cert_manager_version', + 'var_file_path' : 'playbooks/k8s/kubespray/extra_playbooks/roles/download/defaults/main.yml', + 'shortname' : 'cert_manager_version' + } + - { + 'description' : 'telemetry aware scheduling', + 'var_file_path' : 'roles/platform_aware_scheduling_install/defaults/main.yml', + 'shortname' : 'tas_extender_image_tag_default' + } + - { + 'description' : 'gpu aware scheduling', + 'var_file_path' : 'roles/platform_aware_scheduling_install/defaults/main.yml', + 'shortname' : 'gas_extender_image_tag_default' + } + - { + 'description' : 'jaeger version', + 'var_file_path' : 'roles/jaeger_install/defaults/main.yml', + 'shortname' : "jaeger_defaults\\'\\]\\[\\'image" + } + - { 'description' : 'crio_version', + 'var_file_path' : 'playbooks/infra/roles/container_engine/crio/defaults/main.yml', + 'shortname' : 'crio_version' + } + - { 'description' : 'cluster_name', + 'var_file_path' : 'playbooks/k8s/kubespray/roles/kubespray-defaults/defaults/main.yaml', + 'shortname' : 'cluster_name' + } + - { 'description' : 'containerd_version', + 'var_file_path' : 'roles/container_engine/containerd_common/defaults/main.yml', + 'shortname' : 'containerd_version' + } + - { 'description' : 'multus_version', + 'var_file_path' : 'playbooks/k8s/kubespray/roles/download/defaults/main.yml', + 'shortname' : 'multus_version' + } + - { 'description' : 'nfd_version', + 'var_file_path' : 'playbooks/infra/roles/nfd_install/defaults/main.yml', + 'shortname' : 'nfd_image_tag' + } + - { 'description' : 'cndp_version', + 'var_file_path' : 'playbooks/intel/roles/cndp_install/defaults/main.yml', + 'shortname' : 'intel_cndp_version' + } + - { 'description' : 'nginx_version', + 'var_file_path' : 'playbooks/infra/roles/kmra_install/defaults/main.yml', + 'shortname' : "kmra_defaults\\'\\]\\[\\'ctk_loadkey_demo\\'\\]\\[\\'nginx_image_tag" + } + - name: remove old version parsing results + file: + path: "{{ item }}" + state: absent + failed_when: false + with_items: + - "{{ versions_output_file }}" + - "{{ versions_parsing_errors_file }}" + - lineinfile: + path: "{{ versions_output_file }}" + line: "{{ item.stdout }}" + create: yes + loop: "{{ item_value.results }}" + - lineinfile: + path: '{{ versions_parsing_errors_file }}' + line: "{{ item.stderr }}" + create: yes + loop: "{{ item_value.results }}" diff --git a/playbooks/vm.yml b/playbooks/vm.yml index cd274fe9..f0bb8835 100644 --- a/playbooks/vm.yml +++ b/playbooks/vm.yml @@ -16,14 +16,16 @@ --- # VM enabled # If VM is enabled then Virtual Machines are created and CEK is deployed into those VMs - - name: preflight checks import_playbook: preflight.yml - name: configure target hosts OS layer - import_playbook: infra/{{ lookup('env', 'PROFILE') | default('full_nfv', True) }}.yml + vars: + group_vars_content: "{{ lookup('file', '../group_vars/all.yml') | from_yaml }}" + import_playbook: "infra/{{ group_vars_content['profile_name'] }}.yml" - name: prepare VMs for VM deploymnet import_playbook: infra/prepare_vms.yml - name: deploy CEK on VMs vars: on_vms: True - import_playbook: "{{ lookup('env', 'PROFILE') | default('full_nfv', True) }}.yml" + group_vars_content: "{{ lookup('file', '../group_vars/all.yml') | from_yaml }}" + import_playbook: "{{ group_vars_content['profile_name'] }}.yml" diff --git a/requirements.txt b/requirements.txt index 3b6e1ce6..e508631b 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,10 +1,11 @@ -ansible==4.10.0 +ansible==5.7.1 +ansible-core==2.12.5 cryptography==3.3.2 jinja2==2.11.3 netaddr==0.7.19 pbr==5.4.4 jmespath==0.9.5 ruamel.yaml==0.16.10 -ruamel.yaml.clib==0.2.4 +ruamel.yaml.clib==0.2.6 MarkupSafe==1.1.1 ipaddr diff --git a/roles/service_mesh_install/defaults/main.yml b/roles/adq_dp_install/defaults/main.yml similarity index 61% rename from roles/service_mesh_install/defaults/main.yml rename to roles/adq_dp_install/defaults/main.yml index 6060e60c..3e68131e 100644 --- a/roles/service_mesh_install/defaults/main.yml +++ b/roles/adq_dp_install/defaults/main.yml @@ -14,7 +14,7 @@ ## limitations under the License. ## --- -service_mesh_download_url: "https://github.com/istio/istio/releases/download/{{ service_mesh.version }}/istio-{{ service_mesh.version }}-linux-amd64.tar.gz" -service_mesh_release_dir: "{{ (project_root_dir, 'istio') | path_join }}" -service_mesh_charts_dir: "{{ (project_root_dir, 'charts', 'istio') | path_join }}" -service_mesh_profiles_dir: "{{ (service_mesh_charts_dir, 'profiles') | path_join }}" +intel_adq_dp_git_url: "https://github.com/intel/adq-k8s-plugins.git" +intel_adq_dp_dir: "{{ (project_root_dir, 'intel-adq-dp') | path_join }}" +intel_adq_dp_version: "22.06-1" +adq_dp_namespace: kube-system diff --git a/roles/adq_dp_install/files/adq-cluster-role.yml b/roles/adq_dp_install/files/adq-cluster-role.yml new file mode 100644 index 00000000..14ac8f57 --- /dev/null +++ b/roles/adq_dp_install/files/adq-cluster-role.yml @@ -0,0 +1,16 @@ +--- +kind: ClusterRole +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: adq +rules: + - apiGroups: [""] + resources: + - nodes/proxy + verbs: + - get + - apiGroups: [""] + resources: + - nodes + verbs: + - list diff --git a/roles/adq_dp_install/tasks/main.yml b/roles/adq_dp_install/tasks/main.yml new file mode 100644 index 00000000..b72e5496 --- /dev/null +++ b/roles/adq_dp_install/tasks/main.yml @@ -0,0 +1,126 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- block: + - name: set_fact for Cilium + set_fact: + adq_cilium_deploy: true + + - name: create Intel ADQ Device Plugin directory + file: + path: "{{ intel_adq_dp_dir }}" + state: directory + mode: '0755' + + - name: clone Intel ADQ Device Plugin repository + git: + repo: "{{ intel_adq_dp_git_url }}" + dest: "{{ (intel_adq_dp_dir, 'adq-k8s-plugins') | path_join }}" + version: "{{ intel_adq_dp_version }}" + + - name: copy Intel ADQ cluster role + copy: + src: adq-cluster-role.yml + dest: "{{ (intel_adq_dp_dir, 'adq-cluster-role.yml') | path_join }}" + owner: root + group: root + mode: '0644' + + - name: template Intel ADQ files + template: + src: "{{ item.src }}" + dest: "{{ (intel_adq_dp_dir, item.dst) | path_join }}" + force: yes + mode: '0644' + loop: + - {src: 'adq-cluster-config.yml.j2', dst: 'adq-cluster-config.yml'} + - {src: 'adq-cluster-role-binding.yml.j2', dst: 'adq-cluster-role-binding.yml'} + - {src: 'adq-service-account.yml.j2', dst: 'adq-service-account.yml'} + - {src: 'adq-cni-dp-ds.yml.j2', dst: 'adq-cni-dp-ds.yml'} + + - name: create Intel ADQ cluster config + command: "kubectl apply -f {{ (intel_adq_dp_dir, 'adq-cluster-config.yml') | path_join }}" + changed_when: true + + - name: build Intel ADQ Device Plugin images + command: "podman build --build-arg=BUILD_VERSION={{ intel_adq_dp_version }} -f {{ item.file }} -t {{ item.tag }} ." + changed_when: true + args: + chdir: "{{ (intel_adq_dp_dir, 'adq-k8s-plugins') | path_join }}" + loop: + - {file: 'Dockerfile.adqsetup', tag: "{{ registry_local_address }}/adqsetup:{{ intel_adq_dp_version }}"} + - {file: 'monitoring/Dockerfile.adqexporter', tag: "{{ registry_local_address }}/adqexporter:{{ intel_adq_dp_version }}"} + - {file: 'Dockerfile', tag: "{{ registry_local_address }}/adq-cni-dp:{{ intel_adq_dp_version }}"} + + - name: push Intel ADQ Device Plugin images + command: "podman push {{ registry_local_address }}/{{ item }}:{{ intel_adq_dp_version }}" + changed_when: true + loop: + - "adqsetup" + - "adqexporter" + - "adq-cni-dp" + + - name: deploy Cilium + include_role: + name: cilium + + - name: create Intel ADQ Device Plugin resources + command: kubectl apply -f ./ + changed_when: true + args: + chdir: "{{ intel_adq_dp_dir }}" + + - name: check if ADQ pods are running + shell: set -o pipefail && kubectl get pods -n kube-system | grep -i adq | awk '{ print $3 }' + args: + executable: /bin/bash + register: adq_pods_status + retries: 30 + delay: 15 + until: + - "'Error' not in adq_pods_status.stdout" + - "'CrashLoopBackOff' not in adq_pods_status.stdout" + - "'Terminating' not in adq_pods_status.stdout" + - "'ContainerCreating' not in adq_pods_status.stdout" + - "'Pending' not in adq_pods_status.stdout" + - "'Init' not in adq_pods_status.stdout" + changed_when: false + + - name: restart unmanaged pods + shell: >- + set -o pipefail && kubectl get pods --all-namespaces + -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,HOSTNETWORK:.spec.hostNetwork --no-headers=true + | grep '' | awk '{print "-n "$1" "$2}' | xargs -L 1 -r kubectl delete pod + args: + executable: /bin/bash + changed_when: true + + - name: check if all pods are running + shell: set -o pipefail && kubectl get pods -A | awk 'NR != 1 { print $4 }' + args: + executable: /bin/bash + register: cilium_pods_status + retries: 30 + delay: 15 + until: + - "'Error' not in cilium_pods_status.stdout" + - "'CrashLoopBackOff' not in cilium_pods_status.stdout" + - "'Terminating' not in cilium_pods_status.stdout" + - "'ContainerCreating' not in cilium_pods_status.stdout" + - "'Pending' not in cilium_pods_status.stdout" + changed_when: false + when: + - inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/adq_dp_install/templates/adq-cluster-config.yml.j2 b/roles/adq_dp_install/templates/adq-cluster-config.yml.j2 new file mode 100644 index 00000000..7c27238a --- /dev/null +++ b/roles/adq_dp_install/templates/adq-cluster-config.yml.j2 @@ -0,0 +1,51 @@ +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: adq-cluster-config + namespace: {{ adq_dp_namespace }} +data: + adq-cluster-config.json: |- + { + "NodeConfigs": [ + { + "Labels": { + "kubernetes.io/os": "linux" + }, + "EgressMode": "skbedit", + "FilterPrio": 1, + "Globals": { + "Dev": "{{ adq_dp.interface_name }}", + "Queues": 16, + "Busypoll": 50000, + "Busyread": 50000, + "Txadapt": false, + "Txusecs": 50, + "Rxadapt": false, + "Rxusecs": 50 + }, + "TrafficClass": [ + { + "Queues": 4, + "Pollers": 4 + }, + { + "Queues": 4, + "Pollers": 4 + }, + { + "Queues": 4, + "Pollers": 4 + }, + { + "Queues": 4, + "Pollers": 4 + }, + { + "Queues": 32, + "Mode": "shared" + } + ] + } + ] + } \ No newline at end of file diff --git a/roles/adq_dp_install/templates/adq-cluster-role-binding.yml.j2 b/roles/adq_dp_install/templates/adq-cluster-role-binding.yml.j2 new file mode 100644 index 00000000..d1fd7ed3 --- /dev/null +++ b/roles/adq_dp_install/templates/adq-cluster-role-binding.yml.j2 @@ -0,0 +1,13 @@ +--- +kind: ClusterRoleBinding +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: adq +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: adq +subjects: +- kind: ServiceAccount + name: adq + namespace: {{ adq_dp_namespace }} diff --git a/roles/adq_dp_install/templates/adq-cni-dp-ds.yml.j2 b/roles/adq_dp_install/templates/adq-cni-dp-ds.yml.j2 new file mode 100644 index 00000000..4da76b66 --- /dev/null +++ b/roles/adq_dp_install/templates/adq-cni-dp-ds.yml.j2 @@ -0,0 +1,141 @@ +--- +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: adq-cni-dp + namespace: {{ adq_dp_namespace }} +spec: + selector: + matchLabels: + name: adq-cni-dp + template: + metadata: + labels: + name: adq-cni-dp + spec: + tolerations: + - operator: Exists + hostNetwork: true + hostPID: true + serviceAccountName: adq + initContainers: + - name: install-cni + image: {{ registry_local_address }}/adq-cni-dp:{{ intel_adq_dp_version }} + command: [ "sh", "-c", "cp -f /adq-cni /opt/cni/bin/adq-cni" ] + securityContext: + runAsUser: 0 + readOnlyRootFilesystem: true + volumeMounts: + - name: cni + mountPath: /opt/cni/bin + - name: configs + image: {{ registry_local_address }}/adq-cni-dp:{{ intel_adq_dp_version }} + command: ["sh", "-c", "/entrypoint.sh"] + securityContext: + runAsUser: 0 + readOnlyRootFilesystem: true + env: + - name: NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName + volumeMounts: + - name: cni-cfg + mountPath: /host/etc/cni/net.d/ + - name: adq-cluster-config + mountPath: /etc/adq + - name: adqsetup-config + mountPath: /adqsetup-config + - name: configs-tmp + mountPath: /configs-tmp + - name: adqsetup + image: {{ registry_local_address }}/adqsetup:{{ intel_adq_dp_version }} + command: ["adqsetup", "apply", "/adqsetup-config/adqsetup.conf"] + securityContext: + runAsUser: 0 + readOnlyRootFilesystem: true + privileged: true + volumeMounts: + - name: adqsetup-config + mountPath: /adqsetup-config + readOnly: true + containers: + - name: deviceplugin + image: {{ registry_local_address }}/adq-cni-dp:{{ intel_adq_dp_version }} + args: + - -reconcile-period=35s + securityContext: + runAsUser: 0 + readOnlyRootFilesystem: true + volumeMounts: + - name: device-plugins + mountPath: /var/lib/kubelet/device-plugins + - name: sys-devices + mountPath: /sys/devices/ + readOnly: true + - name: sys-class-net + mountPath: /sys/class/net/ + readOnly: true + - name: cni-cfg + mountPath: /etc/cni/net.d/ + readOnly: true + - name: adq-netprio + image: {{ registry_local_address }}/adq-cni-dp:{{ intel_adq_dp_version }} + env: + - name: NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName + securityContext: + runAsUser: 0 + readOnlyRootFilesystem: true + command: ["/adq-netprio"] + args: + - -cni-config-path=/etc/cni/net.d/05-cilium.conflist + - -reconcile-period=2s + volumeMounts: + - mountPath: /sys/fs/cgroup + name: cgroupfs + - mountPath: /var/lib/kubelet/pod-resources/kubelet.sock + name: pod-resources + readOnly: true + - name: cni-cfg + mountPath: /etc/cni/net.d/ + readOnly: true + - name: kubelet-pki + mountPath: /var/lib/kubelet/pki/kubelet.crt + readOnly: true + volumes: + - name: device-plugins + hostPath: + path: /var/lib/kubelet/device-plugins + - name: pod-resources + hostPath: + path: /var/lib/kubelet/pod-resources/kubelet.sock + type: Socket + - name: cni + hostPath: + path: /opt/cni/bin + - name: cni-cfg + hostPath: + path: /etc/cni/net.d/ + - name: sys-devices + hostPath: + path: /sys/devices/ + - name: sys-class-net + hostPath: + path: /sys/class/net/ + - name: cgroupfs + hostPath: + path: /sys/fs/cgroup + - name: kubelet-pki + hostPath: + path: /var/lib/kubelet/pki/kubelet.crt + type: File + - name: adq-cluster-config + configMap: + name: adq-cluster-config + - name: adqsetup-config + emptyDir: {} + - name: configs-tmp + emptyDir: {} diff --git a/roles/adq_dp_install/templates/adq-service-account.yml.j2 b/roles/adq_dp_install/templates/adq-service-account.yml.j2 new file mode 100644 index 00000000..38e04aea --- /dev/null +++ b/roles/adq_dp_install/templates/adq-service-account.yml.j2 @@ -0,0 +1,6 @@ +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: adq + namespace: {{ adq_dp_namespace }} diff --git a/roles/bond_cni_install/defaults/main.yml b/roles/bond_cni_install/defaults/main.yml index be576e51..ee2d3804 100644 --- a/roles/bond_cni_install/defaults/main.yml +++ b/roles/bond_cni_install/defaults/main.yml @@ -17,4 +17,3 @@ bond_cni_git_url: "https://github.com/intel/bond-cni.git" bond_cni_dir: "{{ (project_root_dir, 'bond-cni') | path_join }}" bond_cni_version: "eca3b06bec744444ee6e95dcf1e304f50a56f85e" - diff --git a/roles/bootstrap/allocate_cpus/tasks/main.yml b/roles/bootstrap/allocate_cpus/tasks/main.yml new file mode 100644 index 00000000..af4e890f --- /dev/null +++ b/roles/bootstrap/allocate_cpus/tasks/main.yml @@ -0,0 +1,44 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: Allocate requested number of CPUs + cpupin: + name: "{{ item.name }}" + number: "{{ item.cpu_total if item.cpu_total is defined else omit }}" + cpus: "{{ item.cpus if item.cpus is defined else omit }}" + numa: "{{ item.numa if item.numa is defined else omit }}" + number_host_os: "{{ cpu_host_os if cpu_host_os is defined else omit }}" + alloc_all: "{{ item.alloc_all if item.alloc_all is defined else omit }}" + pinning: false + loop: "{{ vms }}" + changed_when: true + register: allocated_cpus + throttle: 1 + +- name: Initialize new_vms variable + set_fact: + cpupin_vms: [] + changed_when: true + +- name: Merge data structures + include: merge_dicts.yml + loop: "{{ vms }}" + loop_control: + loop_var: vm + +- name: Debug allocated_cpus + debug: + var: allocated_cpus diff --git a/roles/vm/manage_vms/tasks/merge_dicts.yml b/roles/bootstrap/allocate_cpus/tasks/merge_dicts.yml similarity index 100% rename from roles/vm/manage_vms/tasks/merge_dicts.yml rename to roles/bootstrap/allocate_cpus/tasks/merge_dicts.yml diff --git a/roles/bootstrap/apply_intel_pstate/tasks/main.yml b/roles/bootstrap/apply_intel_pstate/tasks/main.yml index ca0c8d57..1a3f1beb 100644 --- a/roles/bootstrap/apply_intel_pstate/tasks/main.yml +++ b/roles/bootstrap/apply_intel_pstate/tasks/main.yml @@ -17,9 +17,14 @@ - name: determine machine type include_role: name: check_machine_type + when: + - inventory_hostname in groups['kube_node'] or + inventory_hostname in groups['vm_host'] + - not on_vms | default (false) - name: setup turbo boost include_tasks: setup_turbo.yml when: - turbo_boost_enabled is defined - - is_clx_nsku or is_icx or is_spr + - not on_vms | default (false) + - is_clx_ncpu or is_icx or is_spr diff --git a/roles/bootstrap/apply_intel_pstate/tasks/setup_turbo.yml b/roles/bootstrap/apply_intel_pstate/tasks/setup_turbo.yml index 714b025e..4fb098bb 100644 --- a/roles/bootstrap/apply_intel_pstate/tasks/setup_turbo.yml +++ b/roles/bootstrap/apply_intel_pstate/tasks/setup_turbo.yml @@ -19,8 +19,9 @@ name: install_dependencies - name: check CPU/BIOS status for Intel Turbo Boost Technology - shell: - cmd: cpuid | grep -i turbo | grep 'Intel Turbo Boost Technology' + shell: "set -o pipefail && cpuid | grep 'Intel Turbo Boost Technology'" + args: + executable: /bin/bash register: turbo_details changed_when: false @@ -37,12 +38,12 @@ - name: configure Intel Turbo Boost Technology block: # returned 1 indicates turbo is now off and returned 0 indicates turbo is now on - - name: set turbo boost is enabled + - name: set turbo boost is disabled set_fact: turbo_value: 1 when: not turbo_boost_enabled - - name: set turbo boost is disabled + - name: set turbo boost is enabled set_fact: turbo_value: 0 when: turbo_boost_enabled @@ -51,6 +52,7 @@ shell: "echo {{ turbo_value }} > {{ intel_turbo_path }}" args: executable: /bin/bash + changed_when: false when: turbo_bios_enabled - name: configuration cannot be continued diff --git a/roles/bootstrap/apply_kubernetes_reqs/tasks/main.yml b/roles/bootstrap/apply_kubernetes_reqs/tasks/main.yml index deedc78c..bf869970 100644 --- a/roles/bootstrap/apply_kubernetes_reqs/tasks/main.yml +++ b/roles/bootstrap/apply_kubernetes_reqs/tasks/main.yml @@ -17,7 +17,7 @@ - name: comment swap lines in /etc/fstab replace: path: /etc/fstab - regexp: '(.*swap.*)$' + regexp: '^([^#].*swap.*)$' replace: '# \1' mode: 0600 become: yes @@ -65,27 +65,11 @@ become: yes failed_when: false -- name: check if IP forwarding already enabled - command: sysctl net.ipv4.ip_forward -b - register: is_ip_forward_enabled - changed_when: false - -- name: enable IP forwarding on the fly - command: sysctl -w net.ipv4.ip_forward=1 - become: yes - changed_when: true - when: is_ip_forward_enabled.stdout != "1" - - name: add IP forwarding to sysctl.conf - lineinfile: - path: /etc/sysctl.conf - line: 'net.ipv4.ip_forward = 1' - regexp: '^net\.ipv4\.ip_forward' + sysctl: + name: net.ipv4.ip_forward + value: "1" + sysctl_set: yes + sysctl_file: "/etc/sysctl.d/99-sysctl.conf" state: present - mode: 0600 - become: yes - -- name: apply sysctl.conf - command: sysctl -p /etc/sysctl.conf - become: yes - changed_when: true + reload: yes diff --git a/roles/bootstrap/configure_cpu_isolation/tasks/get_required_cpus.yml b/roles/bootstrap/configure_cpu_isolation/tasks/get_required_cpus.yml new file mode 100644 index 00000000..e4c9e8df --- /dev/null +++ b/roles/bootstrap/configure_cpu_isolation/tasks/get_required_cpus.yml @@ -0,0 +1,33 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: Set total cpus_required + set_fact: + isolcpus_cpus_total: "{{ isolcpus_cpus_total | default(0) | int + item.cpu_total | int }}" + loop: "{{ vms }}" + +- name: Debug total + debug: + var: isolcpus_cpus_total + +- name: Set content of temporary isolcpus + set_fact: + tmp_isolcpus: "{{ tmp_isolcpus | default() + ',' + item.cpus }}" + loop: "{{ allocated_cpus.results }}" + +- name: Set content of new isolcpus + set_fact: + isolcpus: "{{ tmp_isolcpus | regex_replace('^,(.*)$', '\\1') }}" diff --git a/roles/bootstrap/configure_cpu_isolation/tasks/main.yml b/roles/bootstrap/configure_cpu_isolation/tasks/main.yml index b57a439e..f3195fad 100644 --- a/roles/bootstrap/configure_cpu_isolation/tasks/main.yml +++ b/roles/bootstrap/configure_cpu_isolation/tasks/main.yml @@ -14,5 +14,24 @@ ## limitations under the License. ## --- -- name: setup CPU isolation +- name: Get cpus for cpu isolation in case of VMRA + block: + - name: Set isolcpus_cpus_total variable + set_fact: + isolcpus_cpus_total: "" + + - name: Allocate cpus + include_role: + name: bootstrap/allocate_cpus + + - name: Get vms cpus + include_tasks: get_required_cpus.yml + + - name: Debug cpupin_vms + debug: + var: cpupin_vms + when: + - vm_enabled and (not on_vms | default(false)) + +- name: Setup CPU isolation include_tasks: setup_isolcpus.yml diff --git a/roles/bootstrap/configure_cpu_isolation/tasks/setup_isolcpus.yml b/roles/bootstrap/configure_cpu_isolation/tasks/setup_isolcpus.yml index 28d2d52e..ac900226 100644 --- a/roles/bootstrap/configure_cpu_isolation/tasks/setup_isolcpus.yml +++ b/roles/bootstrap/configure_cpu_isolation/tasks/setup_isolcpus.yml @@ -14,33 +14,36 @@ ## limitations under the License. ## --- -- name: validate isolcpus settings - assert: - that: isolcpus | length > 0 - fail_msg: "CPU isolation enabled, but list of CPUs to isolate is not defined, make sure that 'isolcpus' variable is set" +- block: + - name: Validate isolcpus settings + assert: + that: isolcpus | length > 0 + fail_msg: "CPU isolation enabled, but list of CPUs to isolate is not defined, make sure that 'isolcpus' variable is set" -- name: load present CPUs range - command: cat /sys/devices/system/cpu/present - become: yes - changed_when: false - register: cpus_present_file + - name: Load present CPUs range + command: cat /sys/devices/system/cpu/present + become: true + changed_when: false + register: cpus_present_file -- name: get range of all available CPUs - set_fact: - cpus_present: "{{ cpus_present_file.stdout }}" + - name: Get range of all available CPUs + set_fact: + cpus_present: "{{ cpus_present_file.stdout }}" -- name: validate provided isolcpus value - action: validate_isolcpus + - name: Validate provided isolcpus value + action: validate_isolcpus + when: + - not vm_enabled -- name: set isolcpus flag +- name: Set isolcpus flag set_fact: isolcpus_flags: "isolcpus={{ isolcpus }} rcu_nocbs={{ isolcpus }} nohz_full={{ isolcpus }} nr_cpus={{ ansible_processor_vcpus }}" -- name: prepare CPU isolation grub commandline string +- name: Prepare CPU isolation grub commandline string set_fact: cpu_isolation_cmdline: 'GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX} {{ isolcpus_flags }}" {{ isolcpus_marker }}' -- name: set CPU isolation flags in /etc/default/grub +- name: Set CPU isolation flags in /etc/default/grub lineinfile: dest: /etc/default/grub regexp: '^GRUB_CMDLINE_LINUX="\${GRUB_CMDLINE_LINUX}(.*?)" {{ isolcpus_marker }}$' diff --git a/roles/bootstrap/configure_dlb/defaults/main.yml b/roles/bootstrap/configure_dlb/defaults/main.yml new file mode 100644 index 00000000..a9d97cf3 --- /dev/null +++ b/roles/bootstrap/configure_dlb/defaults/main.yml @@ -0,0 +1,19 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +intel_dlb_driver_ver: "dlb_linux_src_release_7.7.0_2022_06_17" +intel_dlb_driver_url: "https://downloadmirror.intel.com/734482/{{ intel_dlb_driver_ver }}.txz" +intel_dlb_driver_checksum: "sha1:253984E464597F6DCCA8DBBA439FA980B93B8180" diff --git a/roles/bootstrap/configure_dlb/tasks/main.yml b/roles/bootstrap/configure_dlb/tasks/main.yml new file mode 100644 index 00000000..b80cdc7f --- /dev/null +++ b/roles/bootstrap/configure_dlb/tasks/main.yml @@ -0,0 +1,113 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: install dependencies for Intel DLB driver + include_role: + name: install_dependencies + +# dependencies are not limited to packages +- name: insert mdev module + modprobe: + name: mdev + state: present + +- name: load mdev module on boot + lineinfile: + path: /etc/modules-load.d/mdev.conf + line: mdev + create: yes + mode: 0644 + become: yes + +# build and install Intel DLB driver +- name: download DLB driver + become: yes + get_url: + url: "{{ intel_dlb_driver_url }}" + dest: "{{ project_root_dir }}" + checksum: "{{ intel_dlb_driver_checksum }}" + timeout: 60 + mode: 0644 + register: dlb_download + until: dlb_download is not failed + retries: 5 + +- name: untar DLB driver on Ubuntu + unarchive: + src: "{{ dlb_download.dest }}" + dest: "{{ project_root_dir }}" + list_files: yes + remote_src: yes + mode: 0774 + become: yes + when: ansible_os_family == "Debian" + +# Ansible built-in unarchive not working as expected in RHEL / Rocky using shell as alternative +- name: extract DLB driver package on RHEL / Rocky + shell: "tar --xz -xf {{ intel_dlb_driver_ver }}.txz" # noqa command-instead-of-module 305 + args: + chdir: "{{ project_root_dir }}" + executable: /bin/bash + when: ansible_os_family == "RedHat" + +- name: build Intel DLB driver + make: + chdir: "{{ project_root_dir }}/dlb/driver/dlb2" + +- name: check if DLB module is loaded + command: lsmod + register: dlb_module + failed_when: false + changed_when: false + +- name: insert DLB module + command: insmod dlb2.ko + args: + chdir: "{{ project_root_dir }}/dlb/driver/dlb2" + when: "'dlb' not in dlb_module.stdout" + +- name: link dlb2 module to kernel drivers + file: + state: link + src: "{{ project_root_dir }}/dlb/driver/dlb2" + dest: "/usr/lib/modules/{{ ansible_kernel }}/kernel/drivers/dlb2" + force: yes + mode: 0644 + +- name: setup DLB module loading on boot + lineinfile: + path: /etc/modules-load.d/dlb2.conf + line: dlb2 + create: yes + mode: 0644 + become: yes + +- name: check if DLB devices are present on the system + find: + path: /dev + file_type: any + use_regex: yes + pattern: "^(dlb)[0-9]*$" # devices have to start with "dlb" followed by the ID at the end + register: dlb_devices + +- name: assert DLB devices presence + assert: + that: + - dlb_devices.matched > 0 + fail_msg: + - "Intel DLB devices cannot be configured." + - "If the failure persist, please consider updating your kernel to version 5.12 or 5.13." + - "If the above solutions are not working for you, please connact the owner of the code via GitHub issues." diff --git a/roles/cert_manager_install/defaults/main.yml b/roles/bootstrap/configure_dlb/vars/main.yml similarity index 84% rename from roles/cert_manager_install/defaults/main.yml rename to roles/bootstrap/configure_dlb/vars/main.yml index 27383a33..2036fe02 100644 --- a/roles/cert_manager_install/defaults/main.yml +++ b/roles/bootstrap/configure_dlb/vars/main.yml @@ -14,6 +14,10 @@ ## limitations under the License. ## --- -cert_manager_namespace: "cert-manager" -cert_manager_version: "v1.5.1" -cert_manager_repo: "https://charts.jetstack.io" +install_dependencies: + Debian: + - make + - gcc + RedHat: + - make + - gcc diff --git a/roles/bootstrap/configure_dsa/defaults/main.yml b/roles/bootstrap/configure_dsa/defaults/main.yml new file mode 100644 index 00000000..c323e5cf --- /dev/null +++ b/roles/bootstrap/configure_dsa/defaults/main.yml @@ -0,0 +1,21 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +idxd_config_git_url: "https://github.com/intel/idxd-config.git" +idxd_config_dir: "/usr/src/idxd-config" +idxd_config_git_ref: "accel-config-v3.4.6.4" + +dsa_devices_dir: "/sys/bus/dsa/devices/" diff --git a/roles/bootstrap/configure_dsa/files/dsa_config.service b/roles/bootstrap/configure_dsa/files/dsa_config.service new file mode 100644 index 00000000..6a7629e1 --- /dev/null +++ b/roles/bootstrap/configure_dsa/files/dsa_config.service @@ -0,0 +1,11 @@ +[Unit] +Description=Intel Container Experience Kits accel-config configuration loading for DSA devices +AssertPathExists=/etc/accel-config/accel-config.conf + +[Service] +Type=oneshot +ExecStartPre=/bin/sleep 40 +ExecStart=accel-config load-config -e -f + +[Install] +WantedBy=multi-user.target diff --git a/roles/bootstrap/configure_dsa/tasks/dsa_custom_config.yml b/roles/bootstrap/configure_dsa/tasks/dsa_custom_config.yml new file mode 100644 index 00000000..aa1c788c --- /dev/null +++ b/roles/bootstrap/configure_dsa/tasks/dsa_custom_config.yml @@ -0,0 +1,133 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +# make sure DSA device is disabled before configuring +- name: reset {{ dsa_device.name }} device + command: accel-config disable-device {{ dsa_device.name }} + failed_when: false + changed_when: false + +# amount of groups to which WQs can be assigned +- name: get max groups + command: cat {{ dsa_devices_dir }}/{{ dsa_device.name }}/max_groups + changed_when: false + register: max_groups + +- name: fail when more groups were requested than available on device + fail: + msg: "Requested {{ dsa_device.groups }}, but {{ max_groups.stdout }} available on device. + Please update dsa_devices list for {{ dsa_device.name }} in host_vars." + when: max_groups.stdout | int < dsa_device.groups + +# amount of engines on DSA device - each group has unique engine +- name: get max engines + command: cat {{ dsa_devices_dir }}/{{ dsa_device.name }}/max_engines + changed_when: false + register: max_engines + +- name: fail when more engines were requested than available on device + fail: + msg: "Requested {{ dsa_device.engines }}, but {{ max_engines.stdout }} available on node. + Please update dsa_devices list for {{ dsa_device.name }} in host_vars." + when: max_engines.stdout | int < dsa_device.engines + +- name: get dsa device id + set_fact: + dsa_dev_id: "{{ dsa_device.name[-1] }}" + +# make sure engines are not assigned to groups +- name: reset all engines + command: accel-config config-engine {{ dsa_device.name }}/engine{{ dsa_dev_id }}.{{ engine_id }} --group-id=-1 + failed_when: false + changed_when: false + with_sequence: start=0 end="{{ max_engines.stdout | int - 1 }}" + loop_control: + loop_var: engine_id + +# amount of WQs that can be configured on DSA device +- name: get max work queues + command: cat {{ dsa_devices_dir }}/{{ dsa_device.name }}/max_work_queues + changed_when: false + register: max_work_queues + +- name: fail when more WQs were requested than available on device + fail: + msg: "Requested {{ dsa_device.wqs | length }} work queues, but {{ max_work_queues.stdout }} are available on device. + Please update dsa_devices list for {{ dsa_device.name }} in host_vars." + when: max_work_queues.stdout | int < dsa_device.wqs | length + +# used to calculate max size of single WQ +- name: get max work queues size + command: cat {{ dsa_devices_dir }}/{{ dsa_device.name }}/max_work_queues_size + changed_when: false + register: max_work_queues_size + +- name: calculate max size of single wq on {{ dsa_device.name }} device + set_fact: + max_wq_size: "{{ (max_work_queues_size.stdout | int / dsa_device.wqs | length | int) }}" + +- name: get pasid value + command: cat {{ dsa_devices_dir }}/{{ dsa_device.name }}/pasid_enabled + changed_when: false + register: pasid_enabled + +- name: fail when PASID is disabled, but Shared WQs were requested + fail: + msg: "Could not configure Shared WQ with ID {{ wq.id }} on device {{ dsa_device.name }} - PASID is disabled. + Please make sure iommu is set as: intel_iommu=on,sm_on in Grub GRUB_CMDLINE_LINUX" + loop: "{{ dsa_device.wqs }}" + loop_control: + loop_var: wq + when: + - wq.mode == 'shared' + - pasid_enabled.stdout == '0' + +# check the max number of groups that can be configured +- name: determine number of groups + set_fact: + number_of_groups: "{{ [dsa_device.engines, dsa_device.groups, dsa_device.wqs | length] | min }}" + +# NOTE(pklimowx): can more than one engine be assigned to same group? +# if yes then it's worth to consider creating list like wqs inside +# dsa_devices in host_vars which will hold some config regarding engines +# configure one engine per group +- name: configure {{ dsa_device.name }} engines + command: accel-config config-engine {{ dsa_device.name }}/engine{{ dsa_dev_id }}.{{ engine_id }} --group-id={{ engine_id }} + changed_when: false + with_sequence: start=0 end="{{ number_of_groups | int - 1 }}" + loop_control: + loop_var: engine_id + +- name: configure Work Queues + include_tasks: wqs_custom_config.yml + vars: + WQ: "{{ work_queue }}" # noqa var-naming + dsa_id: "{{ dsa_dev_id }}" + max_single_wq_size: "{{ max_wq_size | int }}" + loop: "{{ dsa_device.wqs }}" + loop_control: + loop_var: work_queue + +- name: enable device {{ dsa_device.name }} + command: accel-config enable-device {{ dsa_device.name }} + changed_when: false + +- name: enable all configured WQs + command: accel-config enable-wq {{ dsa_device.name }}/wq{{ dsa_dev_id }}.{{ enabled_wq_id }} + changed_when: false + with_sequence: start=0 end="{{ dsa_device.wqs | length - 1 }}" + loop_control: + loop_var: enabled_wq_id diff --git a/roles/bootstrap/configure_dsa/tasks/dsa_default_config.yml b/roles/bootstrap/configure_dsa/tasks/dsa_default_config.yml new file mode 100644 index 00000000..1d102386 --- /dev/null +++ b/roles/bootstrap/configure_dsa/tasks/dsa_default_config.yml @@ -0,0 +1,96 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +# make sure DSA device is disabled before configuring +- name: reset dsa{{ dsa_id }} device + command: accel-config disable-device dsa{{ dsa_id }} + failed_when: false + changed_when: false + +# amount of groups to which WQs can be assigned +- name: get max groups + command: cat {{ dsa_devices_dir }}/dsa{{ dsa_id }}/max_groups + changed_when: false + register: max_groups + +# amount of engines on DSA device - each group has unique engine +- name: get max engines + command: cat {{ dsa_devices_dir }}/dsa{{ dsa_id }}/max_engines + changed_when: false + register: max_engines + +# make sure engines are not assigned to groups +- name: reset all engines + command: accel-config config-engine dsa{{ dsa_id }}/engine{{ dsa_id }}.{{ engine_id }} --group-id=-1 + failed_when: false + changed_when: false + with_sequence: start=0 end="{{ max_engines.stdout | int - 1 }}" + loop_control: + loop_var: engine_id + +# amount of WQs that can be configured on DSA device +- name: get max work queues + command: cat {{ dsa_devices_dir }}/dsa{{ dsa_id }}/max_work_queues + changed_when: false + register: max_work_queues + +# sum of sizes of all WQs should be at most equal to max_work_queues_size - get that max value +- name: get max work queues size + command: cat {{ dsa_devices_dir }}/dsa{{ dsa_id }}/max_work_queues_size + changed_when: false + register: max_work_queues_size + +- name: determine number of groups + set_fact: + number_of_groups: "{{ [max_engines.stdout | int, max_groups.stdout | int, max_work_queues.stdout | int] | min }}" + +# configure one engine per group, and no more engines than queues +- name: configure dsa{{ dsa_id }} engines + command: accel-config config-engine dsa{{ dsa_id }}/engine{{ dsa_id }}.{{ engine_id }} --group-id={{ engine_id }} + changed_when: false + with_sequence: start=0 end="{{ number_of_groups | int - 1 }}" + loop_control: + loop_var: engine_id + +- name: calculate single Work Queue size + set_fact: + single_wq_size: "{{ (max_work_queues_size.stdout | int / max_work_queues.stdout | int) }}" # all WQs are the same size + +# tried to randomize priorities of WQs. Max priority is 15 that's why 'else' value is provided +- name: configure Dedicated Work Queues + command: >- + accel-config config-wq dsa{{ dsa_id }}/wq{{ dsa_id }}.{{ dwq_id }} + --group-id="{{ dwq_id | int % number_of_groups | int }}" + --mode=dedicated + --priority={{ dwq_id | int * 2 + 1 if dwq_id | int <= 7 else 5 }} + --wq-size={{ single_wq_size | int }} + --type=user + --name=dedicated-queue-{{ dsa_id }}.{{ dwq_id }} + with_sequence: start=0 end="{{ max_work_queues.stdout | int - 1 }}" + changed_when: true + loop_control: + loop_var: dwq_id + +- name: enable device dsa{{ dsa_id }} + command: accel-config enable-device dsa{{ dsa_id }} + changed_when: false + +- name: enable all configured WQs + command: accel-config enable-wq dsa{{ dsa_id }}/wq{{ dsa_id }}.{{ enabled_wq_id }} + changed_when: false + with_sequence: start=0 end="{{ max_work_queues.stdout | int - 1 }}" + loop_control: + loop_var: enabled_wq_id diff --git a/roles/bootstrap/configure_dsa/tasks/install_accel_config.yml b/roles/bootstrap/configure_dsa/tasks/install_accel_config.yml new file mode 100644 index 00000000..f4a6cce2 --- /dev/null +++ b/roles/bootstrap/configure_dsa/tasks/install_accel_config.yml @@ -0,0 +1,43 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: clone accel-config tool repository + git: + repo: "{{ idxd_config_git_url }}" + dest: "{{ idxd_config_dir }}" + version: "{{ idxd_config_git_ref }}" + force: yes + +- name: prepare files for building accel-config tool + command: "{{ item }}" + changed_when: false + args: + chdir: "{{ idxd_config_dir }}" + with_items: + - ./autogen.sh + - ./configure CFLAGS='-g -O2' --prefix=/usr --sysconfdir=/etc --libdir=/usr/lib64 + +- name: build accel-config tool (1/2) + make: + chdir: "{{ idxd_config_dir }}" + +- name: build accel-config tool (2/2) + make: + target: "{{ item }}" + chdir: "{{ idxd_config_dir }}" + with_items: + - check + - install diff --git a/roles/bootstrap/configure_dsa/tasks/main.yml b/roles/bootstrap/configure_dsa/tasks/main.yml new file mode 100644 index 00000000..eb6e5212 --- /dev/null +++ b/roles/bootstrap/configure_dsa/tasks/main.yml @@ -0,0 +1,73 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: install dependencies for Intel DSA devices + include_role: + name: install_dependencies + +- name: install accel-config tool + include_tasks: install_accel_config.yml + +- name: get number of DSA devices + find: + paths: "{{ dsa_devices_dir }}" + file_type: any + use_regex: yes + patterns: + - '^(dsa)\w' + register: found_dsa_devices + +- name: apply default configuration for DSA devices + include_tasks: dsa_default_config.yml + vars: + dsa_id: "{{ item.path[-1] }}" + with_items: "{{ found_dsa_devices.files }}" + when: + - configure_dsa_devices | default(false) | bool + - dsa_devices | default([]) | length | int == 0 + +- name: fail if configured number of DSA devices is greater than actual number of DSA devices on the node + fail: + msg: "Max supported DSA devices by node is {{ found_dsa_devices.matched }}, but configuration for {{ dsa_devices | length }} was provided. + Please update dsa_devices list in host_vars." + when: dsa_devices | default([]) | length > found_dsa_devices.matched + +- name: apply custom configuration for DSA devices + include_tasks: dsa_custom_config.yml + vars: + dsa_device: "{{ item }}" + loop: "{{ dsa_devices }}" + when: + - configure_dsa_devices | default(false) | bool + - dsa_devices | default([]) | length > 0 + +- name: save accel-config configuration + command: accel-config save-config + changed_when: true + +- name: create systemd unit file + copy: + src: "{{ (role_path , 'files', 'dsa_config.service') | path_join }}" + dest: /lib/systemd/system/dsa_config.service + owner: root + group: root + mode: '0644' + +- name: ensure that systemd service is enabled + systemd: + name: dsa_config + enabled: yes + daemon_reload: yes diff --git a/roles/bootstrap/configure_dsa/tasks/wqs_custom_config.yml b/roles/bootstrap/configure_dsa/tasks/wqs_custom_config.yml new file mode 100644 index 00000000..fc4ed4af --- /dev/null +++ b/roles/bootstrap/configure_dsa/tasks/wqs_custom_config.yml @@ -0,0 +1,81 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: check WQ size value + fail: + msg: "The max size of single WQ is {{ max_single_wq_size }}, but for WQ{{ dsa_id }}.{{ WQ.id }} value {{ WQ.size }} was provided." + when: WQ.size | int > max_single_wq_size | int + +- name: check WQ threshold value + fail: + msg: "Wrong threshold value. Possible reasons are: threshold is defined and WQ mode is not shared, threshold is >= WQ size. + Please check these settings for WQ{{ dsa_id }}.{{ WQ.id }}" + when: + - WQ.threshold is defined and WQ.mode == 'dedicated' or WQ.threshold is defined and WQ.threshold >= WQ.size + +- name: check WQ priority value + fail: + msg: "Valid range for priority is from 1 to 15, but got {{ WQ.prio }} for WQ{{ dsa_id }}.{{ WQ.id }}. Please update the config list." + when: WQ.prio < 1 or WQ.prio > 15 + +- name: check WQ type value + fail: + msg: "Valid types are: kernel, user, but '{{ WQ.type }}' provided for WQ{{ dsa_id }}.{{ WQ.id }}. Please update the config list." + when: WQ.type not in ['kernel', 'user'] + +- name: check WQ group id value + fail: + msg: "Valid group IDs are from 0 to {{ max_groups.stdout | int - 1 }}, but WQ.group_id provided. + Please update config for WQ{{ dsa_id }}.{{ WQ.id }}." + when: WQ.group_id < 0 or WQ.group_id > (max_groups.stdout | int - 1) + +- name: check WQ block_on_fault value + fail: + msg: "block_on_fault should be either 0 or 1, but {{ WQ.block_on_fault }} was provided. + Please update config for WQ{{ dsa_id }}.{{ WQ.id }}." + when: WQ.block_on_fault not in [0, 1] + +# NOTE(pklimowx): consider unification of wq configuration tasks. For now accel-config tool +# will fail when try to write 'shared' into mode param explicitly (all WQs are shared by default). +# After it is fixed then two tasks can be merged into one. +# (threshold can be set to -1 for Dedicated WQ using python style if-else) +- name: configure Dedicated Work Queues + command: >- + accel-config config-wq {{ dsa_device.name }}/wq{{ dsa_id }}.{{ WQ.id }} + --group-id={{ WQ.group_id }} + --mode={{ WQ.mode }} + --priority={{ WQ.prio }} + --wq-size={{ WQ.size }} + --max-batch-size={{ WQ.max_batch_size }} + --max-transfer-size={{ WQ.max_transfer_size }} + --block-on-fault={{ WQ.block_on_fault }} + --type={{ WQ.type }} + --name={{ WQ.mode }}-queue-{{ dsa_id }}.{{ WQ.id }} + when: WQ.mode == 'dedicated' + +- name: configure Shared Work Queues + command: >- + accel-config config-wq {{ dsa_device.name }}/wq{{ dsa_id }}.{{ WQ.id }} + --group-id={{ WQ.group_id }} + --threshold={{ WQ.threshold }} + --priority={{ WQ.prio }} + --wq-size={{ WQ.size }} + --max-batch-size={{ WQ.max_batch_size }} + --max-transfer-size={{ WQ.max_transfer_size }} + --block-on-fault={{ WQ.block_on_fault }} + --type={{ WQ.type }} + --name={{ WQ.mode }}-queue-{{ dsa_id }}.{{ WQ.id }} + when: WQ.mode == 'shared' diff --git a/roles/bootstrap/configure_dsa/vars/main.yml b/roles/bootstrap/configure_dsa/vars/main.yml new file mode 100644 index 00000000..17e9ec40 --- /dev/null +++ b/roles/bootstrap/configure_dsa/vars/main.yml @@ -0,0 +1,58 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +install_dependencies: + Debian: + - asciidoc + - asciidoctor + - autoconf + - automake + - autotools-dev + - build-essential + - debhelper + - debmake + - devscripts + - fakeroot + - file + - git + - gnupg + - libjson-c-dev + - libkeyutils-dev + - libkmod-dev + - libtool + - lintian + - make + - patch + - patchutils + - pkgconf + - quilt + - uuid-dev + - xmlto + RedHat: + - "@Development Tools" + - asciidoc + - autoconf + - automake + - git + - json-c-devel + - kmod-devel + - libtool + - libuuid-devel + - make + - pkgconf + - rpm-build + - rpmdevtools + - xmlto diff --git a/roles/bootstrap/configure_hugepages/tasks/setup_hugepages.yml b/roles/bootstrap/configure_hugepages/tasks/setup_hugepages.yml index 85a2fca4..1a612e76 100644 --- a/roles/bootstrap/configure_hugepages/tasks/setup_hugepages.yml +++ b/roles/bootstrap/configure_hugepages/tasks/setup_hugepages.yml @@ -16,13 +16,13 @@ --- - name: calculate total number of requested hugepages set_fact: - mem_huge_2M: "{{ 2 * number_of_hugepages_2M }}" - mem_huge_1G: "{{ 1024 * number_of_hugepages_1G }}" + mem_huge_2M: "{{ 2 * number_of_hugepages_2M }}" # noqa var-naming + mem_huge_1G: "{{ 1024 * number_of_hugepages_1G }}" # noqa var-naming - name: check if there is enough memory on the target system for the requested hugepages assert: - that: (mem_huge_2M|int + mem_huge_1G|int) < (ansible_memtotal_mb|int - mem_reserved|int) - msg: "Requested {{ mem_huge_2M|int + mem_huge_1G|int }}MB of hugepages, while {{ ansible_memtotal_mb|int - mem_reserved|int }}MB is available." + that: (mem_huge_2M | int + mem_huge_1G | int) < (ansible_memtotal_mb | int - mem_reserved | int) + msg: "Requested {{ mem_huge_2M | int + mem_huge_1G | int }}MB of hugepages, while {{ ansible_memtotal_mb | int - mem_reserved | int }}MB is available." - name: prepare kernel boot flags set_fact: diff --git a/roles/bootstrap/configure_openssl/defaults/main.yml b/roles/bootstrap/configure_openssl/defaults/main.yml index c84b92e2..24f18919 100644 --- a/roles/bootstrap/configure_openssl/defaults/main.yml +++ b/roles/bootstrap/configure_openssl/defaults/main.yml @@ -15,5 +15,14 @@ ## --- openssl_url: "https://github.com/openssl/openssl.git" -openssl_version: "openssl-3.0.3" +openssl_version: "openssl-3.0.5" openssl_dir: "{{ (project_root_dir, 'openssl') | path_join }}" +openssl_pkg_subdir: "{{ openssl_dir }}/{{ openssl_version }}" + +# QATLibs +intel_qatlib_download_url: "https://github.com/intel/qatlib.git" +intel_qatlib_download_url_version: "22.07.0" +intel_qatlib_download_url_dir: "{{ (project_root_dir, 'intel_qatlibs') | path_join }}" + +# Note: mentioned below variable name & folder location must match "roles/bootstrap/install_qat_drivers_services/defaults/main.yml" +qat_drivers_dir: "{{ (project_root_dir, 'qat_drivers') | path_join }}" diff --git a/roles/bootstrap/configure_openssl/tasks/intel_qatlibs_and_qatsvm_configuration.yml b/roles/bootstrap/configure_openssl/tasks/intel_qatlibs_and_qatsvm_configuration.yml new file mode 100644 index 00000000..8cd82428 --- /dev/null +++ b/roles/bootstrap/configure_openssl/tasks/intel_qatlibs_and_qatsvm_configuration.yml @@ -0,0 +1,80 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +# Intel QATLibs +- name: Install Intel QATLibs + block: + - name: create directory {{ intel_qatlib_download_url_dir }} for Intel QATLibs configuration + file: + path: "{{ intel_qatlib_download_url_dir }}" + state: directory + mode: '0700' + + - name: download Intel QATLib + git: + repo: "{{ intel_qatlib_download_url }}" + dest: "{{ intel_qatlib_download_url_dir }}" + version: "{{ intel_qatlib_download_url_version }}" + force: true + + # using shell module instead of comand as it was giving aclocal: warning: causing playbook failure + - name: run autogen before configure QATLibs + shell: './autogen.sh' # noqa 305 + args: + chdir: "{{ intel_qatlib_download_url_dir }}" + executable: /bin/bash + changed_when: true + + - name: check all packages are present for QATLibs installation + command: './configure --enable-service' + args: + chdir: "{{ intel_qatlib_download_url_dir }}" + changed_when: true + + - name: make install QAT drivers + make: + chdir: "{{ intel_qatlib_download_url_dir }}" + target: install + become: yes + + - name: reload the dynamic linker cache + command: "ldconfig" + changed_when: true + +# Mentioned below block is also present in "roles/bootstrap/install_qat_drivers_services/tasks/main.yml" that will only occurs if, +# "enable_intel_qatlibs" is "false" in host_vars because, in order to compile QAT configuration, +# QATlibs must be installed before SVM feature is configured +- name: configuration for QAT Shared Virtual Memory (SVM) + block: + - name: set QAT SVM is enabled + set_fact: + svm_value: 1 + + - name: enable address translation services for QAT Shared Virtual Memory (SVM) + replace: + path: "{{ item }}" + regexp: '(^ATEnabled\s)(.*)$' + replace: 'ATEnabled = {{ svm_value }}' + mode: 0600 + with_items: + - "{{ qat_drivers_dir }}/quickassist/utilities/adf_ctl/conf_files/4xxxvf_dev0.conf.vm" + - "{{ qat_drivers_dir }}/quickassist/utilities/adf_ctl/conf_files/4xxxvf_dev0.conf.sym.vm" + - "{{ qat_drivers_dir }}/quickassist/utilities/adf_ctl/conf_files/4xxxvf_dev0.conf.dc.vm" + - "{{ qat_drivers_dir }}/quickassist/utilities/adf_ctl/conf_files/4xxxvf_dev0.conf.asym.vm" + - "{{ qat_drivers_dir }}/quickassist/utilities/adf_ctl/conf_files/4xxxvf_dev0.conf.dc.sym.vm" + - "{{ qat_drivers_dir }}/build/4xxxvf_dev0.conf.vm" + failed_when: false + when: enable_qat_svm and enable_qat_svm is defined diff --git a/roles/bootstrap/configure_openssl/tasks/main.yml b/roles/bootstrap/configure_openssl/tasks/main.yml index ff0665d2..06c4fce2 100644 --- a/roles/bootstrap/configure_openssl/tasks/main.yml +++ b/roles/bootstrap/configure_openssl/tasks/main.yml @@ -29,24 +29,24 @@ - name: Module not found, playbook terminated fail: - msg: "No QAT module found. Please set update_qat_drivers to true in host vars to resolve issue." + msg: "No QAT module found. Please set update_qat_drivers to true in host vars to resolve issue." when: '"intel_qat" not in confirm_mod.stdout' - debug: - var: confirm_mod.stdout_lines + var: confirm_mod.stdout_lines # ansible_facts.services is not supported currently on Ubuntu 20.04, once sorted will remove and use ansible service module -- name: check status of service for OpenSSL*Engine - shell: "set -o pipefail && service qat status | grep qat_dev" +- name: check status of service for qat_service + shell: "set -o pipefail && service qat_service status | grep qat_dev" args: executable: /bin/bash register: service_check - changed_when: true + changed_when: false ignore_errors: true - name: Service not found, playbook terminated fail: - msg: "Failed to start qat service on system. Please check if QAT configuration in host vars is correct." + msg: "Failed to start qat service on system. Please check if QAT configuration in host vars is correct." when: - "'up' not in service_check.stdout" @@ -60,7 +60,7 @@ - name: clone OpenSSL repository git: repo: "{{ openssl_url }}" - version: " {{ openssl_version }}" + version: "{{ openssl_version }}" dest: "{{ openssl_dir }}" force: yes @@ -107,3 +107,9 @@ - name: reload the dynamic linker cache command: "ldconfig" changed_when: true + +- name: QATLibs and SVM configuration + include_tasks: intel_qatlibs_and_qatsvm_configuration.yml + when: + - configured_arch == "spr" + - enable_intel_qatlibs is defined and enable_intel_qatlibs diff --git a/roles/bootstrap/configure_qat/files/cek_sriov_qat_init b/roles/bootstrap/configure_qat/files/cek_sriov_qat_init index 9ae6ad39..211f2396 100644 --- a/roles/bootstrap/configure_qat/files/cek_sriov_qat_init +++ b/roles/bootstrap/configure_qat/files/cek_sriov_qat_init @@ -15,7 +15,12 @@ # limitations under the License. # +DEVBIND_TOOL=${DEVBIND_TOOL:-"/usr/local/bin/dpdk-devbind.py"} QAT_SRIOV_NUMVFS_MAPPINGS=${QAT_SRIOV_NUMVFS_MAPPINGS:-"/etc/cek/cek_sriov_qat_numvfs"} +QAT_DEVICE_DRIVER_MAPPINGS=${QAT_DEVICE_DRIVER_MAPPINGS:-"/etc/cek/cek_qat_vfs"} +QAT_FORCE_DRIVER_BINDING=${QAT_FORCE_DRIVER_BINDING:-"/etc/cek/cek_force_qat_driver_binding"} +DO_DRIVER_BINDING="" +WA_PREVIOUS_QAT_ID="" setup_vfs() { echo "Setting up VFs" @@ -38,14 +43,91 @@ setup_vfs() { read -r current_vfs < "${numvfs_path}" + total_numvfs_path="/sys/bus/pci/devices/${pci_address}/sriov_totalvfs" + if [[ ! -e "${total_numvfs_path}" ]]; then + echo "Could not find sriov_totalvfs for device ${pci_address}, skipping..." + continue + fi + + read -r total_vfs < "${total_numvfs_path}" + echo "Setting up ${numvfs} Virtual Functions on ${pci_address}" if [[ ${current_vfs} -ne ${numvfs} ]]; then - # if change is needed we must reset it first - echo 0 > "${numvfs_path}" - echo "${numvfs}" > "${numvfs_path}" + if [[ ${current_vfs} -ne ${total_vfs} ]] || [[ ${numvfs} -eq 0 ]] || [[ "${WA_PREVIOUS_QAT_ID}" == "${pci_address}" ]]; then + # if change is needed we must reset it first + echo 0 > "${numvfs_path}" + echo "${numvfs}" > "${numvfs_path}" + if [[ "${WA_PREVIOUS_QAT_ID}" != "${pci_address}" ]]; then + DO_DRIVER_BINDING="${DO_DRIVER_BINDING} ${pci_address}" + fi + if [[ ${current_vfs} -eq 0 ]] && [[ "${WA_PREVIOUS_QAT_ID}" == "" ]]; then + WA_PREVIOUS_QAT_ID=${pci_address} + echo "pci_address stored for WA check: ${WA_PREVIOUS_QAT_ID}" + fi + else + echo "There is max number ${total_vfs} of Virtual Functions on ${pci_address} available. Do nothing." + fi + else + echo "${numvfs} Virtual Functions are already present on ${pci_address} Do nothing." fi done < "${QAT_SRIOV_NUMVFS_MAPPINGS}" } +bind_all() { + if [[ -r "${QAT_FORCE_DRIVER_BINDING}" ]]; then + echo "Force QAT driver binding" + while read -r pci_address; do + DO_DRIVER_BINDING="${DO_DRIVER_BINDING} ${pci_address}" + done < "${QAT_FORCE_DRIVER_BINDING}" + rm -f ${QAT_FORCE_DRIVER_BINDING} + fi + + for physfn in ${DO_DRIVER_BINDING}; do + echo "Driver binding to VFs from PF ${physfn}" + if [[ ! -r "${QAT_DEVICE_DRIVER_MAPPINGS}_${physfn}" ]]; then + echo "File ${QAT_DEVICE_DRIVER_MAPPINGS}_${physfn} doesn't exist, driver bindings won't be changed" + return 0 + fi + + while read -r pci_address driver; do + if [[ ${pci_address} == "" ]] || [[ ${driver} == "" ]]; then + echo "Empty PCI address or driver, skipping..." + continue + fi + + echo "Binding ${pci_address} to ${driver}" + + device_path="/sys/bus/pci/devices/${pci_address}" + + # skip if device doesn't exist + if [[ ! -e "${device_path}" ]]; then + echo "Could not find device ${pci_address}, skipping..." + continue + fi + + # get current driver + if [[ -L "${device_path}/driver" ]]; then + current_driver=$(readlink "${device_path}/driver") + current_driver=$(basename "$current_driver") + echo "Current driver of ${pci_address} is ${current_driver}" + else + current_driver="" + fi + + # don't bind if not needed + if [[ "${driver}" != "${current_driver}" ]]; then + modprobe -q "${driver}" || true + if [[ -e "/sys/bus/pci/drivers/${driver}" ]]; then + $DEVBIND_TOOL -b "${driver}" --force "${pci_address}" + else + echo "Failed to bind ${pci_address}: target driver ${driver} doesn't exist" + fi + fi + + done < "${QAT_DEVICE_DRIVER_MAPPINGS}_${physfn}" + done +} + setup_vfs +bind_all diff --git a/roles/bootstrap/configure_qat/tasks/bind_qat_vfs.yml b/roles/bootstrap/configure_qat/tasks/bind_qat_vfs.yml new file mode 100644 index 00000000..be7dbbcd --- /dev/null +++ b/roles/bootstrap/configure_qat/tasks/bind_qat_vfs.yml @@ -0,0 +1,69 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: pre-create empty dict for VFs + set_fact: + vfs_acc: {} + +- name: populate VFs dict with values + set_fact: + vfs_acc: "{{ vfs_acc | combine({idx : item.qat_default_vf_driver }) }}" + loop: "{{ range(item.qat_sriov_numvfs | default(0) | int) | list }}" + loop_control: + index_var: idx + loop_var: vf_default + +- name: update VFs dict with default drivers + set_fact: + vfs_acc: "{{ vfs_acc | combine({vf.key | regex_replace('.*_(\\d*)', '\\1') | int : vf.value}) }}" + loop: "{{ item.qat_vfs | default({}) | dict2items | sort(attribute='key') }}" + loop_control: + loop_var: vf + extended: yes + when: ansible_loop.index < (item.qat_sriov_numvfs | default(0) | int ) + +- name: clean up existing configuration file cek_qat_vfs_{{ item.qat_id }} + file: + path: "{{ sriov_config_path }}/cek_qat_vfs_{{ item.qat_id }}" + state: absent + become: yes + +# get a list of VFs PCI addresses and save the configuration +- name: attach VFs driver + block: + - name: fetch VFs pci addresses for a PF + shell: 'for vf in /sys/bus/pci/devices/{{ item.qat_id }}/virtfn*;do basename $(readlink -f $vf);done | sort' + register: vf_pciids + args: + executable: /bin/bash + changed_when: false + + - name: save VF driver binding + lineinfile: + path: "{{ sriov_config_path }}/cek_qat_vfs_{{ item.qat_id }}" + line: "{{ this_item[0] }} {{ this_item[1].value }}" + regexp: "^{{ this_item[0] }}" + create: yes + owner: root + group: root + mode: '0600' + become: yes + loop: "{{ vf_pciids.stdout_lines | zip(vfs_acc | dict2items) | list }}" + loop_control: + loop_var: this_item + when: + - vf_pciids.stderr|length == 0 + - vf_pciids.stdout_lines|length > 0 diff --git a/roles/bootstrap/configure_qat/tasks/check_qat_status.yml b/roles/bootstrap/configure_qat/tasks/check_qat_status.yml index f0a22122..7bdfd03d 100644 --- a/roles/bootstrap/configure_qat/tasks/check_qat_status.yml +++ b/roles/bootstrap/configure_qat/tasks/check_qat_status.yml @@ -27,15 +27,15 @@ msg: "No QAT module found. Please set update_qat_drivers to true in host vars to resolve the issue." when: '"intel_qat" not in qat_confirm_mod.stdout' -- name: make sure QAT service is started and enabled +- name: make sure qat_service service is started and enabled service: - name: qat + name: qat_service state: started enabled: yes # ansible_facts.services is not supported currently on Ubuntu 20.04, once sorted will remove and use ansible service module -- name: check status of QAT service - shell: "set -o pipefail && service qat status | grep qat_dev" +- name: check status of qat_service service + shell: "set -o pipefail && service qat_service status | grep qat_dev" args: executable: /bin/bash register: qat_status_check diff --git a/roles/bootstrap/configure_qat/tasks/create_qat_vfs.yml b/roles/bootstrap/configure_qat/tasks/create_qat_vfs.yml new file mode 100644 index 00000000..84784bd7 --- /dev/null +++ b/roles/bootstrap/configure_qat/tasks/create_qat_vfs.yml @@ -0,0 +1,68 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: get maximum possible number of VFs + command: cat /sys/bus/pci/devices/{{ item.qat_id }}/sriov_totalvfs + register: total_vfs + changed_when: false + +- name: get current number of VFs + command: cat /sys/bus/pci/devices/{{ item.qat_id }}/sriov_numvfs + register: existing_vfs + changed_when: false + +- name: fail if requested number of VFs is higher than supported + assert: + that: item.qat_sriov_numvfs | default(0) | int <= total_vfs.stdout | int + fail_msg: "Requested qat_sriov_numvfs for {{ item.qat_id }} must be lower than {{ total_vfs.stdout | int }}" + +- name: create QAT VFs and save SR-IOV numvfs configuration + block: + # in case when QAT SR-IOV VFs have been already configured we reset it first to avoid "device or resource busy" error + - name: reset QAT SR-IOV Virtual Functions + shell: echo 0 > /sys/bus/pci/devices/{{ item.qat_id }}/sriov_numvfs + when: existing_vfs.stdout|int != 0 and existing_vfs.stdout|int != item.qat_sriov_numvfs + and (existing_vfs.stdout|int != total_vfs.stdout|int or item.qat_sriov_numvfs|int == 0) + + - name: enable QAT SR-IOV Virtual Functions + shell: echo {{ item.qat_sriov_numvfs }} > /sys/bus/pci/devices/{{ item.qat_id }}/sriov_numvfs + when: existing_vfs.stdout|int != item.qat_sriov_numvfs + and (existing_vfs.stdout|int != total_vfs.stdout|int or item.qat_sriov_numvfs|int == 0) + + - name: force driver binding when QAT VFs are created + lineinfile: + path: "{{ sriov_config_path }}/cek_force_qat_driver_binding" + line: "{{ item.qat_id }}" + regexp: "^{{ item.qat_id }}" + create: yes + owner: root + group: root + mode: '0600' + when: existing_vfs.stdout|int != item.qat_sriov_numvfs + and (existing_vfs.stdout|int != total_vfs.stdout|int or item.qat_sriov_numvfs|int == 0) + + - name: save number of QAT VFs per bus location + lineinfile: + path: "{{ sriov_config_path }}/cek_sriov_qat_numvfs" + line: "{{ item.qat_id }} {{ item.qat_sriov_numvfs | default(0) }}" +# regexp: "^{{ item.qat_id }}" # It was removed intentionally to enable WA for Rocky + create: yes + owner: root + group: root + mode: '0600' + become: yes + when: + - item.qat_sriov_numvfs | default(0) | int != 0 diff --git a/roles/bootstrap/configure_qat/tasks/main.yml b/roles/bootstrap/configure_qat/tasks/main.yml index 144c1a34..206a1121 100644 --- a/roles/bootstrap/configure_qat/tasks/main.yml +++ b/roles/bootstrap/configure_qat/tasks/main.yml @@ -28,41 +28,48 @@ mode: '0700' become: yes -- name: remove existing configuration file if it exists +- name: clean up existing QAT configuration file file: - path: "{{ sriov_config_path }}/cek_sriov_qat_numvfs" + path: "{{ sriov_config_path }}/{{ item }}" state: absent + with_items: + - cek_sriov_qat_numvfs + - cek_force_qat_driver_binding become: yes -- name: create file for QAT ids to create defined VFs - file: - path: "{{ sriov_config_path }}/cek_sriov_qat_numvfs" - state: touch - owner: root - group: root - mode: '0700' - become: yes +- name: Workaround for long qat vfs initiation on Rocky + block: + - name: get current numvfs for qat_devices[0] on Rocky + command: cat "/sys/bus/pci/devices/{{ qat_devices[0].qat_id }}/sriov_numvfs" + register: existing_qat_vfs + changed_when: false -- name: populate QAT vf template with vfs per bus location - lineinfile: - path: "{{ sriov_config_path }}/cek_sriov_qat_numvfs" - line: "{{ item.qat_id }} {{ item.qat_sriov_numvfs | default(0) }}" - owner: root - group: root - mode: '0700' - become: yes - with_items: "{{ qat_devices }}" + - name: create the first dummy record to warmup QAT device on Rocky + include_tasks: create_qat_vfs.yml + with_items: + - {'qat_id': "{{ qat_devices[0].qat_id }}", 'qat_sriov_numvfs': 2} -- name: create the first dummy record to warmup QAT device on Rocky - lineinfile: - path: "{{ sriov_config_path }}/cek_sriov_qat_numvfs" - insertbefore: "{{ qat_devices[0].qat_id }} {{ qat_devices[0].qat_sriov_numvfs | default(0) }}" - line: "{{ qat_devices[0].qat_id }} 2" - become: yes + - name: clean up VFs from the first dummy record to warmup QAT device on Rocky + include_tasks: create_qat_vfs.yml + with_items: + - {'qat_id': "{{ qat_devices[0].qat_id }}", 'qat_sriov_numvfs': 0} + when: existing_qat_vfs.stdout|int == 0 when: - ansible_distribution == "Rocky" - qat_devices | length > 0 - - qat_devices[0].qat_sriov_numvfs > 0 + - not on_vms | default (false) + +- name: create QAT vfs + include_tasks: create_qat_vfs.yml + when: + - item.qat_sriov_numvfs | default(0) > 0 + - not on_vms | default (false) + with_items: "{{ qat_devices }}" + +- name: set QAT VFs driver + include_tasks: bind_qat_vfs.yml + when: item.qat_sriov_numvfs | default(0) > 0 + with_items: "{{ qat_devices }}" - name: copy QAT SRIOV setup script to /usr/local/bin copy: diff --git a/roles/bootstrap/configure_qat/templates/cek_sriov_qat_init.service.j2 b/roles/bootstrap/configure_qat/templates/cek_sriov_qat_init.service.j2 index fb942a07..609f4b1d 100644 --- a/roles/bootstrap/configure_qat/templates/cek_sriov_qat_init.service.j2 +++ b/roles/bootstrap/configure_qat/templates/cek_sriov_qat_init.service.j2 @@ -5,7 +5,9 @@ After=qat.service Requires=qat.service [Service] -Environment=SRIOV_NUMVFS_MAPPINGS={{ sriov_config_path }}/cek_sriov_qat_numvfs +Environment=QAT_SRIOV_NUMVFS_MAPPINGS={{ sriov_config_path }}/cek_sriov_qat_numvfs +Environment=QAT_DEVICE_DRIVER_MAPPINGS={{ sriov_config_path }}/cek_qat_vfs +Environment=DEVBIND_TOOL=/usr/local/bin/dpdk-devbind.py Type=oneshot ExecStartPre=/bin/sleep 10 ExecStart=/usr/local/bin/cek_sriov_qat_init diff --git a/roles/bootstrap/configure_security/tasks/fw_debian.yaml b/roles/bootstrap/configure_security/tasks/fw_debian.yaml index ffc7adfe..c55eee89 100644 --- a/roles/bootstrap/configure_security/tasks/fw_debian.yaml +++ b/roles/bootstrap/configure_security/tasks/fw_debian.yaml @@ -21,12 +21,28 @@ when: inventory_hostname in groups['kube_control_plane'] or ( 'vm_host' in groups and inventory_hostname in groups['vm_host']) +- name: open required ports in the firewall configuration on the controller nodes + command: ufw allow {{ item }} + with_items: "{{ adq_open_ports['controller'] }}" + become: yes + when: + - inventory_hostname in groups['kube_control_plane'] + - adq_dp.enabled |d(false) | bool + - name: open required ports in the firewall configuration on the worker nodes command: ufw allow {{ item }} with_items: "{{ fw_open_ports['node'] }}" become: yes when: inventory_hostname in groups['kube_node'] +- name: open required ports in the firewall configuration on the worker nodes + command: ufw allow {{ item }} + with_items: "{{ adq_open_ports['node'] }}" + become: yes + when: + - inventory_hostname in groups['kube_node'] + - adq_dp.enabled |d(false) | bool + - name: allow traffic from Kubernetes subnets command: ufw allow from {{ item }} become: yes @@ -46,6 +62,7 @@ shell: set pipefail -o && route | grep default | awk '{print $8}' # interface name is at the very end of line args: executable: /bin/bash + changed_when: false register: default_if - name: allow incoming trafiic on default interface diff --git a/roles/bootstrap/configure_security/tasks/fw_redhat.yaml b/roles/bootstrap/configure_security/tasks/fw_redhat.yaml index bc114648..66f50660 100644 --- a/roles/bootstrap/configure_security/tasks/fw_redhat.yaml +++ b/roles/bootstrap/configure_security/tasks/fw_redhat.yaml @@ -21,12 +21,28 @@ when: inventory_hostname in groups['kube_control_plane'] or ( 'vm_host' in groups and inventory_hostname in groups['vm_host']) +- name: open required ports in the firewall configuration on the controller + command: firewall-cmd --zone=public --add-port={{ item | regex_replace(':', '-') }} --permanent + with_items: "{{ adq_open_ports['controller'] }}" + become: yes + when: + - inventory_hostname in groups['kube_control_plane'] + - adq_dp.enabled |d(false) | bool + - name: open required ports in the firewall configuration on the node command: firewall-cmd --zone=public --add-port={{ item | regex_replace(':', '-') }} --permanent with_items: "{{ fw_open_ports['node'] }}" become: yes when: inventory_hostname in groups['kube_node'] +- name: open required ports in the firewall configuration on the node + command: firewall-cmd --zone=public --add-port={{ item | regex_replace(':', '-') }} --permanent + with_items: "{{ adq_open_ports['node'] }}" + become: yes + when: + - inventory_hostname in groups['kube_node'] + - adq_dp.enabled |d(false) | bool + - name: add Kubernetes pods and services subnets to the "trusted" zone in firewalld command: firewall-cmd --zone=trusted --permanent --add-source={{ item }} changed_when: true @@ -43,6 +59,7 @@ shell: set pipefail -o && route | grep default | awk '{print $8}' # interface name is at the very end of line args: executable: /bin/bash + changed_when: false register: default_if - name: allow incoming trafiic on default interface diff --git a/roles/bootstrap/configure_security/tasks/main.yml b/roles/bootstrap/configure_security/tasks/main.yml index 92e7d917..6fdeddd3 100644 --- a/roles/bootstrap/configure_security/tasks/main.yml +++ b/roles/bootstrap/configure_security/tasks/main.yml @@ -54,7 +54,7 @@ Rocky: firewalld Ubuntu: ufw become: yes - ignore_errors: true + failed_when: false when: not firewall_enabled | default(false) | bool - name: configure SELinux @@ -68,7 +68,7 @@ - name: increase file size limit settings blockinfile: dest: /etc/security/limits.conf - marker: "" + marker: "# {mark} ANSIBLE MANAGED BLOCK - CEK values" block: |2 root soft fsize unlimited root hard fsize unlimited diff --git a/roles/bootstrap/configure_security/vars/main.yml b/roles/bootstrap/configure_security/vars/main.yml index 46b4b4b1..7a42e11f 100644 --- a/roles/bootstrap/configure_security/vars/main.yml +++ b/roles/bootstrap/configure_security/vars/main.yml @@ -61,6 +61,81 @@ fw_open_ports: - 8285/udp - 8472/udp +adq_open_ports: + controller: + # etcd access + - 2379:2380/tcp + # VXLAN overlay + - 8472/udp + # cluster health checks (cilium-health) + - 4240/tcp + # Hubble server + - 4244/tcp + # Hubble Relay + - 4245/tcp + # cilium-agent pprof server (listening on 127.0.0.1) + - 6060/tcp + # cilium-operator pprof server (listening on 127.0.0.1) + - 6061/tcp + # Hubble Relay pprof server (listening on 127.0.0.1) + - 6062/tcp + # cilium-agent health status API (listening on 127.0.0.1 and/or ::1) + - 9879/tcp + # cilium-agent gops server (listening on 127.0.0.1) + - 9890/tcp + # operator gops server (listening on 127.0.0.1) + - 9891/tcp + # clustermesh-apiserver gops server (listening on 127.0.0.1) + - 9892/tcp + # Hubble Relay gops server (listening on 127.0.0.1) + - 9893/tcp + # cilium-agent Prometheus metrics + - 9962/tcp + # cilium-operator Prometheus metrics + - 9963/tcp + # cilium-proxy Prometheus metrics + - 9964/tcp + # WireGuard encryption tunnel endpoint + - 51871/udp + # health checks + - ICMP 8/0 + node: + # etcd access + - 2379:2380/tcp + # VXLAN overlay + - 8472/udp + # cluster health checks (cilium-health) + - 4240/tcp + # Hubble server + - 4244/tcp + # Hubble Relay + - 4245/tcp + # cilium-agent pprof server (listening on 127.0.0.1) + - 6060/tcp + # cilium-operator pprof server (listening on 127.0.0.1) + - 6061/tcp + # Hubble Relay pprof server (listening on 127.0.0.1) + - 6062/tcp + # cilium-agent health status API (listening on 127.0.0.1 and/or ::1) + - 9879/tcp + # cilium-agent gops server (listening on 127.0.0.1) + - 9890/tcp + # operator gops server (listening on 127.0.0.1) + - 9891/tcp + # clustermesh-apiserver gops server (listening on 127.0.0.1) + - 9892/tcp + # Hubble Relay gops server (listening on 127.0.0.1) + - 9893/tcp + # cilium-agent Prometheus metrics + - 9962/tcp + # cilium-operator Prometheus metrics + - 9963/tcp + # cilium-proxy Prometheus metrics + - 9964/tcp + # WireGuard encryption tunnel endpoint + - 51871/udp + # health checks + - ICMP 8/0 fw_open_subnets: - "{{ kube_pods_subnet }}" diff --git a/roles/bootstrap/configure_sgx/defaults/main.yml b/roles/bootstrap/configure_sgx/defaults/main.yml index ec860645..95561720 100644 --- a/roles/bootstrap/configure_sgx/defaults/main.yml +++ b/roles/bootstrap/configure_sgx/defaults/main.yml @@ -17,13 +17,13 @@ # Intel SGX-DCAP drivers module for Ubuntu 20.04 dcap_driver_series_ubuntu_20: "1.41" dcap_driver_version_ubuntu_20: "sgx_linux_x64_driver_{{ dcap_driver_series_ubuntu_20 }}.bin" -dcap_driver_url_ubuntu_20: "https://download.01.org/intel-sgx/sgx-dcap/1.13/linux/distro/ubuntu20.04-server/{{ dcap_driver_version_ubuntu_20 }}" -dcap_driver_checksum_ubuntu_20: "sha256:9779638e5ac13288d47cfb543df1f36c43c51743043cd4ee324bc3ddbe4c9068" +dcap_driver_url_ubuntu_20: "https://download.01.org/intel-sgx/sgx-dcap/1.14/linux/distro/ubuntu20.04-server/{{ dcap_driver_version_ubuntu_20 }}" +dcap_driver_checksum_ubuntu_20: "sha256:c62a94cf3eb6d8e46d47a87481484ccf50bb6083680e9d809b65699083b17e20" sgx_folder_check_ubuntu_20: "{{ project_root_dir }}/sgx-{{ dcap_driver_series_ubuntu_20 }}" -sgx_sdk_version_ubuntu_20: "sgx_linux_x64_sdk_2.16.100.4.bin" -sgx_sdk_url_ubuntu_20: "https://download.01.org/intel-sgx/sgx-dcap/1.13/linux/distro/ubuntu20.04-server/{{ sgx_sdk_version_ubuntu_20 }}" -sgx_sdk_checksum_ubuntu_20: "sha256:db5f36a77960595ee7216c6beb4da0fda3293b25c0f9af14989f468181a158c0" +sgx_sdk_version_ubuntu_20: "sgx_linux_x64_sdk_2.17.100.3.bin" +sgx_sdk_url_ubuntu_20: "https://download.01.org/intel-sgx/sgx-dcap/1.14/linux/distro/ubuntu20.04-server/{{ sgx_sdk_version_ubuntu_20 }}" +sgx_sdk_checksum_ubuntu_20: "sha256:408a38b1b2ee0065016035eb25a652a658eac73b9cb2319867a2143e08f7d758" # Intel SGX-SGX Key configuration for Ubuntu >= 18.04.4 sgx_apt_source_list: "intel-sgx" @@ -33,21 +33,28 @@ sgx_apt_repo_key: "{{ sgx_apt_repo_url }}/intel-sgx-deb.key" # Intel SGX-DCAP drivers module for <= RHEL 8.4 dcap_driver_series_rhel: "1.41" dcap_driver_version_rhel: "sgx_linux_x64_driver_{{ dcap_driver_series_rhel }}.bin" -dcap_driver_url_rhel: "https://download.01.org/intel-sgx/sgx-dcap/1.13/linux/distro/rhel8.4-server/{{ dcap_driver_version_rhel }}" -dcap_driver_checksum_rhel: "sha256:f91edc30caa11df7c22b7e31d0df5fe8030e0117102d3885e8eadbf9cd611f46" +dcap_driver_url_rhel: "https://download.01.org/intel-sgx/sgx-dcap/1.14/linux/distro/rhel8.4-server/{{ dcap_driver_version_rhel }}" +dcap_driver_checksum_rhel: "sha256:7a6b38034f2d244bc082896c7d2e110e9532267f1cf6d4716cd001243baebea7" sgx_folder_check_rhel: "{{ project_root_dir }}/sgx-{{ dcap_driver_series_rhel }}" -sgx_sdk_version_rhel: "sgx_linux_x64_sdk_2.16.100.4.bin" -sgx_sdk_url_rhel: "https://download.01.org/intel-sgx/sgx-dcap/1.13/linux/distro/rhel8.4-server/{{ sgx_sdk_version_rhel }}" -sgx_sdk_checksum_rhel: "sha256:f270ba791eda76732f38f8b38f0cbe525bd42c3a173177d1c0e1b3dbdcb415d1" +sgx_sdk_version_rhel: "sgx_linux_x64_sdk_2.17.100.3.bin" +sgx_sdk_url_rhel: " https://download.01.org/intel-sgx/sgx-dcap/1.14/linux/distro/rhel8.4-server/{{ sgx_sdk_version_rhel }}" +sgx_sdk_checksum_rhel: "sha256:a0cad3c904f41935b67cbdf6d77981ba76e53e2ecfba1c6de19008fc76a53127" # Intel SGX RPM local repository for RHEL <= 8.4 sgx_rpm_local_repo_version_rhel: "sgx_rpm_local_repo.tgz" -sgx_rpm_local_repo_url_rhel: "https://download.01.org/intel-sgx/sgx-dcap/1.13/linux/distro/rhel8.4-server/{{ sgx_rpm_local_repo_version_rhel }}" -sgx_rpm_local_repo_checksum_rhel: "sha256:8003de041de5065f44b8c515fe21056654209431c6ec995637746183cd2e2ac1" +sgx_rpm_local_repo_url_rhel: "https://download.01.org/intel-sgx/sgx-dcap/1.14/linux/distro/rhel8.4-server/{{ sgx_rpm_local_repo_version_rhel }}" +sgx_rpm_local_repo_checksum_rhel: "sha256:c96e4f3c8db70c345d94b3164e42eafdede3e62e4ea72b3c3fa37fffeca81938" sgx_config_dir: "{{ project_root_dir }}" -sgx_rpm_directory: "{{ project_root_dir }}/sgx_rpm_local_repo" +sgx_rpm_directory: "{{ (project_root_dir, 'sgx_rpm_local_repo') | path_join }}" -sgx_pkg_version: "2.16.100.4" -sgx_pkg_dcap_version: "1.13.100.4" +sgx_pkg_version: "2.17.100.3" +sgx_pkg_dcap_version: "1.14.100.3" + +protobuf_version: protobuf-3.5.0-13.el8.x86_64.rpm +protobuf_repository: http://dl.rockylinux.org/pub/rocky/8.6/AppStream/x86_64/os/Packages/p +protobuf_library_version: libprotobuf.so.15 +protobuf_library_version_long: "{{ protobuf_library_version }}.0.0" +protobuf_library_dir: usr/lib64 +protobuf_dir: protobuf diff --git a/roles/bootstrap/configure_sgx/tasks/main.yml b/roles/bootstrap/configure_sgx/tasks/main.yml index aa0fba61..2c0a51a2 100644 --- a/roles/bootstrap/configure_sgx/tasks/main.yml +++ b/roles/bootstrap/configure_sgx/tasks/main.yml @@ -17,19 +17,21 @@ - name: determine machine type include_role: name: check_machine_type + when: + - inventory_hostname in groups['kube_node'] or + inventory_hostname in groups['vm_host'] + - not on_vms | default (false) - name: install dependencies - cpuid package: name: cpuid state: present - when: is_icx or is_spr - name: check CPU/BIOS is enabled for SGX shell: set -o pipefail && cpuid | grep -i sgx | grep -v ENCL args: executable: /bin/bash register: cpuid_output - when: is_icx or is_spr changed_when: false - name: SGX is not enabled in BIOS @@ -38,20 +40,18 @@ - "Please enable all required options for Intel SGX in BIOS." - "If failure persists, check with your system vendor." when: - - is_icx or is_spr - "'false' in cpuid_output.stdout" + - not on_vms | default(false) - name: configure SGX on Ubuntu distribution include_tasks: ubuntu.yml when: - ansible_distribution == 'Ubuntu' - - is_icx or is_spr - name: configure SGX on RHEL distribution include_tasks: rhel.yml when: - ansible_os_family == "RedHat" - - is_icx or is_spr - name: SGX configuration is successful debug: @@ -59,4 +59,3 @@ - "The BIOS check passed..." - "The system is properly configured..." - "Intel SGX Device Plugin may be deployed now!" - when: is_icx or is_spr diff --git a/roles/bootstrap/configure_sgx/tasks/rhel.yml b/roles/bootstrap/configure_sgx/tasks/rhel.yml index d8de86a8..838a182c 100644 --- a/roles/bootstrap/configure_sgx/tasks/rhel.yml +++ b/roles/bootstrap/configure_sgx/tasks/rhel.yml @@ -90,17 +90,69 @@ line: 'gpgcheck=0' mode: '0644' -- name: install sgx platform sw +- name: install software specific to rocky 8 for sgx platform package: disable_gpg_check: yes name: - libsgx-launch - libsgx-epid - - libsgx-urts - libsgx-quote-ex + - sgx-aesm-service + state: present + when: ansible_distribution_version < '9.0' + +- name: Setting packages for Rocky / RHEL >= 9.0 for sgx platform + block: + - name: install software specific to Rocky / RHEL >= 9.0 for sgx platform + shell: "set -o pipefail && rpm --reinstall --nodeps '{{ sgx_rpm_directory }}/{{ item }}-{{ sgx_pkg_version }}-1.el8.x86_64.rpm'" + loop: + - libsgx-launch + - libsgx-epid + - libsgx-quote-ex + - sgx-aesm-service + changed_when: true + + - name: install additional software for Rocky / RHEL >= 9.0 + package: + state: present + name: + - compat-openssl11 + + - name: making directory for unpacking rpm + file: + path: "{{ (project_root_dir, 'protobuf') | path_join }}" + state: directory + mode: '0644' + + - name: downloading protobuf from rocky 8 repository + get_url: + url: "{{ protobuf_repository }}/{{ protobuf_version }}" + dest: "{{ (project_root_dir, protobuf_dir, protobuf_version) | path_join }}" + mode: '0640' + + - name: unpack protobuf rpm + shell: 'rpm2cpio {{ protobuf_version }} | cpio -idmv' + args: + chdir: "{{ (project_root_dir, protobuf_dir) | path_join }}" + changed_when: true + + - name: copy protobuf library + copy: + src: "{{ (project_root_dir, protobuf_dir, protobuf_library_dir, protobuf_library_version) | path_join }}" + remote_src: true + owner: root + group: root + mode: "0755" + dest: "{{ ('/', protobuf_library_dir, protobuf_library_version_long) | path_join }}" + when: ansible_distribution_version >= '9.0' + +- name: install common software for sgx platform + package: + disable_gpg_check: yes + name: + - libsgx-urts - libsgx-enclave-common - libsgx-uae-service - - sgx-aesm-service - libsgx-dcap-ql - libsgx-ae-qe3 - libsgx-ae-qve @@ -121,6 +173,11 @@ state: started name: aesmd +- name: wait for aesmd service to start + pause: + minutes: 1 + when: ansible_distribution_version >= '9.0' + - name: get aesmd service facts service_facts: register: service_info diff --git a/roles/bootstrap/configure_sst/defaults/main.yml b/roles/bootstrap/configure_sst/defaults/main.yml index 1e69dace..6a3615f4 100644 --- a/roles/bootstrap/configure_sst/defaults/main.yml +++ b/roles/bootstrap/configure_sst/defaults/main.yml @@ -19,6 +19,6 @@ clx_sst_bf_commit_hash: "a3a1869fd88eff5b2b872f447ca69b866e3d318e" clx_sst_bf_dir: "{{ project_root_dir }}/CommsPowerManagement" clx_sst_bf_exec: "/usr/local/bin/sst_bf.py" -isst_tool_git_url: https://github.com/torvalds/linux.git -isst_tool_git_version: v5.17 +isst_tool_git_url: "https://github.com/torvalds/linux.git" +isst_tool_git_version: "v5.19" isst_tool_src_dir: "{{ (project_root_dir, 'speedselect') | path_join }}" diff --git a/roles/bootstrap/configure_sst/tasks/clx_setup_sst_bf.yml b/roles/bootstrap/configure_sst/tasks/clx_setup_sst_bf.yml index 5976a4ab..5e52ea34 100644 --- a/roles/bootstrap/configure_sst/tasks/clx_setup_sst_bf.yml +++ b/roles/bootstrap/configure_sst/tasks/clx_setup_sst_bf.yml @@ -16,7 +16,7 @@ --- - name: verify that intel_pstate driver is enabled fail: - msg: intel_pstate must be enabled for SST-BF to work on CLX platform + msg: intel_pstate must be enabled for SST-BF to work on CLX platform when: intel_pstate is defined and intel_pstate == "disable" - name: validate sst mode diff --git a/roles/bootstrap/configure_sst/tasks/main.yml b/roles/bootstrap/configure_sst/tasks/main.yml index 2f690a6a..d27c47b8 100644 --- a/roles/bootstrap/configure_sst/tasks/main.yml +++ b/roles/bootstrap/configure_sst/tasks/main.yml @@ -17,6 +17,10 @@ - name: determine machine type include_role: name: check_machine_type + when: + - inventory_hostname in groups['kube_node'] or + inventory_hostname in groups['vm_host'] + - not on_vms | default (false) # Common part for both ICX and CLX platform - name: install libraries utility required for CentOS 8.3+ @@ -28,7 +32,9 @@ # Configuartion for Intel(R) Speed Select Technology "SST-BF,SST-CP,SST-TF and SST-PP" - name: configure Intel Speed Select Technology (ISST) include_tasks: sst_bf_cp_tf_pp_setup.yml - when: is_icx + when: + - is_icx or + is_spr # CLX specific - name: configure Intel SST BF on CLX Platform diff --git a/roles/bootstrap/configure_sst/tasks/sst_bf_cp_tf_pp_setup.yml b/roles/bootstrap/configure_sst/tasks/sst_bf_cp_tf_pp_setup.yml index 6bfc2b8c..b17e1019 100644 --- a/roles/bootstrap/configure_sst/tasks/sst_bf_cp_tf_pp_setup.yml +++ b/roles/bootstrap/configure_sst/tasks/sst_bf_cp_tf_pp_setup.yml @@ -14,9 +14,9 @@ ## limitations under the License. ## --- -- name: install 'intel-speed-select' tool on Ubuntu 20.04+ on ICX Platform +- name: install Intel-Speed-Select-Technology (ISST) tool on Ubuntu include_tasks: ubuntu_install_sst_tool.yml - when: ansible_distribution == 'Ubuntu' and ansible_distribution_version > '18.04' + when: ansible_distribution == 'Ubuntu' and ansible_distribution_version >= '20.04' - name: Intel(R)-Speed-Select-Technology (ISST) verification command: "intel-speed-select --info" @@ -32,21 +32,21 @@ register: sst_pp_verify changed_when: true -- name: SST-BF verification +- name: SST-BF verification command: "intel-speed-select base-freq enable -a" register: sst_bf_verify when: - sst_bf_configuration_enabled is defined and sst_bf_configuration_enabled - '"get-config-levels:0" in sst_pp_verify.stderr' -- name: SST-CP verification +- name: SST-CP verification command: "intel-speed-select core-power enable -a" register: sst_cp_verify when: - sst_cp_configuration_enabled is defined and sst_cp_configuration_enabled - '"get-config-levels:0" in sst_pp_verify.stderr' -- name: SST-TF verification +- name: SST-TF verification command: "intel-speed-select turbo-freq enable -a" register: sst_tf_verify when: diff --git a/roles/bootstrap/configure_sst/tasks/sst_pp.yml b/roles/bootstrap/configure_sst/tasks/sst_pp.yml index dd845f1e..513aa5a1 100644 --- a/roles/bootstrap/configure_sst/tasks/sst_pp.yml +++ b/roles/bootstrap/configure_sst/tasks/sst_pp.yml @@ -149,14 +149,14 @@ - name: SST-PP turbostat output when turbo-freq setup is auto configuration debug: - msg: "{{ read_turbostat_output.stdout|replace('\\t',' ') }}" + msg: "{{ read_turbostat_output.stdout | replace('\\t',' ') }}" when: - '"enable" in sst_tf_config' - '"auto" in sst_tf_online_cpus' - name: save turbostat output for auto config to SST-PP dir path shell: - cmd: "turbostat -c {{ online_cpus_range.stdout }} --show Package,Core,CPU,Bzy_MHz -i 1 | head -n 59 > sst_pp_turbostat_output_when_auto.txt" + cmd: "turbostat -c {{ online_cpus_range.stdout }} --show Package,Core,CPU,Bzy_MHz -i 1 | head -n 59 > sst_pp_turbostat_output_when_auto.txt" args: executable: /bin/bash chdir: "{{ project_root_dir }}/sst_pp_config" @@ -186,7 +186,7 @@ - name: SST-PP turbostat output when all SST-BF,SST-CP and SST-TF are disabled debug: - msg: "{{ turbostat_output_for_disabled.stdout|replace('\\t',' ') }}" + msg: "{{ turbostat_output_for_disabled.stdout | replace('\\t',' ') }}" when: - '"disable" in sst_bf_config' - '"disable" in sst_cp_config' @@ -194,7 +194,7 @@ - name: save turbostat output to SST-PP dir when SST-BF,SST-CP and SST-TF are disabled shell: - cmd: "turbostat -c {{ online_cpus_range.stdout }} --show Package,Core,CPU,Bzy_MHz -i 1 | head -n 59 > sst_pp_turbostat_output_when_disabled.txt" + cmd: "turbostat -c {{ online_cpus_range.stdout }} --show Package,Core,CPU,Bzy_MHz -i 1 | head -n 59 > sst_pp_turbostat_output_when_disabled.txt" args: executable: /bin/bash chdir: "{{ project_root_dir }}/sst_pp_config" @@ -223,7 +223,7 @@ intel-speed-select \ -d perf-profile set-config-level \ -l {{ sst_pp_level_set.stdout }} -o | \ - grep -E '(set_tdp_level|online|offline|logical)' > sst_pp_config_details.txt + grep -E '(set_tdp_level | online | offline | logical)' > sst_pp_config_details.txt args: executable: /bin/bash chdir: "{{ project_root_dir }}/sst_pp_config" diff --git a/roles/bootstrap/configure_sst/tasks/sst_pp_user_defined_setup.yml b/roles/bootstrap/configure_sst/tasks/sst_pp_user_defined_setup.yml index 04e0082a..6acf9581 100644 --- a/roles/bootstrap/configure_sst/tasks/sst_pp_user_defined_setup.yml +++ b/roles/bootstrap/configure_sst/tasks/sst_pp_user_defined_setup.yml @@ -55,7 +55,7 @@ - name: SST-PP turbostat output when online CPUs is not set to auto configuration debug: - msg: "{{ read_turbostat_values.stdout|replace('\\t',' ') }}" + msg: "{{ read_turbostat_values.stdout | replace('\\t',' ') }}" - name: create directory sst_pp_config to save details file: diff --git a/roles/bootstrap/configure_sst/tasks/ubuntu_install_sst_tool.yml b/roles/bootstrap/configure_sst/tasks/ubuntu_install_sst_tool.yml index 8bb623a8..099cf621 100644 --- a/roles/bootstrap/configure_sst/tasks/ubuntu_install_sst_tool.yml +++ b/roles/bootstrap/configure_sst/tasks/ubuntu_install_sst_tool.yml @@ -14,6 +14,10 @@ ## limitations under the License. ## --- +- name: install dependencies + include_role: + name: install_dependencies + - name: clone git repository to compile ISST git: repo: "{{ isst_tool_git_url }}" diff --git a/roles/bootstrap/configure_sst/vars/main.yml b/roles/bootstrap/configure_sst/vars/main.yml new file mode 100644 index 00000000..cc147a80 --- /dev/null +++ b/roles/bootstrap/configure_sst/vars/main.yml @@ -0,0 +1,20 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +install_dependencies: + Debian: + - libnl-3-dev + - libnl-genl-3-dev diff --git a/roles/bootstrap/determine_dataplane_interfaces/tasks/dataplane-interfaces.yml b/roles/bootstrap/determine_dataplane_interfaces/tasks/dataplane-interfaces.yml index a6f5cf2b..276f72b8 100644 --- a/roles/bootstrap/determine_dataplane_interfaces/tasks/dataplane-interfaces.yml +++ b/roles/bootstrap/determine_dataplane_interfaces/tasks/dataplane-interfaces.yml @@ -35,7 +35,7 @@ - name: adding nic with name added to new_dataplane_interfaces set_fact: - new_dataplane_interfaces: "{{ new_dataplane_interfaces + [ new_nic ] }}" + new_dataplane_interfaces: "{{ new_dataplane_interfaces + [ new_nic ] }}" when: ansible_distribution == "Ubuntu" - name: block for dataplane interface list in RHEL / Rocky diff --git a/roles/bootstrap/golang_install/tasks/main.yml b/roles/bootstrap/golang_install/tasks/main.yml index 7d3b580f..ec08952b 100644 --- a/roles/bootstrap/golang_install/tasks/main.yml +++ b/roles/bootstrap/golang_install/tasks/main.yml @@ -33,48 +33,49 @@ - name: start procedure to install golang in required version block: - - name: uninstall existing golang - file: - path: "{{ item }}" - state: absent - with_items: - - "/usr/local/go" - - "$HOME/go" + - name: uninstall existing golang + file: + path: "{{ item }}" + state: absent + with_items: + - "/usr/local/go" + - "$HOME/go" - - name: download golang tarball - get_url: - url: "{{ golang_download_url }}" - checksum: "{{ golang_download_checksum }}" - dest: "{{ project_root_dir }}" - register: golang_download - until: golang_download is not failed - retries: 5 + - name: download golang tarball + get_url: + url: "{{ golang_download_url }}" + checksum: "{{ golang_download_checksum }}" + dest: "{{ project_root_dir }}" + mode: 0755 + register: golang_download + until: golang_download is not failed + retries: 5 - - name: untar downloaded golang tarball - unarchive: - src: "{{ golang_download.dest }}" - dest: /usr/local - copy: no - mode: 0755 + - name: untar downloaded golang tarball + unarchive: + src: "{{ golang_download.dest }}" + dest: /usr/local + copy: no + mode: 0755 - - name: set GOPATH env and add golang bin to PATH for all users - copy: - content: | - export GOROOT=/usr/local/go - export GOPATH=$HOME/go - export PATH=$GOPATH/bin:$GOROOT/bin:$PATH - dest: /etc/profile.d/golang.sh - mode: 0755 + - name: set GOPATH env and add golang bin to PATH for all users + copy: + content: | + export GOROOT=/usr/local/go + export GOPATH=$HOME/go + export PATH=$GOPATH/bin:$GOROOT/bin:$PATH + dest: /etc/profile.d/golang.sh + mode: 0755 - - name: create symlinks to golang binaries - file: - state: link - src: /usr/local/go/bin/{{ item }} - dest: /usr/bin/{{ item }} - mode: 0755 - with_items: - - go - - gofmt + - name: create symlinks to golang binaries + file: + state: link + src: /usr/local/go/bin/{{ item }} + dest: /usr/bin/{{ item }} + mode: 0755 + with_items: + - go + - gofmt when: golang_version != go_version.stdout # golang is successfully installed in required version @@ -88,22 +89,26 @@ block: - name: check current cfssl version shell: go version -m $(which cfssl) | grep mod | awk '{print $3}' + changed_when: false failed_when: false register: cfssl_current_version - name: check latest cfssl version shell: go list -m -versions github.com/cloudflare/cfssl | awk '{print $7}' # $7 should be latest varsion + changed_when: false failed_when: false register: cfssl_latest_version - name: install cfssl in latest version command: go install github.com/cloudflare/cfssl/cmd/cfssl@latest + changed_when: true when: cfssl_current_version.stdout != cfssl_latest_version.stdout # NOTE(pklimowx): cfssljson doesn't return useful version information # we have to lose 1s here - name: install cfssljson in latest version command: go install github.com/cloudflare/cfssl/cmd/cfssljson@latest + changed_when: true when: - groups['kube_control_plane'] | length > 0 - inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/bootstrap/install_gpu_kernel/tasks/main.yml b/roles/bootstrap/install_gpu_kernel/tasks/main.yml index faab32d8..70bf6bc3 100644 --- a/roles/bootstrap/install_gpu_kernel/tasks/main.yml +++ b/roles/bootstrap/install_gpu_kernel/tasks/main.yml @@ -84,6 +84,7 @@ get_url: url: "{{ linux_dg1_firmware_url }}" dest: "{{ gpu_kernel_src_dst }}" + mode: 0644 changed_when: false - name: ensure destination directory exists diff --git a/roles/bootstrap/install_gpu_kernel/tasks/prepare_grub.yml b/roles/bootstrap/install_gpu_kernel/tasks/prepare_grub.yml index 47120a15..af03497d 100644 --- a/roles/bootstrap/install_gpu_kernel/tasks/prepare_grub.yml +++ b/roles/bootstrap/install_gpu_kernel/tasks/prepare_grub.yml @@ -55,4 +55,5 @@ - name: set default kernel with grub command command: grub-set-default "Advanced options for Ubuntu>Ubuntu, with Linux {{ gpu_dp_kernel_version }}" + changed_when: true when: ansible_distribution == "Ubuntu" and ansible_distribution_version == '21.04' diff --git a/roles/bootstrap/install_packages/tasks/debian.yml b/roles/bootstrap/install_packages/tasks/debian.yml index 3f7a9059..651e823f 100644 --- a/roles/bootstrap/install_packages/tasks/debian.yml +++ b/roles/bootstrap/install_packages/tasks/debian.yml @@ -23,12 +23,12 @@ create: yes mode: 0664 with_items: - - { type: 'http', value: "{{ http_proxy | default('') }}" } - - { type: 'https', value: "{{ https_proxy | default('') }}" } + - {type: 'http', value: "{{ http_proxy | default('') }}"} + - {type: 'https', value: "{{ https_proxy | default('') }}"} when: http_proxy is defined or https_proxy is defined - name: reconfigure unattended-upgrades package - command: dpkg-reconfigure --priority=low unattended-upgrades + command: dpkg-reconfigure --priority=low -f noninteractive unattended-upgrades args: creates: "/etc/apt/apt.conf.d/20auto-upgrades" @@ -41,7 +41,7 @@ with_items: - "/etc/apt/apt.conf.d/20auto-upgrades" - "/etc/apt/apt.conf.d/10periodic" - ignore_errors: true + failed_when: false - name: install build-essential package apt: @@ -137,20 +137,20 @@ - name: install command line tools to collect hardware details apt: name: - - hwinfo - - inxi - - jq + - hwinfo + - inxi + - jq when: ansible_distribution == "Ubuntu" # hirsute (21.04) package for (image & headers) is 20.04. (Note: ansible_distribution_version will not be returned as the correct version) # Depending on the needs, we can split tasks for future Ubuntu releases if necessary. # Ref: https://launchpad.net/ubuntu/hirsute/+package/linux-image-generic-hwe-20.04 -# https://launchpad.net/ubuntu/hirsute/+package/linux-headers-generic-hwe-20.04 +# https://launchpad.net/ubuntu/hirsute/+package/linux-headers-generic-hwe-20.04 - name: Update Ubuntu to the latest kernel and kernel headers apt: name: - - linux-image-generic-hwe-20.04 - - linux-headers-generic-hwe-20.04 + - linux-image-generic-hwe-20.04 + - linux-headers-generic-hwe-20.04 state: latest # noqa 403 notify: - reboot server diff --git a/roles/bootstrap/install_packages/tasks/main.yml b/roles/bootstrap/install_packages/tasks/main.yml index e057aee3..c746e352 100644 --- a/roles/bootstrap/install_packages/tasks/main.yml +++ b/roles/bootstrap/install_packages/tasks/main.yml @@ -22,7 +22,7 @@ include_tasks: debian.yml when: ansible_os_family == "Debian" -#net_attach_def, docker registry +# net_attach_def, docker registry reqs - name: upgrade Python wheel and setuptools pip: name: @@ -30,7 +30,7 @@ - setuptools<=44 extra_args: --upgrade -#pinned python packages versions +# pinned python packages versions - name: install Python packages pip: name: diff --git a/roles/bootstrap/install_packages/tasks/rhel.yml b/roles/bootstrap/install_packages/tasks/rhel.yml index 504e4e5b..c98772f6 100644 --- a/roles/bootstrap/install_packages/tasks/rhel.yml +++ b/roles/bootstrap/install_packages/tasks/rhel.yml @@ -14,27 +14,50 @@ ## limitations under the License. ## --- -- name: enable powertools repository on Rocky >= 8.3 +- name: verify system subscription status on RHEL + command: "subscription-manager list --available --all" + register: check_subscription_status + failed_when: false + when: ansible_distribution == 'RedHat' + +- debug: + msg: "Detected not-subscribed RHEL image. Deployment will proceed assuming private repos are properly configured and available" + when: + - ansible_distribution == "RedHat" + - "'This system is not yet registered' in check_subscription_status.stderr" + +- name: enable powertools repository on Rocky < 9.0 # noqa 303 - yum is called intenionallly here command: yum config-manager --set-enabled powertools - when: - - ansible_distribution in ['CentOS', 'Rocky'] - - ansible_distribution_version >= '8.3' + when: ansible_distribution == 'Rocky' and ansible_distribution_version < '9.0' - name: enable CodeReady Linux Builder repository on RHEL 8 rhsm_repository: name: codeready-builder-for-rhel-8-x86_64-rpms when: - - ansible_distribution in ['RedHat'] - - ansible_distribution_version >= '8' + - ansible_distribution == "RedHat" and ansible_distribution_version < "9.0" + - "'This system is not yet registered' not in check_subscription_status.stderr" + failed_when: false # allow to fail if o/s is not subscribed but need to warn user -- name: install epel-release on CentOS +- name: enable CodeReady Linux Builder repository on RHEL 9 + rhsm_repository: + name: codeready-builder-for-rhel-9-x86_64-rpms + when: + - ansible_distribution == "RedHat" and ansible_distribution_version >= "9.0" + - "'This system is not yet registered' not in check_subscription_status.stderr" + failed_when: false # allow to fail if o/s is not subscribed but need to warn user + +# Rocky 9.0 --set-enabled crb is required which is similar to --set-enabled powertools on Rocky <= 9.0 +- name: enable CRB to support dependent on packages from repositories + command: "dnf config-manager --set-enabled crb -y" + when: ansible_distribution == "Rocky" and ansible_distribution_version >= "9.0" + +- name: install epel-release on Rocky >= 9.0 package: name: epel-release - when: - - ansible_distribution == "CentOS" + when: ansible_distribution == "Rocky" and ansible_distribution_version >= "9.0" -- name: obtain EPEL GPG key on RHEL8 +- name: obtain RPM-GPG-KEY-EPEL-8 rpm_key: state: present key: https://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-8 @@ -42,36 +65,54 @@ - ansible_distribution in ['RedHat', 'Rocky'] - ansible_distribution_version >= '8' -- name: install epel-release on RHEL8 +- name: install RPM-GPG-KEY-EPEL-8 package: name: https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm when: - ansible_distribution in ['RedHat', 'Rocky'] - ansible_distribution_version >= '8' +# CPUID package is missing on RHEL 9.0 / Rocky 9.0 +- name: block for downloading CPUID on RHEL / Rocky >= 9.0 + block: + - name: download CPUID on Rocky >= 9.0 + get_url: + url: "https://download-ib01.fedoraproject.org/pub/epel/8/Everything/x86_64/Packages/c/cpuid-20200427-1.el8.x86_64.rpm" + dest: "{{ project_root_dir }}" + mode: 0755 + + - name: install downloaded CPUID package on RHEL / Rocky >= 9.0 + package: + name: "{{ project_root_dir }}/cpuid-20200427-1.el8.x86_64.rpm" + state: present + when: + - ansible_distribution in ['RedHat', 'Rocky'] + - ansible_distribution_version >= '9' + +- name: obtain RPM-GPG-KEY-EPEL-9 + rpm_key: + state: present + key: https://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-9 + when: ansible_distribution == "RedHat" and ansible_distribution_version >= "9" + +- name: install RPM-GPG-KEY-EPEL-9 + package: + name: https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm + when: ansible_distribution == "RedHat" and ansible_distribution_version >= "9" + - name: get full distribution versions command: cat /etc/redhat-release register: release - changed_when: true + changed_when: false - name: set full distribution version set_fact: full_dist_version: "{{ release.stdout | regex_replace('.*(\\d+.\\d+.\\d\\d\\d\\d).*', '\\1') }}" -#- name: update CentOS Vault yum repository on CentOS 8 -# yum_repository: -# name: C{{ full_dist_version }}-base -# description: CentOS-{{ full_dist_version }} - Base -# file: CentOS-Vault -# baseurl: http://vault.centos.org/{{ full_dist_version }}/BaseOS/$basearch/os/ -# baseurl: http://vault.centos.org/{{ full_dist_version }}/BaseOS/Source/ -# gpgcheck: yes -# gpgkey: file:///etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial -# enabled: yes -# when: -# - ansible_distribution == "CentOS" -# - ansible_distribution_version >= '8' and ansible_distribution_version < '8.3' -# - not update_kernel +- name: get current kernel version + command: uname -r + register: current_kernel_version + changed_when: false # CentOS-Vault repo not working for CentOS 8, so install kernel headers directly - name: pull matching kernel headers on CentOS 8.3 @@ -142,6 +183,28 @@ - ansible_distribution_version == '8.5' - not update_kernel +- name: pull matching kernel headers on Rocky 9.0 + package: + name: "{{ item }}" + state: present + retries: 5 + delay: 10 + register: source_status + until: source_status is not failed + with_items: + - "https://dl.rockylinux.org/pub/rocky/9.0/AppStream/x86_64/kickstart/Packages/k/kernel-headers-5.14.0-70.13.1.el9_0.x86_64.rpm" + - "https://dl.rockylinux.org/pub/rocky/9.0/AppStream/x86_64/kickstart/Packages/k/kernel-devel-5.14.0-70.13.1.el9_0.x86_64.rpm" + when: + - ansible_distribution == "Rocky" + - ansible_distribution_version == '9.0' + - not update_kernel + - current_kernel_version.stdout == '5.14.0-70.13.1.el9_0.x86_64' + +- name: install epel-next-release on Rocky >= 9.0 + package: + name: epel-next-release + when: ansible_distribution == "Rocky" and ansible_distribution_version >= "9.0" + # pull the matching kernel headers if kernel is not updated - name: pull matching kernel headers from configured repos # noqa 503 - more than one condition, can't be a handler @@ -154,9 +217,12 @@ until: kernel_source is success when: - not source_status.changed - - (ansible_distribution in ['RedHat', 'Rocky']) or - (ansible_distribution == "CentOS" and ansible_distribution_version != '8.4') + - ansible_os_family == "RedHat" - not update_kernel + - not 'rt' in ansible_kernel +# with RHEL 8.6 RT not-subscribed: +# "No package kernel-headers-4.18.0-372.9.1.rt7.166.el8.x86_64 available.", +# "No package kernel-devel-4.18.0-372.9.1.rt7.166.el8.x86_64 available." - name: install the 'Development tools' package group package: @@ -171,6 +237,15 @@ - ansible_os_family == "RedHat" - ansible_distribution_version >= '8' +# no harm to remove package it will be reinstalled / updated during dnf update +- name: remove network-scripts package when update packages is required in Rocky / RHEL >= 9.0 + package: + name: network-scripts + state: absent + when: + - ansible_os_family == "RedHat" and ansible_distribution_version >= "9.0" + - update_all_packages | default(false) + - name: update all packages package: name: '*' @@ -183,8 +258,8 @@ - name: update to the latest kernel and kernel headers on the Red Hat OS family package: name: - - kernel - - kernel-devel + - kernel + - kernel-devel state: latest # noqa 403 notify: - reboot server @@ -204,8 +279,8 @@ - name: install command line tools to collect hardware details package: name: - - inxi - - jq + - inxi + - jq state: present when: ansible_os_family == "RedHat" diff --git a/roles/bootstrap/install_qat_drivers_services/defaults/main.yml b/roles/bootstrap/install_qat_drivers_services/defaults/main.yml index 6bc23339..5c39d549 100644 --- a/roles/bootstrap/install_qat_drivers_services/defaults/main.yml +++ b/roles/bootstrap/install_qat_drivers_services/defaults/main.yml @@ -14,7 +14,11 @@ ## limitations under the License. ## --- -qat_drivers_version: 'QAT.L.4.18.0-00008' -qat_drivers_download_url: 'https://downloadmirror.intel.com/729932/{{qat_drivers_version }}.tar.gz' -qat_drivers_pkg_checksum: 'sha1:8BB265F64EC845DDB2B5580638F9FC4293881A1B' +qat_drivers_version: 'QAT.L.4.18.1-00001' +qat_drivers_download_url: 'https://downloadmirror.intel.com/738667/{{ qat_drivers_version }}.tar.gz' +qat_drivers_pkg_checksum: 'sha1:5D967D676963F3D81FB22A60F389D654F231C08C' +# If updating mentioned below folder location kindly update similar in roles/redeploy_cleanup/defaults/main.yml qat_drivers_dir: "{{ (project_root_dir, 'qat_drivers') | path_join }}" + +# sha256:checksum of QAT20.L.0.9.6-00024.tar.gz package +qat_drivers_version_checksum: "b7759c2c7b50077e2840c6cc1001ed4c6c54145a7aa89ccc941ac931c53100b5" diff --git a/roles/bootstrap/install_qat_drivers_services/tasks/main.yml b/roles/bootstrap/install_qat_drivers_services/tasks/main.yml index 14f5a538..9d28097b 100644 --- a/roles/bootstrap/install_qat_drivers_services/tasks/main.yml +++ b/roles/bootstrap/install_qat_drivers_services/tasks/main.yml @@ -18,11 +18,11 @@ include_role: name: install_dependencies -- name: WA for libudev-dev version issue on Ubuntu 22.04 +- name: WA for libudev-dev version issue on Ubuntu apt: name: 'udev' state: latest # noqa 403 package-latest - when: ansible_distribution == "Ubuntu" and ansible_distribution_version == "22.04" + when: ansible_distribution == "Ubuntu" - name: get current udev package version shell: "set -o pipefail && apt list --installed 2>/dev/null |grep '^udev' | awk 'NR==1{ print $2 }'" @@ -54,9 +54,20 @@ url: "{{ qat_drivers_download_url }}" checksum: "{{ qat_drivers_pkg_checksum }}" dest: "{{ qat_drivers_dir }}" + mode: 0755 register: qat_driver until: qat_driver is not failed retries: 5 + when: configured_arch != "spr" + +- name: copy QAT driver package + copy: + dest: "{{ qat_drivers_dir }}/{{ qat_drivers_version }}.tar.gz" + src: "{{ qat_drivers_folder }}/{{ qat_drivers_version }}.tar.gz" + owner: root + group: root + mode: 0644 + when: configured_arch == "spr" - name: unarchive QAT drivers package unarchive: @@ -88,15 +99,16 @@ become: yes notify: - reboot server + when: configured_arch != "spr" -- name: playbook terminated, QAT module load blocking on CentOS > 8 - fail: - msg: - - "Purpose for failure might be an already intel_qat module set up on server. Recommended is to have clean image of OS without pre-installed QAT module." - - "make uninstall && make clean && make distclean commands can roll-back QAT drivers package {{ qat_drivers_version }} already carried out this point" - when: - - ansible_distribution == "CentOS" and ansible_distribution_version >= '8.3' - - "'ERROR:' in qat_make_install.stderr" +# Reboot with driver ver: QAT20.L.0.8.0-00071 causing issues, there is no need to reboot. +- name: make install QAT drivers + make: + chdir: "{{ qat_drivers_dir }}" + target: install + register: qat_make_install + become: yes + when: configured_arch == "spr" - name: confirm QAT module installed shell: "set -o pipefail && lsmod | grep qat" @@ -118,14 +130,41 @@ when: - on_vms is defined and on_vms -- name: make sure old qat_service is stopped and disabled +- name: make sure qat service is stopped and disabled service: state: stopped - name: qat_service + name: qat enabled: no -- name: make sure QAT service is started and enabled +- name: make sure qat_service service is started and enabled service: state: restarted - name: qat + name: qat_service enabled: yes + +# Mentioned below block is also present in "roles/bootstrap/configure_openssl/tasks/intel_qatlibs_and_qatsvm_configuration.yml" that will only occurs if, +# "enable_intel_qatlibs" is "true" in host_vars because, in order to compile QAT configuration, +# QATlibs must be installed before SVM feature is configured +- name: configuration for QAT Shared Virtual Memory (SVM) + block: + - name: set QAT SVM is enabled + set_fact: + svm_value: 1 + + - name: enable address translation services for QAT Shared Virtual Memory (SVM) + replace: + path: "{{ item }}" + regexp: '(^ATEnabled\s)(.*)$' + replace: 'ATEnabled = {{ svm_value }}' + mode: 0600 + with_items: + - "{{ qat_drivers_dir }}/quickassist/utilities/adf_ctl/conf_files/4xxxvf_dev0.conf.vm" + - "{{ qat_drivers_dir }}/quickassist/utilities/adf_ctl/conf_files/4xxxvf_dev0.conf.sym.vm" + - "{{ qat_drivers_dir }}/quickassist/utilities/adf_ctl/conf_files/4xxxvf_dev0.conf.dc.vm" + - "{{ qat_drivers_dir }}/quickassist/utilities/adf_ctl/conf_files/4xxxvf_dev0.conf.asym.vm" + - "{{ qat_drivers_dir }}/quickassist/utilities/adf_ctl/conf_files/4xxxvf_dev0.conf.dc.sym.vm" + - "{{ qat_drivers_dir }}/build/4xxxvf_dev0.conf.vm" + failed_when: false + when: + - configured_arch == "spr" + - enable_intel_qatlibs is defined and not enable_intel_qatlibs diff --git a/roles/bootstrap/install_qat_drivers_services/tasks/qat_drivers_preflight.yml b/roles/bootstrap/install_qat_drivers_services/tasks/qat_drivers_preflight.yml new file mode 100644 index 00000000..f9d80f2f --- /dev/null +++ b/roles/bootstrap/install_qat_drivers_services/tasks/qat_drivers_preflight.yml @@ -0,0 +1,51 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: check folder availability on localhost + delegate_to: localhost + stat: + path: "{{ qat_drivers_folder }}" + register: check_qat_folder + +- fail: + msg: + - "{{ qat_drivers_folder }} folder does not exist." + - "Please make sure that {{ qat_drivers_folder }} folder is present." + delegate_to: localhost + when: not check_qat_folder.stat.exists + +- name: check if QAT package is present + delegate_to: localhost + stat: + path: "{{ (qat_drivers_folder, qat_drivers_version + '.tar.gz') | path_join }}" + checksum_algorithm: sha256 + register: check_qat_package + +- fail: + msg: + - "{{ (qat_drivers_folder, qat_drivers_version + '.tar.gz') | path_join }} does not exist." + - "Please make sure that QAT package is present in {{ qat_drivers_folder }}" + delegate_to: localhost + when: not check_qat_package.stat.exists + +- name: check checksum of QAT package + delegate_to: localhost + assert: + that: "check_qat_package.stat.checksum == qat_drivers_version_checksum" + fail_msg: + - "checksum of QAT drivers package({{ check_qat_package.stat.checksum }}) is different" + - "from CEK validated package checksum {{ qat_drivers_version_checksum }}" + success_msg: "Provided package checksum: {{ check_qat_package.stat.checksum }} = CEK validated package checksum: {{ qat_drivers_version_checksum }}" diff --git a/roles/bootstrap/install_qat_drivers_services/vars/main.yml b/roles/bootstrap/install_qat_drivers_services/vars/main.yml index fe476190..831b6b2d 100644 --- a/roles/bootstrap/install_qat_drivers_services/vars/main.yml +++ b/roles/bootstrap/install_qat_drivers_services/vars/main.yml @@ -16,11 +16,16 @@ --- install_dependencies: Debian: + - cmake - g++ - pkg-config - wget - make - yasm + - libboost-all-dev + - libnl-genl-3-dev + - zlib1g + - zlib1g-dev RedHat: - "@Development Tools" - pciutils @@ -35,3 +40,4 @@ install_dependencies: - perl - usbutils - yasm + - boost-devel diff --git a/roles/bootstrap/set_intel_flexran_kernel_flags/tasks/main.yml b/roles/bootstrap/set_intel_flexran_kernel_flags/tasks/main.yml index 26647917..f0cd65d8 100644 --- a/roles/bootstrap/set_intel_flexran_kernel_flags/tasks/main.yml +++ b/roles/bootstrap/set_intel_flexran_kernel_flags/tasks/main.yml @@ -15,7 +15,7 @@ ## --- # probe CPU -- debug: msg="CPU={{ ansible_processor[2] }} cores={{ ansible_processor_cores }} count={{ ansible_processor_count }} nproc={{ ansible_processor_nproc }} tpc={{ ansible_processor_threads_per_core }} vcpus={{ ansible_processor_vcpus }}" # noqa 204 line-length +- debug: msg="CPU={{ ansible_processor[2] }} cores={{ ansible_processor_cores }} count={{ ansible_processor_count }} nproc={{ ansible_processor_nproc }} tpc={{ ansible_processor_threads_per_core }} vcpus={{ ansible_processor_vcpus }}" # noqa yaml[line-length] - name: include Intel FlexRAN role vars include_vars: ../../intel_flexran/defaults/main.yml @@ -45,39 +45,65 @@ set_fact: # intel_flexran_cmdline: 'GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX} intel_iommu=on iommu=pt" {{ intel_flexran_marker }}' intel_flexran_cmdline: 'GRUB_CMDLINE_LINUX="{{ generated_cmdline.stdout }}" {{ intel_flexran_marker }}' +# intel_flexran_isol_cores: "{{ generated_cmdline.stdout | regex_search('isolcpus=*', '\\1') }}" # in ENV is $isolcpus -- name: override Intel FlexRAN kernel flags specific for 1x32 cores CPU +- debug: msg="generic kernel cmdline is {{ intel_flexran_cmdline }}" + +- name: set Intel FlexRAN kernel flags for Host-16c-single + set_fact: + intel_flexran_cmdline: 'GRUB_CMDLINE_LINUX="default_hugepagesz=1G hugepages=30 hugepagesz=1G nmi_watchdog=0 softlockup_panic=0 intel_iommu=on iommu=pt rcu_nocbs=1-15,17-31 irqaffinity=0,16 isolcpus=managed_irq,domain,1-15,17-31 kthread_cpus=0,16 nohz_full=1-15,17-31 crashkernel=auto enforcing=0 quiet rcu_nocb_poll rhgb selinux=0 mce=off audit=0 pci=realloc pci=assign-busses rdt=l3cat skew_tick=1 nosoftlockup nohz=on" {{ intel_flexran_marker }}' # noqa yaml[line-length] + intel_flexran_isol_cores: "1-15,17-31" + intel_flexran_cpu_supported: true + when: + - ansible_processor_count == 1 + - ansible_processor_cores == 16 + +- name: set Intel FlexRAN kernel flags for Host-20c-single set_fact: - # Sam: BM: Quanta MCC SPR single socket (1x32 cores) for L2 testfile=icelake-sp: - intel_flexran_cmdline: 'GRUB_CMDLINE_LINUX="crashkernel=auto intel_iommu=on iommu=pt usbcore.autosuspend=-1 selinux=0 enforcing=0 nmi_watchdog=0 softlockup_panic=0 audit=0 cgroup_disable=memory mce=off hugepagesz=1G hugepages=40 hugepagesz=2M hugepages=0 default_hugepagesz=1G kthread_cpus=0,30-32,62-63 irqaffinity=0,30-32,62-63 nosoftlockup skew_tick=1 skew_tick=1 isolcpus=1-29,33-61 nohz_full=1-29,33-61 rcu_nocbs=1-29,33-61" {{ intel_flexran_marker }}' # noqa 204 line-length - intel_flexran_isol_cores: "1-29,33-61" + intel_flexran_cmdline: 'GRUB_CMDLINE_LINUX="default_hugepagesz=1G hugepages=40 hugepagesz=1G nmi_watchdog=0 softlockup_panic=0 intel_iommu=on iommu=pt rcu_nocbs=1-19,21-39 irqaffinity=0,20 isolcpus=managed_irq,domain,1-19,21-39 kthread_cpus=0,20 nohz_full=1-19,21-39 crashkernel=auto enforcing=0 quiet rcu_nocb_poll rhgb selinux=0 mce=off audit=0 pci=realloc pci=assign-busses rdt=l3cat skew_tick=1 nosoftlockup nohz=on" {{ intel_flexran_marker }}' # noqa yaml[line-length] + intel_flexran_isol_cores: "1-19,21-39" + intel_flexran_cpu_supported: true + when: + - ansible_processor_count == 1 + - ansible_processor_cores == 20 + +- name: set Intel FlexRAN kernel flags for Host-32c-single + set_fact: + intel_flexran_cmdline: 'GRUB_CMDLINE_LINUX="default_hugepagesz=1G hugepages=60 hugepagesz=1G nmi_watchdog=0 softlockup_panic=0 intel_iommu=on iommu=pt rcu_nocbs=4-31,36-63 irqaffinity=0-3,32-35 isolcpus=managed_irq,domain,4-31,36-63 kthread_cpus=0-3,32-35 nohz_full=4-31,36-63 crashkernel=auto enforcing=0 quiet rcu_nocb_poll rhgb selinux=0 mce=off audit=0 pci=realloc pci=assign-busses rdt=l3cat skew_tick=1 nosoftlockup nohz=on" {{ intel_flexran_marker }}' # noqa yaml[line-length] + intel_flexran_isol_cores: "4-31,36-63" intel_flexran_cpu_supported: true when: - ansible_processor_count == 1 - ansible_processor_cores == 32 -- name: override Intel FlexRAN kernel flags specific for 2x40 cores CPUs +- name: set Intel FlexRAN kernel flags for Host-32c-dual set_fact: - # Baoqian script: BM: FCP (2x40 cores): - intel_flexran_cmdline: 'GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt usbcore.autosuspend=-1 selinux=0 enforcing=0 nmi_watchdog=0 crashkernel=auto softlockup_panic=0 audit=0 cgroup_disable=memory tsc=nowatchdog intel_pstate=disable mce=off hugepagesz=1G hugepages=40 hugepagesz=2M hugepages=0 default_hugepagesz=1G kthread_cpus=0,80,40-79,120-159 irqaffinity=0,80,40-79,120-159 nohz=on nosoftlockup nohz_full=1-39,81-119 rcu_nocbs=1-39,81-119 rcu_nocb_poll skew_tick=1 isolcpus=1-39,81-119" {{ intel_flexran_marker }}' # noqa 204 line-length + intel_flexran_cmdline: 'GRUB_CMDLINE_LINUX="default_hugepagesz=1G hugepages=60 hugepagesz=1G nmi_watchdog=0 softlockup_panic=0 intel_iommu=on iommu=pt rcu_nocbs=4-59,68-123 irqaffinity=0-3,60-63,64-67,124-127 isolcpus=managed_irq,domain,4-59,68-123 kthread_cpus=0-3,60-63,64-67,124-127 nohz_full=4-59,68-123 crashkernel=auto enforcing=0 quiet rcu_nocb_poll rhgb selinux=0 mce=off audit=0 pci=realloc pci=assign-busses rdt=l3cat skew_tick=1 nosoftlockup nohz=on" {{ intel_flexran_marker }}' # noqa yaml[line-length] + intel_flexran_isol_cores: "4-59,68-123" intel_flexran_cpu_supported: true when: - ansible_processor_count == 2 - - ansible_processor_cores == 40 + - ansible_processor_cores == 32 -- name: override Intel FlexRAN kernel flags specific for 2x52 cores CPUs +- name: set Intel FlexRAN kernel flags for Host-52c-dual set_fact: - # Sam: BM: Quanta XCC dual socket (2x52 cores) for L2 testfile=icelake-sp: - intel_flexran_cmdline: 'GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt default_hugepagesz=1G hugepagesz=1G hugepages=60 irqaffinity=0,50-52,103-104,154-156,206-207 mce=off nmi_watchdog=0 softlockup_panic=0 selinux=0 enforcing=0 audit=0 kthread_cpus=0,50-52,103-104,154-156,206-207 clock=pit no_timer_check clocksource=tsc tsc=perfect usbcore.autosuspend=-1 pci=realloc pci=assign-busses rdt=l3cat skew_tick=1 isolcpus=managed_irq,domain,1-49,53-101,105-153,157-205 intel_pstate=disable nosoftlockup tsc=nowatchdog nohz=on nohz_full=1-49,53-101,105-153,157-205 rcu_nocbs=1-49,53-101,105-153,157-205" {{ intel_flexran_marker }}' # noqa 204 line-length - intel_flexran_isol_cores: "1-49,53-101,105-153,157-205" + intel_flexran_cmdline: 'GRUB_CMDLINE_LINUX="default_hugepagesz=1G hugepages=60 hugepagesz=1G nmi_watchdog=0 softlockup_panic=0 intel_iommu=on iommu=pt rcu_nocbs=4-99,108-203 irqaffinity=0-3,100-103,104-107,204-207 isolcpus=managed_irq,domain,4-99,108-203 kthread_cpus=0-3,100-103,104-107,204-207 nohz_full=4-99,108-203 crashkernel=auto enforcing=0 quiet rcu_nocb_poll rhgb selinux=0 mce=off audit=0 pci=realloc pci=assign-busses rdt=l3cat skew_tick=1 nosoftlockup nohz=on" {{ intel_flexran_marker }}' # noqa yaml[line-length] + intel_flexran_isol_cores: "4-99,108-203" intel_flexran_cpu_supported: true - # Jing: XCC dual socket (2x52 cores): -# intel_flexran_cmdline: 'GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt default_hugepagesz=1G hugepagesz=1G hugepages=60 irqaffinity=0-3,100-103,104-107,204-207 mce=off nmi_watchdog=0 softlockup_panic=0 selinux=0 enforcing=0 audit=0 kthread_cpus=0-3,100-103,104-107,204-207 clock=pit no_timer_check clocksource=tsc tsc=perfect usbcore.autosuspend=-1 pci=realloc pci=assign-busses rdt=l3cat skew_tick=1 isolcpus=managed_irq,domain,4-99,108-203 intel_pstate=disable nosoftlockup tsc=nowatchdog nohz=on nohz_full=4-99,108-203 rcu_nocbs=4-99,108-203" {{ intel_flexran_marker }}' # noqa 204 line-length when: - ansible_processor_count == 2 - ansible_processor_cores == 52 -- debug: msg="{{ intel_flexran_cmdline }}" +- name: set Intel FlexRAN kernel flags for Host-56c-dual + set_fact: + intel_flexran_cmdline: 'GRUB_CMDLINE_LINUX="default_hugepagesz=1G hugepages=60 hugepagesz=1G nmi_watchdog=0 softlockup_panic=0 intel_iommu=on iommu=pt rcu_nocbs=4-107,116-219 irqaffinity=0-3,108-111,112-115,220-223 isolcpus=managed_irq,domain,4-107,116-219 kthread_cpus=0-3,108-111,112-115,220-223 nohz_full=4-107,116-219 crashkernel=auto enforcing=0 quiet rcu_nocb_poll rhgb selinux=0 mce=off audit=0 pci=realloc pci=assign-busses rdt=l3cat skew_tick=1 nosoftlockup nohz=on" {{ intel_flexran_marker }}' # noqa yaml[line-length] + intel_flexran_isol_cores: "4-99,108-203" + intel_flexran_cpu_supported: true + when: + - ansible_processor_count == 2 + - ansible_processor_cores == 56 + +- debug: msg="final kernel cmdline is {{ intel_flexran_cmdline }}" - name: set Intel FlexRAN kernel flags in /etc/default/grub lineinfile: @@ -89,14 +115,15 @@ notify: - reboot server -- name: set Intel FlexRAN cores isolation for RHEL - lineinfile: - dest: /etc/tuned/realtime-variables.conf # or /etc/tuned/realtime-virtual-host.conf - line: 'isolated_cores={{ intel_flexran_isol_cores }}' - state: present - create: yes - mode: '0664' - when: - - ansible_os_family == "RedHat" - notify: - - reboot server +# No need. The kernel flags include cores isolation +# - name: set Intel FlexRAN cores isolation for RHEL +# lineinfile: +# dest: /etc/tuned/realtime-variables.conf # or /etc/tuned/realtime-virtual-host.conf +# line: 'isolated_cores={{ intel_flexran_isol_cores }}' +# state: present +# create: yes +# mode: '0664' +# when: +# - ansible_os_family == "RedHat" +# notify: +# - reboot server diff --git a/roles/bootstrap/set_siov_kernel_flags/defaults/main.yml b/roles/bootstrap/set_siov_kernel_flags/defaults/main.yml new file mode 100644 index 00000000..cbe7be97 --- /dev/null +++ b/roles/bootstrap/set_siov_kernel_flags/defaults/main.yml @@ -0,0 +1,17 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +siov_marker: "# siov" diff --git a/roles/bootstrap/set_siov_kernel_flags/tasks/main.yml b/roles/bootstrap/set_siov_kernel_flags/tasks/main.yml new file mode 100644 index 00000000..df69a597 --- /dev/null +++ b/roles/bootstrap/set_siov_kernel_flags/tasks/main.yml @@ -0,0 +1,33 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: set default iommu extra flag + set_fact: + iommu_extra_flag: "" + +- name: set SIOV kernel flags + set_fact: + iommu_cmdline: 'GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX} intel_iommu=on,sm_on iommu=pt{{ iommu_extra_flag }}" {{ siov_marker }}' + +- name: set SIOV kernel flags in /etc/default/grub + lineinfile: + dest: /etc/default/grub + regexp: '^GRUB_CMDLINE_LINUX="\${GRUB_CMDLINE_LINUX}(.*?)" {{ siov_marker }}$' + line: '{{ iommu_cmdline }}' + state: present + mode: 0664 + notify: + - reboot server diff --git a/roles/bootstrap/set_sriov_kernel_flags/tasks/main.yml b/roles/bootstrap/set_sriov_kernel_flags/tasks/main.yml index 0ce60461..588280cd 100644 --- a/roles/bootstrap/set_sriov_kernel_flags/tasks/main.yml +++ b/roles/bootstrap/set_sriov_kernel_flags/tasks/main.yml @@ -22,7 +22,7 @@ register: grub notify: - reboot server - when: not (iommu_enabled | default(false) | bool) + when: not (iommu_enabled | default(false) | bool) and not (on_vms | default(false) | bool) - name: setup sriov grub commandline parameters include_tasks: setup_sriov_kernel_flags.yml diff --git a/roles/bootstrap/set_sriov_kernel_flags/tasks/setup_sriov_kernel_flags.yml b/roles/bootstrap/set_sriov_kernel_flags/tasks/setup_sriov_kernel_flags.yml index a7877988..f6ad3bc9 100644 --- a/roles/bootstrap/set_sriov_kernel_flags/tasks/setup_sriov_kernel_flags.yml +++ b/roles/bootstrap/set_sriov_kernel_flags/tasks/setup_sriov_kernel_flags.yml @@ -22,12 +22,12 @@ set_fact: vfio_cmdline: " vfio-pci.disable_denylist=1" when: - - qat_devices is defined and (qat_devices|length>0) - - install_dpdk | default(false) - - (ansible_distribution == "Ubuntu" and ansible_distribution_version == '20.04' and update_kernel) or - (ansible_distribution == "Ubuntu" and ansible_distribution_version >= '21.04') or - (ansible_distribution in ['RedHat', 'Rocky'] and ansible_distribution_version >= '8.4') or - (ansible_distribution == "CentOS" and ansible_distribution_version >= '8.5') + - qat_devices is defined and (qat_devices|length>0) + - install_dpdk | default(false) + - (ansible_distribution == "Ubuntu" and ansible_distribution_version == "20.04" and update_kernel) or + (ansible_distribution == "Ubuntu" and ansible_distribution_version >= "21.04") or + (ansible_distribution in ['RedHat', 'Rocky'] and ansible_distribution_version >= '8.4') or + (ansible_distribution == "CentOS" and ansible_distribution_version >= "8.5") - name: set noiommu default kernel flags set_fact: diff --git a/roles/bootstrap/update_grub/tasks/main.yml b/roles/bootstrap/update_grub/tasks/main.yml index e7520951..b33681ff 100644 --- a/roles/bootstrap/update_grub/tasks/main.yml +++ b/roles/bootstrap/update_grub/tasks/main.yml @@ -19,31 +19,69 @@ when: - ansible_os_family == "Debian" -- name: check if grub2.cfg config file exists (MBR variant) - stat: - path: /etc/grub2.cfg - register: grub_mbr_cfg +- name: block for Rocky / RHEL < 9.0 + block: + - name: check if grub2.cfg config file exists (MBR variant) + stat: + path: /etc/grub2.cfg + register: grub_mbr_cfg + + - name: update MBR grub2.cfg + command: grub2-mkconfig -o /etc/grub2.cfg + when: grub_mbr_cfg.stat.exists + + - name: check if grub2-efi.cfg config file exists (EFI variant) + stat: + path: /etc/grub2-efi.cfg + register: grub_efi_cfg + + - name: update MBR grub2-efi.cfg on Rocky / RHEL < 9.0 + command: "grub2-mkconfig -o /etc/grub2-efi.cfg" + when: grub_efi_cfg.stat.exists + changed_when: true when: - ansible_os_family == "RedHat" + - ansible_distribution_version < "9.0" -- name: update MBR grub2.cfg - command: grub2-mkconfig -o /etc/grub2.cfg +- name: disable blscfg to enable grub configuration changes (Rocky >= 9.0) + lineinfile: + dest: /etc/default/grub + regexp: '^GRUB_ENABLE_BLSCFG=(.*?)$' + line: 'GRUB_ENABLE_BLSCFG=false' when: - - ansible_os_family == "RedHat" - - grub_mbr_cfg.stat.exists + - ansible_distribution == "Rocky" + - ansible_distribution_version >= "9.0" -- name: check if grub2-efi.cfg config file exists (EFI variant) - stat: - path: /etc/grub2-efi.cfg - register: grub_efi_cfg +- name: block for Rocky/RHEL 9.0 or greater + block: + - name: check if grub.cfg config file exists (Rocky / RHEL >= 9.0) + stat: + path: "/boot/efi/EFI/{{ ansible_distribution | lower }}/grub.cfg" + register: grub_rhel_rocky_cfg + + - name: update MBR grub.cfg on (Rocky / RHEL >= 9.0) + command: "grub2-mkconfig -o /boot/efi/EFI/{{ ansible_distribution | lower }}/grub.cfg" + when: grub_rhel_rocky_cfg.stat.exists + changed_when: true when: - ansible_os_family == "RedHat" + - ansible_distribution_version >= "9.0" -- name: update MBR grub2-efi.cfg - command: grub2-mkconfig -o /etc/grub2-efi.cfg - when: - - ansible_os_family == "RedHat" - - grub_efi_cfg.stat.exists +- block: + - name: template the dhclient systemd service + template: + src: dhclient.service.j2 + dest: /lib/systemd/system/cek_dhclient.service + owner: root + group: root + mode: 0644 + + - name: run cek dhclient systemd service on boot + systemd: + daemon_reload: yes + state: restarted + name: cek_dhclient + enabled: yes - name: create empty machine_id list from the worker nodes set_fact: @@ -56,18 +94,26 @@ - "{{ groups['kube_node'] }}" - block: - - name: detect that machine-id duplicates over multiple nodes - debug: - msg: "Detected there are /etc/machine-id duplicates {{ machine_id_list }}, will generate a new machine-id for groups['kube_node'] nodes" - - - name: remove /etc/machine-id - file: - state: absent - path: /etc/machine-id - force: yes - - - name: create new /etc/machine-id - command: dbus-uuidgen --ensure=/etc/machine-id - changed_when: true + - name: detect that machine-id duplicates over multiple nodes + debug: + msg: "Detected there are /etc/machine-id duplicates {{ machine_id_list }}, will generate a new machine-id for groups['kube_node'] nodes" + + - name: remove /etc/machine-id + file: + state: absent + path: /etc/machine-id + force: yes + + - name: create new /etc/machine-id (debian) + command: dbus-uuidgen --ensure=/etc/machine-id + changed_when: true + when: + - ansible_os_family == "Debian" + + - name: create new /etc/machine-id (redhat) + command: systemd-machine-id-setup + changed_when: true + when: + - ansible_os_family == "RedHat" when: ( machine_id_list | unique | length < groups['kube_node'] | length ) diff --git a/roles/bootstrap/update_grub/templates/dhclient.service.j2 b/roles/bootstrap/update_grub/templates/dhclient.service.j2 new file mode 100644 index 00000000..df287782 --- /dev/null +++ b/roles/bootstrap/update_grub/templates/dhclient.service.j2 @@ -0,0 +1,11 @@ +[Unit] +Description=cek dhclient start on boot +AssertPathExists=/usr/sbin/dhclient + +[Service] +Type=oneshot +ExecStart=/usr/sbin/dhclient -r {{ ansible_default_ipv4.interface }} +ExecStart=/usr/sbin/dhclient {{ ansible_default_ipv4.interface }} + +[Install] +WantedBy=multi-user.target diff --git a/roles/bootstrap/update_nic_drivers/defaults/main.yml b/roles/bootstrap/update_nic_drivers/defaults/main.yml index 45e29f4a..2ecd11df 100644 --- a/roles/bootstrap/update_nic_drivers/defaults/main.yml +++ b/roles/bootstrap/update_nic_drivers/defaults/main.yml @@ -16,18 +16,18 @@ --- # i40e i40e_driver_name: i40e -i40e_driver_version: 2.19.3 +i40e_driver_version: 2.20.12 i40e_driver_url: https://sourceforge.net/projects/e1000/files/i40e%20stable/{{ i40e_driver_version }}/i40e-{{ i40e_driver_version }}.tar.gz -i40e_driver_checksum: sha1:f05a8f5206ab01246638826620dc42126e57a854 +i40e_driver_checksum: sha1:a24f0c5512af31c68cd90667d5822121780d5487 # ice ice_driver_name: ice -ice_driver_version: 1.8.8 +ice_driver_version: 1.9.11 ice_driver_url: https://sourceforge.net/projects/e1000/files/ice%20stable/{{ ice_driver_version }}/ice-{{ ice_driver_version }}.tar.gz -ice_driver_checksum: sha1:0b239447ce2316f540b83707229e98a164517cd2 +ice_driver_checksum: sha1:f05e2322a66de5d4019e7aa6141a109bb419dda4 # iavf iavf_driver_name: iavf -iavf_driver_version: 4.4.2.1 +iavf_driver_version: 4.5.3 iavf_driver_url: https://sourceforge.net/projects/e1000/files/iavf%20stable/{{ iavf_driver_version }}/iavf-{{ iavf_driver_version }}.tar.gz -iavf_driver_checksum: sha1:d1e2f9d5cc278bf824bbbc484e17e56c06d9a97f +iavf_driver_checksum: sha1:76b3a7dec392e559dea6112fa55f5614857cff2a diff --git a/roles/bootstrap/update_nic_drivers/tasks/i40e.yml b/roles/bootstrap/update_nic_drivers/tasks/i40e.yml index 60964ecf..874ee882 100644 --- a/roles/bootstrap/update_nic_drivers/tasks/i40e.yml +++ b/roles/bootstrap/update_nic_drivers/tasks/i40e.yml @@ -31,7 +31,7 @@ - i40e_installed_version.stdout != i40e_driver_version - mgmt_interface_driver.stdout != i40e_driver_name - not update_kernel - - ansible_os_family == "RedHat" or + - (ansible_os_family == "RedHat" and ansible_distribution_version < "9.0") or (ansible_distribution == "Ubuntu" and ansible_distribution_version < "22.04") - name: update i40e driver @@ -73,7 +73,8 @@ reboot_timeout: 1200 when: - (i40e_installed_version.stdout != i40e_driver_version and mgmt_interface_driver.stdout == i40e_driver_name) or - (i40e_installed_version.stdout != i40e_driver_version and (ansible_distribution == "Ubuntu" and ansible_distribution_version >= "22.04" or update_kernel)) + (i40e_installed_version.stdout != i40e_driver_version and ((ansible_os_family == "RedHat" and ansible_distribution_version >= "9.0") or + (ansible_distribution == "Ubuntu" and ansible_distribution_version >= "22.04") or update_kernel)) - name: ensure that i40e module is loaded modprobe: diff --git a/roles/bootstrap/update_nic_drivers/tasks/iavf.yml b/roles/bootstrap/update_nic_drivers/tasks/iavf.yml index 9292c9df..972971e6 100644 --- a/roles/bootstrap/update_nic_drivers/tasks/iavf.yml +++ b/roles/bootstrap/update_nic_drivers/tasks/iavf.yml @@ -39,6 +39,7 @@ dest: "{{ project_root_dir }}" checksum: "{{ iavf_driver_checksum }}" timeout: 60 + mode: 0644 register: iavf_download until: iavf_download is not failed retries: 5 diff --git a/roles/bootstrap/update_nic_drivers/tasks/ice.yml b/roles/bootstrap/update_nic_drivers/tasks/ice.yml index ef42c5a4..adb004c3 100644 --- a/roles/bootstrap/update_nic_drivers/tasks/ice.yml +++ b/roles/bootstrap/update_nic_drivers/tasks/ice.yml @@ -23,16 +23,17 @@ - debug: msg: "Currently installed ice version: {{ ice_installed_version.stdout }}" -- name: unload ice module - modprobe: - name: ice - state: absent - when: - - ice_installed_version.stdout != ice_driver_version - - mgmt_interface_driver.stdout != ice_driver_name - - not update_kernel - - ansible_os_family == "RedHat" or - (ansible_distribution == "Ubuntu" and ansible_distribution_version < "22.04") +# unloading before update is probably not necessay and does not work anyway when irdma is using ice +# - name: unload ice module +# modprobe: +# name: ice +# state: absent +# when: +# - ice_installed_version.stdout != ice_driver_version +# - mgmt_interface_driver.stdout != ice_driver_name +# - not update_kernel +# - (ansible_os_family == "RedHat" and ansible_distribution_version < "9.0") or +# (ansible_distribution == "Ubuntu" and ansible_distribution_version < "22.04") - name: update ice driver block: @@ -43,6 +44,7 @@ dest: "{{ project_root_dir }}" checksum: "{{ ice_driver_checksum }}" timeout: 60 + mode: 0644 register: ice_download until: ice_download is not failed retries: 5 @@ -65,6 +67,19 @@ loop: - clean - install + when: not adq_dp.enabled |d(false) | bool + + - name: build and install ice driver + make: + chdir: "{{ (ice_untar.dest, ice_untar.files[0], 'src') | path_join }}" + target: "{{ item }}" + params: + CFLAGS_EXTRA: '-DADQ_PERF_COUNTERS' + become: yes + loop: + - clean + - install + when: adq_dp.enabled |d(false) | bool when: ice_installed_version.stdout != ice_driver_version - name: reboot node after driver update @@ -73,7 +88,8 @@ reboot_timeout: 1200 when: - (ice_installed_version.stdout != ice_driver_version and mgmt_interface_driver.stdout == ice_driver_name) or - (ice_installed_version.stdout != ice_driver_version and (ansible_distribution == "Ubuntu" and ansible_distribution_version >= "22.04" or update_kernel)) + (ice_installed_version.stdout != ice_driver_version and ((ansible_os_family == "RedHat" and ansible_distribution_version >= "9.0") or + (ansible_distribution == "Ubuntu" and ansible_distribution_version >= "22.04") or update_kernel)) - name: ensure that ice module is loaded modprobe: diff --git a/roles/bootstrap/update_nic_drivers/vars/main.yml b/roles/bootstrap/update_nic_drivers/vars/main.yml index 3e3c25e5..77176edd 100644 --- a/roles/bootstrap/update_nic_drivers/vars/main.yml +++ b/roles/bootstrap/update_nic_drivers/vars/main.yml @@ -25,3 +25,5 @@ install_dependencies: - kernel-devel - elfutils-libelf-devel - ethtool + - tar + - unzip diff --git a/roles/bootstrap/update_nic_firmware/defaults/main.yml b/roles/bootstrap/update_nic_firmware/defaults/main.yml index efba2156..c86d8d5e 100644 --- a/roles/bootstrap/update_nic_firmware/defaults/main.yml +++ b/roles/bootstrap/update_nic_firmware/defaults/main.yml @@ -18,22 +18,32 @@ nvmupdate: # 700 Series i40e: - nvmupdate_pkg_url: https://downloadmirror.intel.com/732165/700Series_NVMUpdatePackage_v8_70_Linux.tar.gz - nvmupdate_pkg_checksum: sha1:1FD0BDB04D1FCFAD5543219DFE93FD11480C950E - min_required_fw_version: 8.70 + nvmupdate_pkg_url: "https://downloadmirror.intel.com/739639/700Series_NVMUpdatePackage_v9_00_Linux.tar.gz" + nvmupdate_pkg_checksum: "sha1:B2B183ADD3B6EF8BCB2DA77A6E6A68F482F4BFD1" + required_fw_version: "9.0" # min fw version for ddp was taken from: # https://www.intel.com/content/www/us/en/developer/articles/technical/dynamic-device-personalization-for-intel-ethernet-700-series.html - min_ddp_loadable_fw_version: 6.01 - min_updatable_fw_version: 5.02 + min_ddp_loadable_fw_version: "6.01" + min_updatable_fw_version: "5.02" # 800 Series (CVL) ice: - nvmupdate_pkg_url: https://downloadmirror.intel.com/727313/E810_NVMUpdatePackage_v3_20_Linux.tar.gz - nvmupdate_pkg_checksum: sha1:1831DDF95F164969B5BD275DA2A74B80A65EEC9E - min_required_fw_version: 3.20 + nvmupdate_pkg_url: "https://downloadmirror.intel.com/738715/E810_NVMUpdatePackage_v4_00_Linux.tar.gz" + nvmupdate_pkg_checksum: "sha1:7C168880082653B579FDF225A2E6E9301C154DD1" + required_fw_version: "4.0" # https://builders.intel.com/docs/networkbuilders/intel-ethernet-controller-800-series-device-personalization-ddp-for-telecommunications-workloads-technology-guide.pdf # document above does not specify any min fw version needed for ddp feature. So, min_ddp_loadable_fw is the same as min_updatable_fw - min_ddp_loadable_fw_version: 0.70 - min_updatable_fw_version: 0.70 + min_ddp_loadable_fw_version: "0.70" + min_updatable_fw_version: "0.70" + # In case of downgrade only, you must download the supported nvmupdate64e tool and replace it with an older FW version by using the url below. + supported_nvmupdate_tool_pkg_url: "https://downloadmirror.intel.com/738715/E810_NVMUpdatePackage_v4_00_Linux.tar.gz" + supported_nvmupdate_tool_pkg_checksum: "sha1:7C168880082653B579FDF225A2E6E9301C154DD1" + supported_nvmupdate_tool_fw_version: "4.0" + nvmupdate_result: stdout: "" + +adq_ice_fw_url: "https://downloadmirror.intel.com/738715/E810_NVMUpdatePackage_v4_00_Linux.tar.gz" +adq_ice_fw_checksum: "sha1:7C168880082653B579FDF225A2E6E9301C154DD1" +adq_ice_fw_dest: "{{ (project_root_dir, 'nvmupdate.tar.gz') | path_join }}" +adq_ice_fw_required_version: "4.00" diff --git a/roles/bootstrap/update_nic_firmware/tasks/adq_update.yml b/roles/bootstrap/update_nic_firmware/tasks/adq_update.yml new file mode 100644 index 00000000..5576b61f --- /dev/null +++ b/roles/bootstrap/update_nic_firmware/tasks/adq_update.yml @@ -0,0 +1,136 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: get current FW version + shell: "set -o pipefail && ethtool -i {{ adq_dp.interface_name }} | grep -i firmware-version | awk '{ print $2 }'" + changed_when: false + args: + executable: /bin/bash + register: adq_interface_fw_version + +- name: check if current firmware version meets required version requirements + set_fact: + continue_nvmupdate: "{{ adq_interface_fw_version.stdout is version_compare(adq_ice_fw_required_version, '!=') }}" + +- debug: + msg: "The current firmware release on the card is {{ adq_interface_fw_version.stdout }}. Required is {{ adq_ice_fw_required_version }}" + +- name: assert that current FW is not too old + assert: + that: adq_interface_fw_version.stdout is version_compare(nvmupdate.ice.min_updatable_fw_version, '>=') + msg: >- + "Current firmware version {{ adq_interface_fw_version.stdout }} is too old to upgrade; + it must be {{ nvmupdate.ice.min_updatable_fw_version }} or newer" + when: continue_nvmupdate | bool + +- block: + - name: download nvmupdate package + get_url: + url: "{{ adq_ice_fw_url }}" + checksum: "{{ adq_ice_fw_checksum }}" + dest: "{{ adq_ice_fw_dest }}" + mode: 0644 + register: nvmupdate_download + until: nvmupdate_download is not failed + retries: 5 + + - name: unarchive nvmupdate package + unarchive: + src: "{{ adq_ice_fw_dest }}" + dest: "{{ (project_root_dir) | path_join }}" + remote_src: yes + extra_opts: [--strip-components=1] # to unpack Linux_x64 directly, skipping top-level model specific dir (e.g. 700Series) + mode: 0755 + when: continue_nvmupdate | bool + +- block: + # get device MAC address - this allows to run nvmupdate for the requested device only + - debug: var=ansible_facts[adq_dp.interface_name]['macaddress'] + when: continue_nvmupdate | bool + + - name: check ip routing table for interface {{ adq_dp.interface_name }} + command: ip route list + register: ip_route + changed_when: true + when: continue_nvmupdate | bool + +- debug: + msg: "interface {{ adq_dp.interface_name }} is not active in ip route list" + when: + - continue_nvmupdate | bool + - adq_dp.interface_name not in ip_route.stdout + +- block: + - debug: + msg: "interface {{ adq_dp.interface_name }} is active in ip route list, will remove interface {{ adq_dp.interface_name }} from list" + + - name: interface {{ adq_dp.interface_name }} was found active, grep interface before removing it + shell: "set -o pipefail && ip route list | grep {{ adq_dp.interface_name }}" + args: + executable: /bin/bash + register: grep_interface + changed_when: true + + - debug: + var: grep_interface.stdout_lines + + - name: remove interface {{ adq_dp.interface_name }} from routing table and proceed to firmware upgrade / downgrade + command: "ip route del {{ grep_interface.stdout }}" + changed_when: true + when: + - continue_nvmupdate | bool + - adq_dp.interface_name in ip_route.stdout + +- block: + - debug: + msg: >- + updating firmware... This operation may take several minutes. + To avoid damage to your device, do not stop the update or + reboot or power off the system during this process! + + - name: update firmware + # noqa 305 - shell is used intentionally here + shell: + cmd: "./nvmupdate64e -u -l -o update.xml -b -c nvmupdate.cfg -m '{{ ansible_facts[adq_dp.interface_name]['macaddress'] | replace(':','') }}'" + args: + chdir: "{{ (project_root_dir, 'Linux_x64') | path_join }}" + executable: /bin/bash + register: nvmupdate_result + failed_when: false + changed_when: true + when: continue_nvmupdate | bool + +- name: show additional message on unsupported platforms + debug: + msg: > + Failed to update firmware on interface {{ adq_dp.interface_name }} . + This is probably caused by unsupported platform - contact your vendor for more information. + when: + - continue_nvmupdate | bool + - (nvmupdate_result.stdout is search('No devices to update.')) + +- name: fail if fw update failed + fail: + msg: "Failed to update firmware on interface {{ adq_dp.interface_name }}. Error: {{ nvmupdate_result.stdout }}" + when: + - continue_nvmupdate | bool + - (nvmupdate_result.stdout is not search('update successful')) + +- name: reboot after update +# noqa 503 - more than one condition here, so can't be a handler + reboot: + reboot_timeout: 1200 # wait up to 20 minutes - if reboot takes longer after NVM update + when: continue_nvmupdate | bool diff --git a/roles/bootstrap/update_nic_firmware/tasks/main.yml b/roles/bootstrap/update_nic_firmware/tasks/main.yml index 2e1d678f..9f43a0d0 100644 --- a/roles/bootstrap/update_nic_firmware/tasks/main.yml +++ b/roles/bootstrap/update_nic_firmware/tasks/main.yml @@ -22,8 +22,10 @@ - name: validate host vars interface names with system interface names before FW update assert: that: "item.name in check_nics_in_system.stdout" - fail_msg: "host vars defined dataplane interface name != interface name found in system. Kindly, select correct interface name" - success_msg: "host vars defined dataplane interface name = interface name found in system, verification completed" + fail_msg: + - "Interface name (bus_id) defined in host vars (dataplane interfaces) does not match interface name (bus_id) found in system." + - "Please select correct interface name (bus_id) in (dataplane interfaces)" + success_msg: "In host vars (dataplane interfaces) defined interface name (bus_id) = interface name (bus_id) found in system, verification completed" with_items: "{{ dataplane_interfaces }}" - name: update NIC firmware diff --git a/roles/bootstrap/update_nic_firmware/tasks/update.yml b/roles/bootstrap/update_nic_firmware/tasks/update.yml index 5037ed93..a9ae13e9 100644 --- a/roles/bootstrap/update_nic_firmware/tasks/update.yml +++ b/roles/bootstrap/update_nic_firmware/tasks/update.yml @@ -27,6 +27,7 @@ block: - name: get interface info command: ethtool -i {{ interface_name }} + changed_when: false register: ethtool - name: set ethtool_drvinfo set_fact: @@ -37,37 +38,70 @@ - name: set fw_short_version set_fact: fw_short_version: "{{ (fw_full_version.split(' ') | first) }}" - # Only major.minor "mm" part are relevant because e.g. on E800 series FW shows as 2.00 on C0 (rev 02) and 2.02 on B0 (rev 01) + # Only major.minor "mm" part are relevant because e.g. on E800 series FW shows as 2.00 on C0 (rev 02) and 2.02 on B0 (rev 01) - name: set fw_current_mm set_fact: fw_current_mm: "{{ fw_short_version[0:3] }}" - name: set fw_min_req set_fact: - fw_min_req: "{{ nvmupdate[nic_module].min_required_fw_version }}" + fw_min_req: "{{ nvmupdate[nic_module].required_fw_version }}" - name: set fw_min_mm set_fact: fw_min_mm: "{{ fw_min_req[0:3] }}" - debug: msg: "On {{ interface_name }} (driver {{ nic_module }}) the firmware version is {{ fw_full_version }}" -- name: check if current firmware version meets minimal requirements +- name: check if current firmware version meets required version requirements set_fact: - continue_nvmupdate: "{{ fw_current_mm is version_compare(fw_min_mm, '<') }}" + continue_nvmupdate: "{{ fw_current_mm is version_compare(fw_min_mm, '!=') }}" + continue_nvmupdate_upgrade: "{{ fw_current_mm is version_compare(fw_min_mm, '<') }}" + continue_nvmupdate_downgrade: "{{ fw_current_mm is version_compare(fw_min_mm, '>') }}" - debug: - msg: "The current firmware release on the card is {{ fw_current_mm }}. Min required is {{ fw_min_mm }}" + msg: "The current firmware release on the card is {{ fw_current_mm }}. Required is {{ fw_min_mm }}" - name: assert that current FW is not too old assert: that: fw_short_version is version_compare(nvmupdate[nic_module].min_updatable_fw_version, '>=') - fail_msg: "Current firmware version {{ fw_short_version }} is too old to update; it must be {{ nvmupdate[nic_module].min_updatable_fw_version }} or newer" + fail_msg: "Current firmware version {{ fw_short_version }} is too old to upgrade; it must be {{ nvmupdate[nic_module].min_updatable_fw_version }} or newer" when: continue_nvmupdate | bool +# download supported nvmupdate64e tool from required fw version +- name: download supported nvmupdate64e tool + block: + - name: download nvmupdate package for supported nvmupdate64e tool when downgrading fw + get_url: + url: "{{ nvmupdate[nic_module].supported_nvmupdate_tool_pkg_url }}" + checksum: "{{ nvmupdate[nic_module].supported_nvmupdate_tool_pkg_checksum }}" + dest: "{{ ansible_env.HOME }}/nvmupdate.tar.gz" + mode: 0644 + register: ice_nvmupdate_tool + until: ice_nvmupdate_tool is not failed + retries: 5 + + - name: create temporary directory when downgrading fw + tempfile: + state: directory + prefix: nvmupdate_tool + register: tempdir_nvmupdate_tool + + - name: unarchive nvmupdate package for supported nvmupdate64e tool when downgrading fw + unarchive: + src: "{{ ansible_env.HOME }}/nvmupdate.tar.gz" + dest: "{{ tempdir_nvmupdate_tool.path }}" + remote_src: yes + extra_opts: [--strip-components=1] # to unpack Linux_x64 directly, skipping top-level model specific dir (e.g. 700Series) + mode: 0755 + when: + - continue_nvmupdate | bool + - continue_nvmupdate_downgrade | bool + - name: download nvmupdate package get_url: url: "{{ nvmupdate[nic_module].nvmupdate_pkg_url }}" checksum: "{{ nvmupdate[nic_module].nvmupdate_pkg_checksum }}" dest: "{{ ansible_env.HOME }}/nvmupdate.tar.gz" + mode: 0644 register: nvmupdate_download until: nvmupdate_download is not failed retries: 5 @@ -89,28 +123,29 @@ mode: 0755 when: continue_nvmupdate | bool -- name: get downloaded firmware version - set_fact: - downloaded_version: "{{ nvmupdate[nic_module].min_required_fw_version }}" - when: continue_nvmupdate | bool - -- debug: - msg: "downloaded firmware version is {{ nvmupdate[nic_module].min_required_fw_version }}" - when: continue_nvmupdate | bool - -- debug: var=downloaded_version - when: continue_nvmupdate | bool +# downgrade fw block to remove and add nvmupdate64e tool +- name: delete nvmupdate64e tool form exiting package when downgrading fw + block: + - name: remove files + file: + path: "{{ tempdir_nvmupdate.path }}/Linux_x64/nvmupdate64e" + state: absent + changed_when: true + + - name: copy nvmupdate64e tool from supported package when downgrading fw + copy: + src: "{{ tempdir_nvmupdate_tool.path }}/Linux_x64/nvmupdate64e" + dest: "{{ tempdir_nvmupdate.path }}/Linux_x64/nvmupdate64e" + mode: 0755 + remote_src: yes + when: + - continue_nvmupdate | bool + - continue_nvmupdate_downgrade | bool # get device MAC address - this allows to run nvmupdate for the requested device only - debug: var=ansible_facts[interface_name]['macaddress'] when: continue_nvmupdate | bool -- name: assert that downloaded FW package isn't older than required - assert: - that: downloaded_version is version_compare(nvmupdate[nic_module].min_required_fw_version, '>=') - fail_msg: "downloaded FW version is lower than required, please update 'nvmupdate_pkg_url' and 'nvmupdate_pkg_checksum' vars!" - when: continue_nvmupdate | bool - - name: check ip routing table for interface {{ interface_name }} command: ip route list register: ip_route @@ -145,7 +180,7 @@ - continue_nvmupdate | bool - 'interface_name in ip_route.stdout' -- name: remove interface {{ interface_name }} from routing table and proceed to firmware update +- name: remove interface {{ interface_name }} from routing table and proceed to firmware upgrade / downgrade command: "ip route del {{ grep_interface.stdout }}" when: - continue_nvmupdate | bool @@ -154,50 +189,57 @@ - debug: msg: >- - updating firmware... This operation may take several minutes. + upgrading / downgrading firmware... This operation may take several minutes. To avoid damage to your device, do not stop the update or - reboot or power off the system during this update! + reboot or power off the system during this process! when: continue_nvmupdate | bool -- name: update firmware +- name: upgrade / downgrade firmware # noqa 305 - shell is used intentionally here shell: - chdir: "{{ tempdir_nvmupdate.path }}/Linux_x64" cmd: "./nvmupdate64e -u -l -o update.xml -b -c nvmupdate.cfg -m '{{ ansible_facts[interface_name]['macaddress'] | replace(':','') }}'" + args: + chdir: "{{ tempdir_nvmupdate.path }}/Linux_x64" + executable: /bin/bash register: nvmupdate_result when: continue_nvmupdate | bool - changed_when: nvmupdate_result.stdout is search('Reboot is required to complete the update process.') + failed_when: false - name: show additional message on unsupported platforms debug: msg: > - Failed to update firmware on interface {{ interface_name }} (driver {{ nic_module }}). + Failed to upgrade / downgrade on interface {{ interface_name }} (driver {{ nic_module }}). This is probably caused by unsupported platform - contact your vendor for more information. when: - continue_nvmupdate | bool - - nvmupdate_result.stdout is search('No devices to update.') + - (nvmupdate_result.stdout is search('No devices to update.')) -- name: fail if fw update failed +- name: fail if fw upgrade / downgrade failed fail: - msg: "Failed to update firmware on interface {{ interface_name }} (driver {{ nic_module }}). Error: {{ nvmupdate_result.stdout }}" + msg: "Failed to upgrade / downgrade firmware on interface {{ interface_name }} (driver {{ nic_module }}). Error: {{ nvmupdate_result.stdout }}" when: - continue_nvmupdate | bool - - nvmupdate_result.stdout is not search('update successful.') + - continue_nvmupdate_upgrade | bool + - (nvmupdate_result.stdout is not search('update successful')) -- name: ensure that temporary files are deleted +- name: ensure that temporary files are deleted when downgrading fw file: state: absent - path: "{{ tempdir_nvmupdate.path }}" - when: continue_nvmupdate | bool + path: "{{ tempdir_nvmupdate_tool.path }}" + when: + - continue_nvmupdate_downgrade | bool + - continue_nvmupdate | bool -- name: output update log - debug: var=nvmupdate_result.stdout - when: continue_nvmupdate | bool +- name: ensure that temporary files are deleted when upgrading fw + file: + state: absent + path: "{{ tempdir_nvmupdate.path }}" + when: + - continue_nvmupdate_upgrade | bool + - continue_nvmupdate | bool -- name: reboot after update +- name: reboot after upgrade / downgrade # noqa 503 - more than one condition here, so can't be a handler reboot: - reboot_timeout: 1200 # wait up to 20 minutes - if reboot takes longer after NVM update - when: - - nvmupdate_result.changed - - continue_nvmupdate | bool + reboot_timeout: 1200 # wait up to 20 minutes - if reboot takes longer after NVM upgrade / downgrade + when: continue_nvmupdate | bool diff --git a/roles/cadvisor_install/defaults/main.yaml b/roles/cadvisor_install/defaults/main.yaml new file mode 100644 index 00000000..59c0d280 --- /dev/null +++ b/roles/cadvisor_install/defaults/main.yaml @@ -0,0 +1,25 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +cadvisor_application_name: "cadvisor" # cAdvisor Main application name +cadvisor_release_name: "cadvisor" # cAdvisor Helm Charts release name +perf_events_config_filename: "sample-perf-event.json" + +cadvisor_helm_repo_url: "https://ckotzbauer.github.io/helm-charts" # cAdvisor Helm Repo URL +cadvisor_helm_chart_repo_name: "ckotzbauer" # cAdvisor Repo Name +cadvisor_helm_charts_ref: "ckotzbauer/cadvisor" # cAdvisor Helm Chart Reference +cadvisor_helm_charts_version: "2.2.1" # cAdvisor Version +cadvisor_helm_release_namespace: "kube-system" # cAdvisor Namespace diff --git a/roles/cadvisor_install/files/sample-perf-event.json b/roles/cadvisor_install/files/sample-perf-event.json new file mode 100644 index 00000000..aad2f1a8 --- /dev/null +++ b/roles/cadvisor_install/files/sample-perf-event.json @@ -0,0 +1,17 @@ +{ + "core": { + "events": [ + "LLC-load-misses" + ], + "custom_events": [ + { + "type": 3, + "config": [ + "0x10002" + ], + "name": "LLC-load-misses" + } + ] + } + } + \ No newline at end of file diff --git a/roles/cadvisor_install/tasks/cadvisor_install.yml b/roles/cadvisor_install/tasks/cadvisor_install.yml new file mode 100644 index 00000000..6640a871 --- /dev/null +++ b/roles/cadvisor_install/tasks/cadvisor_install.yml @@ -0,0 +1,70 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: check cAdvisor Helm charts directory + stat: + path: "{{ (project_root_dir, 'charts', 'cadvisor') | path_join }}" + register: cadvisor_dir + +- name: create cAdvisor Helm charts directory if needed + file: + path: "{{ (project_root_dir, 'charts', 'cadvisor') | path_join }}" + state: directory + mode: 0755 + when: + - cadvisor_dir.stat.exists is defined and not cadvisor_dir.stat.exists + +- name: check cAdvisor Helm charts temp directory. + stat: + path: "{{ (project_root_dir, 'charts', 'cadvisor', 'temp') | path_join }}" + register: cadvisor_temp_dir + +- name: create the temp folder for cAdvisor custom values + file: + path: "{{ (project_root_dir, 'charts', 'cadvisor', 'temp') | path_join }}" + state: directory + mode: 0755 + when: + - cadvisor_temp_dir.stat.exists is defined and not cadvisor_temp_dir.stat.exists + +- name: copy {{ perf_events_config_filename }} + copy: + src: "{{ (role_path, 'files', perf_events_config_filename) | path_join }}" + dest: "{{ (project_root_dir, 'charts', 'cadvisor', 'temp') | path_join }}" + mode: preserve + +- name: populate cAdvisor Helm charts values template and push to controller node + template: + src: "cadvisor_custom_values.yml.j2" + dest: "{{ (project_root_dir, 'charts', 'cadvisor', 'temp', 'cadvisor-custom-values.yml') | path_join }}" + force: yes + mode: preserve + +- name: Add "{{ cadvisor_application_name }}" Helm Chart Repository + command: >- + helm repo add "{{ cadvisor_helm_chart_repo_name }}" "{{ cadvisor_helm_repo_url }}" + changed_when: true + +- name: Deploy {{ cadvisor_helm_charts_version }} of {{ cadvisor_application_name }} + command: >- + helm install + {{ cadvisor_release_name }} + {{ cadvisor_helm_charts_ref }} + --version={{ cadvisor_helm_charts_version }} + --namespace {{ cadvisor_helm_release_namespace }} + --create-namespace + -f {{ (project_root_dir, 'charts', 'cadvisor', 'temp', 'cadvisor-custom-values.yml') | path_join }} + changed_when: true diff --git a/roles/cadvisor_install/tasks/cleanup_cadvisor.yml b/roles/cadvisor_install/tasks/cleanup_cadvisor.yml new file mode 100644 index 00000000..6b1b4d51 --- /dev/null +++ b/roles/cadvisor_install/tasks/cleanup_cadvisor.yml @@ -0,0 +1,33 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- block: + - name: delete cAdvisor Helm Charts + command: >- + helm delete {{ cadvisor_release_name }} --namespace {{ cadvisor_helm_release_namespace }} + when: + - inventory_hostname == groups['kube_control_plane'][0] + changed_when: false + failed_when: false + - name: delete cAdvisor Helm Repo + command: >- + helm repo remove {{ cadvisor_helm_chart_repo_name }} + when: + - inventory_hostname == groups['kube_control_plane'][0] + changed_when: false + failed_when: false + tags: + - cadvisor diff --git a/roles/cadvisor_install/tasks/main.yml b/roles/cadvisor_install/tasks/main.yml new file mode 100644 index 00000000..e44c91a2 --- /dev/null +++ b/roles/cadvisor_install/tasks/main.yml @@ -0,0 +1,21 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: install cAdvisor Helm charts + import_tasks: cadvisor_install.yml + when: + - cadvisor_enabled is defined and cadvisor_enabled + - inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/cadvisor_install/tasks/preflight_cadvisor.yml b/roles/cadvisor_install/tasks/preflight_cadvisor.yml new file mode 100644 index 00000000..5a8ffff2 --- /dev/null +++ b/roles/cadvisor_install/tasks/preflight_cadvisor.yml @@ -0,0 +1,22 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +# - block: + # - name: preflight cAdvisor installation + # include_role: + # name: cadvisor_install + # tasks_from: preflight_cadvisor + # any_errors_fatal: true diff --git a/roles/cadvisor_install/templates/cadvisor_custom_values.yml.j2 b/roles/cadvisor_install/templates/cadvisor_custom_values.yml.j2 new file mode 100644 index 00000000..8f692f8f --- /dev/null +++ b/roles/cadvisor_install/templates/cadvisor_custom_values.yml.j2 @@ -0,0 +1,97 @@ +image: + repository: gcr.io/cadvisor/cadvisor + tag: v0.44.0 + pullPolicy: IfNotPresent + + ## Reference to one or more secrets to be used when pulling images + ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/ + ## + pullSecrets: [] + +container: + port: 8080 + additionalArgs: + - --housekeeping_interval=10s # kubernetes default args + - --max_housekeeping_interval=15s + - --event_storage_event_limit=default=0 + - --event_storage_age_limit=default=0 + - --disable_metrics=percpu,process,sched,tcp,udp # enable only diskIO, cpu, memory, network, disk + {% if cadvisor_custom_events_config_on | default(false) -%} + - --perf_events_config={{ (project_root_dir, 'charts', 'cadvisor', 'temp', perf_events_config_filename) | path_join }} + {% endif -%} + - --docker_only + hostPaths: + - name: varrun + path: "/var/run" + - name: sys + path: "/sys" + - name: docker + path: "/var/lib/docker" + - name: disk + path: "/dev/disk" + +resources: {} + # We usually recommend not to specify default resources and to leave this as a conscious + # choice for the user. This also increases chances charts run on environments with little + # resources, such as Minikube. If you do want to specify resources, uncomment the following + # lines, adjust them as necessary, and remove the curly braces after 'resources:'. + # limits: + # cpu: 100m + # memory: 128Mi + # requests: + # cpu: 100m + # memory: 128Mi + +podAnnotations: {} + +# priorityClassName: system-cluster-critical +priorityClassName: {} + +# sometimes errors are encountered when using the cpu load reader without being on the host network +hostNetwork: false + +serviceAccount: + # Specifies whether a service account should be created + create: true + # The name of the service account to use. + # If not set and create is true, a name is generated using the fullname template + name: + +podSecurityPolicy: + create: false + privileged: false + +# Specifies whether a securityContext should be created. Required for privileged operations. +podSecurityContext: + create: false + privileged: false + +nodeSelector: {} + +tolerations: [] + +affinity: {} + +# This will create a ServiceMonitor Custom Resource indicating the prometheus operator what to scrape. +metrics: + enabled: false + # This will allow you to specify relabelings on the metrics before ingestion. E.g. to use the kubernetes monitoring + # mixin with this chart set metrics.enabled above to true and use: + # relabelings: + # - sourceLabels: + # - name + # targetLabel: container + # - sourceLabels: + # - container_label_io_kubernetes_pod_namespace + # targetLabel: namespace + # - sourceLabels: + # - container_label_io_kubernetes_pod_name + # targetLabel: pod + metricRelabelings: [] + # This will allow you to specify relabelings on the metrics before scraping. + # relabelings: + # - action: replace + # sourceLabels: + # - __meta_kubernetes_pod_node_name + # targetLabel: node + relabelings: [] diff --git a/roles/cert_manager_install/tasks/main.yml b/roles/cert_manager_install/tasks/main.yml deleted file mode 100644 index c6561116..00000000 --- a/roles/cert_manager_install/tasks/main.yml +++ /dev/null @@ -1,45 +0,0 @@ -## -## Copyright (c) 2020-2022 Intel Corporation. -## -## Licensed under the Apache License, Version 2.0 (the "License"); -## you may not use this file except in compliance with the License. -## You may obtain a copy of the License at -## -## http://www.apache.org/licenses/LICENSE-2.0 -## -## Unless required by applicable law or agreed to in writing, software -## distributed under the License is distributed on an "AS IS" BASIS, -## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -## See the License for the specific language governing permissions and -## limitations under the License. -## ---- -- name: prepare and deploy cert-manager - block: - - name: check if cert-manager namespace exists - command: kubectl get namespace {{ cert_manager_namespace }} - register: ns_exists - failed_when: no - - - name: create a namespace for cert-manager - command: kubectl create namespace {{ cert_manager_namespace }} - when: ns_exists.stderr is defined and "NotFound" in ns_exists.stderr - - - name: add the Jetstack helm repository - command: helm repo add jetstack {{ cert_manager_repo }} - - - name: update local Helm chart repository cache - command: helm repo update - - - name: install the cert-manager Helm chart - command: >- - helm upgrade -i cert-manager jetstack/cert-manager - --namespace {{ cert_manager_namespace }} - --version {{ cert_manager_version }} - --set installCRDs=true - when: - - inventory_hostname == groups['kube_control_plane'][0] - -- name: wait for cert-manager to become fully available - pause: - seconds: 60 diff --git a/roles/check_machine_type/tasks/main.yml b/roles/check_machine_type/tasks/main.yml index 6b51fb08..e3cc8f01 100644 --- a/roles/check_machine_type/tasks/main.yml +++ b/roles/check_machine_type/tasks/main.yml @@ -14,48 +14,53 @@ ## limitations under the License. ## --- -- name: read CPU type - shell: - cmd: lscpu | grep "Model name:" - register: cpu_model - changed_when: false - -- name: set is_clx to false +- name: set CPU ID set_fact: - is_clx: False + cpu_id: "{{ ansible_processor[2] | regex_search('\\$?\\d\\d\\d\\d\\%?\\@?\\w?|\\d\\d/\\d\\w') }}" # noqa var-spacing -- name: check if clx mode - set_fact: - is_clx: True - when: item in cpu_model.stdout - loop: "{{ supported_clx_skus }}" +- name: print CPU ID + debug: + msg: "CPU ID: {{ cpu_id }}" + +- name: check if CPU has confirmed support + assert: + that: "cpu_id in {{ lookup('ansible.builtin.vars', 'confirmed_' + configured_arch + '_cpus') }} \ + {% if configured_arch == 'clx' %} or cpu_id in {{ confirmed_clx_ncpus }} {% endif %} \ + or cpu_id in {{ unconfirmed_cpu_models }}" + fail_msg: + "CPU model '{{ cpu_id }}' present on target is not in the confirmed CPUs list.\n + To proceed, please add '{{ cpu_id }}' to the list of unconfirmed CPUs in variable 'unconfirmed_cpu_models' in group_vars.\n + Please be aware that by using CPU model that is not confirmed, some features may not work properly." -- name: set is_clx_nsku to false +- name: set skl, icx, clx, spr to false set_fact: - is_clx_nsku: False + is_skl: false + is_clx: false + is_clx_ncpu: false + is_icx: false + is_spr: false -- name: check if clx_nsku mode +- name: set is_skl architecture variable set_fact: - is_clx_nsku: True - when: item in cpu_model.stdout - loop: "{{ supported_clx_nskus }}" + is_skl: true + when: cpu_id in confirmed_skl_cpus -- name: set is_icx to false +- name: set is_clx architecture variable set_fact: - is_icx: False + is_clx: true + when: cpu_id in confirmed_clx_cpus -- name: check if icx mode +- name: set is_icx architecture variable set_fact: - is_icx: True - when: item in cpu_model.stdout - loop: "{{ supported_icx_skus }}" + is_icx: true + when: cpu_id in confirmed_icx_cpus -- name: set is_spr to false +- name: set is_spr architecture variable set_fact: - is_spr: False + is_spr: true + when: cpu_id in confirmed_spr_cpus -- name: check if spr mode +- name: check if clx_ncpu mode set_fact: - is_spr: True - when: item in cpu_model.stdout - loop: "{{ supported_spr_skus }}" + is_clx_ncpu: true + when: cpu_id in confirmed_clx_ncpus diff --git a/roles/check_machine_type/vars/main.yml b/roles/check_machine_type/vars/main.yml index 662c622c..b1809f98 100644 --- a/roles/check_machine_type/vars/main.yml +++ b/roles/check_machine_type/vars/main.yml @@ -14,18 +14,19 @@ ## limitations under the License. ## --- -# NOTE(pklimowx): these lists should be updated with complete set of machine SKUs. Please add alphabetically -supported_clx_nskus: +confirmed_skl_cpus: + # Sky Lake Xeon Gold (quad) + - "6152" + # Sky Lake Xeon Platinum (octa) + - "8176" +confirmed_clx_ncpus: - "5218N" - "6252N" - "6230N" - -supported_clx_skus: +confirmed_clx_cpus: - "6252" -supported_icx_skus: - - "$0000%@" - - "0000%@" +confirmed_icx_cpus: - "06/6c" - "5318N" - "6338" @@ -33,7 +34,12 @@ supported_icx_skus: - "6346" - "6348" - "8358" + - "8360Y" - "8380" + - "$0000%@" -supported_spr_skus: +confirmed_spr_cpus: + - "8470N" + - "8471N" + - "8490H" - "0000%@" diff --git a/roles/cilium/defaults/main.yml b/roles/cilium/defaults/main.yml new file mode 100644 index 00000000..1e7d44ff --- /dev/null +++ b/roles/cilium/defaults/main.yml @@ -0,0 +1,20 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +cilium_helm_repo: "https://helm.cilium.io/" +cilium_helm_repo_name: "cilium" +cilium_namespace: "kube-system" +cilium_version: "v1.12.0" diff --git a/roles/cilium/files/cilium-cm.yml b/roles/cilium/files/cilium-cm.yml new file mode 100644 index 00000000..d9e64fde --- /dev/null +++ b/roles/cilium/files/cilium-cm.yml @@ -0,0 +1,21 @@ +apiVersion: v1 +kind: ConfigMap +metadata: + name: cni-configuration + namespace: kube-system +data: + cni-config: |- + { + "name": "chained", + "cniVersion": "0.3.1", + "plugins": [ + { + "type": "cilium-cni" + }, + { + "type": "adq-cni", + "tunneling": "disabled", + "tunneling-interface": "" + } + ] + } diff --git a/roles/cilium/tasks/main.yml b/roles/cilium/tasks/main.yml new file mode 100644 index 00000000..f01cae98 --- /dev/null +++ b/roles/cilium/tasks/main.yml @@ -0,0 +1,139 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- block: + - name: remove cilium + command: helm delete cilium -n kube-system + changed_when: true + when: adq_cilium_deploy | d(false) + + - name: create Cilium directory + file: + path: "{{ (project_root_dir, 'cilium') | path_join }}" + state: directory + mode: '0755' + + - name: copy Cilium config map + copy: + src: cilium-cm.yml + dest: "{{ (project_root_dir, 'cilium', 'cilium-cm.yml') | path_join }}" + owner: root + group: root + mode: '0644' + + # - name: populate Cilium chart values + # template: + # src: "values.yml.j2" + # dest: "{{ (project_root_dir, 'cilium', 'cilium-values.yml') | path_join }}" + # force: yes + # mode: preserve + # changed_when: true + + - name: create Cilium config map + command: "kubectl apply -f {{ (project_root_dir, 'cilium', 'cilium-cm.yml') | path_join }}" + changed_when: true + when: adq_cilium_deploy | d(false) + + - name: add Cilium stable chart repo + kubernetes.core.helm_repository: + name: "{{ cilium_helm_repo_name }}" + repo_url: "{{ cilium_helm_repo }}" + + - name: deploy generic Cilium + command: >- + helm install cilium cilium/cilium + --version "{{ cilium_version }}" + --namespace kube-system + --set kubeProxyReplacement=strict + --set k8sServiceHost="{{ adq_dp.interface_address }}" + --set k8sServicePort=6443 + --set devices="{{ adq_dp.interface_name }}" + when: not adq_cilium_deploy | d(false) + + - name: deploy Cilium with ADQ flags + command: >- + helm install cilium cilium/cilium + --version "{{ cilium_version }}" + --namespace kube-system + --set kubeProxyReplacement=strict + --set k8sServiceHost="{{ adq_dp.interface_address }}" + --set k8sServicePort=6443 + --set devices="{{ adq_dp.interface_name }}" + --set l7Proxy=false + --set sockops.enabled=true + --set tunnel=disabled + --set ipv4NativeRoutingCIDR="{{ kube_pods_subnet }}" + --set enableipv4masquerade=true + --set autoDirectNodeRoutes=true + --set endpointRoutes.enabled=true + --set bpf.masquerade=true + --set ipv4.enabled=true + --set disable-envoy-version-check=true + --set ipam.mode=kubernetes + --set cni.customConf=true + --set cni.configMap=cni-configuration + --set prometheus.enabled=false + --set operator.prometheus.enabled=false + --set hubble.enabled=true + --set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,http}" + --set extraArgs='{--bpf-filter-priority=99}' + changed_when: true + when: adq_cilium_deploy | d(false) + + # - name: deploy Cilium + # kubernetes.core.helm: + # name: cilium + # state: present + # chart_ref: cilium/cilium + # chart_version: v1.12.0 + # namespace: "{{ cilium_namespace }}" + # values_files: "{{ (project_root_dir, 'cilium', 'cilium-values.yml') | path_join }}" + # wait: yes + + - name: wait for Cilium to be ready + pause: + minutes: 2 + + - name: restart unmanaged pods + shell: >- + set -o pipefail && kubectl get pods --all-namespaces + -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,HOSTNETWORK:.spec.hostNetwork --no-headers=true + | grep '' | awk '{print "-n "$1" "$2}' | xargs -L 1 -r kubectl delete pod + args: + executable: /bin/bash + when: not adq_cilium_deploy | d(false) + + - name: check if all pods are running + shell: set -o pipefail && kubectl get pods -A | awk 'NR != 1 { print $4 }' + args: + executable: /bin/bash + register: cilium_pods_status + retries: 30 + delay: 15 + until: + - "'Error' not in cilium_pods_status.stdout" + - "'CrashLoopBackOff' not in cilium_pods_status.stdout" + - "'Terminating' not in cilium_pods_status.stdout" + - "'ContainerCreating' not in cilium_pods_status.stdout" + - "'Pending' not in cilium_pods_status.stdout" + changed_when: false + when: not adq_cilium_deploy | d(false) + + when: inventory_hostname == groups['kube_control_plane'][0] + environment: + http_proxy: "{{ http_proxy | d('') }}" + https_proxy: "{{ https_proxy | d ('') }}" + no_proxy: "{{ no_proxy | d('') }}" diff --git a/roles/cilium/templates/values.yml.j2 b/roles/cilium/templates/values.yml.j2 new file mode 100644 index 00000000..cfe7efd8 --- /dev/null +++ b/roles/cilium/templates/values.yml.j2 @@ -0,0 +1,23 @@ +--- +kubeProxyReplacement: strict +k8sServiceHost: {{ adq_dp.interface_address }} +k8sServicePort: 6443 +devices: {{ adq_dp.interface_name }} +l7Proxy: false +sockops.enabled: true +tunnel: disabled +ipv4NativeRoutingCIDR: {{ kube_pods_subnet }} +enableipv4masquerade: true +autoDirectNodeRoutes: true +endpointRoutes.enabled: true +bpf.masquerade: true +ipv4.enabled: true +disable-envoy-version-check: true +ipam.mode: kubernetes +cni.customConf: true +cni.configMap: cni-configuration +prometheus.enabled: false +operator.prometheus.enabled: false +hubble.enabled: true +hubble.metrics.enabled: {dns,drop,tcp,flow,port-distribution,icmp,http} +extraArgs: [--bpf-filter-priority=99] diff --git a/roles/cndp_dp_install/tasks/add_cndp_labels.yml b/roles/cndp_dp_install/tasks/add_cndp_labels.yml index 6a983843..f0a4fa4f 100644 --- a/roles/cndp_dp_install/tasks/add_cndp_labels.yml +++ b/roles/cndp_dp_install/tasks/add_cndp_labels.yml @@ -15,7 +15,7 @@ ## --- - name: add labels for nodes with CNDP - command: kubectl label nodes {{ node_name }} cndp=true --overwrite + command: kubectl label nodes {{ hostvars[node_name]['ansible_hostname'] }} cndp=true --overwrite when: - cndp_dp_enabled | default(false) - hostvars[node_name]['cndp_enabled'] | default(false) diff --git a/roles/cndp_dp_install/tasks/main.yml b/roles/cndp_dp_install/tasks/main.yml index c9147acb..80c9a733 100644 --- a/roles/cndp_dp_install/tasks/main.yml +++ b/roles/cndp_dp_install/tasks/main.yml @@ -73,9 +73,11 @@ - name: tag Intel CNDP Device Plugin image when docker is used as container runtime command: docker tag afxdp-device-plugin {{ intel_cndp_dp_image }}:{{ intel_cndp_dp_image_version }} + changed_when: true - name: push Intel CNDP Device Plugin image to local registry when docker is used as container runtime command: docker push {{ intel_cndp_dp_image }}:{{ intel_cndp_dp_image_version }} + changed_when: true when: - inventory_hostname == groups['kube_control_plane'][0] - container_runtime == "docker" @@ -84,14 +86,18 @@ block: - name: build Intel CNDP Device Plugin image when containerd/cri-o is used as container runtime command: podman build -t afxdp-device-plugin -f images/amd64.dockerfile . + changed_when: true args: chdir: "{{ intel_cndp_dp_dir }}" - name: tag Intel CNDP Device Plugin image when containerd/cri-o is used as container runtime command: podman tag afxdp-device-plugin {{ intel_cndp_dp_image }}:{{ intel_cndp_dp_image_version }} + changed_when: true - name: push Intel CNDP Device Plugin image to local registry when containerd/cri-o is used as container runtime command: podman push {{ intel_cndp_dp_image }}:{{ intel_cndp_dp_image_version }} + changed_when: true + when: - inventory_hostname == groups['kube_control_plane'][0] - '"docker" not in container_runtime' diff --git a/roles/cndp_dp_install/templates/intel-cndp-plugin-daemonset.yml.j2 b/roles/cndp_dp_install/templates/intel-cndp-plugin-daemonset.yml.j2 index 36a3564f..3add0334 100644 --- a/roles/cndp_dp_install/templates/intel-cndp-plugin-daemonset.yml.j2 +++ b/roles/cndp_dp_install/templates/intel-cndp-plugin-daemonset.yml.j2 @@ -25,6 +25,9 @@ spec: - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule + - key: node-role.kubernetes.io/control-plane + operator: Exists + effect: NoSchedule serviceAccountName: afxdp-device-plugin containers: - name: kube-cndp diff --git a/roles/cndp_install/defaults/main.yml b/roles/cndp_install/defaults/main.yml index 1bd2ca6b..b3ab9c0f 100644 --- a/roles/cndp_install/defaults/main.yml +++ b/roles/cndp_install/defaults/main.yml @@ -15,7 +15,7 @@ ## --- intel_cndp_git_url: "https://github.com/CloudNativeDataPlane/cndp.git" -intel_cndp_version: "v22.04.0" +intel_cndp_version: "v22.08.0" intel_cndp_dir: "{{ (project_root_dir, 'intel-cndp') | path_join }}" docker_bin_dir: "/usr/bin" diff --git a/roles/cndp_install/handlers/main.yml b/roles/cndp_install/handlers/main.yml index 700c5e3e..4ad3d78b 100644 --- a/roles/cndp_install/handlers/main.yml +++ b/roles/cndp_install/handlers/main.yml @@ -61,7 +61,7 @@ daemon-reload: yes - name: containerd | wait for containerd - command: "{{ containerd_bin_dir }}/ctr images ls -q" + command: "ctr images ls -q" register: containerd_ready retries: 8 delay: 4 diff --git a/roles/cndp_install/tasks/install_libbpf_ubuntu.yml b/roles/cndp_install/tasks/install_libbpf_ubuntu.yml index 6121a10b..14f0414b 100644 --- a/roles/cndp_install/tasks/install_libbpf_ubuntu.yml +++ b/roles/cndp_install/tasks/install_libbpf_ubuntu.yml @@ -53,6 +53,7 @@ - name: Add /usr/lib64 to ldconfig command: ldconfig + changed_when: true - name: Set cndp build environment set_fact: diff --git a/roles/cndp_install/tasks/main.yml b/roles/cndp_install/tasks/main.yml index b4205317..d06d086f 100644 --- a/roles/cndp_install/tasks/main.yml +++ b/roles/cndp_install/tasks/main.yml @@ -31,14 +31,44 @@ dest: "{{ intel_cndp_dir }}" force: yes - - name: build cndp - make: - chdir: "{{ intel_cndp_dir }}" - - - name: install cndp - make: - target: install + - name: Set fact with correct dir + set_fact: + cndp_lib_dest_dir: "{{ project_root_dir | regex_replace('\\/$', '') }}" + + - name: Replace /tmp dir with {{ cndp_lib_dest_dir }} + replace: + path: "{{ item }}" + regexp: '^(.*)\/tmp(.*)$' + replace: '\1{{ cndp_lib_dest_dir }}\2' + with_items: + - "{{ (intel_cndp_dir, 'tools', 'mklib') | path_join }}.sh" + - "{{ (intel_cndp_dir, 'meson') | path_join }}.build" + + # Suggested from CNDP product for a SPR feature on RHEL / ROCKY >= 9.0 + # `uintr` flag from the compiler, the compiler is reporting the `uintr` flag when `uintr` is not supported + # CNDP team advised to remove this flag for Rocky / RHEL + # make install not working on RHEL / ROCKY >= 9.0 need to build it with meson (ninja install) + # Ansible builtin module replace / lineinfile not responding as expected so replacing sed command as an alternative + - name: delete lines from meson.build on RHEL / ROCKY + command: "sed -i '201d;202d;203d;204d;205d' meson.build" # noqa command-instead-of-module 305 + args: chdir: "{{ intel_cndp_dir }}" + changed_when: true + register: test + when: + - ansible_os_family == "RedHat" + - ansible_distribution_version >= "9.0" + + - name: block for install CNDP + block: + - name: build cndp + make: + chdir: "{{ intel_cndp_dir }}" + + - name: install cndp + make: + target: install + chdir: "{{ intel_cndp_dir }}" when: - cndp_enabled | default(false) diff --git a/roles/collectd_install/defaults/main.yml b/roles/collectd_install/defaults/main.yml index 2432a997..16250789 100644 --- a/roles/collectd_install/defaults/main.yml +++ b/roles/collectd_install/defaults/main.yml @@ -14,9 +14,9 @@ ## limitations under the License. ## --- -collectd_configuration_files_dir: "{{ host_collectd_folder }}/collectd.conf.d" -barometer_collectd_dir: "{{ project_root_dir }}/barometer" -collectd_deployment_dir: "{{ project_root_dir }}/k8s/collectd/" +collectd_configuration_files_dir: "{{ (host_collectd_folder, 'collectd.conf.d') | path_join }}" +barometer_collectd_dir: "{{ (project_root_dir, 'barometer') | path_join }}" +collectd_deployment_dir: "{{ (project_root_dir, 'k8s', 'collectd') | path_join }}" collectd_scrap_interval: 30 collectd_write_threads: 25 @@ -47,8 +47,8 @@ unixsock_host_socket_dir: /var/run/collectd/ enable_custom_types_db: false image_collectd: - repository: "opnfv/barometer-collectd" - digest: sha256:a7cea43d9d2f67c38fbf0407786edbe660ee9072945f7bb272b55fd255e8eaca + repository: intel/observability-collectd + digest: sha256:ece869707363959223135d777148f3b97db0477990ef23b2ca4e0644d92ecb09 pullPolicy: Always image_collectd_exporter: repository: prom/collectd-exporter @@ -58,12 +58,11 @@ image_rbac_proxy: repository: quay.io/coreos/kube-rbac-proxy version: v0.5.0 pullPolicy: Always -psp_enabled: true collectd_namespace: monitoring host_collectd_folder: /opt/collect.d pkgpower_repo_url: "https://github.com/intel/CommsPowerManagement.git" -pkgpower_dir: "{{ project_root_dir }}/commspowermanagement" +pkgpower_dir: "{{ (project_root_dir, 'commspowermanagement') | path_join }}" # currently excluded plugins were not delivered with latest stable # opnfv/barometer-collectd image (digest sha256:ed5c574f653e) @@ -203,7 +202,7 @@ collectd_plugins: # List of plugins that will be excluded from collectd deployment. exclude_collectd_plugins: [] -rbac_proxy_ssl_mount_path: /etc/ssl/rbac-proxy/ +rbac_proxy_ssl_mount_path: /etc/ssl/rbac-proxy rbac_proxy_ssl_secret_name: rbac-proxy-ssl rbac_proxy_tls_cipher_suites: diff --git a/roles/collectd_install/tasks/collectd.yml b/roles/collectd_install/tasks/collectd.yml index 8c243b19..4b5d5551 100644 --- a/roles/collectd_install/tasks/collectd.yml +++ b/roles/collectd_install/tasks/collectd.yml @@ -45,7 +45,7 @@ - name: fill in k8s deployment files template: src: "{{ item }}" - dest: "{{ collectd_deployment_dir }}/{{ item }}" + dest: "{{ (collectd_deployment_dir, item) | path_join }}" owner: root group: root force: yes @@ -57,17 +57,6 @@ - "collectd-rbac-cluster-role-binding.yml" - "collectd-rbac-service-account.yml" -- name: copy psp file - template: - src: "psp.yml.j2" - dest: "{{ collectd_deployment_dir }}/psp.yml" - force: yes - mode: preserve - owner: root - group: root - when: - - psp_enabled | default(true) - - name: install collectd from k8s deployment files command: kubectl apply -f {{ collectd_deployment_dir }} --namespace {{ collectd_namespace }} changed_when: true diff --git a/roles/collectd_install/tasks/copy-configs.yml b/roles/collectd_install/tasks/copy-configs.yml index f071ccc3..7c9701db 100644 --- a/roles/collectd_install/tasks/copy-configs.yml +++ b/roles/collectd_install/tasks/copy-configs.yml @@ -14,6 +14,18 @@ ## limitations under the License. ## --- +- name: check if intel-rapl dir exist + stat: path=/sys/devices/virtual/powercap/intel-rapl + register: intel_rapl_status + +- name: disable pkgpower_plugin if intel-rapl dir is not available + set_fact: + enable_pkgpower_plugin: false + changed_when: true + when: + - not intel_rapl_status.stat.exists + - enable_pkgpower_plugin + - name: disable intel_pmu plugin set_fact: exclude_collectd_plugins: "{{ exclude_collectd_plugins + [ 'intel_pmu' ] }}" @@ -31,7 +43,7 @@ - name: prepare list of plugins to be deployed set_fact: - plugins: "{{ collectd_plugins[collectd_profile|default('basic')] | difference(exclude_collectd_plugins) }}" + plugins: "{{ collectd_plugins[collectd_profile | default('basic')] | difference(exclude_collectd_plugins) }}" - name: rename ipmi to 0_ipmi to ensure that ipmi plugin will be loaded first (https://sourceforge.net/p/openipmi/bugs/86/) set_fact: @@ -72,7 +84,7 @@ - name: get network interfaces for ethstat plugin shell: set -o pipefail && hwinfo --short --netcard | awk -F ' ' '{if ($2!="") print $2}' # noqa 602 args: - executable: /bin/bash + executable: /bin/bash register: ethstat_interfaces_in_ubuntu changed_when: ethstat_interfaces_in_ubuntu is defined when: @@ -83,7 +95,7 @@ - name: get network interfaces for ethstat plugin shell: set -o pipefail && lshw -c network -businfo | grep "pci@" | awk -F ' ' '{if ($2!="") print $2}' # noqa 602 args: - executable: /bin/bash + executable: /bin/bash register: ethstat_interfaces changed_when: ethstat_interfaces is defined when: diff --git a/roles/collectd_install/templates/collectd-rbac-cluster-role.yml b/roles/collectd_install/templates/collectd-rbac-cluster-role.yml index f03cbf91..394d5c00 100644 --- a/roles/collectd_install/templates/collectd-rbac-cluster-role.yml +++ b/roles/collectd_install/templates/collectd-rbac-cluster-role.yml @@ -12,10 +12,3 @@ rules: resources: - subjectaccessreviews verbs: ["create"] -{% if psp_enabled %} - - apiGroups: ['policy'] - resources: ['podsecuritypolicies'] - verbs: ['use'] - resourceNames: - - collectd -{% endif %} diff --git a/roles/collectd_install/templates/daemonset.yml b/roles/collectd_install/templates/daemonset.yml index 98e4a719..d3157066 100644 --- a/roles/collectd_install/templates/daemonset.yml +++ b/roles/collectd_install/templates/daemonset.yml @@ -20,9 +20,6 @@ spec: - name: collectd image: "{{ image_collectd.repository }}@{{ image_collectd.digest }}" imagePullPolicy: "{{ image_collectd.pullPolicy }}" - command: ["/bin/bash", "-c"] - args: - - "yum -y remove kernel-devel kernel-headers ; /run_collectd.sh" securityContext: privileged: true volumeMounts: @@ -71,8 +68,8 @@ spec: - name: https containerPort: 9104 args: - - "--tls-cert-file={{ rbac_proxy_ssl_mount_path }}{{ rbac_proxy_ssl_secret_name }}.cert" - - "--tls-private-key-file={{ rbac_proxy_ssl_mount_path }}{{ rbac_proxy_ssl_secret_name }}.key" + - "--tls-cert-file={{ rbac_proxy_ssl_mount_path }}/{{ rbac_proxy_ssl_secret_name }}.cert" + - "--tls-private-key-file={{ rbac_proxy_ssl_mount_path }}/{{ rbac_proxy_ssl_secret_name }}.key" - "--tls-cipher-suites={{ rbac_proxy_tls_cipher_suites | join(',') }}" - "--secure-listen-address=0.0.0.0:9104" - "--upstream=http://127.0.0.1:9103/" diff --git a/roles/collectd_install/templates/psp.yml.j2 b/roles/collectd_install/templates/psp.yml.j2 deleted file mode 100644 index c6ccad2a..00000000 --- a/roles/collectd_install/templates/psp.yml.j2 +++ /dev/null @@ -1,44 +0,0 @@ ---- -apiVersion: policy/v1beta1 -kind: PodSecurityPolicy -metadata: - name: collectd -spec: - allowPrivilegeEscalation: true - allowedCapabilities: - - '*' - allowedUnsafeSysctls: - - '*' - fsGroup: - rule: RunAsAny - hostNetwork: true - hostPorts: - - max: 9104 - min: 9103 - privileged: true - runAsUser: - rule: RunAsAny - seLinux: - rule: RunAsAny - supplementalGroups: - rule: RunAsAny - allowedHostPaths: - - pathPrefix: "{{ host_collectd_folder }}" - - pathPrefix: "/proc" - - pathPrefix: "/usr/local/var/run/openvswitch/" - - pathPrefix: "/var/run/.client" - - pathPrefix: "/var/run/dpdk/rte/telemetry" -{% if enable_pkgpower_plugin %} - - pathPrefix: "/sys/devices/virtual/powercap/intel-rapl/" -{% endif %} - - pathPrefix: "/sys/kernel/mm/hugepages" - - pathPrefix: "/sys/devices/system/cpu" - - pathPrefix: "{{ unixsock_host_socket_dir }}" - volumes: - - "configMap" - - "downwardAPI" - - "emptyDir" - - "persistentVolumeClaim" - - "secret" - - "projected" - - "hostPath" diff --git a/roles/container_engine/crictl/defaults/main.yml b/roles/container_engine/crictl/defaults/main.yml index 11315cfc..87f94ffd 100644 --- a/roles/container_engine/crictl/defaults/main.yml +++ b/roles/container_engine/crictl/defaults/main.yml @@ -17,7 +17,8 @@ crictl_version: "v1.21.0" image_arch: "amd64" -crictl_download_url: "https://github.com/kubernetes-sigs/cri-tools/releases/download/{{ crictl_version }}/crictl-{{ crictl_version }}-{{ ansible_system | lower }}-{{ image_arch }}.tar.gz" +crictl_download_url: "{{ crictl_repo_url }}{{ crictl_version }}/crictl-{{ crictl_version }}-{{ ansible_system | lower }}-{{ image_arch }}.tar.gz" +crictl_repo_url: "https://github.com/kubernetes-sigs/cri-tools/releases/download/" crictl_checksums: amd64: diff --git a/roles/container_engine/docker/defaults/main.yml b/roles/container_engine/docker/defaults/main.yml index 19469b80..95db9cab 100644 --- a/roles/container_engine/docker/defaults/main.yml +++ b/roles/container_engine/docker/defaults/main.yml @@ -52,4 +52,3 @@ docker_log_opts: "--log-opt max-size=50m --log-opt max-file=5" # docker_rpm_keepcache: 1 yum_repo_dir: /etc/yum.repos.d - diff --git a/roles/container_registry/defaults/main.yml b/roles/container_registry/defaults/main.yml index 628a18b1..0f96b066 100644 --- a/roles/container_registry/defaults/main.yml +++ b/roles/container_registry/defaults/main.yml @@ -23,7 +23,9 @@ registry_addr: 127.0.0.1 registry_image: "docker.io/library/registry:2.7.0" nginx_image: "docker.io/library/nginx:1.20.1-alpine" -nginx_ssl_ciphers: "AES128-CCM-SHA256:CHACHA20-POLY1305-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256" +nginx_ssl_ciphers: + "AES128-CCM-SHA256:CHACHA20-POLY1305-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE\ + -ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256" nginx_ssl_protocols: "TLSv1.2 TLSv1.3" release_name: container-registry diff --git a/roles/container_registry/tasks/main.yml b/roles/container_registry/tasks/main.yml index f8b81251..40df9be3 100644 --- a/roles/container_registry/tasks/main.yml +++ b/roles/container_registry/tasks/main.yml @@ -154,6 +154,14 @@ validate_certs: yes when: container_runtime == "docker" +- name: copy auth file + copy: + src: "{{ ansible_env.HOME }}/.docker/config.json" + dest: /var/lib/kubelet/config.json + remote_src: yes + mode: '0755' + when: container_runtime == "docker" + - name: grant access to the registry command: podman login --authfile="{{ registry_auth }}" -u docker -p "{{ password }}" "{{ registry_local_address }}" changed_when: false diff --git a/roles/container_registry/tasks/tls.yml b/roles/container_registry/tasks/tls.yml index 4e9b0166..651e5b57 100644 --- a/roles/container_registry/tasks/tls.yml +++ b/roles/container_registry/tasks/tls.yml @@ -33,10 +33,12 @@ - name: delete any preexisting certs/key/CSR from Kubernetes command: kubectl delete csr registry.{{ registry_namespace }} + changed_when: true failed_when: false - name: delete any preexisting secrets from Kubernetes command: kubectl delete secret -n {{ registry_namespace }} {{ registry_secret_name }} + changed_when: true failed_when: false - name: populate registry CSR template @@ -58,12 +60,14 @@ args: chdir: "/etc/ssl/registry/" executable: /bin/bash + changed_when: true become: yes - name: read generated key command: cat registry-key.pem args: chdir: "/etc/ssl/registry/" + changed_when: false register: key - name: load generated key @@ -74,6 +78,7 @@ command: cat registry.csr args: chdir: "/etc/ssl/registry/" + changed_when: false register: csr - name: load generated csr @@ -89,14 +94,17 @@ - name: send CSR to the Kubernetes API Server command: kubectl apply -f /etc/ssl/registry/kube-registry-csr.yml + changed_when: true - name: approve request command: kubectl certificate approve registry.kube-system + changed_when: true - name: get approved certificate shell: kubectl get csr registry.kube-system -o jsonpath='{.status.certificate}' args: chdir: "/etc/ssl/registry" + changed_when: false register: cert retries: 30 delay: 1 @@ -111,6 +119,7 @@ kubectl create -n {{ registry_namespace }} secret generic {{ registry_secret_name }} --from-literal=tls.crt='{{ registry_cert }}' --from-literal=tls.key='{{ registry_key }}' + changed_when: true - name: clean up file: path=/etc/ssl/registry state=absent @@ -122,44 +131,44 @@ - name: copy Kubernetes CA so that registry client can validate registry's certificate become: yes block: - - name: remove existing certs and keys - file: path="/etc/docker/certs.d/{{ registry_local_address }}" state=absent - - name: ensure that path exists - file: - path: "/etc/docker/certs.d/{{ registry_local_address }}" - mode: '0700' - owner: root - group: root - state: directory - - name: place Kubernetes CA in the /etc/docker/certs.d - copy: - src: /etc/kubernetes/ssl/ca.crt - dest: "/etc/docker/certs.d/{{ registry_local_address }}/ca.crt" - remote_src: yes - mode: '0600' - owner: root - group: root + - name: remove existing certs and keys + file: path="/etc/docker/certs.d/{{ registry_local_address }}" state=absent + - name: ensure that path exists + file: + path: "/etc/docker/certs.d/{{ registry_local_address }}" + mode: '0700' + owner: root + group: root + state: directory + - name: place Kubernetes CA in the /etc/docker/certs.d + copy: + src: /etc/kubernetes/ssl/ca.crt + dest: "/etc/docker/certs.d/{{ registry_local_address }}/ca.crt" + remote_src: yes + mode: '0600' + owner: root + group: root when: container_runtime == "docker" # copy CA file so that registry clients can validate its certificate - name: copy Kubernetes CA so that registry client can validate registry's certificate become: yes block: - - name: remove existing certs and keys - file: path="/etc/containers/certs.d/{{ registry_local_address }}ca.crt" state=absent - - name: ensure that path exists - file: - path: "/etc/containers/certs.d/{{ registry_local_address }}" - mode: '0700' - owner: root - group: root - state: directory - - name: place Kubernetes CA in the /etc/containers/certs.d/ - copy: - src: /etc/kubernetes/ssl/ca.crt - dest: "/etc/containers/certs.d/{{ registry_local_address }}/ca.crt" - remote_src: yes - mode: '0600' - owner: root - group: root + - name: remove existing certs and keys + file: path="/etc/containers/certs.d/{{ registry_local_address }}ca.crt" state=absent + - name: ensure that path exists + file: + path: "/etc/containers/certs.d/{{ registry_local_address }}" + mode: '0700' + owner: root + group: root + state: directory + - name: place Kubernetes CA in the /etc/containers/certs.d/ + copy: + src: /etc/kubernetes/ssl/ca.crt + dest: "/etc/containers/certs.d/{{ registry_local_address }}/ca.crt" + remote_src: yes + mode: '0600' + owner: root + group: root when: '"docker" not in container_runtime' diff --git a/roles/container_registry/templates/container-registry/deployment.yaml.j2 b/roles/container_registry/templates/container-registry/deployment.yaml.j2 index 5911de7e..03c63229 100644 --- a/roles/container_registry/templates/container-registry/deployment.yaml.j2 +++ b/roles/container_registry/templates/container-registry/deployment.yaml.j2 @@ -81,6 +81,9 @@ spec: - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists + - effect: NoSchedule + key: node-role.kubernetes.io/control-plane + operator: Exists volumes: - name: data persistentVolumeClaim: diff --git a/roles/dlb_dp_install/defaults/main.yml b/roles/dlb_dp_install/defaults/main.yml new file mode 100644 index 00000000..14111ebe --- /dev/null +++ b/roles/dlb_dp_install/defaults/main.yml @@ -0,0 +1,23 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +intel_dlb_dp_git_url: "https://github.com/intel/intel-device-plugins-for-kubernetes.git" +intel_dlb_dp_git_ref: "v0.24.0" +intel_dlb_dp_version: "0.24.0" +intel_dlb_dp_dir: "{{ (project_root_dir, 'intel-dlb-dp') | path_join }}" + +dlb_dp_apparmor_profile: "unconfined" +dlb_dp_verbosity: 4 diff --git a/roles/dlb_dp_install/tasks/main.yml b/roles/dlb_dp_install/tasks/main.yml new file mode 100644 index 00000000..2053e1e7 --- /dev/null +++ b/roles/dlb_dp_install/tasks/main.yml @@ -0,0 +1,84 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: install dependencies for Intel DLB DP + include_role: + name: install_dependencies + +- name: clone Intel Device Plugins repository + git: + repo: "{{ intel_dlb_dp_git_url }}" + version: "{{ intel_dlb_dp_git_ref }}" + dest: "{{ intel_dlb_dp_dir }}" + force: yes + when: inventory_hostname == groups['kube_node'][0] + +# docker is in use as container runtime +- name: prepare containers images + block: + - name: build Intel DLB Device Plugin image + make: + target: intel-dlb-plugin + chdir: "{{ intel_dlb_dp_dir }}" + + - name: tag Intel DLB Device Plugin image + command: docker tag intel/intel-dlb-plugin:devel {{ registry_local_address }}/intel-dlb-plugin:{{ intel_dlb_dp_version }} + changed_when: true + + - name: push Intel DLB Device Plugin image to local registry + command: docker push {{ registry_local_address }}/intel-dlb-plugin:{{ intel_dlb_dp_version }} + changed_when: true + when: + - inventory_hostname == groups['kube_node'][0] + - dlb_dp_build_image_locally + - container_runtime == "docker" + +# crio/containerd is in use as container runtime +- name: prepare containers images + block: + - name: build and tag Intel DLB Device Plugin image + command: buildah bud -f build/docker/intel-dlb-plugin.Dockerfile -t {{ registry_local_address }}/intel-dlb-plugin:{{ intel_dlb_dp_version }} + changed_when: true + + - name: push Intel DLB Device Plugin image to local registry + command: buildah push {{ registry_local_address }}/intel-dlb-plugin:{{ intel_dlb_dp_version }} + changed_when: true + when: + - inventory_hostname == groups['kube_node'][0] + - dlb_dp_build_image_locally + - '"docker" not in container_runtime' + +# deploy Intel DLB Device Plugin +- name: prepare and deploy Intel DLB Device Plugin + block: + - name: set values + set_fact: + dlb_dp_image: "{{ registry_local_address }}/intel-dlb-plugin" + dlb_dp_version: "{{ intel_dlb_dp_version }}" + when: dlb_dp_build_image_locally + + - name: populate Intel DLB Plugin yaml file and push to controller node + template: + src: "intel-dlb-dp.yml.j2" + dest: "{{ (project_root_dir, 'intel-dlb-dp.yml') | path_join }}" + force: yes + mode: preserve + + - name: deploy Intel DLB Device Plugin with the Intel Device Plugin Operator + k8s: + state: present + src: "{{ (project_root_dir, 'intel-dlb-dp.yml') | path_join }}" + when: inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/dlb_dp_install/templates/intel-dlb-dp.yml.j2 b/roles/dlb_dp_install/templates/intel-dlb-dp.yml.j2 new file mode 100644 index 00000000..f7f0c3ef --- /dev/null +++ b/roles/dlb_dp_install/templates/intel-dlb-dp.yml.j2 @@ -0,0 +1,18 @@ +--- +apiVersion: deviceplugin.intel.com/v1 +kind: DlbDevicePlugin +metadata: + name: intel-dlb-device-plugin + # example apparmor annotation + # see more details here: + # - https://kubernetes.io/docs/tutorials/clusters/apparmor/#securing-a-pod + # - https://github.com/intel/intel-device-plugins-for-kubernetes/issues/381 +{% if ansible_distribution == "Ubuntu" %} + annotations: + container.apparmor.security.beta.kubernetes.io/intel-dlb-plugin: {{ dlb_dp_apparmor_profile | default("unconfined") }} +{% endif %} +spec: + image: {{ dlb_dp_image | default("docker.io/intel/intel-dlb-plugin") }}:{{ dlb_dp_version | default("0.24.0") }} + logLevel: {{ dlb_dp_verbosity | default(4) }} + nodeSelector: + intel.feature.node.kubernetes.io/dlb: "true" diff --git a/roles/dlb_dp_install/vars/main.yml b/roles/dlb_dp_install/vars/main.yml new file mode 100644 index 00000000..87cfc212 --- /dev/null +++ b/roles/dlb_dp_install/vars/main.yml @@ -0,0 +1,23 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +install_dependencies: + Debian: + - make + - git + RedHat: + - make + - git diff --git a/roles/dsa_dp_install/defaults/main.yml b/roles/dsa_dp_install/defaults/main.yml new file mode 100644 index 00000000..481e3522 --- /dev/null +++ b/roles/dsa_dp_install/defaults/main.yml @@ -0,0 +1,22 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +intel_dsa_dp_git_url: "https://github.com/intel/intel-device-plugins-for-kubernetes.git" +intel_dsa_dp_git_ref: "v0.24.0" +intel_dsa_dp_version: "0.24.0" +intel_dsa_dp_dir: "{{ (project_root_dir, 'intel-dsa-dp') | path_join }}" +dsa_shared_devices: 10 # SharedDevNum is a number of containers that can share the same DSA device. +dsa_log_level: 4 diff --git a/roles/dsa_dp_install/tasks/main.yml b/roles/dsa_dp_install/tasks/main.yml new file mode 100644 index 00000000..f1e34c96 --- /dev/null +++ b/roles/dsa_dp_install/tasks/main.yml @@ -0,0 +1,91 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: install dependencies for Intel DSA DP + include_role: + name: install_dependencies + +- name: clone Intel Device Plugins repository + git: + repo: "{{ intel_dsa_dp_git_url }}" + version: "{{ intel_dsa_dp_git_ref }}" + dest: "{{ intel_dsa_dp_dir }}" + force: yes + when: inventory_hostname == groups['kube_node'][0] + +- name: create Intel DSA Device Plugin directory on master node + file: + path: "{{ intel_dsa_dp_dir }}" + state: directory + mode: 0644 + changed_when: true + +# docker is used as container runtime: +- name: prepare containers images + block: + - name: build Intel DSA Device Plugin image + make: + target: intel-dsa-plugin + chdir: "{{ intel_dsa_dp_dir }}" + + - name: tag Intel DSA Device Plugin image + command: docker tag intel/intel-dsa-plugin:{{ intel_dsa_dp_version }} {{ registry_local_address }}/intel-dsa-plugin:{{ intel_dsa_dp_version }} + changed_when: true + + - name: push Intel DSA Device Plugin image to local registry + command: docker push {{ registry_local_address }}/intel-dsa-plugin:{{ intel_dsa_dp_version }} + changed_when: true + when: + - inventory_hostname == groups['kube_node'][0] + - dsa_dp_build_image_locally + - container_runtime == "docker" + +# containerd/cri-o is used as container runtime: +- name: prepare containers images + block: + - name: build and tag Intel DSA Device Plugin image + command: podman build -f build/docker/intel-dsa-plugin.Dockerfile . -t {{ registry_local_address }}/intel-dsa-plugin:{{ intel_dsa_dp_version }} + args: + chdir: "{{ intel_dsa_dp_dir }}" + changed_when: true + + - name: push Intel DSA Device Plugin image to local registry + command: podman push {{ registry_local_address }}/intel-dsa-plugin:{{ intel_dsa_dp_version }} + changed_when: true + when: + - inventory_hostname == groups['kube_node'][0] + - dsa_dp_build_image_locally + - '"docker" not in container_runtime' + +- name: prepare and deploy Intel DSA Device Plugin + block: + - name: set values + set_fact: + dsa_dp_image: "{{ registry_local_address }}/intel-dsa-plugin" + when: dsa_dp_build_image_locally + + - name: populate Intel DSA Plugin yaml file and push to controller node + template: + src: "intel-dsa-plugin.yml.j2" + dest: "{{ intel_dsa_dp_dir }}/intel-dsa-plugin.yml" + force: yes + mode: preserve + + - name: deploy Intel DSA Device Plugin with the Intel Device Plugin Operator + k8s: + state: present + src: "{{ intel_dsa_dp_dir }}/intel-dsa-plugin.yml" + when: inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/dsa_dp_install/templates/intel-dsa-plugin.yml.j2 b/roles/dsa_dp_install/templates/intel-dsa-plugin.yml.j2 new file mode 100644 index 00000000..1063dc35 --- /dev/null +++ b/roles/dsa_dp_install/templates/intel-dsa-plugin.yml.j2 @@ -0,0 +1,12 @@ +--- +apiVersion: deviceplugin.intel.com/v1 +kind: DsaDevicePlugin +metadata: + name: intel-dsa-plugin +spec: + image: {{ dsa_dp_image | default("docker.io/intel/intel-dsa-plugin") }}:{{ intel_dsa_dp_version | default("0.24.0") }} + sharedDevNum: {{ dsa_shared_devices }} + logLevel: {{ dsa_dp_verbosity }} + nodeSelector: + intel.feature.node.kubernetes.io/dsa: "true" + dsa.configured: 'true' diff --git a/roles/dsa_dp_install/vars/main.yml b/roles/dsa_dp_install/vars/main.yml new file mode 100644 index 00000000..170d32e5 --- /dev/null +++ b/roles/dsa_dp_install/vars/main.yml @@ -0,0 +1,23 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +install_dependencies: + Debian: + - git + - make + RedHat: + - git + - make diff --git a/roles/gpu_dp_install/defaults/main.yml b/roles/gpu_dp_install/defaults/main.yml index fd6fa5f7..922b02a2 100644 --- a/roles/gpu_dp_install/defaults/main.yml +++ b/roles/gpu_dp_install/defaults/main.yml @@ -15,8 +15,8 @@ ## --- intel_gpu_dp_git_url: "https://github.com/intel/intel-device-plugins-for-kubernetes.git" -intel_gpu_dp_git_ref: "v0.23.0" -intel_gpu_dp_version: "0.23.0" +intel_gpu_dp_git_ref: "v0.24.0" +intel_gpu_dp_version: "0.24.0" intel_gpu_dp_dir: "{{ (project_root_dir, 'intel-gpu-dp') | path_join }}" gpu_dp_shared_devices: 10 diff --git a/roles/gpu_dp_install/tasks/main.yml b/roles/gpu_dp_install/tasks/main.yml index e93121e8..e456c84c 100644 --- a/roles/gpu_dp_install/tasks/main.yml +++ b/roles/gpu_dp_install/tasks/main.yml @@ -41,12 +41,14 @@ - name: tag Intel GPU Device Plugin images command: docker tag intel/{{ item }}:{{ intel_gpu_dp_version }} {{ registry_local_address }}/{{ item }}:{{ intel_gpu_dp_version }} + changed_when: true loop: - intel-gpu-plugin - intel-gpu-initcontainer - name: push Intel GPU Device Plugin image to local registry command: docker push {{ registry_local_address }}/{{ item }}:{{ intel_gpu_dp_version }} + changed_when: true loop: - intel-gpu-plugin - intel-gpu-initcontainer @@ -64,15 +66,15 @@ chdir: "{{ intel_gpu_dp_dir }}" changed_when: true with_items: - - { file: intel-gpu-initcontainer.Dockerfile, name: intel-gpu-initcontainer } - - { file: intel-gpu-plugin.Dockerfile, name: intel-gpu-plugin } + - {file: intel-gpu-initcontainer.Dockerfile, name: intel-gpu-initcontainer} + - {file: intel-gpu-plugin.Dockerfile, name: intel-gpu-plugin} - name: push Intel GPU Device Plugin image to local registry command: podman push {{ registry_local_address }}/{{ item.name }}:{{ intel_gpu_dp_version }} changed_when: true with_items: - - { name: intel-gpu-initcontainer } - - { name: intel-gpu-plugin } + - {name: intel-gpu-initcontainer} + - {name: intel-gpu-plugin} when: - inventory_hostname == groups['kube_node'][0] - gpu_dp_build_image_locally diff --git a/roles/gpu_dp_install/templates/intel-gpu-plugin.yml.j2 b/roles/gpu_dp_install/templates/intel-gpu-plugin.yml.j2 index 5db63a2d..d2801b6b 100644 --- a/roles/gpu_dp_install/templates/intel-gpu-plugin.yml.j2 +++ b/roles/gpu_dp_install/templates/intel-gpu-plugin.yml.j2 @@ -4,14 +4,15 @@ kind: GpuDevicePlugin metadata: name: intel-gpu-plugin spec: - image: {{ gpu_dp_image | default("docker.io/intel/intel-gpu-plugin") }}:{{ gpu_dp_version | default("0.23.0") }} - initImage: {{ gpu_dp_init_image | default("docker.io/intel/intel-gpu-initcontainer") }}:{{ gpu_dp_version | default("0.23.0") }} + image: {{ gpu_dp_image | default("docker.io/intel/intel-gpu-plugin") }}:{{ gpu_dp_version | default("0.24.0") }} + initImage: {{ gpu_dp_init_image | default("docker.io/intel/intel-gpu-initcontainer") }}:{{ gpu_dp_version | default("0.24.0") }} sharedDevNum: {{ gpu_dp_shared_devices | default(10) }} logLevel: {{ gpu_dp_verbosity | default(4) }} enableMonitoring: {{ gpu_dp_monitor_resources | default(false) }} resourceManager: {{ gpu_dp_fractional_manager | default(false) }} preferredAllocationPolicy: {{ gpu_dp_prefered_allocation | default('none') }} nodeSelector: + intel.feature.node.kubernetes.io/gpu: "true" # check if node has required PCI IDs feature.node.kubernetes.io/pci-0300_8086.present: 'true' # check if node custom gpu kernel installed diff --git a/roles/install_ddp_pkgs/defaults/main.yml b/roles/install_ddp_pkgs/defaults/main.yml index 2291022a..3510c8ef 100644 --- a/roles/install_ddp_pkgs/defaults/main.yml +++ b/roles/install_ddp_pkgs/defaults/main.yml @@ -35,3 +35,4 @@ ddp_pkgs: - "https://downloadmirror.intel.com/29889/eng/800%20series%20comms%20binary%20package%201.3.30.0_rev1.1.zip" - "https://downloadmirror.intel.com/713853/800%20Series%20DDP%20Comms%20Package%201.3.31.0.zip" - "https://downloadmirror.intel.com/727568/ice_comms-1.3.35.0.zip" + - "https://downloadmirror.intel.com/738733/800%20Series%20DDP%20Comms%20Package%201.3.37.0.zip" diff --git a/roles/install_ddp_pkgs/tasks/install_a_pkg.yml b/roles/install_ddp_pkgs/tasks/install_a_pkg.yml index 636e17c7..8f0aa637 100644 --- a/roles/install_ddp_pkgs/tasks/install_a_pkg.yml +++ b/roles/install_ddp_pkgs/tasks/install_a_pkg.yml @@ -61,6 +61,14 @@ mode: 0644 when: '"1.3.31.0" in pkgurl' +- name: unarchive DDP package subfolder excluding from list of URLs + unarchive: + src: "{{ temp_ddp_path }}/ice_comms-1.3.37.0.zip" + dest: "{{ temp_ddp_path }}" + remote_src: yes + mode: 0644 + when: '"1.3.37.0" in pkgurl' + - name: find PKG files find: paths: "{{ temp_ddp_path }}" diff --git a/roles/install_ddp_pkgs/tasks/install_pkgs.yml b/roles/install_ddp_pkgs/tasks/install_pkgs.yml index df16f573..e9db5e92 100644 --- a/roles/install_ddp_pkgs/tasks/install_pkgs.yml +++ b/roles/install_ddp_pkgs/tasks/install_pkgs.yml @@ -23,7 +23,7 @@ owner: root group: root -- name : install ddp package from a URL +- name: install ddp package from a URL include: install_a_pkg.yml loop: "{{ pkgurls }}" loop_control: diff --git a/roles/install_dependencies/tasks/main.yml b/roles/install_dependencies/tasks/main.yml index 265b904b..04cb364a 100644 --- a/roles/install_dependencies/tasks/main.yml +++ b/roles/install_dependencies/tasks/main.yml @@ -27,5 +27,5 @@ retries: 3 until: pkg_mgr_results is success environment: - http_proxy: "{{ http_proxy }}" - https_proxy: "{{ https_proxy }}" + http_proxy: "{{ http_proxy | d('') }}" + https_proxy: "{{ https_proxy | d('') }}" diff --git a/roles/install_dpdk/tasks/install_dpdk_meson.yml b/roles/install_dpdk/tasks/install_dpdk_meson.yml index d93c75e2..6ff04dbf 100644 --- a/roles/install_dpdk/tasks/install_dpdk_meson.yml +++ b/roles/install_dpdk/tasks/install_dpdk_meson.yml @@ -14,7 +14,7 @@ ## limitations under the License. ## --- -- name: install dpdk-devel required for libraries enablement in RHEL / CentOS >= 8.2 +- name: install dpdk-devel required for libraries enablement in RHEL / Rocky dnf: name: dpdk-devel when: ansible_os_family == "RedHat" and ansible_distribution_version >= '8.2' @@ -36,21 +36,25 @@ - name: meson build for ease of compiling and linking libraries enablement command: "meson build" + changed_when: true args: chdir: "{{ dpdk_dir }}" - name: configure DPDK with ninja command: "ninja" + changed_when: true args: chdir: "{{ dpdk_dir }}/build" - name: install DPDK with ninja command: "ninja install" + changed_when: true args: chdir: "{{ dpdk_dir }}/build" - name: update the dynamic linker cache command: "ldconfig" + changed_when: true args: chdir: "{{ dpdk_dir }}/build" vars: diff --git a/roles/install_dpdk/tasks/main.yml b/roles/install_dpdk/tasks/main.yml index 6fdb775f..7eecd930 100644 --- a/roles/install_dpdk/tasks/main.yml +++ b/roles/install_dpdk/tasks/main.yml @@ -17,7 +17,7 @@ - name: Check if dpdk_version is defined assert: that: - - dpdk_version is defined + - dpdk_version is defined fail_msg: "Required variable 'dpdk_version' is not defined" - name: install dependencies @@ -78,7 +78,7 @@ msg: "[WARNING] DPDK patches were not found, no patches been applied." when: - patches_found.skipped | default(false) or patches_found.matched == 0 - ignore_errors: true + failed_when: false when: dpdk_local_patches_dir is defined diff --git a/roles/intel_dp_operator/defaults/main.yml b/roles/intel_dp_operator/defaults/main.yml index e50a5c18..d859f751 100644 --- a/roles/intel_dp_operator/defaults/main.yml +++ b/roles/intel_dp_operator/defaults/main.yml @@ -15,7 +15,7 @@ ## --- intel_dp_operator_git_url: "https://github.com/intel/intel-device-plugins-for-kubernetes.git" -intel_dp_operator_git_ref: "v0.23.0" -intel_dp_operator_version: "0.23.0" +intel_dp_operator_git_ref: "v0.24.0" +intel_dp_operator_version: "0.24.0" intel_dp_operator_dir: "{{ (project_root_dir, 'intel-dp-operator') | path_join }}" intel_dp_namespace: kube-system diff --git a/roles/intel_dp_operator/tasks/add_dp_labels.yml b/roles/intel_dp_operator/tasks/add_dp_labels.yml index e36994c6..8d11fd74 100644 --- a/roles/intel_dp_operator/tasks/add_dp_labels.yml +++ b/roles/intel_dp_operator/tasks/add_dp_labels.yml @@ -26,3 +26,9 @@ when: - sgx_dp_enabled | default(false) - hostvars[node_name]['configure_sgx'] | default(false) + +- name: add labels for nodes with configured DSA + command: kubectl label nodes {{ hostvars[node_name]['ansible_hostname'] }} dsa.configured=true --overwrite + when: + - dsa_dp_enabled | default(false) + - hostvars[node_name]['configure_dsa_devices'] | default(false) diff --git a/roles/intel_dp_operator/tasks/main.yml b/roles/intel_dp_operator/tasks/main.yml index d32e137e..7d5a9b0c 100644 --- a/roles/intel_dp_operator/tasks/main.yml +++ b/roles/intel_dp_operator/tasks/main.yml @@ -43,6 +43,7 @@ - name: install Intel Device Plugins Operator command: kubectl apply -k {{ intel_dp_operator_dir }}/deployments/operator/default + changed_when: true register: result retries: 10 delay: 5 diff --git a/roles/intel_ethernet_operator/defaults/main.yml b/roles/intel_ethernet_operator/defaults/main.yml index 6d070b08..c5ae469f 100644 --- a/roles/intel_ethernet_operator/defaults/main.yml +++ b/roles/intel_ethernet_operator/defaults/main.yml @@ -42,8 +42,8 @@ intel_ethernet_operator_node_flow_config_files_dir: "{{ (intel_ethernet_operator intel_ethernet_operator_catalog_image: "{{ registry_local_address }}/intel-ethernet-operator-catalog:v{{ intel_ethernet_operator_img_ver }}" -intel_ethernet_operator_fw_url: "https://downloadmirror.intel.com/709692/E810_NVMUpdatePackage_v3_10_Linux.tar.gz" -intel_ethernet_operator_fw_sum: "031a4db40f14a04d5986a0cae53ea226" +intel_ethernet_operator_fw_url: "https://downloadmirror.intel.com/738715/E810_NVMUpdatePackage_v4_00_Linux.tar.gz" +intel_ethernet_operator_fw_sum: "95cadf0842eb97cd29c3083362db0a35" intel_ethernet_operator_ddp_urls: 'ice_comms-1.3.17.0.pkg': https://downloadmirror.intel.com/29892/eng/ice_comms-1.3.17.0.zip @@ -55,6 +55,7 @@ intel_ethernet_operator_ddp_urls: 'ice_comms-1.3.30.0_rev1.1.pkg': https://downloadmirror.intel.com/29889/eng/800%20series%20comms%20binary%20package%201.3.30.0_rev1.1.zip 'ice_comms-1.3.31.0.pkg': https://downloadmirror.intel.com/713853/800%20Series%20DDP%20Comms%20Package%201.3.31.0.zip 'ice_comms-1.3.35.0.pkg': "https://downloadmirror.intel.com/727568/ice_comms-1.3.35.0.zip" + 'ice_comms-1.3.37.0.pkg': https://downloadmirror.intel.com/738733/800%20Series%20DDP%20Comms%20Package%201.3.37.0.zip # MD5 sums of DDP packages intel_ethernet_operator_ddp_sums: @@ -67,3 +68,4 @@ intel_ethernet_operator_ddp_sums: 'ice_comms-1.3.30.0_rev1.1.pkg': 91ae9e51497cb6ab35d70d0f502c6be4 'ice_comms-1.3.31.0.pkg': d0d838120db7784f0419cd73d481aab3 'ice_comms-1.3.35.0.pkg': ee79feecf555fa50d26dfc7c07879e41 + 'ice_comms-1.3.37.0.pkg': ba5febc828e1789d6c81bae5f1ab5d71 diff --git a/roles/intel_ethernet_operator/files/intel-ethernet-operator-cluster-role.yml b/roles/intel_ethernet_operator/files/intel-ethernet-operator-cluster-role.yml deleted file mode 100644 index 7305a61d..00000000 --- a/roles/intel_ethernet_operator/files/intel-ethernet-operator-cluster-role.yml +++ /dev/null @@ -1,11 +0,0 @@ ---- -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - name: intel-ethernet-operator -rules: -- apiGroups: ['policy'] - resources: ['podsecuritypolicies'] - verbs: ['use'] - resourceNames: - - intel-ethernet-operator diff --git a/roles/intel_ethernet_operator/files/intel-ethernet-operator-psp.yml b/roles/intel_ethernet_operator/files/intel-ethernet-operator-psp.yml deleted file mode 100644 index f65c543d..00000000 --- a/roles/intel_ethernet_operator/files/intel-ethernet-operator-psp.yml +++ /dev/null @@ -1,24 +0,0 @@ ---- -apiVersion: policy/v1beta1 -kind: PodSecurityPolicy -metadata: - name: intel-ethernet-operator -spec: - privileged: true - hostPID: true - hostNetwork: true - allowPrivilegeEscalation: true - allowedCapabilities: - - '*' - allowedUnsafeSysctls: - - '*' - fsGroup: - rule: RunAsAny - runAsUser: - rule: RunAsAny - seLinux: - rule: RunAsAny - supplementalGroups: - rule: RunAsAny - volumes: - - '*' diff --git a/roles/intel_ethernet_operator/tasks/ddp.yml b/roles/intel_ethernet_operator/tasks/ddp.yml index b8ce9ffb..51d520cc 100644 --- a/roles/intel_ethernet_operator/tasks/ddp.yml +++ b/roles/intel_ethernet_operator/tasks/ddp.yml @@ -95,14 +95,13 @@ until: "'PostUpdateReboot' not in after_ddp_update_info.stdout" retries: 30 delay: 10 - failed_when: "'Failed' in after_ddp_update_info.stdout or 'PostUpdateReboot' in after_ddp_update_info.stdout or 'InProgress' in after_ddp_update_info.stdout" # noqa 204 line-length + failed_when: "'Failed' in after_ddp_update_info.stdout or 'PostUpdateReboot' in after_ddp_update_info.stdout or 'InProgress' in after_ddp_update_info.stdout" # noqa yaml[line-length] changed_when: false when: "'PostUpdateReboot' in ddp_update_info.stdout" - name: remove DDP CR after update - k8s: - state: absent - src: "{{ (intel_ethernet_operator_ddp_files_dir, node_name + '-ddp-update.yml') | path_join }}" + command: "kubectl delete -f {{ (intel_ethernet_operator_ddp_files_dir, node_name + '-ddp-update.yml') | path_join }}" + changed_when: true - name: remove EthernetNodeConfig after update command: kubectl delete enc {{ hostvars[node_name]['ansible_hostname'] }} -n {{ intel_ethernet_operator_namespace }} @@ -130,7 +129,7 @@ state: absent delegate_to: "{{ node_name }}" when: - - not hostvars[node_name]['enable_ice_systemd_service'] + - not hostvars[node_name]['enable_ice_systemd_service'] |d(false) - mgmt_interface_driver_ieo.stdout != "ice" - "'irdma' not in ieo_lsmod.stdout" @@ -140,7 +139,7 @@ state: present delegate_to: "{{ node_name }}" when: - - not hostvars[node_name]['enable_ice_systemd_service'] + - not hostvars[node_name]['enable_ice_systemd_service'] |d(false) - mgmt_interface_driver_ieo.stdout != "ice" - "'irdma' not in ieo_lsmod.stdout" @@ -162,5 +161,5 @@ name: ddp enabled: yes delegate_to: "{{ node_name }}" - when: hostvars[node_name]['enable_ice_systemd_service'] - when: hostvars[node_name]['intel_ethernet_operator']['ddp_update'] + when: hostvars[node_name]['enable_ice_systemd_service'] |d(false) + when: hostvars[node_name]['intel_ethernet_operator']['ddp_update'] |d(false) diff --git a/roles/intel_ethernet_operator/tasks/ethernet_operator.yml b/roles/intel_ethernet_operator/tasks/ethernet_operator.yml index dad37415..a7c9ed12 100644 --- a/roles/intel_ethernet_operator/tasks/ethernet_operator.yml +++ b/roles/intel_ethernet_operator/tasks/ethernet_operator.yml @@ -57,15 +57,6 @@ kind: Namespace state: present -- name: copy PSP and ClusterRole - copy: - src: "{{ (role_path , 'files', item) | path_join }}" - dest: "{{ (intel_ethernet_operator_files_dir, item) | path_join }}" - mode: 0644 - loop: - - intel-ethernet-operator-psp.yml - - intel-ethernet-operator-cluster-role.yml - - name: populate Intel Ethernet Operator yaml files and push to controller node template: src: "{{ item.src }}" @@ -73,19 +64,15 @@ force: yes mode: preserve loop: - - { src: 'catalog.yml.j2', dst: 'catalog.yml' } - - { src: 'operator-group.yml.j2', dst: 'operator-group.yml' } - - { src: 'subscription.yml.j2', dst: 'subscription.yml' } - - { src: 'intel-ethernet-operator-role-binding.yml.j2', dst: 'intel-ethernet-operator-role-binding.yml' } + - {src: 'catalog.yml.j2', dst: 'catalog.yml'} + - {src: 'operator-group.yml.j2', dst: 'operator-group.yml'} + - {src: 'subscription.yml.j2', dst: 'subscription.yml'} -- name: create PSP, RBAC and OperatorGroup +- name: Catalog and OperatorGroup k8s: state: present src: "{{ (intel_ethernet_operator_files_dir, item) | path_join }}" loop: - - intel-ethernet-operator-psp.yml - - intel-ethernet-operator-cluster-role.yml - - intel-ethernet-operator-role-binding.yml - catalog.yml - operator-group.yml diff --git a/roles/intel_ethernet_operator/tasks/flow_config_deployment.yml b/roles/intel_ethernet_operator/tasks/flow_config_deployment.yml index c7e18ef3..b468d95e 100644 --- a/roles/intel_ethernet_operator/tasks/flow_config_deployment.yml +++ b/roles/intel_ethernet_operator/tasks/flow_config_deployment.yml @@ -29,8 +29,8 @@ force: yes mode: preserve loop: - - { src: 'flow-config-sriov-network.yml.j2', dst: 'flow-config-sriov-network.yml' } - - { src: 'flow-config-node-agent.yml.j2', dst: 'flow-config-node-agent.yml' } + - {src: 'flow-config-sriov-network.yml.j2', dst: 'flow-config-sriov-network.yml'} + - {src: 'flow-config-node-agent.yml.j2', dst: 'flow-config-node-agent.yml'} - name: create SRIOV network attachment definition for the DCF VF pool k8s: diff --git a/roles/intel_ethernet_operator/tasks/flow_config_files.yml b/roles/intel_ethernet_operator/tasks/flow_config_files.yml index 380e2be5..c7c732f7 100644 --- a/roles/intel_ethernet_operator/tasks/flow_config_files.yml +++ b/roles/intel_ethernet_operator/tasks/flow_config_files.yml @@ -22,8 +22,8 @@ force: yes mode: preserve loop: - - { src: 'flow-config-sriov-policy.yml.j2', dst: 'flow-config-sriov-policy.yml' } - - { src: 'flow-config-node-flow.yml.j2', dst: 'flow-config-node-flow.yml' } + - {src: 'flow-config-sriov-policy.yml.j2', dst: 'flow-config-sriov-policy.yml'} + - {src: 'flow-config-node-flow.yml.j2', dst: 'flow-config-node-flow.yml'} - name: apply SRIOV Network Node Policy for Flow Config k8s: diff --git a/roles/intel_ethernet_operator/tasks/fw.yml b/roles/intel_ethernet_operator/tasks/fw.yml index 7598144a..50581a52 100644 --- a/roles/intel_ethernet_operator/tasks/fw.yml +++ b/roles/intel_ethernet_operator/tasks/fw.yml @@ -39,6 +39,7 @@ retries: 60 delay: 10 failed_when: "'Failed' in fw_update_info.stdout" + changed_when: false - name: check node after reboot block: @@ -96,7 +97,7 @@ - "'InProgress' not in after_fw_update_info.stdout" retries: 30 delay: 10 - failed_when: "'Failed' in after_fw_update_info.stdout or 'PostUpdateReboot' in after_fw_update_info.stdout or 'InProgress' in after_fw_update_info.stdout" # noqa 204 line-length + failed_when: "'Failed' in after_fw_update_info.stdout or 'PostUpdateReboot' in after_fw_update_info.stdout or 'InProgress' in after_fw_update_info.stdout" # noqa yaml[line-length] changed_when: false when: "'PostUpdateReboot' in fw_update_info.stdout" @@ -109,4 +110,4 @@ command: kubectl delete enc {{ hostvars[node_name]['ansible_hostname'] }} -n {{ intel_ethernet_operator_namespace }} changed_when: true - when: hostvars[node_name]['intel_ethernet_operator']['fw_update'] + when: hostvars[node_name]['intel_ethernet_operator']['fw_update'] |d(false) diff --git a/roles/intel_ethernet_operator/tasks/preflight_ethernet_operator.yml b/roles/intel_ethernet_operator/tasks/preflight_ethernet_operator.yml index 4526ea27..416fd0a5 100644 --- a/roles/intel_ethernet_operator/tasks/preflight_ethernet_operator.yml +++ b/roles/intel_ethernet_operator/tasks/preflight_ethernet_operator.yml @@ -59,14 +59,16 @@ - "Intel Ethernet Operator is mutually exclusive with legacy DDP/FW update role." - "Please set 'install_ddp_packages' and 'update_nic_firmware' as false" when: - - (intel_ethernet_operator.ddp_update is defined and intel_ethernet_operator.ddp_update) or - (intel_ethernet_operator.fw_update is defined and intel_ethernet_operator.fw_update) + - intel_ethernet_operator.ddp_update | d(false) or intel_ethernet_operator.fw_update | d(false) - - name: check if ice driver will be updated + - name: check bus_info and ddp profile of dataplane_interfaces assert: - that: update_nic_drivers - msg: "Firmware update requires update_nic_drivers set as true" - when: intel_ethernet_operator.fw_update is defined and intel_ethernet_operator.fw_update + that: + - dataplane_interfaces | json_query("[?ends_with(bus_info, ':00.0')]") + - dataplane_interfaces | json_query('[?ddp_profile]') + msg: "When DDP update is true, bus_info of one of the interfaces must end with ':00.0', also ddp_profile must be defined" + when: intel_ethernet_operator.ddp_update | d(false) + - name: check Hugepages settings for Flow Configuration assert: diff --git a/roles/intel_ethernet_operator/tasks/uft.yml b/roles/intel_ethernet_operator/tasks/uft.yml index 1896312f..6e28356e 100644 --- a/roles/intel_ethernet_operator/tasks/uft.yml +++ b/roles/intel_ethernet_operator/tasks/uft.yml @@ -25,20 +25,24 @@ block: - name: build UFT image command: podman build -f images/Dockerfile.uft . -t {{ registry_local_address }}/{{ uft_image }}:{{ uft_image_ver }} --build-arg DPDK_TAG={{ dpdk_tag }} + changed_when: true args: chdir: "{{ uft_dir }}" - name: push UFT image command: podman push {{ registry_local_address }}/{{ uft_image }}:{{ uft_image_ver }} + changed_when: true when: container_runtime != "docker" - name: prepare UFT image block: - name: build UFT image command: docker build --build-arg DPDK_TAG={{ dpdk_tag }} -f images/Dockerfile.uft . -t {{ registry_local_address }}/{{ uft_image }}:{{ uft_image_ver }} + changed_when: true args: chdir: "{{ uft_dir }}" - name: push UFT image command: docker push {{ registry_local_address }}/{{ uft_image }}:{{ uft_image_ver }} + changed_when: true when: container_runtime == "docker" diff --git a/roles/intel_ethernet_operator/templates/ddp-service.j2 b/roles/intel_ethernet_operator/templates/ddp-service.j2 index 75b7cb41..a3d70b44 100644 --- a/roles/intel_ethernet_operator/templates/ddp-service.j2 +++ b/roles/intel_ethernet_operator/templates/ddp-service.j2 @@ -5,8 +5,15 @@ Before=kubelet.service [Service] Type=oneshot -ExecStart=/sbin/modprobe -r {% if (hostvars[node_name]['ansible_distribution'] == "Ubuntu" and hostvars[node_name]['ansible_distribution_version'] >= "22.04") %}irdma {% endif %}ice -ExecStart=/sbin/modprobe ice {% if (hostvars[node_name]['ansible_distribution'] == "Ubuntu" and hostvars[node_name]['ansible_distribution_version'] >= "22.04") %}irdma{% endif %} +{% if (not hostvars[node_name]['update_nic_drivers'] and +((hostvars[node_name]['ansible_distribution'] == "Ubuntu" and hostvars[node_name]['ansible_distribution_version'] >= "22.04") or +(hostvars[node_name]['ansible_os_family'] == "RedHat" and hostvars[node_name]['ansible_distribution_version'] >= "8.6"))) %} +ExecStart=/sbin/modprobe -r irdma ice +ExecStart=/sbin/modprobe -a ice irdma +{% else %} +ExecStart=/sbin/modprobe -r ice +ExecStart=/sbin/modprobe -a ice +{% endif %} [Install] WantedBy=multi-user.target diff --git a/roles/intel_ethernet_operator/templates/intel-ethernet-operator-role-binding.yml.j2 b/roles/intel_ethernet_operator/templates/intel-ethernet-operator-role-binding.yml.j2 deleted file mode 100644 index d50448c5..00000000 --- a/roles/intel_ethernet_operator/templates/intel-ethernet-operator-role-binding.yml.j2 +++ /dev/null @@ -1,13 +0,0 @@ -apiVersion: rbac.authorization.k8s.io/v1 -kind: RoleBinding -metadata: - name: intel-ethernet-operator - namespace: {{ intel_ethernet_operator_namespace }} -roleRef: - kind: ClusterRole - name: intel-ethernet-operator - apiGroup: rbac.authorization.k8s.io -subjects: -- kind: Group - apiGroup: rbac.authorization.k8s.io - name: system:authenticated diff --git a/roles/intel_ethernet_operator/templates/subscription.yml.j2 b/roles/intel_ethernet_operator/templates/subscription.yml.j2 index a9caf718..9d672caf 100644 --- a/roles/intel_ethernet_operator/templates/subscription.yml.j2 +++ b/roles/intel_ethernet_operator/templates/subscription.yml.j2 @@ -4,13 +4,13 @@ metadata: name: intel-ethernet-subscription namespace: {{ intel_ethernet_operator_namespace }} spec: -{% if http_proxy or https_proxy is defined %} +{% if http_proxy is defined or https_proxy is defined %} config: env: - name: HTTP_PROXY - value: {{ http_proxy }} + value: {{ http_proxy | d('') }} - name: HTTPS_PROXY - value: {{ https_proxy }} + value: {{ https_proxy | d('') }} - name: NO_PROXY value: {{ kube_service_addresses }},{{ kube_pods_subnet }} {% endif %} diff --git a/roles/intel_flexran/defaults/main.yml b/roles/intel_flexran/defaults/main.yml index e49e98fb..25060507 100644 --- a/roles/intel_flexran/defaults/main.yml +++ b/roles/intel_flexran/defaults/main.yml @@ -26,30 +26,31 @@ # intel_flexran_repo: "not public" # intel_flexran_token: "pkg requires private access from Intel’s Developer Zone Portal" # intel_flexran_staging_location: "/tmp/flexran/" # a directory on localhost (ansible host) -intel_flexran_ver: "22.03" +intel_flexran_ver: "22.07" # "22.03" (since RA22.06) # intel_flexran_tarball: "FlexRAN-22.03.tar.gz" # intel_flexran_tar_chk: "65e59ac1295ef392f54b80047db2efe458962fc78e5d84c5d54703439a364cda" # SHA256 intel_flexran_dir: "{{ (project_root_dir, 'intel-flexran') | path_join }}" intel_flexran_files_dir: "{{ (project_root_dir, 'intel-flexran-files') | path_join }}" # for ACC100 CRs, kernel cmdline, etc -intel_flexran_dpdk_ver: "21.11" +intel_flexran_dpdk_ver: "21.11" # for both, FlexRAN 22.07 and 22.03 # intel_flexran_dpdk_dir: "{{ dpdk_dir }}" # as defined in host_vars intel_flexran_dpdk_dir: "{{ (project_root_dir, 'dpdk-' + intel_flexran_dpdk_ver) | path_join }}" # intel_flexran_dpdk_zip: "dpdk_patch-{{ intel_flexran_ver }}.patch.zip" # intel_flexran_dpdk_zip_chk: "8870b139a3f7fbbd2f0bee1aeaeeb5e0a08fb4745b4e183bf4c9119e5d2dcdaa" # SHA256 -intel_flexran_dpdk_patch: "dpdk_patch-22.03.patch" -intel_flexran_dpdk_patch_chk: "4556ba6e5ac32d0360c0e5c71ba7fa22a6065f9e608a35e1c945691f7dfd7fe4" # SHA256 +intel_flexran_dpdk_patch: "dpdk_patch-22.07.patch" # "dpdk_patch-22.03.patch" +# intel_flexran_dpdk_patch_chk: "4556ba6e5ac32d0360c0e5c71ba7fa22a6065f9e608a35e1c945691f7dfd7fe4" # SHA256 dpdk_patch-22.03.patch +intel_flexran_dpdk_patch_chk: "d50b513a0e2a018937af643b06e55c569627614d06a5ec298ce504a68b943be8" # SHA256 dpdk_patch-22.07.patch # Intel oneAPI Base Toolkit # Reference: https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html -intel_oneapi_ver: "2022.1.2.146" -intel_oneapi_url: "https://registrationcenter-download.intel.com/akdlm/irc_nas/18487/l_BaseKit_p_2022.1.2.146_offline.sh" -intel_oneapi_chk: "91682e4410c17a82147ce574c30e57271cc12adfab198c8547612f13d4dd21c8d77ce12153d29b3774bc27f0c6b604cd" # SHA384 intel_oneapi_dir: "{{ (project_root_dir, 'intel-oneapi') | path_join }}" -# Newer Release Date: April 05, 2022 -# intel_oneapi_ver: "2022.2" -# intel_oneapi_url: "https://registrationcenter-download.intel.com/akdlm/irc_nas/18673/l_BaseKit_p_2022.2.0.262_offline.sh" -# intel_oneapi_chk: "e508b0a64f048d9518cc3706e1fa3f400dbb0a07fdc0f91e02b371b18a35715fa0fad7a960dbb7fc04595f77ae65a333" # SHA384 +# intel_oneapi_ver: "2022.1.2.146" +# intel_oneapi_url: "https://registrationcenter-download.intel.com/akdlm/irc_nas/18487/l_BaseKit_p_2022.1.2.146_offline.sh" +# intel_oneapi_chk: "91682e4410c17a82147ce574c30e57271cc12adfab198c8547612f13d4dd21c8d77ce12153d29b3774bc27f0c6b604cd" # SHA384 + +intel_oneapi_ver: "2022.2" +intel_oneapi_url: "https://registrationcenter-download.intel.com/akdlm/irc_nas/18673/l_BaseKit_p_2022.2.0.262_offline.sh" +intel_oneapi_chk: "e508b0a64f048d9518cc3706e1fa3f400dbb0a07fdc0f91e02b371b18a35715fa0fad7a960dbb7fc04595f77ae65a333" # SHA384 # Intel ACC100 FEC CR (Mount Bryce) # intel_acc100_cr: "acc100-cr.yaml" diff --git a/roles/intel_flexran/tasks/flexran.yml b/roles/intel_flexran/tasks/flexran.yml index 574856cd..0205fb1b 100644 --- a/roles/intel_flexran/tasks/flexran.yml +++ b/roles/intel_flexran/tasks/flexran.yml @@ -14,13 +14,14 @@ ## limitations under the License. ## --- +# no need. per POR, customer must manually pre-extract FlexRAN package on target(s) # - name: create Intel FlexRAN directory on worker node # file: # path: "{{ intel_flexran_dir }}" # state: directory # mode: '0755' -# - name: unpack Intel FlexRAN tarball on worker node +# - name: unpack Intel FlexRAN tarball on target(s) # unarchive: # src: "{{ (intel_flexran_staging_location, intel_flexran_tarball) | path_join }}" # dest: "{{ intel_flexran_dir }}" @@ -43,22 +44,30 @@ content: "{{ intel_oneapi_dir }}" mode: '0755' +- debug: msg="Intel FlexRAN mode is '{{ intel_flexran_mode }}'" + +- name: set Intel FlexRAN mode + lineinfile: + path: "{{ (intel_flexran_dir, 'xran/build.sh') | path_join }}" + regexp: '^SAMPLEAPP=0' + line: SAMPLEAPP=1 + when: intel_flexran_mode == "xran" + +- name: set Intel FlexRAN target isa + set_fact: + target_isa: "-i spr" + when: configured_arch == "spr" + - name: build Intel FlexRAN SDK - shell: "source set_env_var.sh -d && ./flexran_build.sh -e -r 5gnr -m sdk" # noqa 305 + shell: "source set_env_var.sh -d && ./flexran_build.sh -e -r 5gnr {{ target_isa | default('') }} -m sdk" args: executable: /bin/bash chdir: "{{ intel_flexran_dir }}" - changed_when: false - -# - name: build DPDK # all this presumably already done by DPDK role. need to re-run after building sdk above?! -# shell: "source set_env_var.sh -d && cd $RTE_SDK && meson build && cd build && meson configure && ninja && ninja install" # noqa 305 -# args: -# executable: /bin/bash -# chdir: "{{ intel_flexran_dir }}" + changed_when: true - name: build FlexRAN ALL for 5GNR - shell: "source set_env_var.sh -d && ./flexran_build.sh -e -r 5gnr" # noqa 305 + shell: "ldconfig && export RTE_SDK={{ intel_flexran_dpdk_dir }} && source set_env_var.sh -d && export PKG_CONFIG_PATH=$RTE_SDK/build/meson-uninstalled && ./flexran_build.sh -e -r 5gnr {{ target_isa | default('') }}" # noqa yaml[line-length] args: executable: /bin/bash chdir: "{{ intel_flexran_dir }}" - changed_when: false + changed_when: true diff --git a/roles/intel_flexran/tasks/flexran_preflight.yml b/roles/intel_flexran/tasks/flexran_preflight.yml index 0df80dbd..5b88c77a 100644 --- a/roles/intel_flexran/tasks/flexran_preflight.yml +++ b/roles/intel_flexran/tasks/flexran_preflight.yml @@ -14,92 +14,159 @@ ## limitations under the License. ## --- -- block: # - name: load FlexRAN vars # include_vars: "../roles/intel_flexran/defaults/main.yml" - # check CPU for FlexRAN - - debug: msg="CPU={{ ansible_processor[2] }} cores={{ ansible_processor_cores }} count={{ ansible_processor_count }} nproc={{ ansible_processor_nproc }} tpc={{ ansible_processor_threads_per_core }} vcpus={{ ansible_processor_vcpus }}" # noqa 204 line-length - - name: check CPU for FlexRAN - assert: - that: "ansible_processor_count == 1 and ansible_processor_cores == 32" - msg: "Intel FlexRAN proper configuration requires worker with single 32-cores SPR CPU. Deployment may proceed but is unsupported" - failed_when: false - - # check o/s for FlexRAN - - debug: msg="Linux distribution on target is {{ ansible_distribution }} {{ ansible_distribution_version }} ({{ ansible_distribution_release }}) with {{ ansible_kernel }} kernel" # noqa 204 line-length - - name: check linux distro version and kernel for FlexRAN - assert: - that: "ansible_distribution == 'Ubuntu' and ansible_distribution_version == '22.04' and 'realtime' in ansible_kernel" - msg: - - Deploying Intel FlexRAN is supported only on Ubuntu 22.04 with realtime kernel. - - Please prepare accordingly the o/s image on target or disable FlexRAN. See docs/flexran_guide.md - - # check package for FlexRAN - - debug: msg="Expecting subfolders inside '{{ intel_flexran_dir }}' on worker node" - - - name: probe for FlexRAN extraction - stat: - path: "{{ item }}" - register: flexran_dir_stats - with_items: - - "{{ (intel_flexran_dir, 'bin') | path_join }}" - - "{{ (intel_flexran_dir, 'framework') | path_join }}" - - "{{ (intel_flexran_dir, 'sdk') | path_join }}" - - "{{ (intel_flexran_dir, 'source') | path_join }}" - - "{{ (intel_flexran_dir, 'tests') | path_join }}" - - "{{ (intel_flexran_dir, 'xran') | path_join }}" - - - name: check FlexRAN folders - assert: - that: "item.stat.exists and item.stat.isdir" - msg: - - Directory '{{ item.item }}' is missing on target '{{ inventory_hostname }}' - - Deploying Intel FlexRAN requires the tarball package to be pre-extracted on the worker node. See docs/flexran_guide.md - with_items: "{{ flexran_dir_stats.results }}" - - # check DPDK patch for FlexRAN - - debug: msg="Expecting file {{ (dpdk_local_patches_dir, 'dpdk-' + dpdk_version, intel_flexran_dpdk_patch) | path_join }} on local ansible host" - - - name: probe for FlexRAN DPDK patch - delegate_to: localhost - stat: - path: "{{ (dpdk_local_patches_dir, 'dpdk-' + dpdk_version, intel_flexran_dpdk_patch) | path_join }}" - checksum_algorithm: sha256 - register: provided_flexran_dpdk_patch - - - debug: msg="{{ intel_flexran_dpdk_patch }} exists is {{ provided_flexran_dpdk_patch.stat.exists }}" - - - name: check the FlexRAN DPDK patch name - assert: - that: "provided_flexran_dpdk_patch.stat.exists" - msg: - - Mandatory file {{ (dpdk_local_patches_dir, 'dpdk-' + dpdk_version, intel_flexran_dpdk_patch) | path_join }} does NOT exist on localhost. - - Please acquire the DPDK patch.zip and unzip it in the location indicated above in order to deploy FlexRAN. See docs/flexran_guide.md - - - debug: msg="{{ intel_flexran_dpdk_patch }} checksum is {{ provided_flexran_dpdk_patch.stat.checksum }}" - - - name: check the FlexRAN DPDK patch integrity - assert: - that: "provided_flexran_dpdk_patch.stat.checksum == '{{ intel_flexran_dpdk_patch_chk }}'" - msg: - - File {{ (dpdk_local_patches_dir, 'dpdk-' + dpdk_version, intel_flexran_dpdk_patch) | path_join }} on localhost is NOT the expected one. - - Please provide the correct file. See docs/flexran_guide.md - - # check DPDK for FlexRAN - - name: check DPDK is enabled for FlexRAN - assert: - that: install_dpdk - msg: "DPDK installation is required for FlexRAN. Please make sure install_dpdk is true in the worker node host_vars file" - - - debug: msg="DPDK version is set to '{{ dpdk_version }}'" - - - name: check DPDK version for FlexRAN - assert: - that: dpdk_version == intel_flexran_dpdk_ver - msg: - - DPDK version '{{ dpdk_version }}' set in the worker node host_vars file does NOT match the DPDK version required for FlexRAN. - - Must be '{{ intel_flexran_dpdk_ver }}' +- name: verify gNB (BBU) node + block: # only on gNB (node) + # check mode for FlexRAN + - debug: msg="Intel FlexRAN mode is '{{ intel_flexran_mode }}'" + - name: check mode for FlexRAN + assert: + that: intel_flexran_mode in ['timer', 'xran'] + msg: "Intel FlexRAN mode must be either 'timer' or 'xran'. Please correct the intel_flexran_mode value in roles\\intel_flexran\\defaults\\main.yml" + # check FEC acc for FlexRAN + - name: check acc h/w + assert: + that: fec_acc is defined # and PCIID is present in the host and its DevID in supported list + msg: "Intel FlexRAN requires the FEC Accelerator Device '{{ fec_acc }}' to be present in the host. Please correct the host h/w configuration" + + - name: check FEC Acc devices on worker node (expected 1 PF + 16 VFs) + shell: "set -o pipefail && lspci | grep -i acc" + args: + executable: /bin/bash + register: intel_flexran_fec_devs + changed_when: false + failed_when: false + + - debug: msg="lspci probing returned '{{ intel_flexran_fec_devs.stdout }}'" + + # check oRU for FlexRAN + - name: check oRU for FlexRAN + assert: + that: groups['oru'][0] is defined | default(false) | bool + msg: "Intel FlexRAN mode xRAN requires an oRU target defined in inventory. Please add 'oru' group in the inventory file" + when: intel_flexran_mode == 'xran' + + # check network for FlexRAN + - debug: msg="BBU_FH={{ intel_flexran_bbu_front_haul }} BBU_PS={{ intel_flexran_bbu_ptp_sync }} oRU_FH={{ intel_flexran_oru_front_haul }} oRU_PS={{ intel_flexran_oru_ptp_sync }}" # noqa yaml[line-length] + - name: check network for FlexRAN + assert: + that: "intel_flexran_bbu_front_haul is defined and intel_flexran_bbu_ptp_sync is defined and intel_flexran_oru_front_haul is defined and intel_flexran_oru_ptp_sync is defined" # noqa yaml[line-length] + msg: "Intel FlexRAN mode xRAN requires defining the network devices for 'Front Haul' and 'PTP Sync'. See docs/flexran_guide.md" + when: intel_flexran_mode == 'xran' + + # check NIC for FlexRAN + - name: read Physical NICs PCIIDs + set_fact: + phy_nics_pciids: "{{ phy_nics_pciids + [ ansible_facts[item]['pciid'] ] }}" + with_items: "{{ ansible_interfaces }}" + when: ansible_facts[item]['pciid'] is defined and ansible_facts[item]['type'] == "ether" + + - debug: msg="PCI Slots for the NICs on target '{{ ansible_hostname }}' = {{ phy_nics_pciids }}" + when: intel_flexran_mode == 'xran' + + # {{ hostvars[inventory_hostname]['ansible_default_ipv4']['interface'] }} | grep driver | sed 's/^driver: //'" + + # check CPU for FlexRAN + - debug: msg="CPU={{ ansible_processor[2] }} cores={{ ansible_processor_cores }} count={{ ansible_processor_count }} nproc={{ ansible_processor_nproc }} tpc={{ ansible_processor_threads_per_core }} vcpus={{ ansible_processor_vcpus }}" # noqa yaml[line-length] + - name: check CPU for FlexRAN + assert: + that: "ansible_processor_count == 1 and ansible_processor_cores == 32" + msg: "Intel FlexRAN proper configuration requires worker with single 32-cores SPR CPU. Deployment may proceed but is unsupported" + failed_when: false + + # check DPDK patch for FlexRAN + - debug: msg="Expecting file {{ (dpdk_local_patches_dir, 'dpdk-' + dpdk_version, intel_flexran_dpdk_patch) | path_join }} on local ansible host" + + - name: probe for FlexRAN DPDK patch + delegate_to: localhost + stat: + path: "{{ (dpdk_local_patches_dir, 'dpdk-' + dpdk_version, intel_flexran_dpdk_patch) | path_join }}" + checksum_algorithm: sha256 + register: provided_flexran_dpdk_patch + + - debug: msg="{{ intel_flexran_dpdk_patch }} exists is {{ provided_flexran_dpdk_patch.stat.exists }}" + + - name: check the FlexRAN DPDK patch name + assert: + that: "provided_flexran_dpdk_patch.stat.exists" + msg: + - Mandatory file {{ (dpdk_local_patches_dir, 'dpdk-' + dpdk_version, intel_flexran_dpdk_patch) | path_join }} does NOT exist on localhost. + - Please acquire the DPDK patch.zip and unzip it in the location indicated above in order to deploy FlexRAN. See docs/flexran_guide.md + + - debug: msg="{{ intel_flexran_dpdk_patch }} checksum is {{ provided_flexran_dpdk_patch.stat.checksum }}" + + - name: check the FlexRAN DPDK patch integrity + assert: + that: "provided_flexran_dpdk_patch.stat.checksum == '{{ intel_flexran_dpdk_patch_chk }}'" + msg: + - File {{ (dpdk_local_patches_dir, 'dpdk-' + dpdk_version, intel_flexran_dpdk_patch) | path_join }} on localhost is NOT the expected one. + - Please provide the correct file. See docs/flexran_guide.md + + # check DPDK for FlexRAN + - name: check DPDK is enabled for FlexRAN + assert: + that: install_dpdk + msg: "DPDK is required for FlexRAN. Please make sure install_dpdk is true in the node host_vars file and correct DPDK version is set" + + - debug: msg="DPDK version is set to '{{ dpdk_version }}'" + + - name: check DPDK version for FlexRAN + assert: + that: dpdk_version == intel_flexran_dpdk_ver + msg: + - DPDK version '{{ dpdk_version }}' set in the worker node host_vars file does NOT match the DPDK version required for FlexRAN. + - Must be '{{ intel_flexran_dpdk_ver }}' when: - - intel_flexran_enabled | default(false) | bool + - intel_flexran_enabled | default(false) | bool # skip for oRU + +- name: verify gNB (BBU) node and oRU + block: # repeat for gNB (node) and oRU + # check o/s for FlexRAN + - debug: msg="Linux distribution on target is {{ ansible_distribution }} {{ ansible_distribution_version }} ({{ ansible_distribution_release }}) with {{ ansible_kernel }} kernel" # noqa yaml[line-length] + - name: check linux distro version and kernel for FlexRAN + assert: + that: > + (ansible_distribution == 'Ubuntu' and ansible_distribution_version == '22.04' and 'realtime' in ansible_kernel) or + (ansible_distribution == 'RedHat' and ansible_distribution_version == '8.6' and 'rt' in ansible_kernel) + msg: + - Deploying Intel FlexRAN is supported only on Ubuntu 22.04 or RHEL 8.6 and with real-time kernel. + - Please prepare accordingly the o/s image on target(s) or disable FlexRAN. See docs/flexran_guide.md + + # check package for FlexRAN + - debug: msg="Expecting subfolders inside '{{ intel_flexran_dir }}' on target(s) gNR and oRU" + + - name: probe for FlexRAN pre-extraction + stat: + path: "{{ item }}" + register: flexran_dir_stats + with_items: + - "{{ (intel_flexran_dir, 'bin') | path_join }}" + - "{{ (intel_flexran_dir, 'framework') | path_join }}" + - "{{ (intel_flexran_dir, 'sdk') | path_join }}" + - "{{ (intel_flexran_dir, 'source') | path_join }}" + - "{{ (intel_flexran_dir, 'tests') | path_join }}" + - "{{ (intel_flexran_dir, 'xran') | path_join }}" + + - name: check FlexRAN folders + assert: + that: "item.stat.exists and item.stat.isdir" + msg: + - Directory '{{ item.item }}' is missing on target '{{ inventory_hostname }}' + - Deploying Intel FlexRAN requires the tarball package to be pre-extracted on the worker node. See docs/flexran_guide.md + with_items: "{{ flexran_dir_stats.results }}" + + # check NICs for xRAN mode + - debug: + msg: "Network interfaces present on target '{{ ansible_hostname }}' = {{ ansible_interfaces }}" + when: intel_flexran_mode == 'xran' +# +# - name: probe NICs PCIIDs +# set_fact: +# phy_nics_pciids: "{{ phy_nics_pciids + [ ansible_facts[item]['pciid'] ] }}" +# with_items: "{{ ansible_interfaces }}" +# when: ansible_facts[item]['pciid'] is defined and ansible_facts[item]['type'] == "ether" +# +# - debug: msg="PCI Slots for the NICs on target '{{ ansible_hostname }}' = {{ phy_nics_pciids }}" diff --git a/roles/intel_flexran/tasks/main.yml b/roles/intel_flexran/tasks/main.yml index 83e62bb9..471ac666 100644 --- a/roles/intel_flexran/tasks/main.yml +++ b/roles/intel_flexran/tasks/main.yml @@ -16,19 +16,27 @@ --- - name: preflight check for Intel FlexRAN include_tasks: flexran_preflight.yml - when: inventory_hostname == groups['kube_node'][0] + +- name: bring-up oRU for Intel FlexRAN # DPDK and anything else that is already on gNR node + include_tasks: oru.yml + when: + - groups['oru'][0] is defined and inventory_hostname == groups['oru'][0] + - intel_flexran_mode == "xran" - name: install dependencies for Intel FlexRAN include_role: name: install_dependencies +- name: install pyelftools for FlexRAN + command: "pip3 install pyelftools" + changed_when: false + when: ansible_distribution == 'RedHat' + - name: deploy Intel oneAPI include_tasks: oneapi.yml - when: inventory_hostname == groups['kube_node'][0] - name: deploy Intel FlexRAN include_tasks: flexran.yml - when: inventory_hostname == groups['kube_node'][0] - name: deploy Intel pf_bb (Physical Function Baseband) device config app include_tasks: pf_bb.yml @@ -46,4 +54,12 @@ - name: test Timer Mode include_tasks: timer_mode.yml - when: inventory_hostname == groups['kube_node'][0] + when: + - inventory_hostname == groups['kube_node'][0] + - intel_flexran_mode == "timer" + +- name: test xRAN Mode + include_tasks: xran_mode.yml + when: + - inventory_hostname == groups['kube_node'][0] + - intel_flexran_mode == "xran" diff --git a/roles/intel_flexran/tasks/oneapi.yml b/roles/intel_flexran/tasks/oneapi.yml index 10369ecd..3961dc38 100644 --- a/roles/intel_flexran/tasks/oneapi.yml +++ b/roles/intel_flexran/tasks/oneapi.yml @@ -28,16 +28,19 @@ mode: '0755' use_proxy: yes +# ln -s /usr/lib/x86_64-linux-gnu/libnuma.so /usr/lib64/libnuma.so +# RHEL 8.6 RT ERR: src file does not exist, use "force=yes" if you really want to create the link: /usr/lib/x86_64-linux-gnu/libnuma.so - name: create libnuma symlink file: src: "/usr/lib/x86_64-linux-gnu/libnuma.so" dest: "/usr/lib64/libnuma.so" state: link -# ln -s /usr/lib/x86_64-linux-gnu/libnuma.so /usr/lib64/libnuma.so + when: ansible_distribution == 'Ubuntu' - name: install Intel oneAPI - command: "sh {{ intel_oneapi_dir }}/intel-oneapi-basekit-offline.sh -a --silent --eula accept --install-dir {{ intel_oneapi_dir }}" - changed_when: false +# command: "sh {{ intel_oneapi_dir }}/intel-oneapi-basekit-offline.sh -a --silent --eula accept --install-dir {{ intel_oneapi_dir }}" + command: "sh {{ intel_oneapi_dir }}/intel-oneapi-basekit-offline.sh -a --silent --eula accept --components intel.oneapi.lin.dpcpp-cpp-compiler:intel.oneapi.lin.ipp.devel:intel.oneapi.lin.ippcp.devel:intel.oneapi.lin.mkl.devel:intel.oneapi.lin.dpcpp-ct:intel.oneapi.lin.dpl:intel.oneapi.lin.dpcpp_dbg --install-dir {{ intel_oneapi_dir }}" # noqa yaml[line-length] + changed_when: true failed_when: false # to allow re-run install without uninstall # environment: # PATH: "{{ gopath.stdout }}/bin:/usr/local/go/bin:/usr/sbin:/usr/bin:/sbin:/bin:{{ intel_oneapi_dir }}" diff --git a/roles/intel_flexran/tasks/oru.yml b/roles/intel_flexran/tasks/oru.yml new file mode 100644 index 00000000..05f91928 --- /dev/null +++ b/roles/intel_flexran/tasks/oru.yml @@ -0,0 +1,29 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- debug: msg="Building oRU on '{{ ansible_hostname }}' for xRAN mode. See docs/flexran_guide.md" + +- name: clone DPDK variables from FlexRAN gNR (node) + set_fact: + dpdk_version: "{{ hostvars[groups['kube_node'][0]]['dpdk_version'] }}" + dpdk_local_patches_dir: "{{ hostvars[groups['kube_node'][0]]['dpdk_local_patches_dir'] }}" + dpdk_local_patches_strip: "{{ hostvars[groups['kube_node'][0]]['dpdk_local_patches_strip'] }}" + +- debug: msg="DPDK is set to {{ dpdk_version }} and local patches are taken from {{ dpdk_local_patches_dir }}" + +- name: install DPDK on oRU + include_role: + name: install_dpdk diff --git a/roles/intel_flexran/tasks/timer_mode.yml b/roles/intel_flexran/tasks/timer_mode.yml index 7fd03e4c..34c56572 100644 --- a/roles/intel_flexran/tasks/timer_mode.yml +++ b/roles/intel_flexran/tasks/timer_mode.yml @@ -47,7 +47,7 @@ - debug: msg: - "Intel FlexRAN deployment is complete and Timer Mode configuration is done." - - "The worker node is ready for L1/L2 Tests to be executed and verified according to the Guide" + - "The worker node is ready for L1/L2 Tests to be executed and verified. See docs/flexran_guide.md" # - name: run L1 # shell: "source set_env_var.sh -d && cd {{ (intel_flexran_dir, 'bin/nr5g/gnb/l1') | path_join }} && ./l1.sh -e" # noqa 305 @@ -63,7 +63,7 @@ # seconds: 30 # - name: run L2 -# shell: "source set_env_var.sh -d && cd {{ (intel_flexran_dir, 'bin/nr5g/gnb/testmac') | path_join }} && ./l2.sh --testfile=icelake-sp/icxsp_mu0_10mhz_4x4_hton.cfg" # noqa 204 line-length +# shell: "source set_env_var.sh -d && cd {{ (intel_flexran_dir, 'bin/nr5g/gnb/testmac') | path_join }} && ./l2.sh --testfile=icelake-sp/icxsp_mu0_10mhz_4x4_hton.cfg" # noqa yaml[line-length] # args: # executable: /bin/bash # chdir: "{{ intel_flexran_dir }}" diff --git a/roles/intel_flexran/tasks/xran_mode.yml b/roles/intel_flexran/tasks/xran_mode.yml new file mode 100644 index 00000000..7afcd551 --- /dev/null +++ b/roles/intel_flexran/tasks/xran_mode.yml @@ -0,0 +1,50 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: enable h/w FEC mode + lineinfile: + path: "{{ (intel_flexran_dir, 'bin/nr5g/gnb/l1/phycfg_timer.xml') | path_join }}" + search_string: 'dpdkBasebandFecMode' + line: " 1" + +- name: set h/w FEC device + lineinfile: + path: "{{ (intel_flexran_dir, 'bin/nr5g/gnb/l1/phycfg_timer.xml') | path_join }}" + search_string: 'dpdkBasebandDevice' + line: "{{ ' ' + fec_acc + '' }}" + +- name: check dpdkBaseband config + shell: "grep dpdkBaseband phycfg_timer.xml" # noqa 305 + args: + chdir: "{{ (intel_flexran_dir, 'bin/nr5g/gnb/l1') | path_join }}" + register: phycfg_timer_mode + changed_when: false + +- debug: msg="{{ phycfg_timer_mode.stdout }}" + +- name: check FEC Acc devices + shell: "set -o pipefail && lspci | grep -i acc" + args: + executable: /bin/bash + register: fec_acc_devs + changed_when: false + +- debug: msg={{ fec_acc_devs.stdout }} + +- debug: + msg: + - "Intel FlexRAN deployment is complete and xRAN Mode configuration is done." + - "Both, gNB (BBU) and oRU nodes are ready for xRAN tests to be executed and verified. See docs/flexran_guide.md" diff --git a/roles/intel_flexran/vars/main.yml b/roles/intel_flexran/vars/main.yml index 31d7fda9..42a243d5 100644 --- a/roles/intel_flexran/vars/main.yml +++ b/roles/intel_flexran/vars/main.yml @@ -33,3 +33,26 @@ install_dependencies: RedHat: - git - make + - numactl-devel + - elfutils-libelf-devel + - cmake + - gcc-c++ + - libhugetlbfs* + - libstdc++* + - kernel-devel + - numactl* + - gcc + - mlocate + - expect + - gdb + - dstat + - libvirt-devel + - libgcrypt + - meson + - libvirt + - qemu-kvm + - pkgconf + - pciutils + - libzstd-devel.x86_64 + - iproute-devel.x86_64 +# - pyelftools # RH8.6RT: "failures": "No package pyelftools available." diff --git a/roles/intel_power_manager/defaults/main.yml b/roles/intel_power_manager/defaults/main.yml index be79d98e..33a95b7e 100644 --- a/roles/intel_power_manager/defaults/main.yml +++ b/roles/intel_power_manager/defaults/main.yml @@ -17,9 +17,10 @@ intel_power_manager_git_url: "https://github.com/intel/kubernetes-power-manager.git" intel_power_manager_git_ref: "v1.0.2" # project is consistent with git ref and image version intel_power_manager_dir: "{{ (project_root_dir, 'intel-power-manager') | path_join }}" +intel_power_manager_namespace: "intel-power" intel_appqos_git_url: "https://github.com/intel/intel-cmt-cat.git" -intel_appqos_git_ref: "v4.3.0" -intel_appqos_version: "v4.3.0" +intel_appqos_git_ref: "v4.4.1" +intel_appqos_version: "v4.4.1" intel_appqos_dir: "{{ (project_root_dir, 'intel-appqos') | path_join }}" intel_appqos_cert_dir: "/etc/certs/public" diff --git a/roles/intel_power_manager/files/rbac.patch b/roles/intel_power_manager/files/rbac.patch deleted file mode 100644 index fd2446fc..00000000 --- a/roles/intel_power_manager/files/rbac.patch +++ /dev/null @@ -1,77 +0,0 @@ -diff --git a/config/rbac/rbac.yaml b/config/rbac/rbac.yaml -index 4b2700a..a551cee 100644 ---- a/config/rbac/rbac.yaml -+++ b/config/rbac/rbac.yaml -@@ -50,7 +50,11 @@ rules: - - apiGroups: ["", "power.intel.com", "apps"] - resources: ["nodes", "nodes/status", "configmaps", "powerconfigs", "powerconfigs/status", "powerprofiles", "powerprofiles/status", "powerworkloads", "powerworkloads/status", "powernodes", "powernodes/status", "events", "daemonsets"] - verbs: ["*"] -- -+- apiGroups: ['policy'] -+ resources: ['podsecuritypolicies'] -+ verbs: ['use'] -+ resourceNames: -+ - power-manager-psp - --- - - apiVersion: rbac.authorization.k8s.io/v1 -@@ -76,7 +80,11 @@ rules: - - apiGroups: ["", "power.intel.com"] - resources: ["nodes", "nodes/status", "pods", "pods/status", "powerprofiles", "powerprofiles/status", "powerworkloads", "powerworkloads/status", "powernodes", "powernodes/status"] - verbs: ["*"] -- -+- apiGroups: ['policy'] -+ resources: ['podsecuritypolicies'] -+ verbs: ['use'] -+ resourceNames: -+ - node-agent-psp - --- - - apiVersion: rbac.authorization.k8s.io/v1 -@@ -93,3 +101,46 @@ roleRef: - apiGroup: rbac.authorization.k8s.io - - --- -+apiVersion: policy/v1beta1 -+kind: PodSecurityPolicy -+metadata: -+ name: power-manager-psp -+spec: -+ privileged: true -+ allowPrivilegeEscalation: true -+ allowedCapabilities: -+ - '*' -+ allowedUnsafeSysctls: -+ - '*' -+ fsGroup: -+ rule: RunAsAny -+ runAsUser: -+ rule: RunAsAny -+ seLinux: -+ rule: RunAsAny -+ supplementalGroups: -+ rule: RunAsAny -+ volumes: -+ - '*' -+--- -+apiVersion: policy/v1beta1 -+kind: PodSecurityPolicy -+metadata: -+ name: node-agent-psp -+spec: -+ privileged: true -+ allowPrivilegeEscalation: true -+ allowedCapabilities: -+ - '*' -+ allowedUnsafeSysctls: -+ - '*' -+ fsGroup: -+ rule: RunAsAny -+ runAsUser: -+ rule: RunAsAny -+ seLinux: -+ rule: RunAsAny -+ supplementalGroups: -+ rule: RunAsAny -+ volumes: -+ - '*' diff --git a/roles/intel_power_manager/tasks/app_qos.yml b/roles/intel_power_manager/tasks/app_qos.yml index dc60e249..fc86c3ea 100644 --- a/roles/intel_power_manager/tasks/app_qos.yml +++ b/roles/intel_power_manager/tasks/app_qos.yml @@ -22,17 +22,8 @@ force: yes when: inventory_hostname in groups['kube_node'] -# NOTE(pklimowx): w/a for error "ModuleNotFoundError: No module named 'pqos'" -- name: copy pqos module to appqos working dir - copy: - remote_src: yes - src: "{{ (intel_appqos_dir, 'lib', 'python', 'pqos') | path_join }}" - dest: "{{ (intel_appqos_dir, 'appqos') | path_join }}" - mode: 0755 - when: inventory_hostname in groups['kube_node'] - # NOTE(pklimowx): since AppQoS image is not available on docker hub -# and public images of the Power Operator use `appqos:latest` image, +# and public images of the Power Manager use `appqos:latest` image, # we have to build AppQoS image on each node, and push it to localregistry # only once. # @@ -41,15 +32,18 @@ block: - name: build image of App QoS command: docker build --no-cache -t appqos -f Dockerfile ../../ + changed_when: true args: chdir: "{{ (intel_appqos_dir, 'appqos', 'docker') | path_join }}" - name: tag App QoS image command: docker tag appqos:latest {{ registry_local_address }}/appqos:{{ intel_appqos_version }} + changed_when: true when: inventory_hostname == groups['kube_node'][0] - name: push App QoS image to local registry command: docker push {{ registry_local_address }}/appqos:{{ intel_appqos_version }} + changed_when: true when: inventory_hostname == groups['kube_node'][0] when: - container_runtime == "docker" @@ -60,6 +54,7 @@ block: - name: build and tag App QoS image command: podman build -f Dockerfile -t {{ registry_local_address }}/appqos:{{ intel_appqos_version }} ../../ + changed_when: true args: chdir: "{{ (intel_appqos_dir, 'appqos', 'docker') | path_join }}" @@ -86,6 +81,7 @@ command: "{{ item }}" args: chdir: "{{ intel_appqos_cert_dir }}" + changed_when: true with_items: - openssl req -nodes -x509 -newkey rsa:4096 -keyout ca.key -out ca.crt -days 365 -subj "/O=AppQoS/OU=root/CN=localhost" - openssl req -nodes -newkey rsa:3072 -keyout appqos.key -out appqos.csr -subj "/O=AppQoS/OU=AppQoS Server/CN=localhost" diff --git a/roles/intel_power_manager/tasks/main.yml b/roles/intel_power_manager/tasks/main.yml index 0f0b5da7..a68f20e6 100644 --- a/roles/intel_power_manager/tasks/main.yml +++ b/roles/intel_power_manager/tasks/main.yml @@ -14,7 +14,7 @@ ## limitations under the License. ## --- -- name: install dependencies for Power Operator +- name: install dependencies for Power Manager include_role: name: install_dependencies @@ -29,9 +29,9 @@ - name: prepare Intel Kubernetes Power Manager include_tasks: power_manager.yml -- name: wait for Power Operator and Power Node Agent to be up and running +- name: wait for Power Manager and Power Node Agent to be up and running pause: - prompt: "Waiting for Power Operator pods to be up and running..." + prompt: "Waiting for Power Manager pods to be up and running..." minutes: 1 - name: deploy example power pods diff --git a/roles/intel_power_manager/tasks/power_manager.yml b/roles/intel_power_manager/tasks/power_manager.yml index 25102c24..ad48935a 100644 --- a/roles/intel_power_manager/tasks/power_manager.yml +++ b/roles/intel_power_manager/tasks/power_manager.yml @@ -23,8 +23,8 @@ dest: "{{ intel_power_manager_dir }}" force: yes when: - - inventory_hostname == groups['kube_control_plane'][0] or - (inventory_hostname == groups['kube_node'][0] and intel_power_manager.build_image_locally | default(false) | bool) + - inventory_hostname == groups['kube_control_plane'][0] or + (inventory_hostname == groups['kube_node'][0] and intel_power_manager.build_image_locally | default(false) | bool) - name: set facts for Intel Kubernetes Power Manager templates set_fact: @@ -35,7 +35,7 @@ when: - intel_power_manager.build_image_locally | default(false) | bool -# NOTE(pklimowx): node-agent DS is deployed automatically via Power Operator after providing +# NOTE(pklimowx): node-agent DS is deployed automatically via Power Manager after providing # PowerProfile. The yaml file needs to be patched before building image to provide correct source for it. # Both images depend on intel_power_manager* variable as there is no public image for AppQoS - name: patch Node Agent DaemonSet yaml @@ -52,28 +52,31 @@ regexp: "^ - image: 'appqos:latest'" line: " - image: {{ app_qos_image }}" when: - - intel_power_manager.build_image_locally | default(false) | bool - - inventory_hostname == groups['kube_node'][0] + - intel_power_manager.build_image_locally | default(false) | bool + - inventory_hostname == groups['kube_node'][0] # docker runtime is in use - name: prepare images for Intel Kubernetes Power Manager block: - name: build images for Intel Kubernetes Power Manager command: docker build -f build/{{ item.file }} -t {{ item.name }}:latest . + changed_when: true args: chdir: "{{ intel_power_manager_dir }}" with_items: - - { file: Dockerfile, name: intel-power-operator } - - { file: Dockerfile.nodeagent, name: intel-power-node-agent } + - {file: Dockerfile, name: intel-power-operator} + - {file: Dockerfile.nodeagent, name: intel-power-node-agent} - name: tag Intel Kubernetes Power Manager images command: docker tag {{ item }}:latest {{ registry_local_address }}/{{ item }}:{{ intel_power_manager_git_ref }} + changed_when: true with_items: - intel-power-operator - intel-power-node-agent - name: push Intel Kubernetes Power Manager images to local registry command: docker push {{ registry_local_address }}/{{ item }}:{{ intel_power_manager_git_ref }} + changed_when: true with_items: - intel-power-operator - intel-power-node-agent @@ -87,11 +90,12 @@ block: - name: build and tag images for Intel Kubernetes Power Manager command: podman build -f build/{{ item.file }} -t {{ registry_local_address }}/{{ item.name }}:{{ intel_power_manager_git_ref }} . + changed_when: true args: chdir: "{{ intel_power_manager_dir }}" with_items: - - { file: Dockerfile, name: intel-power-operator } - - { file: Dockerfile.nodeagent, name: intel-power-node-agent } + - {file: Dockerfile, name: intel-power-operator} + - {file: Dockerfile.nodeagent, name: intel-power-node-agent} - name: push Intel Kubernetes Power Manager images to local registry command: podman push {{ registry_local_address }}/{{ item }}:{{ intel_power_manager_git_ref }} @@ -104,23 +108,24 @@ - intel_power_manager.build_image_locally | default(false) | bool - inventory_hostname == groups['kube_node'][0] -- name: prepare and deploy Intel Power Operator +- name: prepare and deploy Intel Power Manager block: - - name: apply rbac patch to allow PSP - patch: - src: "{{ (role_path, 'files', 'rbac.patch') | path_join }}" - dest: "{{ (intel_power_manager_dir, 'config', 'rbac', 'rbac.yaml') | path_join }}" - when: psp_enabled | default(true) | bool + - name: create Intel Power Manager namespace + k8s: + name: "{{ intel_power_manager_namespace }}" + kind: Namespace + state: present + definition: + metadata: + labels: + control-plane: controller-manager - name: apply k8s prerequisites k8s: state: present - src: "{{ (intel_power_manager_dir, 'config', 'rbac', item + '.yaml') | path_join }}" - with_items: - - namespace - - rbac + src: "{{ (intel_power_manager_dir, 'config', 'rbac', 'rbac.yaml') | path_join }}" - - name: create and install Intel Power Operator CRDs + - name: create and install Intel Power Manager CRDs make: chdir: "{{ intel_power_manager_dir }}" diff --git a/roles/intel_sriov_fec_operator/defaults/main.yml b/roles/intel_sriov_fec_operator/defaults/main.yml index b74fb524..b08c9f39 100644 --- a/roles/intel_sriov_fec_operator/defaults/main.yml +++ b/roles/intel_sriov_fec_operator/defaults/main.yml @@ -17,7 +17,7 @@ # Reference: https://github.com/smart-edge-open/sriov-fec-operator/tree/sriov-fec-operator-22.03.14/spec # FEC = Forward Error Correction # CR = Custom Resource -# ACC100 = Intel vRAN Dedicated H/W Accelerator Card +# ACC100 = Intel vRAN Dedicated H/W Accelerator Card # Intel Smart Edge Open (SEO) FEC Operator intel_sriov_fec_operator_git: "https://github.com/smart-edge-open/sriov-fec-operator.git" @@ -43,5 +43,3 @@ opm_ver: "v1.22.0" opm_chk: "e671f494f0944af228e9f2bc09042d04ec47b61d7094fa2129c8690ec6b6ed27" opm_dir: "/usr/local/bin/" opm_cmd: "opm" - - diff --git a/roles/intel_sriov_fec_operator/files/ACC100-sample-cr.yaml b/roles/intel_sriov_fec_operator/files/ACC100-sample-cr.yaml index 951ccb25..d50ea357 100644 --- a/roles/intel_sriov_fec_operator/files/ACC100-sample-cr.yaml +++ b/roles/intel_sriov_fec_operator/files/ACC100-sample-cr.yaml @@ -8,7 +8,7 @@ spec: kubernetes.io/hostname: node1 acceleratorSelector: pciAddress: 0000:af:00.0 - physicalFunction: + physicalFunction: pfDriver: "pci-pf-stub" vfDriver: "vfio-pci" vfAmount: 16 diff --git a/roles/intel_sriov_fec_operator/files/N3000-sample-cr.yaml b/roles/intel_sriov_fec_operator/files/N3000-sample-cr.yaml index 20d9a697..b0054c8c 100644 --- a/roles/intel_sriov_fec_operator/files/N3000-sample-cr.yaml +++ b/roles/intel_sriov_fec_operator/files/N3000-sample-cr.yaml @@ -9,7 +9,7 @@ spec: kubernetes.io/hostname: node1 acceleratorSelector: pciAddress: 0000.1d.00.0 - physicalFunction: + physicalFunction: pfDriver: pci-pf-stub vfDriver: vfio-pci vfAmount: 2 diff --git a/roles/intel_sriov_fec_operator/tasks/opm.yml b/roles/intel_sriov_fec_operator/tasks/opm.yml index e121f687..341e59b3 100644 --- a/roles/intel_sriov_fec_operator/tasks/opm.yml +++ b/roles/intel_sriov_fec_operator/tasks/opm.yml @@ -21,4 +21,3 @@ checksum: "sha256:{{ opm_chk }}" mode: '0755' use_proxy: yes - diff --git a/roles/intel_sriov_fec_operator/tasks/sriov_fec_operator.yml b/roles/intel_sriov_fec_operator/tasks/sriov_fec_operator.yml index fea43318..e3aa664c 100644 --- a/roles/intel_sriov_fec_operator/tasks/sriov_fec_operator.yml +++ b/roles/intel_sriov_fec_operator/tasks/sriov_fec_operator.yml @@ -48,10 +48,10 @@ force: yes mode: preserve loop: - - { src: 'acc100-cr.yaml.j2', dst: 'acc100-cr.yaml' } - - { src: 'catalog.yml.j2', dst: 'catalog.yml' } - - { src: 'operator-group.yml.j2', dst: 'operator-group.yml' } - - { src: 'subscription.yml.j2', dst: 'subscription.yml' } + - {src: 'acc100-cr.yaml.j2', dst: 'acc100-cr.yaml'} + - {src: 'catalog.yml.j2', dst: 'catalog.yml'} + - {src: 'operator-group.yml.j2', dst: 'operator-group.yml'} + - {src: 'subscription.yml.j2', dst: 'subscription.yml'} - name: create FEC Operator namespace k8s: diff --git a/roles/service_mesh_install/charts/istioctl/.helmignore b/roles/istio_service_mesh/charts/istioctl/.helmignore similarity index 100% rename from roles/service_mesh_install/charts/istioctl/.helmignore rename to roles/istio_service_mesh/charts/istioctl/.helmignore diff --git a/roles/service_mesh_install/charts/istioctl/Chart.yaml b/roles/istio_service_mesh/charts/istioctl/Chart.yaml similarity index 100% rename from roles/service_mesh_install/charts/istioctl/Chart.yaml rename to roles/istio_service_mesh/charts/istioctl/Chart.yaml diff --git a/roles/service_mesh_install/charts/istioctl/templates/NOTES.txt b/roles/istio_service_mesh/charts/istioctl/templates/NOTES.txt similarity index 100% rename from roles/service_mesh_install/charts/istioctl/templates/NOTES.txt rename to roles/istio_service_mesh/charts/istioctl/templates/NOTES.txt diff --git a/roles/service_mesh_install/charts/istioctl/templates/_helpers.tpl b/roles/istio_service_mesh/charts/istioctl/templates/_helpers.tpl similarity index 100% rename from roles/service_mesh_install/charts/istioctl/templates/_helpers.tpl rename to roles/istio_service_mesh/charts/istioctl/templates/_helpers.tpl diff --git a/roles/service_mesh_install/charts/istioctl/templates/istioctl-deployment.yaml b/roles/istio_service_mesh/charts/istioctl/templates/istioctl-deployment.yaml similarity index 100% rename from roles/service_mesh_install/charts/istioctl/templates/istioctl-deployment.yaml rename to roles/istio_service_mesh/charts/istioctl/templates/istioctl-deployment.yaml diff --git a/roles/service_mesh_install/charts/istioctl/templates/istioctl-rbac.yaml b/roles/istio_service_mesh/charts/istioctl/templates/istioctl-rbac.yaml similarity index 100% rename from roles/service_mesh_install/charts/istioctl/templates/istioctl-rbac.yaml rename to roles/istio_service_mesh/charts/istioctl/templates/istioctl-rbac.yaml diff --git a/roles/service_mesh_install/charts/istioctl/values.yaml b/roles/istio_service_mesh/charts/istioctl/values.yaml similarity index 86% rename from roles/service_mesh_install/charts/istioctl/values.yaml rename to roles/istio_service_mesh/charts/istioctl/values.yaml index cbfe7d0e..685f3caa 100644 --- a/roles/service_mesh_install/charts/istioctl/values.yaml +++ b/roles/istio_service_mesh/charts/istioctl/values.yaml @@ -56,6 +56,10 @@ tolerations: operator: "Equal" value: "" effect: "NoSchedule" + - key: "node-role.kubernetes.io/control-plane" + operator: "Equal" + value: "" + effect: "NoSchedule" affinity: nodeAffinity: @@ -66,6 +70,12 @@ affinity: - key: "node-role.kubernetes.io/master" operator: In values: [""] + - weight: 1 + preference: + matchExpressions: + - key: "node-role.kubernetes.io/control-plane" + operator: In + values: [""] volumes: - name: istio-profiles diff --git a/roles/istio_service_mesh/defaults/main.yml b/roles/istio_service_mesh/defaults/main.yml new file mode 100644 index 00000000..fef25b05 --- /dev/null +++ b/roles/istio_service_mesh/defaults/main.yml @@ -0,0 +1,22 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +istio_service_mesh_download_url: + "https://github.com/istio/istio/releases/download/{{ istio_service_mesh.version }}/\ + istio-{{ istio_service_mesh.version }}-linux-amd64.tar.gz" +istio_service_mesh_release_dir: "{{ (project_root_dir, 'istio') | path_join }}" +istio_service_mesh_charts_dir: "{{ (project_root_dir, 'charts', 'istio') | path_join }}" +istio_service_mesh_profiles_dir: "{{ (istio_service_mesh_charts_dir, 'profiles') | path_join }}" diff --git a/roles/service_mesh_install/files/profiles/.gitkeep b/roles/istio_service_mesh/files/profiles/.gitkeep similarity index 100% rename from roles/service_mesh_install/files/profiles/.gitkeep rename to roles/istio_service_mesh/files/profiles/.gitkeep diff --git a/roles/service_mesh_install/files/profiles/intel-cryptomb.yaml b/roles/istio_service_mesh/files/profiles/intel-cryptomb.yaml similarity index 100% rename from roles/service_mesh_install/files/profiles/intel-cryptomb.yaml rename to roles/istio_service_mesh/files/profiles/intel-cryptomb.yaml diff --git a/roles/service_mesh_install/files/profiles/intel-qat-hw.yaml b/roles/istio_service_mesh/files/profiles/intel-qat-hw.yaml similarity index 100% rename from roles/service_mesh_install/files/profiles/intel-qat-hw.yaml rename to roles/istio_service_mesh/files/profiles/intel-qat-hw.yaml diff --git a/roles/service_mesh_install/files/profiles/intel-qat-sw.yaml b/roles/istio_service_mesh/files/profiles/intel-qat-sw.yaml similarity index 100% rename from roles/service_mesh_install/files/profiles/intel-qat-sw.yaml rename to roles/istio_service_mesh/files/profiles/intel-qat-sw.yaml diff --git a/roles/service_mesh_install/files/profiles/sgx-mtls.yaml b/roles/istio_service_mesh/files/profiles/sgx-mtls.yaml similarity index 100% rename from roles/service_mesh_install/files/profiles/sgx-mtls.yaml rename to roles/istio_service_mesh/files/profiles/sgx-mtls.yaml diff --git a/roles/service_mesh_install/tasks/cleanup.yml b/roles/istio_service_mesh/tasks/cleanup.yml similarity index 72% rename from roles/service_mesh_install/tasks/cleanup.yml rename to roles/istio_service_mesh/tasks/cleanup.yml index 824dc92d..f3b0823e 100644 --- a/roles/service_mesh_install/tasks/cleanup.yml +++ b/roles/istio_service_mesh/tasks/cleanup.yml @@ -17,7 +17,7 @@ - name: find existing istioctl pod command: | kubectl get pods \ - --namespace {{ service_mesh.istio_namespace }} \ + --namespace {{ istio_service_mesh.istio_namespace }} \ -l "app.kubernetes.io/name=istioctl,app.kubernetes.io/instance=istioctl" \ -o jsonpath="{.items[0].metadata.name}" register: istioctl_pod @@ -27,7 +27,7 @@ - name: remove existing istio resources command: | kubectl exec \ - --namespace {{ service_mesh.istio_namespace }} \ + --namespace {{ istio_service_mesh.istio_namespace }} \ {{ istioctl_pod.stdout }} -- istioctl x uninstall --purge -y failed_when: false changed_when: true @@ -35,33 +35,33 @@ - istioctl_pod.stderr | length == 0 - name: remove existing istioctl deployment - command: "helm delete istioctl --namespace {{ service_mesh.istio_namespace }}" + command: "helm delete istioctl --namespace {{ istio_service_mesh.istio_namespace }}" changed_when: true failed_when: false - name: remove existing tcpip-bypass-ebpf resources - command: "kubectl delete -f {{ (service_mesh_charts_dir, 'tcpip-bypass-ebpf.yaml') | path_join }}" + command: "kubectl delete -f {{ (istio_service_mesh_charts_dir, 'tcpip-bypass-ebpf.yaml') | path_join }}" changed_when: true failed_when: false - name: remove existing intel-tls-splicing resources - command: "kubectl delete -f {{ (service_mesh_charts_dir, 'intel-tls-splicing.yaml') | path_join }}" + command: "kubectl delete -f {{ (istio_service_mesh_charts_dir, 'intel-tls-splicing.yaml') | path_join }}" changed_when: true failed_when: false - name: remove existing tcs cluster issuer - command: "kubectl delete -f {{ (service_mesh_charts_dir, 'tcs-cluster-issuer.yaml') | path_join }}" + command: "kubectl delete -f {{ (istio_service_mesh_charts_dir, 'tcs-cluster-issuer.yaml') | path_join }}" changed_when: true failed_when: false - name: remove istio custom manifests directory file: - path: "{{ service_mesh_charts_dir }}" + path: "{{ istio_service_mesh_charts_dir }}" state: absent failed_when: false - name: remove istio release directory file: - path: "{{ service_mesh_release_dir }}" + path: "{{ istio_service_mesh_release_dir }}" state: absent failed_when: false diff --git a/roles/service_mesh_install/tasks/configure-custom-ca-signer.yml b/roles/istio_service_mesh/tasks/configure-custom-ca-signer.yml similarity index 74% rename from roles/service_mesh_install/tasks/configure-custom-ca-signer.yml rename to roles/istio_service_mesh/tasks/configure-custom-ca-signer.yml index 69189a8e..c3be9934 100644 --- a/roles/service_mesh_install/tasks/configure-custom-ca-signer.yml +++ b/roles/istio_service_mesh/tasks/configure-custom-ca-signer.yml @@ -16,19 +16,19 @@ --- - name: create Istio custom manifests directory file: - path: "{{ service_mesh_charts_dir }}" + path: "{{ istio_service_mesh_charts_dir }}" state: directory mode: 0755 - name: configure TCSCluster issuer template: src: "tcs-cluster-issuer.yaml.j2" - dest: "{{ (service_mesh_charts_dir, 'tcs-cluster-issuer.yaml') | path_join }}" + dest: "{{ (istio_service_mesh_charts_dir, 'tcs-cluster-issuer.yaml') | path_join }}" force: yes mode: preserve - name: create TCSIssuer - command: "kubectl apply -f {{ (service_mesh_charts_dir, 'tcs-cluster-issuer.yaml') | path_join }}" + command: "kubectl apply -f {{ (istio_service_mesh_charts_dir, 'tcs-cluster-issuer.yaml') | path_join }}" changed_when: true - name: wait for secret creation @@ -38,7 +38,7 @@ - name: read tls cert from the secret shell: |- set -o pipefail && \ - kubectl get secret -n {{ service_mesh.sgx_signer.tcs_namespace }} {{ service_mesh.sgx_signer.name }}-secret -o jsonpath='{.data.tls\.crt}' | \ + kubectl get secret -n {{ istio_service_mesh.sgx_signer.tcs_namespace }} {{ istio_service_mesh.sgx_signer.name }}-secret -o jsonpath='{.data.tls\.crt}' | \ base64 -d args: executable: /bin/bash @@ -57,4 +57,4 @@ - name: override profile name to use 'custom-ca' set_fact: - service_mesh: "{{ service_mesh | combine({'profile': 'custom-ca'}, recursive=True) }}" + istio_service_mesh: "{{ istio_service_mesh | combine({'profile': 'custom-ca'}, recursive=True) }}" diff --git a/roles/service_mesh_install/tasks/istio-install.yml b/roles/istio_service_mesh/tasks/istio-install.yml similarity index 74% rename from roles/service_mesh_install/tasks/istio-install.yml rename to roles/istio_service_mesh/tasks/istio-install.yml index 5cc2a0fc..b79db77c 100644 --- a/roles/service_mesh_install/tasks/istio-install.yml +++ b/roles/istio_service_mesh/tasks/istio-install.yml @@ -16,21 +16,21 @@ --- - name: set facts for upstream istio release set_fact: - istio_image: "{{ service_mesh.image }}" - istio_tag: "{{ service_mesh.version }}" + istio_image: "{{ istio_service_mesh.image }}" + istio_tag: "{{ istio_service_mesh.version }}" when: - - not service_mesh.intel_preview.enabled + - not istio_service_mesh.intel_preview.enabled - name: set facts for intel preview istio release set_fact: - istio_image: "{{ service_mesh.intel_preview.image }}" - istio_tag: "{{ service_mesh.intel_preview.version }}" + istio_image: "{{ istio_service_mesh.intel_preview.image }}" + istio_tag: "{{ istio_service_mesh.intel_preview.version }}" when: - - service_mesh.intel_preview.enabled + - istio_service_mesh.intel_preview.enabled - name: create istio custom manifests directory file: - path: "{{ service_mesh_charts_dir }}" + path: "{{ istio_service_mesh_charts_dir }}" state: directory mode: 0755 @@ -38,21 +38,21 @@ block: - name: create custom istio profiles directory file: - path: "{{ service_mesh_profiles_dir }}" + path: "{{ istio_service_mesh_profiles_dir }}" state: directory mode: 0755 - name: copy manifest copy: - src: "{{ ('profiles', service_mesh.profile) | path_join }}.yaml" - dest: "{{ service_mesh_profiles_dir }}" + src: "{{ ('profiles', istio_service_mesh.profile) | path_join }}.yaml" + dest: "{{ istio_service_mesh_profiles_dir }}" owner: root mode: preserve rescue: - name: fallback to empty profile if no requested profile manifest exists set_fact: - service_mesh: "{{ service_mesh | combine({'profile':'empty'}, recursive=True) }}" + istio_service_mesh: "{{ istio_service_mesh | combine({'profile':'empty'}, recursive=True) }}" when: - - service_mesh.profile not in ['default', 'demo', 'minimal', 'external', 'empty', 'preview'] + - istio_service_mesh.profile not in ['default', 'demo', 'minimal', 'external', 'empty', 'preview'] - name: evaluate parameters to be used for istio service mesh installation template: @@ -77,7 +77,7 @@ - name: copy istio helm chart to the controller node copy: src: "{{ (role_path, 'charts', 'istioctl') | path_join }}" - dest: "{{ service_mesh_charts_dir }}" + dest: "{{ istio_service_mesh_charts_dir }}" mode: 0755 - name: populate istio charts values templates and push to controller node @@ -90,10 +90,10 @@ - name: install intel istio helm chart command: >- helm upgrade -i istioctl - --namespace {{ service_mesh.istio_namespace }} + --namespace {{ istio_service_mesh.istio_namespace }} --create-namespace -f {{ (project_root_dir, 'charts', 'istioctl-values.yaml') | path_join }} - {{ (service_mesh_charts_dir, 'istioctl') | path_join }} + {{ (istio_service_mesh_charts_dir, 'istioctl') | path_join }} changed_when: true - name: remove temporary files diff --git a/roles/service_mesh_install/tasks/main.yml b/roles/istio_service_mesh/tasks/main.yml similarity index 80% rename from roles/service_mesh_install/tasks/main.yml rename to roles/istio_service_mesh/tasks/main.yml index 0502e164..7c67e21a 100644 --- a/roles/service_mesh_install/tasks/main.yml +++ b/roles/istio_service_mesh/tasks/main.yml @@ -16,7 +16,7 @@ --- - name: combine defaults and user provided vars set_fact: - service_mesh: "{{ service_mesh_defaults | combine(service_mesh | default({}), recursive=True) }}" + istio_service_mesh: "{{ istio_service_mesh_defaults | combine(istio_service_mesh | default({}), recursive=True) }}" when: - inventory_hostname == groups['kube_control_plane'][0] @@ -24,7 +24,9 @@ include_role: name: check_machine_type when: - - inventory_hostname == groups['kube_node'][0] + - inventory_hostname in groups['kube_node'] or + inventory_hostname in groups['vm_host'] + - not on_vms | default (false) - name: remove existing istio service mesh resources include_tasks: cleanup.yml @@ -35,13 +37,13 @@ include_tasks: tcpip-bypass-ebpf.yml when: - inventory_hostname == groups['kube_control_plane'][0] - - service_mesh.tcpip_bypass_ebpf.enabled | default(false) | bool + - istio_service_mesh.tcpip_bypass_ebpf.enabled | default(false) | bool - name: configure custom CA signer include_tasks: configure-custom-ca-signer.yml when: - inventory_hostname == groups['kube_control_plane'][0] - - service_mesh.sgx_signer.enabled | default(false) | bool + - istio_service_mesh.sgx_signer.enabled | default(false) | bool - hostvars[groups['kube_node'][0]]['is_icx'] or hostvars[groups['kube_node'][0]]['is_spr'] @@ -69,5 +71,5 @@ include_tasks: tls-splicing-and-bumping.yml when: - inventory_hostname == groups['kube_control_plane'][0] - - service_mesh.tls_splicing.enabled | default(false) | bool - - service_mesh.profile != 'empty' + - istio_service_mesh.tls_splicing.enabled | default(false) | bool + - istio_service_mesh.profile != 'empty' diff --git a/roles/service_mesh_install/tasks/tcpip-bypass-ebpf.yml b/roles/istio_service_mesh/tasks/tcpip-bypass-ebpf.yml similarity index 69% rename from roles/service_mesh_install/tasks/tcpip-bypass-ebpf.yml rename to roles/istio_service_mesh/tasks/tcpip-bypass-ebpf.yml index 23d09119..2ba2a813 100644 --- a/roles/service_mesh_install/tasks/tcpip-bypass-ebpf.yml +++ b/roles/istio_service_mesh/tasks/tcpip-bypass-ebpf.yml @@ -16,28 +16,28 @@ --- - name: Create istio profiles dir if does not exist file: - path: "{{ service_mesh_charts_dir }}" + path: "{{ istio_service_mesh_charts_dir }}" state: directory owner: root mode: 0755 -- name: create tcpip-bypass-ebpf namespace if does not exist - shell: "set -o pipefail && kubectl create ns {{ service_mesh.tcpip_bypass_ebpf.namespace }} -o yaml --dry-run=client | kubectl apply -f -" - args: - executable: /bin/bash - changed_when: true +- name: create tcpip-bypass-ebpf namespace + k8s: + name: "{{ istio_service_mesh.tcpip_bypass_ebpf.namespace }}" + kind: Namespace + state: present - name: populate tcpip-bypass-ebpf manifest template with values template: src: "tcpip-bypass-ebpf.yaml.j2" - dest: "{{ (service_mesh_charts_dir, 'tcpip-bypass-ebpf.yaml') | path_join }}" + dest: "{{ (istio_service_mesh_charts_dir, 'tcpip-bypass-ebpf.yaml') | path_join }}" force: yes mode: preserve - name: deploy tcpip-bypass-ebpf shell: |- set -o pipefail && \ - kubectl apply -f "{{ (service_mesh_charts_dir, 'tcpip-bypass-ebpf.yaml') | path_join }}" \ + kubectl apply -f "{{ (istio_service_mesh_charts_dir, 'tcpip-bypass-ebpf.yaml') | path_join }}" \ -o yaml --dry-run=client | kubectl apply -f - args: executable: /bin/bash diff --git a/roles/service_mesh_install/tasks/tls-splicing-and-bumping.yml b/roles/istio_service_mesh/tasks/tls-splicing-and-bumping.yml similarity index 79% rename from roles/service_mesh_install/tasks/tls-splicing-and-bumping.yml rename to roles/istio_service_mesh/tasks/tls-splicing-and-bumping.yml index d27a9ed2..215752dc 100644 --- a/roles/service_mesh_install/tasks/tls-splicing-and-bumping.yml +++ b/roles/istio_service_mesh/tasks/tls-splicing-and-bumping.yml @@ -16,7 +16,7 @@ --- - name: Create istio profiles dir if does not exist file: - path: "{{ service_mesh_charts_dir }}" + path: "{{ istio_service_mesh_charts_dir }}" state: directory owner: root mode: 0755 @@ -24,14 +24,14 @@ - name: populate intel-tls-splicing manifest template with values template: src: "intel-tls-splicing.yaml.j2" - dest: "{{ (service_mesh_charts_dir, 'intel-tls-splicing.yaml') | path_join }}" + dest: "{{ (istio_service_mesh_charts_dir, 'intel-tls-splicing.yaml') | path_join }}" force: yes mode: preserve - name: wait for the istio service mesh pods are in running state shell: |- set -o pipefail && \ - [ $(kubectl get pod -n {{ service_mesh.istio_namespace }} -l 'app in (istio-ingressgateway, istiod)' \ + [ $(kubectl get pod -n {{ istio_service_mesh.istio_namespace }} -l 'app in (istio-ingressgateway, istiod)' \ | grep Running | wc -l) -eq 2 ] args: executable: /bin/bash @@ -44,7 +44,7 @@ - name: deploy intel-tls-splicing shell: |- set -o pipefail && \ - kubectl apply -f "{{ (service_mesh_charts_dir, 'intel-tls-splicing.yaml') | path_join }}" \ + kubectl apply -f "{{ (istio_service_mesh_charts_dir, 'intel-tls-splicing.yaml') | path_join }}" \ -o yaml --dry-run=client | kubectl apply -f - args: executable: /bin/bash diff --git a/roles/service_mesh_install/templates/custom-ca.yaml.j2 b/roles/istio_service_mesh/templates/custom-ca.yaml.j2 similarity index 79% rename from roles/service_mesh_install/templates/custom-ca.yaml.j2 rename to roles/istio_service_mesh/templates/custom-ca.yaml.j2 index 6acc4064..39144e16 100644 --- a/roles/service_mesh_install/templates/custom-ca.yaml.j2 +++ b/roles/istio_service_mesh/templates/custom-ca.yaml.j2 @@ -10,7 +10,7 @@ spec: - name: EXTERNAL_CA value: ISTIOD_RA_KUBERNETES_API - name: PILOT_CERT_PROVIDER - value: k8s.io/tcsclusterissuer.tcs.intel.com/{{ service_mesh.sgx_signer.name }} + value: k8s.io/tcsclusterissuer.tcs.intel.com/{{ istio_service_mesh.sgx_signer.name }} overlays: - kind: ClusterRole name: istiod-clusterrole-istio-system @@ -29,9 +29,9 @@ spec: defaultConfig: proxyMetadata: PROXY_CONFIG_XDS_AGENT: "true" - ISTIO_META_CERT_SIGNER: {{ service_mesh.sgx_signer.name }} + ISTIO_META_CERT_SIGNER: {{ istio_service_mesh.sgx_signer.name }} caCertificates: - pem: | {{ tcs_issuer_secret.stdout | indent(8, false) }} certSigners: - - tcsclusterissuer.tcs.intel.com/{{ service_mesh.sgx_signer.name }} + - tcsclusterissuer.tcs.intel.com/{{ istio_service_mesh.sgx_signer.name }} diff --git a/roles/service_mesh_install/templates/intel-tls-splicing.yaml.j2 b/roles/istio_service_mesh/templates/intel-tls-splicing.yaml.j2 similarity index 73% rename from roles/service_mesh_install/templates/intel-tls-splicing.yaml.j2 rename to roles/istio_service_mesh/templates/intel-tls-splicing.yaml.j2 index 2e53c432..10a72ec7 100644 --- a/roles/service_mesh_install/templates/intel-tls-splicing.yaml.j2 +++ b/roles/istio_service_mesh/templates/intel-tls-splicing.yaml.j2 @@ -14,7 +14,7 @@ spec: tls: mode: PASSTHROUGH hosts: - - "{{ service_mesh.tls_splicing.hostname }}" + - "{{ istio_service_mesh.tls_splicing.hostname }}" --- apiVersion: networking.istio.io/v1alpha3 kind: ServiceEntry @@ -22,7 +22,7 @@ metadata: name: splicing-without-connect spec: hosts: - - "{{ service_mesh.tls_splicing.hostname }}" + - "{{ istio_service_mesh.tls_splicing.hostname }}" ports: - number: 443 name: tls @@ -36,7 +36,7 @@ metadata: name: splicing-without-connect spec: hosts: - - "{{ service_mesh.tls_splicing.hostname }}" + - "{{ istio_service_mesh.tls_splicing.hostname }}" gateways: - proxy tls: @@ -45,9 +45,9 @@ spec: - proxy port: 443 sniHosts: - - "{{ service_mesh.tls_splicing.hostname }}" + - "{{ istio_service_mesh.tls_splicing.hostname }}" route: - destination: - host: "{{ service_mesh.tls_splicing.hostname }}" + host: "{{ istio_service_mesh.tls_splicing.hostname }}" port: number: 443 diff --git a/roles/istio_service_mesh/templates/istioctl-options.yml.j2 b/roles/istio_service_mesh/templates/istioctl-options.yml.j2 new file mode 100644 index 00000000..b4ce98f6 --- /dev/null +++ b/roles/istio_service_mesh/templates/istioctl-options.yml.j2 @@ -0,0 +1,60 @@ +argv: + - --skip-confirmation +{% if istio_service_mesh.context is defined and istio_service_mesh.context != '' %} + - --context + - {{ istio_service_mesh.context }} +{% endif -%} +{% if istio_service_mesh.filename is defined and istio_service_mesh.filename != [] %} +{% for item in istio_service_mesh.filename %} + - --filename + - {{ item }} +{% endfor -%} +{% endif -%} +{% if istio_service_mesh.namespace is defined and istio_service_mesh.namespace != '' %} + - --namespace + - {{ istio_service_mesh.namespace }} +{% endif -%} +{% if istio_service_mesh.istio_namespace is defined and istio_service_mesh.istio_namespace != '' %} + - --istioNamespace + - {{ istio_service_mesh.istio_namespace }} +{% endif -%} +{% if istio_service_mesh.kubeconfig is defined and istio_service_mesh.kubeconfig != '' %} + - --kubeconfig + - {{ istio_service_mesh.kubeconfig }} +{% endif -%} +{% if istio_service_mesh.vklog is defined and istio_service_mesh.vklog != '' %} + - --vklog + - {{ istio_service_mesh.vklog }} +{% endif -%} +{% if istio_service_mesh.revision is defined and istio_service_mesh.revision != '' %} + - --revision + - {{ istio_service_mesh.revision }} +{% endif -%} +{% if istio_service_mesh.manifest is defined and istio_service_mesh.manifest != '' %} + - --manifests + - {{ istio_service_mesh.manifest }} +{% endif -%} +{% if istio_service_mesh.dry_run is defined and istio_service_mesh.dry_run | bool %} + - --dry-run +{% endif -%} +{% if istio_service_mesh.force is defined and istio_service_mesh.force | bool %} + - --force +{% endif -%} +{% if istio_service_mesh.readiness_timeout is defined and istio_service_mesh.readiness_timeout != '' %} + - --readiness-timeout + - {{ istio_service_mesh.readiness_timeout }} +{% endif -%} +{% if istio_service_mesh.set is defined and istio_service_mesh.set != [] %} +{% for item in istio_service_mesh.set %} + - --set + - {{ item }} +{% endfor -%} +{% endif -%} +{% if istio_service_mesh.verify is defined and istio_service_mesh.verify | bool and istio_service_mesh.profile != 'empty' %} + - --verify +{% endif -%} +{% if istio_service_mesh.profile in ['default', 'demo', 'minimal', 'external', 'empty', 'preview'] %} + - --set profile={{ istio_service_mesh.profile }} +{% else %} + - --filename={{ istio_service_mesh_profiles_dir }}/{{ istio_service_mesh.profile }}.yaml +{% endif -%} diff --git a/roles/service_mesh_install/templates/istioctl-values.yaml.j2 b/roles/istio_service_mesh/templates/istioctl-values.yaml.j2 similarity index 100% rename from roles/service_mesh_install/templates/istioctl-values.yaml.j2 rename to roles/istio_service_mesh/templates/istioctl-values.yaml.j2 diff --git a/roles/service_mesh_install/templates/tcpip-bypass-ebpf.yaml.j2 b/roles/istio_service_mesh/templates/tcpip-bypass-ebpf.yaml.j2 similarity index 55% rename from roles/service_mesh_install/templates/tcpip-bypass-ebpf.yaml.j2 rename to roles/istio_service_mesh/templates/tcpip-bypass-ebpf.yaml.j2 index bcc5dfc8..e850963e 100644 --- a/roles/service_mesh_install/templates/tcpip-bypass-ebpf.yaml.j2 +++ b/roles/istio_service_mesh/templates/tcpip-bypass-ebpf.yaml.j2 @@ -1,27 +1,29 @@ apiVersion: apps/v1 kind: DaemonSet metadata: - name: {{ service_mesh.tcpip_bypass_ebpf.name }} - namespace: {{ service_mesh.tcpip_bypass_ebpf.namespace }} + name: {{ istio_service_mesh.tcpip_bypass_ebpf.name }} + namespace: {{ istio_service_mesh.tcpip_bypass_ebpf.namespace }} labels: - k8s-app: {{ service_mesh.tcpip_bypass_ebpf.name }} + k8s-app: {{ istio_service_mesh.tcpip_bypass_ebpf.name }} spec: selector: matchLabels: - name: {{ service_mesh.tcpip_bypass_ebpf.name }} + name: {{ istio_service_mesh.tcpip_bypass_ebpf.name }} template: metadata: labels: - name: {{ service_mesh.tcpip_bypass_ebpf.name }} + name: {{ istio_service_mesh.tcpip_bypass_ebpf.name }} spec: tolerations: # this toleration is to have the daemonset runnable on master nodes # remove it if your masters can't run pods - key: node-role.kubernetes.io/master effect: NoSchedule + - key: node-role.kubernetes.io/control-plane + effect: NoSchedule containers: - - name: {{ service_mesh.tcpip_bypass_ebpf.name }} - image: {{ service_mesh.tcpip_bypass_ebpf.image }}:{{ service_mesh.tcpip_bypass_ebpf.version }} + - name: {{ istio_service_mesh.tcpip_bypass_ebpf.name }} + image: {{ istio_service_mesh.tcpip_bypass_ebpf.image }}:{{ istio_service_mesh.tcpip_bypass_ebpf.version }} imagePullPolicy: IfNotPresent securityContext: privileged: true diff --git a/roles/istio_service_mesh/templates/tcs-cluster-issuer.yaml.j2 b/roles/istio_service_mesh/templates/tcs-cluster-issuer.yaml.j2 new file mode 100644 index 00000000..4c5a8310 --- /dev/null +++ b/roles/istio_service_mesh/templates/tcs-cluster-issuer.yaml.j2 @@ -0,0 +1,6 @@ +apiVersion: tcs.intel.com/v1alpha1 +kind: TCSClusterIssuer +metadata: + name: {{ istio_service_mesh.sgx_signer.name }} +spec: + secretName: {{ istio_service_mesh.sgx_signer.name }}-secret diff --git a/roles/service_mesh_install/vars/main.yml b/roles/istio_service_mesh/vars/main.yml similarity index 97% rename from roles/service_mesh_install/vars/main.yml rename to roles/istio_service_mesh/vars/main.yml index b892c41d..8d6609fc 100644 --- a/roles/service_mesh_install/vars/main.yml +++ b/roles/istio_service_mesh/vars/main.yml @@ -13,7 +13,7 @@ ## See the License for the specific language governing permissions and ## limitations under the License. ## -service_mesh_defaults: +istio_service_mesh_defaults: enabled: false image: istio/istioctl version: 1.14.1 @@ -49,4 +49,3 @@ service_mesh_defaults: enabled: false name: sgx-signer tcs_namespace: "{{ tcs.namespace | default('tcs') }}" - diff --git a/roles/jaeger_install/defaults/main.yml b/roles/jaeger_install/defaults/main.yml new file mode 100644 index 00000000..32a02f30 --- /dev/null +++ b/roles/jaeger_install/defaults/main.yml @@ -0,0 +1,27 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +jaeger_defaults: + apiVersion: jaegertracing.io/v1 + name: jaeger + namespace: observability + strategy: allInOne + image: jaegertracing/all-in-one:1.37 + log_level: debug + storage_type: memory + max_traces: 100000 + ingress_enabled: false + agent_strategy: DaemonSet + jaeger_crd_url: https://github.com/jaegertracing/jaeger-operator/releases/download/v1.37.0/jaeger-operator.yaml diff --git a/roles/jaeger_install/tasks/main.yml b/roles/jaeger_install/tasks/main.yml new file mode 100644 index 00000000..56359e68 --- /dev/null +++ b/roles/jaeger_install/tasks/main.yml @@ -0,0 +1,52 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +- block: + - name: Create observability namespace + k8s: + name: "observability" + kind: Namespace + state: present + + - name: Create Jaeger folder + file: + state: directory + dest: "{{ (project_root_dir, 'Jaeger') | path_join }}" + mode: 0755 + + - name: Download jaeger operator CRD + get_url: + url: "{{ jaeger_defaults.jaeger_crd_url }}" + dest: "{{ (project_root_dir, 'Jaeger', 'jaeger-operator.yaml') | path_join }}" + mode: 0755 + + - name: Deploy jaeger-operator CRD + command: "kubectl apply -f {{ (project_root_dir, 'Jaeger', 'jaeger-operator.yaml') | path_join }}" + changed_when: true + + - name: Wait for jaeger-operator to be ready + pause: + seconds: 30 + + - name: Generate Jaeger deployment file + template: + src: "{{ (role_path, 'templates', 'jaeger_deployment.yaml.j2') | path_join }}" + dest: "{{ (project_root_dir, 'Jaeger', 'jaeger_deployment.yaml') | path_join }}" + mode: 0644 + + - name: Create instance of jaeger-operator + command: "kubectl apply -f {{ (project_root_dir, 'Jaeger', 'jaeger_deployment.yaml') | path_join }}" + changed_when: true + when: inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/jaeger_install/templates/jaeger_deployment.yaml.j2 b/roles/jaeger_install/templates/jaeger_deployment.yaml.j2 new file mode 100644 index 00000000..317c44f6 --- /dev/null +++ b/roles/jaeger_install/templates/jaeger_deployment.yaml.j2 @@ -0,0 +1,22 @@ +apiVersion: "{{jaeger_defaults.apiVersion}}" +kind: Jaeger +metadata: + name: "{{jaeger_defaults.name}}" + namespace: "{{jaeger_defaults.namespace}}" +spec: + strategy: "{{jaeger_defaults.strategy}}" + "{{jaeger_defaults.strategy}}": + image: "{{jaeger_defaults.image}}" + options: + log-level: "{{jaeger_defaults.log_level}}" + storage: + type: "{{jaeger_defaults.storage_type}}" + options: + memory: + max-traces: "{{jaeger_defaults.max_traces}}" + ingress: + enabled: {{jaeger_defaults.ingress_enabled}} + agent: + strategy: "{{jaeger_defaults.agent_strategy}}" + annotations: + scheduler.alpha.kubernetes.io/critical-pod: "" diff --git a/roles/kmra_install/charts/kmra-apphsm/Chart.yaml b/roles/kmra_install/charts/kmra-apphsm/Chart.yaml index 45852524..af209afa 100644 --- a/roles/kmra_install/charts/kmra-apphsm/Chart.yaml +++ b/roles/kmra_install/charts/kmra-apphsm/Chart.yaml @@ -17,5 +17,5 @@ apiVersion: v1 description: Key Management Reference Application - AppHSM name: kmra -version: 2.1 -appVersion: '2.1' +version: 2.2.1 +appVersion: '2.2.1' diff --git a/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-config-configmap.yaml b/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-config-configmap.yaml index fa59a203..ab853d87 100644 --- a/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-config-configmap.yaml +++ b/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-config-configmap.yaml @@ -9,6 +9,7 @@ data: { "port": {{ .Values.apphsm.main.port | int }}, "ip": {{ .Values.apphsm.main.hostname | quote }}, + "nonce_lifetime": {{ .Values.apphsm.nonce_lifetime | int }}, "clients": [ {{- if eq .Values.apphsm.ctk_loadkey_demo_enabled "true" -}} { diff --git a/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-entrypoint-configmap.yaml b/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-entrypoint-configmap.yaml index c852c7e0..18b4bae6 100644 --- a/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-entrypoint-configmap.yaml +++ b/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-entrypoint-configmap.yaml @@ -37,8 +37,17 @@ data: rm -rf {{ $key.token_key }} {{ end }} - echo "Starting AppHSM..." + echo -e "\033[0;31m------------------------------------------------------------" + echo -e "KMRA (Key Management Reference Application) is a proof-of-concept" + echo -e "software not suitable for production usage. AppHSM Key Server has" + echo -e "limited functionality and provisions private keys to non-production" + echo -e "SGX enclaves. Please note that the enclave is signed with a test" + echo -e "signing key. A production enclave should go through the process of" + echo -e "signing an enclave as explained in the section Enclave Signing Tool" + echo -e "in the Intel(R) SGX Developer Reference for Linux* OS" + echo -e "(https://download.01.org/intel-sgx/latest/linux-latest/docs/)" + echo -e "---------------------------------------------------------------\033[0m" source /opt/intel/apphsm/env_*/bin/activate && \ - python3.9 /opt/intel/apphsm/apphsm.py && \ + python3 /opt/intel/apphsm/apphsm.py && \ deactivate diff --git a/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-psp.yml b/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-psp.yml deleted file mode 100644 index e6e7a529..00000000 --- a/roles/kmra_install/charts/kmra-apphsm/templates/kmra-apphsm-psp.yml +++ /dev/null @@ -1,25 +0,0 @@ ---- -apiVersion: policy/v1beta1 -kind: PodSecurityPolicy -metadata: - name: {{ .Release.Name }} -spec: - hostNetwork: true - hostPorts: - - max: {{ .Values.apphsm.main.port }} - min: {{ .Values.apphsm.main.port }} - allowPrivilegeEscalation: true - allowedCapabilities: - - '*' - allowedUnsafeSysctls: - - '*' - fsGroup: - rule: RunAsAny - runAsUser: - rule: RunAsAny - seLinux: - rule: RunAsAny - supplementalGroups: - rule: RunAsAny - volumes: - - "*" diff --git a/roles/kmra_install/charts/kmra-ctk/Chart.yaml b/roles/kmra_install/charts/kmra-ctk/Chart.yaml index f9426e57..7a64b67f 100644 --- a/roles/kmra_install/charts/kmra-ctk/Chart.yaml +++ b/roles/kmra_install/charts/kmra-ctk/Chart.yaml @@ -17,5 +17,5 @@ apiVersion: v1 description: Key Management Reference Application - Demo Client Application name: kmra -version: 2.1 -appVersion: '2.1' +version: 2.2.1 +appVersion: '2.2.1' diff --git a/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-configmap.yml b/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-configmap.yml index a9e27f6d..d999663e 100644 --- a/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-configmap.yml +++ b/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-configmap.yml @@ -12,10 +12,11 @@ data: PCCS_HOSTNAME: {{ .Values.ctk_loadkey.pccs_hostname | quote }} APPHSM_PORT: {{ .Values.ctk_loadkey.apphsm_port | quote }} APPHSM_HOSTNAME: {{ .Values.ctk_loadkey.apphsm_hostname | quote }} - NGINX_HOSTNAME: {{ .Values.ctk_loadkey.main.hostname | quote }} - NGINX_PORT: {{ .Values.ctk_loadkey.main.port | quote }} CLIENT_TOKEN: {{ .Values.ctk_loadkey.client_token | quote }} CLIENT_KEY_LABEL: {{ .Values.ctk_loadkey.client_key_label | quote }} TEST_UNIQUE_UID: {{ .Values.ctk_loadkey.test_unique_uid | quote }} DEFAULT_USER_PIN: {{ .Values.ctk_loadkey.default_user_pin | quote }} DEFAULT_SO_PIN: {{ .Values.ctk_loadkey.default_so_pin | quote }} + DEFAULT_CLIENT_TOKEN_ID: {{ .Values.ctk_loadkey.default_client_token_id | quote }} + PKCS11_PROXY_TLS_PSK_FILE: {{ .Values.ctk_loadkey.pkcs11_proxy_tls_psk_file | quote }} + PKCS11_DAEMON_SOCKET: "tls://{{ .Values.ctk_loadkey.pkcs11_daemon_socket_hostname }}:{{ .Values.ctk_loadkey.pkcs11_daemon_socket_port }}" diff --git a/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-deployment.yml b/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-deployment.yml index f1549946..eed0657d 100644 --- a/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-deployment.yml +++ b/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-deployment.yml @@ -30,7 +30,7 @@ spec: imagePullPolicy: {{ .Values.ctk_loadkey.main.image.pullPolicy }} ports: - name: ctk-loadkey - containerPort: {{ .Values.ctk_loadkey.main.port }} + containerPort: {{ .Values.ctk_loadkey.pkcs11_daemon_socket_port }} envFrom: - configMapRef: name: {{ .Release.Name }}-env-cm @@ -48,6 +48,10 @@ spec: - name: tmpfs mountPath: /opt/intel/cryptoapitoolkit/tokens subPath: tokens + - name: p11-proxy-tls-psk + mountPath: /etc/p11_proxy_tls.psk + subPath: p11_proxy_tls.psk + readOnly: true resources: limits: cpu: 500m @@ -60,6 +64,34 @@ spec: runAsUser: 65333 runAsGroup: {{ .Values.ctk_loadkey.sgx_prv_gid }} readOnlyRootFilesystem: true + - name: {{ .Release.Name }}-nginx + image: "{{ .Values.ctk_loadkey.nginx.image.repo }}/{{ .Values.ctk_loadkey.nginx.image.name }}:{{ .Values.ctk_loadkey.nginx.image.tag }}" + imagePullPolicy: {{ .Values.ctk_loadkey.nginx.image.pullPolicy }} + ports: + - name: ctk-nginx + containerPort: {{ .Values.ctk_loadkey.nginx.port }} + envFrom: + - configMapRef: + name: {{ .Release.Name }}-nginx-env + volumeMounts: + - name: p11-proxy-tls-psk + mountPath: /etc/p11_proxy_tls.psk + subPath: p11_proxy_tls.psk + readOnly: true + - name: tmpfs + mountPath: /tmp + subPath: tmp + resources: + limits: + cpu: 200m + memory: 300Mi + requests: + cpu: 100m + memory: 200Mi + securityContext: + runAsUser: 65333 + runAsGroup: 65333 + readOnlyRootFilesystem: true affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: @@ -88,3 +120,10 @@ spec: emptyDir: medium: Memory sizeLimit: 64Mi + - name: p11-proxy-tls-psk + configMap: + name: {{ .Release.Name }}-p11-proxy-tls-psk-conf + - name: nginx-env + configMap: + name: {{ .Release.Name }}-nginx-env + diff --git a/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-nginx-env-configmap.yml b/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-nginx-env-configmap.yml new file mode 100644 index 00000000..091f5ad9 --- /dev/null +++ b/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-nginx-env-configmap.yml @@ -0,0 +1,19 @@ +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ .Release.Name }}-nginx-env + namespace: {{ .Release.Namespace }} +data: + http_proxy: {{ .Values.http_proxy | default "" | quote }} + https_proxy: {{ .Values.https_proxy | default "" | quote }} + no_proxy: {{ .Values.no_proxy | default "" | quote }} + CLIENT_TOKEN: {{ .Values.ctk_loadkey.client_token | quote }} + CLIENT_KEY_LABEL: {{ .Values.ctk_loadkey.client_key_label | quote }} + TEST_UNIQUE_UID: {{ .Values.ctk_loadkey.test_unique_uid | quote }} + DEFAULT_USER_PIN: {{ .Values.ctk_loadkey.default_user_pin | quote }} + DEFAULT_SO_PIN: {{ .Values.ctk_loadkey.default_so_pin | quote }} + PKCS11_PROXY_TLS_PSK_FILE: {{ .Values.ctk_loadkey.pkcs11_proxy_tls_psk_file | quote }} + PKCS11_PROXY_SOCKET: "tls://{{ .Values.ctk_loadkey.pkcs11_daemon_socket_hostname }}:{{ .Values.ctk_loadkey.pkcs11_daemon_socket_port }}" + NGINX_HOSTNAME: {{ .Values.ctk_loadkey.nginx.hostname | quote }} + NGINX_PORT: {{ .Values.ctk_loadkey.nginx.port | quote }} diff --git a/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-pkcs11-proxy-tls-psk-configmap.yaml b/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-pkcs11-proxy-tls-psk-configmap.yaml new file mode 100644 index 00000000..4c5462f0 --- /dev/null +++ b/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-pkcs11-proxy-tls-psk-configmap.yaml @@ -0,0 +1,9 @@ +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ .Release.Name }}-p11-proxy-tls-psk-conf + namespace: {{ .Release.Namespace }} +data: + p11_proxy_tls.psk: | + {{ .Values.ctk_loadkey.pkcs11_proxy_tls_psk }} diff --git a/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-psp.yml b/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-psp.yml deleted file mode 100644 index 3f6cc72a..00000000 --- a/roles/kmra_install/charts/kmra-ctk/templates/kmra-ctk-loadkey-psp.yml +++ /dev/null @@ -1,25 +0,0 @@ ---- -apiVersion: policy/v1beta1 -kind: PodSecurityPolicy -metadata: - name: {{ .Release.Name }} -spec: - hostNetwork: true - hostPorts: - - max: {{ .Values.ctk_loadkey.main.port }} - min: {{ .Values.ctk_loadkey.main.port }} - allowPrivilegeEscalation: true - allowedCapabilities: - - '*' - allowedUnsafeSysctls: - - '*' - fsGroup: - rule: RunAsAny - runAsUser: - rule: RunAsAny - seLinux: - rule: RunAsAny - supplementalGroups: - rule: RunAsAny - volumes: - - "*" diff --git a/roles/kmra_install/charts/kmra-pccs/Chart.yaml b/roles/kmra_install/charts/kmra-pccs/Chart.yaml index 17359d87..234fe7a6 100644 --- a/roles/kmra_install/charts/kmra-pccs/Chart.yaml +++ b/roles/kmra_install/charts/kmra-pccs/Chart.yaml @@ -17,5 +17,5 @@ apiVersion: v1 description: Key Management Reference Application - PCCS name: kmra -version: 2.1 -appVersion: '2.1' +version: 2.2.1 +appVersion: '2.2.1' diff --git a/roles/kmra_install/charts/kmra-pccs/templates/kmra-pccs-psp.yml b/roles/kmra_install/charts/kmra-pccs/templates/kmra-pccs-psp.yml deleted file mode 100644 index 16d9d5c0..00000000 --- a/roles/kmra_install/charts/kmra-pccs/templates/kmra-pccs-psp.yml +++ /dev/null @@ -1,25 +0,0 @@ ---- -apiVersion: policy/v1beta1 -kind: PodSecurityPolicy -metadata: - name: {{ .Release.Name }} -spec: - hostNetwork: true - hostPorts: - - max: {{ .Values.pccs.main.port }} - min: {{ .Values.pccs.main.port }} - allowPrivilegeEscalation: false - allowedCapabilities: - - '*' - allowedUnsafeSysctls: - - '*' - fsGroup: - rule: RunAsAny - runAsUser: - rule: RunAsAny - seLinux: - rule: RunAsAny - supplementalGroups: - rule: RunAsAny - volumes: - - "*" diff --git a/roles/kmra_install/defaults/main.yml b/roles/kmra_install/defaults/main.yml index 259ad34d..8a4440cf 100644 --- a/roles/kmra_install/defaults/main.yml +++ b/roles/kmra_install/defaults/main.yml @@ -22,6 +22,7 @@ kmra_defaults: O: "AppHSM" OU: "root" CN: "localhost" + certs_validity_period_days: 365 apphsm: enabled: false release_name: "kmra-apphsm" @@ -29,17 +30,23 @@ kmra_defaults: chart_path: "{{ (project_root_dir, 'charts', 'kmra-apphsm') | path_join }}" image_repo: "docker.io" image_name: "intel/apphsm" - image_tag: "v2.1" + image_tag: "v2.2.1" init_image_repo: "docker.io" init_image_name: "busybox" init_image_tag: "1.35" upstream_port: 5000 - hostname: "{{ hostvars[groups['kube_node'][0]]['ansible_default_ipv4']['address'] }}" + hostname: | + {%- if vm_enabled %} + {{ hostvars[groups['kube_node'][0]]['ansible_all_ipv4_addresses'] | ipaddr(hostvars[groups['vm_host'][0]]['vxlan_gw_ip']) | join('') | trim }} + {%- else %} + {{ hostvars[groups['kube_node'][0]]['ansible_default_ipv4']['address'] }} + {%- endif %} test_ctk_loadkey_cert_user_id: "ctk_loadkey_user_id_01234" generic_client_cert_id: "generic_client_id_01234" default_user_pin: "1234" default_so_pin: "12345678" use_secure_cert: false + nonce_lifetime: 300 crt_subj: O: "AppHSM" OU: "AppHSM" @@ -63,7 +70,7 @@ kmra_defaults: crt_subj: O: "SampleOrganisation" CN: "localhost" - - id: "tcsclusterissuer.tcs.intel.com/{{ service_mesh.sgx_signer.name | default('sgx-signer') }}" + - id: "tcsclusterissuer.tcs.intel.com/{{ istio_service_mesh.sgx_signer.name | default('sgx-signer') }}" token_name: "token_3" pin: "1234" key_name: "key_3" @@ -81,7 +88,7 @@ kmra_defaults: sgx_provisioning_api_url: "https://api.trustedservices.intel.com/sgx/certification/v3/" image_repo: "docker.io" image_name: "intel/pccs" - image_tag: "v2.1" + image_tag: "v2.2.1" upstream_port: 8081 hostname: "localhost" crt_subj: @@ -97,18 +104,26 @@ kmra_defaults: chart_path: "{{ (project_root_dir, 'charts', 'kmra-ctk') | path_join }}" image_repo: "docker.io" image_name: "intel/ctk_loadkey" - image_tag: "v2.1" + image_tag: "v2.2.1" init_image_repo: "docker.io" init_image_name: "busybox" init_image_tag: "1.35" - upstream_port: 8082 - upstream_server_name: "127.0.0.1" default_user_pin: "4321" default_so_pin: "87654321" + default_client_token_id: "0xDEADBEEF" client_token: "client_token" client_key_label: "client_key_priv" test_unique_uid: "unique_id_1234" use_secure_cert: false + nginx_image_repo: "docker.io" + nginx_image_name: "intel/nginx" + nginx_image_tag: "v2.2.1" + nginx_demo_port: 8083 + nginx_demo_server_name: "127.0.0.1" + pkcs11_proxy_tls_psk: "test:e9622c85018998993fcc16f5ce9c15e9" + pkcs11_proxy_tls_psk_file: "/etc/p11_proxy_tls.psk" + pkcs11_daemon_socket_hostname: "127.0.0.1" + pkcs11_daemon_socket_port: 8082 crt_subj: O: "AppHSM" OU: "ctk_loadkey_user_id_01234" diff --git a/roles/kmra_install/tasks/cleanup.yml b/roles/kmra_install/tasks/cleanup.yml index afe99cbf..e03d000e 100644 --- a/roles/kmra_install/tasks/cleanup.yml +++ b/roles/kmra_install/tasks/cleanup.yml @@ -42,6 +42,3 @@ - "{{ kmra.apphsm.helm_values_file }}" - "{{ kmra.ctk_loadkey_demo.helm_values_file }}" failed_when: false - - - diff --git a/roles/kmra_install/tasks/create_tls_secrets.yml b/roles/kmra_install/tasks/create_tls_secrets.yml index 88c4beb8..43aa50ab 100644 --- a/roles/kmra_install/tasks/create_tls_secrets.yml +++ b/roles/kmra_install/tasks/create_tls_secrets.yml @@ -28,7 +28,7 @@ -subj "/O={{ kmra.ca_root_crt_subj.O }}/OU={{ kmra.ca_root_crt_subj.OU }}/CN={{ kmra.ca_root_crt_subj.CN }}" changed_when: true -- name: generate csr for the {{ item.name }} +- name: generate csr for the kmra app command: >- openssl req -nodes -newkey rsa:2048 -keyout {{ (mtls_tmp_dir.path, item.name) | path_join }}.key @@ -39,14 +39,15 @@ when: - item.deploy -- name: generate cert for the {{ item.name }} app +- name: generate cert for the kmra app - bmra shell: >- set -o pipefail && openssl x509 -req -in {{ (mtls_tmp_dir.path, item.name) | path_join }}.csr + -days {{ kmra.certs_validity_period_days }} -CA {{ (mtls_tmp_dir.path, 'ca.crt') | path_join }} -CAkey {{ (mtls_tmp_dir.path, 'ca.key') | path_join }} {{ '-extfile <(printf "subjectAltName=DNS:' + item.subj.CN + ',IP:' - + hostvars[groups["kube_node"][0]]["ansible_default_ipv4"]["address"] + '")' + + hostvars[groups['kube_node'][0]]['ansible_default_ipv4']['address'] + '")' if item.subj.CN | default('') | length > 0 }} -CAcreateserial -CAserial {{ (mtls_tmp_dir.path, 'ca.srl' ) | path_join }} -out {{ (mtls_tmp_dir.path, item.name) | path_join }}.crt @@ -58,8 +59,30 @@ changed_when: true when: - item.deploy + - not vm_enabled | default(false) -- name: create secret for the {{ item.name }} app +- name: generate cert for the kmra app - vmra + shell: >- + set -o pipefail && + openssl x509 -req -in {{ (mtls_tmp_dir.path, item.name) | path_join }}.csr + -CA {{ (mtls_tmp_dir.path, 'ca.crt') | path_join }} + -CAkey {{ (mtls_tmp_dir.path, 'ca.key') | path_join }} + {{ '-extfile <(printf "subjectAltName=DNS:' + item.subj.CN + ',IP:' + + hostvars[groups['kube_node'][0]]['ansible_all_ipv4_addresses'] | ipaddr(hostvars[groups['vm_host'][0]]['vxlan_gw_ip']) | join('') | trim + '")' + if item.subj.CN | default('') | length > 0 }} + -CAcreateserial -CAserial {{ (mtls_tmp_dir.path, 'ca.srl' ) | path_join }} + -out {{ (mtls_tmp_dir.path, item.name) | path_join }}.crt + args: + executable: /bin/bash + loop: "{{ kmra_apps }}" + loop_control: + extended: yes + changed_when: true + when: + - item.deploy + - vm_enabled | default(false) + +- name: create secret for the kmra app shell: >- set -o pipefail && kubectl create secret generic {{ item.name }}-tls --from-file=tls.cert={{ (mtls_tmp_dir.path, item.name) | path_join }}.crt diff --git a/roles/kmra_install/tasks/main.yml b/roles/kmra_install/tasks/main.yml index dcbde47d..e2607633 100644 --- a/roles/kmra_install/tasks/main.yml +++ b/roles/kmra_install/tasks/main.yml @@ -23,7 +23,9 @@ include_role: name: check_machine_type when: - - inventory_hostname == groups['kube_node'][0] + - inventory_hostname in groups['kube_node'] or + inventory_hostname in groups['vm_host'] + - not on_vms | default (false) - name: prepare worker node block: @@ -74,8 +76,6 @@ when: - kmra.ctk_loadkey_demo.enabled | bool - inventory_hostname == groups['kube_node'][0] - - is_icx | default(false) | bool or - is_spr | default(false) | bool - name: update aesmd/qcnl host settings block: @@ -120,8 +120,6 @@ when: - inventory_hostname == groups['kube_node'][0] - kmra.pccs.enabled - - is_icx | default(false) | bool or - is_spr | default(false) | bool - name: prepare and deploy kmra block: @@ -134,37 +132,38 @@ - name: label worker node with KMRA label command: kubectl label nodes {{ hostvars[groups['kube_node'][0]]['ansible_hostname'] }} app=kmra --overwrite - - - name: create kmra ns if not existing - shell: "set -o pipefail && kubectl create ns {{ kmra.namespace }} -o yaml --dry-run=client | kubectl apply -f -" - args: - executable: /bin/bash changed_when: true + - name: create KMRA namespace + k8s: + name: "{{ kmra.namespace }}" + kind: Namespace + state: present + - name: create k8s tls secrets for apphsm and ctk apps include: create_tls_secrets.yml vars: kmra_apps: - { - name: "{{ kmra.pccs.release_name }}", - subj: "{{ kmra.pccs.crt_subj }}", - deploy: "{{ kmra.pccs.enabled | default(false) }}" - } + name: "{{ kmra.pccs.release_name }}", + subj: "{{ kmra.pccs.crt_subj }}", + deploy: "{{ kmra.pccs.enabled | default(false) }}" + } - { - name: "{{ kmra.apphsm.release_name }}", - subj: "{{ kmra.apphsm.crt_subj }}", - deploy: "{{ kmra.apphsm.enabled | default(false) }}" - } + name: "{{ kmra.apphsm.release_name }}", + subj: "{{ kmra.apphsm.crt_subj }}", + deploy: "{{ kmra.apphsm.enabled | default(false) }}" + } - { - name: "generic-apphsm-client", - subj: { O: "AppHSM", OU: "{{ kmra.apphsm.generic_client_cert_id }}" }, - deploy: "{{ kmra.apphsm.enabled | default(false) }}" - } + name: "generic-apphsm-client", + subj: {O: "AppHSM", OU: "{{ kmra.apphsm.generic_client_cert_id }}"}, + deploy: "{{ kmra.apphsm.enabled | default(false) }}" + } - { - name: "{{ kmra.ctk_loadkey_demo.release_name }}", - subj: "{{ kmra.ctk_loadkey_demo.crt_subj }}", - deploy: "{{ kmra.ctk_loadkey_demo.enabled | default(false) }}" - } + name: "{{ kmra.ctk_loadkey_demo.release_name }}", + subj: "{{ kmra.ctk_loadkey_demo.crt_subj }}", + deploy: "{{ kmra.ctk_loadkey_demo.enabled | default(false) }}" + } - name: create Helm charts directory if needed file: @@ -178,9 +177,9 @@ dest: "{{ (project_root_dir, 'charts') | path_join }}" mode: 0755 loop: - - { chart: 'kmra-pccs', deploy: "{{ kmra.pccs.enabled | default(false) }}" } - - { chart: 'kmra-apphsm', deploy: "{{ kmra.apphsm.enabled | default(false) }}" } - - { chart: 'kmra-ctk', deploy: "{{ kmra.ctk_loadkey_demo.enabled | default(false) }}" } + - {chart: 'kmra-pccs', deploy: "{{ kmra.pccs.enabled | default(false) }}"} + - {chart: 'kmra-apphsm', deploy: "{{ kmra.apphsm.enabled | default(false) }}"} + - {chart: 'kmra-ctk', deploy: "{{ kmra.ctk_loadkey_demo.enabled | default(false) }}"} when: - item.deploy @@ -192,35 +191,35 @@ mode: preserve loop: - { - src: "kmra-pccs-values.yaml.j2", - dest: "{{ (project_root_dir, 'charts', 'kmra-pccs-values.yml') | path_join }}", - deploy: "{{ kmra.pccs.enabled | default(false) }}" - } + src: "kmra-pccs-values.yaml.j2", + dest: "{{ (project_root_dir, 'charts', 'kmra-pccs-values.yml') | path_join }}", + deploy: "{{ kmra.pccs.enabled | default(false) }}" + } - { - src: "kmra-pccs-rbac-cluster-role.yml.j2", - dest: "{{ (kmra.pccs.chart_path, 'templates','kmra-pccs-rbac-cluster-role.yml') | path_join }}", - deploy: "{{ kmra.pccs.enabled | default(false) }}" - } + src: "kmra-pccs-rbac-cluster-role.yml.j2", + dest: "{{ (kmra.pccs.chart_path, 'templates','kmra-pccs-rbac-cluster-role.yml') | path_join }}", + deploy: "{{ kmra.pccs.enabled | default(false) }}" + } - { - src: "kmra-apphsm-values.yaml.j2", - dest: "{{ (project_root_dir, 'charts', 'kmra-apphsm-values.yml') | path_join }}", - deploy: "{{ kmra.apphsm.enabled | default(false) }}" - } + src: "kmra-apphsm-values.yaml.j2", + dest: "{{ (project_root_dir, 'charts', 'kmra-apphsm-values.yml') | path_join }}", + deploy: "{{ kmra.apphsm.enabled | default(false) }}" + } - { - src: "kmra-apphsm-rbac-cluster-role.yml.j2", - dest: "{{ (kmra.apphsm.chart_path, 'templates', 'kmra-apphsm-rbac-cluster-role.yml') | path_join }}", - deploy: "{{ kmra.apphsm.enabled | default(false) }}" - } + src: "kmra-apphsm-rbac-cluster-role.yml.j2", + dest: "{{ (kmra.apphsm.chart_path, 'templates', 'kmra-apphsm-rbac-cluster-role.yml') | path_join }}", + deploy: "{{ kmra.apphsm.enabled | default(false) }}" + } - { - src: "kmra-ctk-values.yaml.j2", - dest: "{{ (project_root_dir, 'charts', 'kmra-ctk-values.yml') | path_join }}", - deploy: "{{ kmra.ctk_loadkey_demo.enabled | default(false) }}" - } + src: "kmra-ctk-values.yaml.j2", + dest: "{{ (project_root_dir, 'charts', 'kmra-ctk-values.yml') | path_join }}", + deploy: "{{ kmra.ctk_loadkey_demo.enabled | default(false) }}" + } - { - src: "kmra-ctk-loadkey-rbac-cluster-role.yml.j2", - dest: "{{ (kmra.ctk_loadkey_demo.chart_path, 'templates', 'kmra-ctk-loadkey-rbac-cluster-role.yml') | path_join }}", - deploy: "{{ kmra.ctk_loadkey_demo.enabled | default(false) }}" - } + src: "kmra-ctk-loadkey-rbac-cluster-role.yml.j2", + dest: "{{ (kmra.ctk_loadkey_demo.chart_path, 'templates', 'kmra-ctk-loadkey-rbac-cluster-role.yml') | path_join }}", + deploy: "{{ kmra.ctk_loadkey_demo.enabled | default(false) }}" + } when: - item.deploy @@ -242,6 +241,16 @@ when: - kmra.apphsm.enabled | default(false) + - name: Wait for apphsm to start + shell: "set -o pipefail && kubectl get pods -n kmra | grep 'apphsm.*Running'" + args: + executable: /bin/bash + register: apphsm_status + retries: 60 + delay: 10 + until: apphsm_status.stdout | length > 0 + changed_when: true + - name: install KMRA Ctk loadkey helm chart command: >- helm upgrade -i {{ kmra.ctk_loadkey_demo.release_name }} @@ -252,5 +261,3 @@ - kmra.ctk_loadkey_demo.enabled | default(false) when: - inventory_hostname == groups['kube_control_plane'][0] - - hostvars[groups['kube_node'][0]]['is_icx'] or - hostvars[groups['kube_node'][0]]['is_spr'] diff --git a/roles/kmra_install/templates/kmra-apphsm-rbac-cluster-role.yml.j2 b/roles/kmra_install/templates/kmra-apphsm-rbac-cluster-role.yml.j2 index c14d990b..e671faa9 100644 --- a/roles/kmra_install/templates/kmra-apphsm-rbac-cluster-role.yml.j2 +++ b/roles/kmra_install/templates/kmra-apphsm-rbac-cluster-role.yml.j2 @@ -12,10 +12,3 @@ rules: resources: - subjectaccessreviews verbs: ["create"] -{% if psp_enabled %} - - apiGroups: ['policy'] - resources: ['podsecuritypolicies'] - verbs: ['use'] - resourceNames: - - {% raw %}{{ .Release.Name }}{% endraw %}{{''}} -{% endif %} diff --git a/roles/kmra_install/templates/kmra-apphsm-values.yaml.j2 b/roles/kmra_install/templates/kmra-apphsm-values.yaml.j2 index 2d4811be..d471ebb3 100644 --- a/roles/kmra_install/templates/kmra-apphsm-values.yaml.j2 +++ b/roles/kmra_install/templates/kmra-apphsm-values.yaml.j2 @@ -17,7 +17,7 @@ apphsm: tag: "{{ kmra.apphsm.image_tag }}" pullPolicy: IfNotPresent port: "{{ kmra.apphsm.upstream_port }}" - hostname: "{{ kmra.apphsm.hostname }}" + hostname: "{{ kmra.apphsm.hostname | trim}}" init: image: repo: "{{ kmra.apphsm.init_image_repo }}" @@ -32,4 +32,5 @@ apphsm: ctk_loadkey_demo_enabled: "{{ kmra.ctk_loadkey_demo.enabled | bool | lower }}" default_user_pin: "{{ kmra.apphsm.default_user_pin }}" default_so_pin: "{{ kmra.apphsm.default_so_pin }}" + nonce_lifetime: "{{ kmra.apphsm.nonce_lifetime }}" keys: {{ kmra.apphsm.app_keys }} \ No newline at end of file diff --git a/roles/kmra_install/templates/kmra-ctk-loadkey-rbac-cluster-role.yml.j2 b/roles/kmra_install/templates/kmra-ctk-loadkey-rbac-cluster-role.yml.j2 index c14d990b..e671faa9 100644 --- a/roles/kmra_install/templates/kmra-ctk-loadkey-rbac-cluster-role.yml.j2 +++ b/roles/kmra_install/templates/kmra-ctk-loadkey-rbac-cluster-role.yml.j2 @@ -12,10 +12,3 @@ rules: resources: - subjectaccessreviews verbs: ["create"] -{% if psp_enabled %} - - apiGroups: ['policy'] - resources: ['podsecuritypolicies'] - verbs: ['use'] - resourceNames: - - {% raw %}{{ .Release.Name }}{% endraw %}{{''}} -{% endif %} diff --git a/roles/kmra_install/templates/kmra-ctk-values.yaml.j2 b/roles/kmra_install/templates/kmra-ctk-values.yaml.j2 index 9ed6a887..05fcb604 100644 --- a/roles/kmra_install/templates/kmra-ctk-values.yaml.j2 +++ b/roles/kmra_install/templates/kmra-ctk-values.yaml.j2 @@ -16,18 +16,24 @@ ctk_loadkey: name: "{{ kmra.ctk_loadkey_demo.image_name }}" tag: "{{ kmra.ctk_loadkey_demo.image_tag }}" pullPolicy: IfNotPresent - port: "{{ kmra.ctk_loadkey_demo.upstream_port }}" - hostname: "{{ kmra.ctk_loadkey_demo.upstream_server_name | default('0.0.0.0')}}" init: image: repo: "{{ kmra.ctk_loadkey_demo.init_image_repo }}" name: "{{ kmra.ctk_loadkey_demo.init_image_name }}" tag: "{{ kmra.ctk_loadkey_demo.init_image_tag }}" pullPolicy: IfNotPresent + nginx: + image: + repo: "{{ kmra.ctk_loadkey_demo.nginx_image_repo }}" + name: "{{ kmra.ctk_loadkey_demo.nginx_image_name }}" + tag: "{{ kmra.ctk_loadkey_demo.nginx_image_tag }}" + pullPolicy: IfNotPresent + port: "{{ kmra.ctk_loadkey_demo.nginx_demo_port }}" + hostname: "{{ kmra.ctk_loadkey_demo.nginx_demo_server_name | default('0.0.0.0')}}" pccs_port: "{{ kmra.pccs.upstream_port }}" pccs_hostname: "{{ kmra.pccs.hostname }}" apphsm_port: "{{ kmra.apphsm.upstream_port }}" - apphsm_hostname: "{{ kmra.apphsm.hostname }}" + apphsm_hostname: "{{ kmra.apphsm.hostname | trim}}" sgx_prv_gid: "{{ hostvars[groups['kube_node'][0]]['getent_group']['sgx_prv'][1] | default('1002')}}" use_secure_cert: "{{ kmra.ctk_loadkey_demo.use_secure_cert | quote }}" client_token: "{{ kmra.ctk_loadkey_demo.client_token }}" @@ -35,3 +41,8 @@ ctk_loadkey: test_unique_uid: "{{ kmra.ctk_loadkey_demo.test_unique_uid }}" default_user_pin: "{{ kmra.ctk_loadkey_demo.default_user_pin }}" default_so_pin: "{{ kmra.ctk_loadkey_demo.default_so_pin }}" + default_client_token_id: "{{ kmra.ctk_loadkey_demo.default_client_token_id }}" + pkcs11_proxy_tls_psk: "{{ kmra.ctk_loadkey_demo.pkcs11_proxy_tls_psk }}" + pkcs11_proxy_tls_psk_file: "{{ kmra.ctk_loadkey_demo.pkcs11_proxy_tls_psk_file }}" + pkcs11_daemon_socket_hostname: "{{ kmra.ctk_loadkey_demo.pkcs11_daemon_socket_hostname }}" + pkcs11_daemon_socket_port: "{{ kmra.ctk_loadkey_demo.pkcs11_daemon_socket_port }}" \ No newline at end of file diff --git a/roles/kmra_install/templates/kmra-pccs-rbac-cluster-role.yml.j2 b/roles/kmra_install/templates/kmra-pccs-rbac-cluster-role.yml.j2 index c14d990b..e671faa9 100644 --- a/roles/kmra_install/templates/kmra-pccs-rbac-cluster-role.yml.j2 +++ b/roles/kmra_install/templates/kmra-pccs-rbac-cluster-role.yml.j2 @@ -12,10 +12,3 @@ rules: resources: - subjectaccessreviews verbs: ["create"] -{% if psp_enabled %} - - apiGroups: ['policy'] - resources: ['podsecuritypolicies'] - verbs: ['use'] - resourceNames: - - {% raw %}{{ .Release.Name }}{% endraw %}{{''}} -{% endif %} diff --git a/roles/kube_prometheus/defaults/main.yml b/roles/kube_prometheus/defaults/main.yml index e6d8e0c7..ddeb5314 100644 --- a/roles/kube_prometheus/defaults/main.yml +++ b/roles/kube_prometheus/defaults/main.yml @@ -24,8 +24,10 @@ prometheus_srv_expose: false prometheus_srv_proxy_port: 9443 prometheus_srv_node_port: 30443 prometheus_srv_address: 127.0.0.1 -prometheus_srv_nginx_image: "docker.io/library/nginx:1.21.6-alpine" -prometheus_srv_nginx_ssl_ciphers: "AES128-CCM-SHA256:CHACHA20-POLY1305-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256" +prometheus_srv_nginx_image: "docker.io/library/nginx:1.23.1-alpine" +prometheus_srv_nginx_ssl_ciphers: + "AES128-CCM-SHA256:CHACHA20-POLY1305-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA\ + -AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256" prometheus_srv_nginx_ssl_protocols: "TLSv1.2 TLSv1.3" prometheus_srv_location_exposed: "/prometheus/" prometheus_srv_user: prometheus diff --git a/roles/kube_prometheus/files/dashboards/grafana-dashboardDefinitions.yaml b/roles/kube_prometheus/files/dashboards/grafana-dashboardDefinitions.yaml index 5be53312..a021da68 100644 --- a/roles/kube_prometheus/files/dashboards/grafana-dashboardDefinitions.yaml +++ b/roles/kube_prometheus/files/dashboards/grafana-dashboardDefinitions.yaml @@ -1737,7 +1737,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: cluster-total.json: |- @@ -3608,7 +3608,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: controller-manager.json: |- @@ -4748,7 +4748,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: k8s-resources-cluster.json: |- @@ -7319,7 +7319,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: k8s-resources-namespace.json: |- @@ -9594,7 +9594,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: k8s-resources-node.json: |- @@ -10561,7 +10561,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: k8s-resources-pod.json: |- @@ -12322,7 +12322,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: k8s-resources-workload.json: |- @@ -14345,7 +14345,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: k8s-resources-workloads-namespace.json: |- @@ -16529,7 +16529,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: kubelet.json: |- @@ -19051,7 +19051,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: namespace-by-pod.json: |- @@ -20504,7 +20504,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: namespace-by-workload.json: |- @@ -22229,7 +22229,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: node-cluster-rsrc-use.json: |- @@ -23182,7 +23182,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: node-rsrc-use.json: |- @@ -24162,7 +24162,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: nodes.json: |- @@ -25152,7 +25152,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: persistentvolumesusage.json: |- @@ -25718,7 +25718,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: pod-total.json: |- @@ -26935,7 +26935,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: prometheus-remote-write.json: |- @@ -28594,7 +28594,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: prometheus.json: |- @@ -29810,7 +29810,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: proxy.json: |- @@ -31030,7 +31030,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: scheduler.json: |- @@ -32093,7 +32093,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: statefulset.json: |- @@ -33010,7 +33010,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: workload-total.json: |- @@ -34437,7 +34437,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: collectd-cpu.json: |- @@ -35531,7 +35531,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: collectd-disk.json: |- @@ -36621,7 +36621,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: collectd-intel.json: |- @@ -37119,7 +37119,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: collectd-ipmi.json: |- @@ -37562,7 +37562,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: collectd-netlink.json: |- @@ -39115,7 +39115,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: collectd-ovs.json: |- @@ -40335,7 +40335,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: collectd-power.json: |- @@ -40889,7 +40889,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: collectd-numa.json: |- @@ -41665,7 +41665,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: collectd-hugepages.json: |- @@ -42001,7 +42001,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: collectd-ethstats.json: |- @@ -42777,5 +42777,5 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.5.3 + app.kubernetes.io/version: 8.5.11 kind: ConfigMapList diff --git a/roles/kube_prometheus/files/dashboards/grafana-telegraf-dashboardDefinitions.yaml b/roles/kube_prometheus/files/dashboards/grafana-telegraf-dashboardDefinitions.yaml index 61407291..7ed9d52b 100644 --- a/roles/kube_prometheus/files/dashboards/grafana-telegraf-dashboardDefinitions.yaml +++ b/roles/kube_prometheus/files/dashboards/grafana-telegraf-dashboardDefinitions.yaml @@ -1737,7 +1737,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: cluster-total.json: |- @@ -3608,7 +3608,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: controller-manager.json: |- @@ -4748,7 +4748,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: k8s-resources-cluster.json: |- @@ -7319,7 +7319,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: k8s-resources-namespace.json: |- @@ -9594,7 +9594,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: k8s-resources-node.json: |- @@ -10561,7 +10561,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: k8s-resources-pod.json: |- @@ -12322,7 +12322,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: k8s-resources-workload.json: |- @@ -14345,7 +14345,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: k8s-resources-workloads-namespace.json: |- @@ -16529,7 +16529,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: kubelet.json: |- @@ -19051,7 +19051,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: namespace-by-pod.json: |- @@ -20504,7 +20504,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: namespace-by-workload.json: |- @@ -22229,7 +22229,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: node-cluster-rsrc-use.json: |- @@ -23182,7 +23182,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: node-rsrc-use.json: |- @@ -24162,7 +24162,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: nodes.json: |- @@ -25152,7 +25152,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: persistentvolumesusage.json: |- @@ -25718,7 +25718,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: pod-total.json: |- @@ -26935,7 +26935,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: prometheus-remote-write.json: |- @@ -28594,7 +28594,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: prometheus.json: |- @@ -29810,7 +29810,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: proxy.json: |- @@ -31030,7 +31030,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: scheduler.json: |- @@ -32093,7 +32093,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: statefulset.json: |- @@ -33010,7 +33010,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: workload-total.json: |- @@ -34437,7 +34437,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: collectd-cpu.json: |- @@ -36386,7 +36386,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: collectd-disk.json: |- @@ -37476,7 +37476,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: collectd-intel.json: |- @@ -37974,7 +37974,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: collectd-ipmi.json: |- @@ -38417,7 +38417,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: collectd-netlink.json: |- @@ -39415,7 +39415,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: collectd-power.json: |- @@ -39858,7 +39858,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: collectd-numa.json: |- @@ -41189,7 +41189,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: collectd-hugepages.json: |- @@ -41525,7 +41525,7 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 - apiVersion: v1 data: collectd-ethstats.json: |- @@ -42301,5 +42301,5 @@ items: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 kind: ConfigMapList diff --git a/roles/kube_prometheus/files/kube-prometheus/grafana-dashboardDatasources.yaml b/roles/kube_prometheus/files/kube-prometheus/grafana-dashboardDatasources.yaml index cdd6e694..69404eac 100644 --- a/roles/kube_prometheus/files/kube-prometheus/grafana-dashboardDatasources.yaml +++ b/roles/kube_prometheus/files/kube-prometheus/grafana-dashboardDatasources.yaml @@ -13,5 +13,5 @@ metadata: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 type: Opaque diff --git a/roles/kube_prometheus/files/kube-prometheus/grafana-dashboardSources.yaml b/roles/kube_prometheus/files/kube-prometheus/grafana-dashboardSources.yaml index 425be93d..178ee5fc 100644 --- a/roles/kube_prometheus/files/kube-prometheus/grafana-dashboardSources.yaml +++ b/roles/kube_prometheus/files/kube-prometheus/grafana-dashboardSources.yaml @@ -27,4 +27,4 @@ metadata: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 diff --git a/roles/kube_prometheus/files/kube-prometheus/persistent-volume-claim-grafana.yaml b/roles/kube_prometheus/files/kube-prometheus/persistent-volume-claim-grafana.yaml index a45ca8bf..149e5b29 100644 --- a/roles/kube_prometheus/files/kube-prometheus/persistent-volume-claim-grafana.yaml +++ b/roles/kube_prometheus/files/kube-prometheus/persistent-volume-claim-grafana.yaml @@ -11,7 +11,7 @@ metadata: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 spec: storageClassName: kube-prometheus selector: diff --git a/roles/kube_prometheus/files/kube-prometheus/persistent-volume-claim-prometheus.yaml b/roles/kube_prometheus/files/kube-prometheus/persistent-volume-claim-prometheus.yaml index 81a2b1f4..b6061481 100644 --- a/roles/kube_prometheus/files/kube-prometheus/persistent-volume-claim-prometheus.yaml +++ b/roles/kube_prometheus/files/kube-prometheus/persistent-volume-claim-prometheus.yaml @@ -11,7 +11,7 @@ metadata: app.kubernetes.io/component: prometheus app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 2.35.0 + app.kubernetes.io/version: 2.37.1 spec: storageClassName: kube-prometheus selector: diff --git a/roles/kube_prometheus/files/kube-prometheus/prometheus-clusterRole.yaml b/roles/kube_prometheus/files/kube-prometheus/prometheus-clusterRole.yaml index 23c83415..ff41ef8e 100644 --- a/roles/kube_prometheus/files/kube-prometheus/prometheus-clusterRole.yaml +++ b/roles/kube_prometheus/files/kube-prometheus/prometheus-clusterRole.yaml @@ -10,7 +10,7 @@ metadata: app.kubernetes.io/component: prometheus app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 2.35.0 + app.kubernetes.io/version: 2.37.1 rules: - apiGroups: - "" diff --git a/roles/kube_prometheus/files/kube-prometheus/prometheus-clusterRoleBinding.yaml b/roles/kube_prometheus/files/kube-prometheus/prometheus-clusterRoleBinding.yaml index ebe17489..680efd1a 100644 --- a/roles/kube_prometheus/files/kube-prometheus/prometheus-clusterRoleBinding.yaml +++ b/roles/kube_prometheus/files/kube-prometheus/prometheus-clusterRoleBinding.yaml @@ -10,7 +10,7 @@ metadata: app.kubernetes.io/component: prometheus app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 2.35.0 + app.kubernetes.io/version: 2.37.1 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole diff --git a/roles/kube_prometheus/files/kube-prometheus/prometheus-roleBindingConfig.yaml b/roles/kube_prometheus/files/kube-prometheus/prometheus-roleBindingConfig.yaml index 98a09b9d..a80ee670 100644 --- a/roles/kube_prometheus/files/kube-prometheus/prometheus-roleBindingConfig.yaml +++ b/roles/kube_prometheus/files/kube-prometheus/prometheus-roleBindingConfig.yaml @@ -5,7 +5,7 @@ metadata: app.kubernetes.io/component: prometheus app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 2.35.0 + app.kubernetes.io/version: 2.37.1 name: prometheus-k8s-config namespace: monitoring roleRef: diff --git a/roles/kube_prometheus/files/kube-prometheus/prometheus-roleBindingSpecificNamespaces.yaml b/roles/kube_prometheus/files/kube-prometheus/prometheus-roleBindingSpecificNamespaces.yaml index d5af629b..b8b6ef8e 100644 --- a/roles/kube_prometheus/files/kube-prometheus/prometheus-roleBindingSpecificNamespaces.yaml +++ b/roles/kube_prometheus/files/kube-prometheus/prometheus-roleBindingSpecificNamespaces.yaml @@ -7,7 +7,7 @@ items: app.kubernetes.io/component: prometheus app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 2.35.0 + app.kubernetes.io/version: 2.37.1 name: prometheus-k8s namespace: default roleRef: @@ -25,7 +25,7 @@ items: app.kubernetes.io/component: prometheus app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 2.35.0 + app.kubernetes.io/version: 2.37.1 name: prometheus-k8s namespace: kube-system roleRef: @@ -43,7 +43,7 @@ items: app.kubernetes.io/component: prometheus app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 2.35.0 + app.kubernetes.io/version: 2.37.1 name: prometheus-k8s namespace: monitoring roleRef: diff --git a/roles/kube_prometheus/files/kube-prometheus/prometheus-roleConfig.yaml b/roles/kube_prometheus/files/kube-prometheus/prometheus-roleConfig.yaml index 45a37528..46c7f9e6 100644 --- a/roles/kube_prometheus/files/kube-prometheus/prometheus-roleConfig.yaml +++ b/roles/kube_prometheus/files/kube-prometheus/prometheus-roleConfig.yaml @@ -5,7 +5,7 @@ metadata: app.kubernetes.io/component: prometheus app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 2.35.0 + app.kubernetes.io/version: 2.37.1 name: prometheus-k8s-config namespace: monitoring rules: diff --git a/roles/kube_prometheus/files/kube-prometheus/prometheus-roleSpecificNamespaces.yaml b/roles/kube_prometheus/files/kube-prometheus/prometheus-roleSpecificNamespaces.yaml index 78b972da..f5a6793a 100644 --- a/roles/kube_prometheus/files/kube-prometheus/prometheus-roleSpecificNamespaces.yaml +++ b/roles/kube_prometheus/files/kube-prometheus/prometheus-roleSpecificNamespaces.yaml @@ -7,7 +7,7 @@ items: app.kubernetes.io/component: prometheus app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 2.35.0 + app.kubernetes.io/version: 2.37.1 name: prometheus-k8s namespace: default rules: @@ -44,7 +44,7 @@ items: app.kubernetes.io/component: prometheus app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 2.35.0 + app.kubernetes.io/version: 2.37.1 name: prometheus-k8s namespace: kube-system rules: @@ -81,7 +81,7 @@ items: app.kubernetes.io/component: prometheus app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 2.35.0 + app.kubernetes.io/version: 2.37.1 name: prometheus-k8s namespace: monitoring rules: diff --git a/roles/kube_prometheus/files/kube-prometheus/prometheus-service.yaml b/roles/kube_prometheus/files/kube-prometheus/prometheus-service.yaml index 56e3a64e..91558ddc 100644 --- a/roles/kube_prometheus/files/kube-prometheus/prometheus-service.yaml +++ b/roles/kube_prometheus/files/kube-prometheus/prometheus-service.yaml @@ -10,7 +10,7 @@ metadata: app.kubernetes.io/component: prometheus app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 2.35.0 + app.kubernetes.io/version: 2.37.1 name: prometheus-k8s namespace: monitoring spec: diff --git a/roles/kube_prometheus/files/kube-prometheus/prometheus-serviceAccount.yaml b/roles/kube_prometheus/files/kube-prometheus/prometheus-serviceAccount.yaml index 6f72b245..ac9a8306 100644 --- a/roles/kube_prometheus/files/kube-prometheus/prometheus-serviceAccount.yaml +++ b/roles/kube_prometheus/files/kube-prometheus/prometheus-serviceAccount.yaml @@ -5,6 +5,6 @@ metadata: app.kubernetes.io/component: prometheus app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 2.35.0 + app.kubernetes.io/version: 2.37.1 name: prometheus-k8s namespace: monitoring diff --git a/roles/kube_prometheus/files/kube-prometheus/setup/prometheus-operator-deployment.yml b/roles/kube_prometheus/files/kube-prometheus/setup/prometheus-operator-deployment.yml index 51f34393..a36c4fce 100644 --- a/roles/kube_prometheus/files/kube-prometheus/setup/prometheus-operator-deployment.yml +++ b/roles/kube_prometheus/files/kube-prometheus/setup/prometheus-operator-deployment.yml @@ -29,6 +29,9 @@ spec: - key: node-role.kubernetes.io/master effect: NoSchedule operator: Exists + - key: node-role.kubernetes.io/control-plane + effect: NoSchedule + operator: Exists containers: - args: - --kubelet-service=kube-system/kubelet diff --git a/roles/kube_prometheus/files/kube-prometheus/setup/prometheus-operator-serviceAccount.yaml b/roles/kube_prometheus/files/kube-prometheus/setup/prometheus-operator-serviceAccount.yaml index a2e5e7bb..bc60bbeb 100644 --- a/roles/kube_prometheus/files/kube-prometheus/setup/prometheus-operator-serviceAccount.yaml +++ b/roles/kube_prometheus/files/kube-prometheus/setup/prometheus-operator-serviceAccount.yaml @@ -5,6 +5,6 @@ metadata: app.kubernetes.io/component: prometheus app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 2.35.0 + app.kubernetes.io/version: 2.37.1 name: prometheus-operator namespace: monitoring diff --git a/roles/kube_prometheus/files/kube-prometheus/storage-class.yaml b/roles/kube_prometheus/files/kube-prometheus/storage-class.yaml index 75c44656..bacb62cf 100644 --- a/roles/kube_prometheus/files/kube-prometheus/storage-class.yaml +++ b/roles/kube_prometheus/files/kube-prometheus/storage-class.yaml @@ -7,7 +7,7 @@ metadata: app.kubernetes.io/component: prometheus app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 2.35.0 + app.kubernetes.io/version: 2.37.1 provisioner: kubernetes.io/no-provisioner reclaimPolicy: Retain allowVolumeExpansion: false diff --git a/roles/kube_prometheus/tasks/kube-prometheus.yml b/roles/kube_prometheus/tasks/kube-prometheus.yml index 6e15236f..42deefe2 100644 --- a/roles/kube_prometheus/tasks/kube-prometheus.yml +++ b/roles/kube_prometheus/tasks/kube-prometheus.yml @@ -84,15 +84,6 @@ - ../templates/prometheus_srv/*.j2 when: prometheus_srv_expose | default(false) -- name: copy PSP manifest - template: - src: "0psp-node-exporter.yaml" - dest: "{{ kube_prometheus_path }}/0psp-node-exporter.yaml" - owner: root - group: root - mode: u+rwx,g-rwx,o- - when: psp_enabled - - name: install kube-prometheus CRD manifests command: "kubectl apply -f {{ kube_prometheus_path }}/setup" changed_when: true diff --git a/roles/kube_prometheus/templates/0psp-node-exporter.yaml b/roles/kube_prometheus/templates/0psp-node-exporter.yaml deleted file mode 100644 index feffcc7e..00000000 --- a/roles/kube_prometheus/templates/0psp-node-exporter.yaml +++ /dev/null @@ -1,36 +0,0 @@ ---- -apiVersion: policy/v1beta1 -kind: PodSecurityPolicy -metadata: - name: node-exporter -spec: - privileged: true - allowPrivilegeEscalation: true - hostPorts: - - max: 9100 - min: 9100 - allowedCapabilities: - - '*' - volumes: - - "configMap" - - "downwardAPI" - - "emptyDir" - - "persistentVolumeClaim" - - "secret" - - "projected" - - "hostPath" - hostNetwork: true - hostIPC: true - hostPID: true - runAsUser: - rule: 'RunAsAny' - seLinux: - rule: 'RunAsAny' - supplementalGroups: - rule: 'RunAsAny' - fsGroup: - rule: 'RunAsAny' - readOnlyRootFilesystem: false - # This will fail if allowed-unsafe-sysctls is not set accordingly in kubelet flags - allowedUnsafeSysctls: - - '*' diff --git a/roles/kube_prometheus/templates/node-exporter-clusterRole.yaml.j2 b/roles/kube_prometheus/templates/node-exporter-clusterRole.yaml.j2 index 22b58bb0..da2faac1 100644 --- a/roles/kube_prometheus/templates/node-exporter-clusterRole.yaml.j2 +++ b/roles/kube_prometheus/templates/node-exporter-clusterRole.yaml.j2 @@ -20,10 +20,3 @@ rules: resources: - subjectaccessreviews verbs: ["create"] -{% if psp_enabled %} -- apiGroups: ['policy'] - resources: ['podsecuritypolicies'] - verbs: ['use'] - resourceNames: - - node-exporter -{% endif %} diff --git a/roles/kube_prometheus/templates/node-exporter-daemonset.yaml.j2 b/roles/kube_prometheus/templates/node-exporter-daemonset.yaml.j2 index 0c753b25..abe0343a 100644 --- a/roles/kube_prometheus/templates/node-exporter-daemonset.yaml.j2 +++ b/roles/kube_prometheus/templates/node-exporter-daemonset.yaml.j2 @@ -98,6 +98,9 @@ spec: - key: node-role.kubernetes.io/master effect: NoSchedule operator: Exists + - key: node-role.kubernetes.io/control-plane + effect: NoSchedule + operator: Exists volumes: - hostPath: path: /proc diff --git a/roles/kube_prometheus/templates/persistent-volume-grafana.yaml.j2 b/roles/kube_prometheus/templates/persistent-volume-grafana.yaml.j2 index 0b47e5d4..ea25da15 100644 --- a/roles/kube_prometheus/templates/persistent-volume-grafana.yaml.j2 +++ b/roles/kube_prometheus/templates/persistent-volume-grafana.yaml.j2 @@ -8,7 +8,7 @@ metadata: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 8.3.4 + app.kubernetes.io/version: 8.5.11 spec: capacity: storage: 1Gi diff --git a/roles/kube_prometheus/templates/persistent-volume-prometheus.yaml.j2 b/roles/kube_prometheus/templates/persistent-volume-prometheus.yaml.j2 index 92f7b9f4..e37c900d 100644 --- a/roles/kube_prometheus/templates/persistent-volume-prometheus.yaml.j2 +++ b/roles/kube_prometheus/templates/persistent-volume-prometheus.yaml.j2 @@ -8,7 +8,7 @@ metadata: app.kubernetes.io/component: prometheus app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 2.35.0 + app.kubernetes.io/version: 2.37.1 spec: capacity: storage: 20Gi diff --git a/roles/kube_prometheus/templates/prometheus-prometheus.yaml.j2 b/roles/kube_prometheus/templates/prometheus-prometheus.yaml.j2 index e255e140..de4a8381 100644 --- a/roles/kube_prometheus/templates/prometheus-prometheus.yaml.j2 +++ b/roles/kube_prometheus/templates/prometheus-prometheus.yaml.j2 @@ -10,7 +10,7 @@ metadata: app.kubernetes.io/component: prometheus app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus - app.kubernetes.io/version: 2.35.0 + app.kubernetes.io/version: 2.37.1 name: k8s namespace: monitoring spec: @@ -72,7 +72,7 @@ spec: scheme: HTTPS timeoutSeconds: 3 - name: grafana - image: grafana/grafana:8.5.3 + image: grafana/grafana:8.5.11 env: - name: GF_SERVER_PROTOCOL value: https @@ -232,7 +232,7 @@ spec: {% endif %} alerting: alertmanagers: [] - image: quay.io/prometheus/prometheus:v2.35.0 + image: quay.io/prometheus/prometheus:v2.37.1 nodeSelector: kubernetes.io/os: linux podMonitorNamespaceSelector: {} @@ -252,7 +252,7 @@ spec: serviceAccountName: prometheus-k8s serviceMonitorNamespaceSelector: {} serviceMonitorSelector: {} - version: v2.35.0 + version: v2.37.1 volumes: - name: persistent-volume-grafana persistentVolumeClaim: diff --git a/roles/kubernetes_ingress_install/defaults/main.yml b/roles/kubernetes_ingress_install/defaults/main.yml new file mode 100755 index 00000000..cc4cf4bc --- /dev/null +++ b/roles/kubernetes_ingress_install/defaults/main.yml @@ -0,0 +1,24 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +kubernetes_ingress_application_name: "minio_kubernetes_ingress" +kubernetes_ingress_release_name: "minio-kubernetes-ingress" + +kubernetes_ingress_helm_repo_url: "https://helm.nginx.com/stable" +kubernetes_ingress_helm_chart_repo_name: "nginx-stable" +kubernetes_ingress_helm_chart_ref: "nginx-stable/nginx-ingress" +kubernetes_ingress_helm_chart_version: "v2.3.0" +kubernetes_ingress_helm_chart_release_namespace: "minio-tenant" diff --git a/roles/kubernetes_ingress_install/tasks/cleanup_kubernetes_ingress.yml b/roles/kubernetes_ingress_install/tasks/cleanup_kubernetes_ingress.yml new file mode 100644 index 00000000..5e6df823 --- /dev/null +++ b/roles/kubernetes_ingress_install/tasks/cleanup_kubernetes_ingress.yml @@ -0,0 +1,35 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- block: + - name: delete Kubernetes Ingress Controller Helm Charts + command: >- + helm delete {{ kubernetes_ingress_release_name }} --namespace {{ kubernetes_ingress_helm_chart_release_namespace }} + when: + - inventory_hostname == groups['kube_control_plane'][0] + changed_when: false + failed_when: false + - name: delete Kubernetes Ingress Controller Helm Repo + command: >- + helm repo remove {{ kubernetes_ingress_helm_chart_repo_name }} + when: + - inventory_hostname == groups['kube_control_plane'][0] + changed_when: false + failed_when: false + tags: + - minio + when: + - inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/kubernetes_ingress_install/tasks/kubernetes_ingress_install.yml b/roles/kubernetes_ingress_install/tasks/kubernetes_ingress_install.yml new file mode 100755 index 00000000..de095687 --- /dev/null +++ b/roles/kubernetes_ingress_install/tasks/kubernetes_ingress_install.yml @@ -0,0 +1,66 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- block: + - name: check Kubernetes-Ingress Helm charts directory. + stat: + path: "{{ (project_root_dir, 'charts', 'kubernetes-ingress') | path_join }}" + register: kubernetes_ingress_path + + - name: create Kubernetes-Ingress Helm charts directory if needed + file: + path: "{{ (project_root_dir, 'charts', 'kubernetes-ingress') | path_join }}" + state: directory + mode: 0755 + when: + - kubernetes_ingress_path.stat.exists is defined and not kubernetes_ingress_path.stat.exists + + - name: check Kubernetes-Ingress Helm charts temp directory. + stat: + path: "{{ (project_root_dir, 'charts', 'kubernetes-ingress', 'temp') | path_join }}" + register: kubernetes_ingress_temp_dir + + - name: create the temp folder for Kubernetes-Ingress custom values + file: + path: "{{ (project_root_dir, 'charts', 'kubernetes-ingress', 'temp') | path_join }}" + state: directory + mode: 0755 + when: + - not kubernetes_ingress_temp_dir.stat.exists + + - name: populate Kubernetes-Ingress Helm charts values template and push to controller node + template: + src: "kubernetes_ingress_custom_values.yml.j2" + dest: "{{ (project_root_dir, 'charts', 'kubernetes-ingress', 'temp', 'kubernetes-ingress-custom-values.yml') | path_join }}" + force: yes + mode: preserve + + - name: Add Kubernetes Ingress Controller Helm Chart Repository + command: >- + helm repo add "{{ kubernetes_ingress_helm_chart_repo_name }}" "{{ kubernetes_ingress_helm_repo_url }}" + changed_when: true + + - name: Deploy {{ kubernetes_ingress_helm_chart_version }} of {{ kubernetes_ingress_helm_chart_ref }} + command: >- + helm install + {{ kubernetes_ingress_release_name }} + {{ kubernetes_ingress_helm_chart_ref }} + --namespace {{ kubernetes_ingress_helm_chart_release_namespace }} + --create-namespace + -f {{ (project_root_dir, 'charts', 'kubernetes-ingress', 'temp', 'kubernetes-ingress-custom-values.yml') | path_join }} + changed_when: true + when: + - inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/kubernetes_ingress_install/tasks/main.yml b/roles/kubernetes_ingress_install/tasks/main.yml new file mode 100755 index 00000000..5a1c43e7 --- /dev/null +++ b/roles/kubernetes_ingress_install/tasks/main.yml @@ -0,0 +1,20 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: install kubernetes ingress controller + import_tasks: kubernetes_ingress_install.yml + when: + - minio_enabled is defined and minio_enabled diff --git a/roles/kubernetes_ingress_install/tasks/preflight_kubernetes_ingress.yml b/roles/kubernetes_ingress_install/tasks/preflight_kubernetes_ingress.yml new file mode 100644 index 00000000..f14d7bcd --- /dev/null +++ b/roles/kubernetes_ingress_install/tasks/preflight_kubernetes_ingress.yml @@ -0,0 +1,22 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## + +# - block: + # - name: preflight kubernetes ingress controller installation + # include_role: + # name: kubernetes_ingress_install + # tasks_from: preflight_kubernetes_ingress + # any_errors_fatal: true diff --git a/roles/kubernetes_ingress_install/templates/kubernetes_ingress_custom_values.yml.j2 b/roles/kubernetes_ingress_install/templates/kubernetes_ingress_custom_values.yml.j2 new file mode 100755 index 00000000..80f31f01 --- /dev/null +++ b/roles/kubernetes_ingress_install/templates/kubernetes_ingress_custom_values.yml.j2 @@ -0,0 +1,375 @@ +controller: + ## The name of the Ingress Controller daemonset or deployment. + ## Autogenerated if not set or set to "". + # name: nginx-ingress + + ## The kind of the Ingress Controller installation - deployment or daemonset. + kind: deployment + + ## Deploys the Ingress Controller for NGINX Plus. + nginxplus: false + + # Timeout in milliseconds which the Ingress Controller will wait for a successful NGINX reload after a change or at the initial start. + nginxReloadTimeout: 60000 + + ## Support for App Protect + appprotect: + ## Enable the App Protect module in the Ingress Controller. + enable: false + ## Sets log level for App Protect. Allowed values: fatal, error, warn, info, debug, trace + logLevel: debug + + ## Support for App Protect Dos + appprotectdos: + ## Enable the App Protect Dos module in the Ingress Controller. + enable: false + ## Enable debugging for App Protect Dos. + debug: false + ## Max number of nginx processes to support. + maxWorkers: 0 + ## Max number of ADMD instances. + maxDaemons: 0 + ## RAM memory size to consume in MB. + memory: 0 + + ## Enables the Ingress Controller pods to use the host's network namespace. + hostNetwork: false + + ## Enables debugging for NGINX. Uses the nginx-debug binary. Requires error-log-level: debug in the ConfigMap via `controller.config.entries`. + nginxDebug: false + + ## The log level of the Ingress Controller. + logLevel: 4 + + ## A list of custom ports to expose on the NGINX ingress controller pod. Follows the conventional Kubernetes yaml syntax for container ports. + customPorts: [] + + image: + ## The image repository of the Ingress Controller. + repository: nginx/nginx-ingress + + ## The tag of the Ingress Controller image. + tag: "2.3.1" + + ## The pull policy for the Ingress Controller image. + pullPolicy: IfNotPresent + + config: + ## The name of the ConfigMap used by the Ingress Controller. + ## Autogenerated if not set or set to "". + # name: nginx-config + + ## The annotations of the Ingress Controller configmap. + annotations: {} + + ## The entries of the ConfigMap for customizing NGINX configuration. + entries: {} + + ## It is recommended to use your own TLS certificates and keys + defaultTLS: + ## The base64-encoded TLS certificate for the default HTTPS server. If not specified, a pre-generated self-signed certificate is used. + ## Note: It is recommended that you specify your own certificate. + cert: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN2akNDQWFZQ0NRREFPRjl0THNhWFhEQU5CZ2txaGtpRzl3MEJBUXNGQURBaE1SOHdIUVlEVlFRRERCWk8KUjBsT1dFbHVaM0psYzNORGIyNTBjbTlzYkdWeU1CNFhEVEU0TURreE1qRTRNRE16TlZvWERUSXpNRGt4TVRFNApNRE16TlZvd0lURWZNQjBHQTFVRUF3d1dUa2RKVGxoSmJtZHlaWE56UTI5dWRISnZiR3hsY2pDQ0FTSXdEUVlKCktvWklodmNOQVFFQkJRQURnZ0VQQURDQ0FRb0NnZ0VCQUwvN2hIUEtFWGRMdjNyaUM3QlBrMTNpWkt5eTlyQ08KR2xZUXYyK2EzUDF0azIrS3YwVGF5aGRCbDRrcnNUcTZzZm8vWUk1Y2Vhbkw4WGM3U1pyQkVRYm9EN2REbWs1Qgo4eDZLS2xHWU5IWlg0Rm5UZ0VPaStlM2ptTFFxRlBSY1kzVnNPazFFeUZBL0JnWlJVbkNHZUtGeERSN0tQdGhyCmtqSXVuektURXUyaDU4Tlp0S21ScUJHdDEwcTNRYzhZT3ExM2FnbmovUWRjc0ZYYTJnMjB1K1lYZDdoZ3krZksKWk4vVUkxQUQ0YzZyM1lma1ZWUmVHd1lxQVp1WXN2V0RKbW1GNWRwdEMzN011cDBPRUxVTExSakZJOTZXNXIwSAo1TmdPc25NWFJNV1hYVlpiNWRxT3R0SmRtS3FhZ25TZ1JQQVpQN2MwQjFQU2FqYzZjNGZRVXpNQ0F3RUFBVEFOCkJna3Foa2lHOXcwQkFRc0ZBQU9DQVFFQWpLb2tRdGRPcEsrTzhibWVPc3lySmdJSXJycVFVY2ZOUitjb0hZVUoKdGhrYnhITFMzR3VBTWI5dm15VExPY2xxeC9aYzJPblEwMEJCLzlTb0swcitFZ1U2UlVrRWtWcitTTFA3NTdUWgozZWI4dmdPdEduMS9ienM3bzNBaS9kclkrcUI5Q2k1S3lPc3FHTG1US2xFaUtOYkcyR1ZyTWxjS0ZYQU80YTY3Cklnc1hzYktNbTQwV1U3cG9mcGltU1ZmaXFSdkV5YmN3N0NYODF6cFErUyt1eHRYK2VBZ3V0NHh3VlI5d2IyVXYKelhuZk9HbWhWNThDd1dIQnNKa0kxNXhaa2VUWXdSN0diaEFMSkZUUkk3dkhvQXprTWIzbjAxQjQyWjNrN3RXNQpJUDFmTlpIOFUvOWxiUHNoT21FRFZkdjF5ZytVRVJxbStGSis2R0oxeFJGcGZnPT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo= + + ## The base64-encoded TLS key for the default HTTPS server. Note: If not specified, a pre-generated key is used. + ## Note: It is recommended that you specify your own key. + key: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcEFJQkFBS0NBUUVBdi91RWM4b1JkMHUvZXVJTHNFK1RYZUprckxMMnNJNGFWaEMvYjVyYy9XMlRiNHEvClJOcktGMEdYaVN1eE9ycXgrajlnamx4NXFjdnhkenRKbXNFUkJ1Z1B0ME9hVGtIekhvb3FVWmcwZGxmZ1dkT0EKUTZMNTdlT1l0Q29VOUZ4amRXdzZUVVRJVUQ4R0JsRlNjSVo0b1hFTkhzbysyR3VTTWk2Zk1wTVM3YUhudzFtMApxWkdvRWEzWFNyZEJ6eGc2clhkcUNlUDlCMXl3VmRyYURiUzc1aGQzdUdETDU4cGszOVFqVUFQaHpxdmRoK1JWClZGNGJCaW9CbTVpeTlZTW1hWVhsMm0wTGZzeTZuUTRRdFFzdEdNVWozcGJtdlFmazJBNnljeGRFeFpkZFZsdmwKMm82MjBsMllxcHFDZEtCRThCay90elFIVTlKcU56cHpoOUJUTXdJREFRQUJBb0lCQVFDZklHbXowOHhRVmorNwpLZnZJUXQwQ0YzR2MxNld6eDhVNml4MHg4Mm15d1kxUUNlL3BzWE9LZlRxT1h1SENyUlp5TnUvZ2IvUUQ4bUFOCmxOMjRZTWl0TWRJODg5TEZoTkp3QU5OODJDeTczckM5bzVvUDlkazAvYzRIbjAzSkVYNzZ5QjgzQm9rR1FvYksKMjhMNk0rdHUzUmFqNjd6Vmc2d2szaEhrU0pXSzBwV1YrSjdrUkRWYmhDYUZhNk5nMUZNRWxhTlozVDhhUUtyQgpDUDNDeEFTdjYxWTk5TEI4KzNXWVFIK3NYaTVGM01pYVNBZ1BkQUk3WEh1dXFET1lvMU5PL0JoSGt1aVg2QnRtCnorNTZud2pZMy8yUytSRmNBc3JMTnIwMDJZZi9oY0IraVlDNzVWYmcydVd6WTY3TWdOTGQ5VW9RU3BDRkYrVm4KM0cyUnhybnhBb0dCQU40U3M0ZVlPU2huMVpQQjdhTUZsY0k2RHR2S2ErTGZTTXFyY2pOZjJlSEpZNnhubmxKdgpGenpGL2RiVWVTbWxSekR0WkdlcXZXaHFISy9iTjIyeWJhOU1WMDlRQ0JFTk5jNmtWajJTVHpUWkJVbEx4QzYrCk93Z0wyZHhKendWelU0VC84ajdHalRUN05BZVpFS2FvRHFyRG5BYWkyaW5oZU1JVWZHRXFGKzJyQW9HQkFOMVAKK0tZL0lsS3RWRzRKSklQNzBjUis3RmpyeXJpY05iWCtQVzUvOXFHaWxnY2grZ3l4b25BWlBpd2NpeDN3QVpGdwpaZC96ZFB2aTBkWEppc1BSZjRMazg5b2pCUmpiRmRmc2l5UmJYbyt3TFU4NUhRU2NGMnN5aUFPaTVBRHdVU0FkCm45YWFweUNweEFkREtERHdObit3ZFhtaTZ0OHRpSFRkK3RoVDhkaVpBb0dCQUt6Wis1bG9OOTBtYlF4VVh5YUwKMjFSUm9tMGJjcndsTmVCaWNFSmlzaEhYa2xpSVVxZ3hSZklNM2hhUVRUcklKZENFaHFsV01aV0xPb2I2NTNyZgo3aFlMSXM1ZUtka3o0aFRVdnpldm9TMHVXcm9CV2xOVHlGanIrSWhKZnZUc0hpOGdsU3FkbXgySkJhZUFVWUNXCndNdlQ4NmNLclNyNkQrZG8wS05FZzFsL0FvR0FlMkFVdHVFbFNqLzBmRzgrV3hHc1RFV1JqclRNUzRSUjhRWXQKeXdjdFA4aDZxTGxKUTRCWGxQU05rMXZLTmtOUkxIb2pZT2pCQTViYjhibXNVU1BlV09NNENoaFJ4QnlHbmR2eAphYkJDRkFwY0IvbEg4d1R0alVZYlN5T294ZGt5OEp0ek90ajJhS0FiZHd6NlArWDZDODhjZmxYVFo5MWpYL3RMCjF3TmRKS2tDZ1lCbyt0UzB5TzJ2SWFmK2UwSkN5TGhzVDQ5cTN3Zis2QWVqWGx2WDJ1VnRYejN5QTZnbXo5aCsKcDNlK2JMRUxwb3B0WFhNdUFRR0xhUkcrYlNNcjR5dERYbE5ZSndUeThXczNKY3dlSTdqZVp2b0ZpbmNvVlVIMwphdmxoTUVCRGYxSjltSDB5cDBwWUNaS2ROdHNvZEZtQktzVEtQMjJhTmtsVVhCS3gyZzR6cFE9PQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo= + + ## The secret with a TLS certificate and key for the default HTTPS server. + ## The value must follow the following format: `/`. + ## Used as an alternative to specifying a certificate and key using `controller.defaultTLS.cert` and `controller.defaultTLS.key` parameters. + ## Format: / + secret: + + wildcardTLS: + ## The base64-encoded TLS certificate for every Ingress/VirtualServer host that has TLS enabled but no secret specified. + ## If the parameter is not set, for such Ingress/VirtualServer hosts NGINX will break any attempt to establish a TLS connection. + cert: "" + + ## The base64-encoded TLS key for every Ingress/VirtualServer host that has TLS enabled but no secret specified. + ## If the parameter is not set, for such Ingress/VirtualServer hosts NGINX will break any attempt to establish a TLS connection. + key: "" + + ## The secret with a TLS certificate and key for every Ingress/VirtualServer host that has TLS enabled but no secret specified. + ## The value must follow the following format: `/`. + ## Used as an alternative to specifying a certificate and key using `controller.wildcardTLS.cert` and `controller.wildcardTLS.key` parameters. + ## Format: / + secret: + + ## The node selector for pod assignment for the Ingress Controller pods. + nodeSelector: {} + + ## The termination grace period of the Ingress Controller pod. + terminationGracePeriodSeconds: 30 + + ## The resources of the Ingress Controller pods. + resources: + requests: + cpu: 100m + memory: 128Mi + # limits: + # cpu: 1 + # memory: 1Gi + + + ## The tolerations of the Ingress Controller pods. + tolerations: [] + + ## The affinity of the Ingress Controller pods. + affinity: {} + + ## The topology spread constraints of the Ingress controller pods. + topologySpreadConstraints: {} + + ## The volumes of the Ingress Controller pods. + volumes: [] + # - name: extra-conf + # configMap: + # name: extra-conf + + ## The volumeMounts of the Ingress Controller pods. + volumeMounts: [] + # - name: extra-conf + # mountPath: /etc/nginx/conf.d/extra.conf + # subPath: extra.conf + + ## InitContainers for the Ingress Controller pods. + initContainers: [] + # - name: init-container + # image: busybox:1.34 + # command: ['sh', '-c', 'echo this is initial setup!'] + + ## The minimum number of seconds for which a newly created Pod should be ready without any of its containers crashing, for it to be considered available. + minReadySeconds: 0 + + ## Strategy used to replace old Pods by new ones. .spec.strategy.type can be "Recreate" or "RollingUpdate". "RollingUpdate" is the default value. + strategy: {} + + ## Extra containers for the Ingress Controller pods. + extraContainers: [] + # - name: container + # image: busybox:1.34 + # command: ['sh', '-c', 'echo this is a sidecar!'] + + ## The number of replicas of the Ingress Controller deployment. + replicaCount: 1 + + ## A class of the Ingress Controller. + + ## IngressClass resource with the name equal to the class must be deployed. Otherwise, + ## the Ingress Controller will fail to start. + ## The Ingress Controller only processes resources that belong to its class - i.e. have the "ingressClassName" field resource equal to the class. + + ## The Ingress Controller processes all the resources that do not have the "ingressClassName" field for all versions of kubernetes. + ingressClass: nginx + + ## New Ingresses without an ingressClassName field specified will be assigned the class specified in `controller.ingressClass`. + setAsDefaultIngress: false + + ## Namespace to watch for Ingress resources. By default the Ingress Controller watches all namespaces. + watchNamespace: "" + + ## Enable the custom resources. + enableCustomResources: true + + ## Enable preview policies. This parameter is deprecated. To enable OIDC Policies please use controller.enableOIDC instead. + enablePreviewPolicies: false + + ## Enable OIDC policies. + enableOIDC: false + + ## Enable TLS Passthrough on port 443. Requires controller.enableCustomResources. + enableTLSPassthrough: false + + ## Enable cert manager for Virtual Server resources. Requires controller.enableCustomResources. + enableCertManager: false + + ## Enable external DNS for Virtual Server resources. Requires controller.enableCustomResources. + enableExternalDNS: false + + globalConfiguration: + ## Creates the GlobalConfiguration custom resource. Requires controller.enableCustomResources. + create: false + + ## The spec of the GlobalConfiguration for defining the global configuration parameters of the Ingress Controller. + spec: {} + # listeners: + # - name: dns-udp + # port: 5353 + # protocol: UDP + # - name: dns-tcp + # port: 5353 + # protocol: TCP + + ## Enable custom NGINX configuration snippets in Ingress, VirtualServer, VirtualServerRoute and TransportServer resources. + enableSnippets: false + + ## Add a location based on the value of health-status-uri to the default server. The location responds with the 200 status code for any request. + ## Useful for external health-checking of the Ingress Controller. + healthStatus: false + + ## Sets the URI of health status location in the default server. Requires controller.healthStatus. + healthStatusURI: "/nginx-health" + + nginxStatus: + ## Enable the NGINX stub_status, or the NGINX Plus API. + enable: true + + ## Set the port where the NGINX stub_status or the NGINX Plus API is exposed. + port: 8080 + + ## Add IPv4 IP/CIDR blocks to the allow list for NGINX stub_status or the NGINX Plus API. Separate multiple IP/CIDR by commas. + allowCidrs: "127.0.0.1" + + service: + ## Creates a service to expose the Ingress Controller pods. + create: true + + ## The type of service to create for the Ingress Controller. + type: NodePort + + ## The externalTrafficPolicy of the service. The value Local preserves the client source IP. + externalTrafficPolicy: Local + + ## The annotations of the Ingress Controller service. + annotations: {} + + ## The extra labels of the service. + extraLabels: {} + + ## The static IP address for the load balancer. Requires controller.service.type set to LoadBalancer. The cloud provider must support this feature. + loadBalancerIP: "" + + ## The list of external IPs for the Ingress Controller service. + externalIPs: [] + + ## The IP ranges (CIDR) that are allowed to access the load balancer. Requires controller.service.type set to LoadBalancer. The cloud provider must support this feature. + loadBalancerSourceRanges: [] + + ## The name of the service + ## Autogenerated if not set or set to "". + # name: nginx-ingress + + ## Whether to automatically allocate NodePorts (only for LoadBalancers). + # allocateLoadBalancerNodePorts: true + + ## Dual stack preference. + ## Valid values: SingleStack, PreferDualStack, RequireDualStack + # ipFamilyPolicy: SingleStack + + ## List of IP families assigned to this service. + ## Valid values: IPv4, IPv6 + # ipFamilies: + # - IPv6 + + httpPort: + ## Enables the HTTP port for the Ingress Controller service. + enable: true + + ## The HTTP port of the Ingress Controller service. + port: 80 + + ## The custom NodePort for the HTTP port. Requires controller.service.type set to NodePort. + nodePort: "30123" + + ## The HTTP port on the POD where the Ingress Controller service is running. + targetPort: 80 + + httpsPort: + ## Enables the HTTPS port for the Ingress Controller service. + enable: true + + ## The HTTPS port of the Ingress Controller service. + port: 443 + + ## The custom NodePort for the HTTPS port. Requires controller.service.type set to NodePort. + nodePort: "30124" + + ## The HTTPS port on the POD where the Ingress Controller service is running. + targetPort: 443 + + ## A list of custom ports to expose through the Ingress Controller service. Follows the conventional Kubernetes yaml syntax for service ports. + customPorts: [] + + serviceAccount: + ## The name of the service account of the Ingress Controller pods. Used for RBAC. + ## Autogenerated if not set or set to "". + # name: nginx-ingress + + ## The name of the secret containing docker registry credentials. + ## Secret must exist in the same namespace as the helm release. + imagePullSecretName: "" + + reportIngressStatus: + ## Updates the address field in the status of Ingress resources with an external address of the Ingress Controller. + ## You must also specify the source of the external address either through an external service via controller.reportIngressStatus.externalService, + ## controller.reportIngressStatus.ingressLink or the external-status-address entry in the ConfigMap via controller.config.entries. + ## Note: controller.config.entries.external-status-address takes precedence over the others. + enable: true + + ## Specifies the name of the service with the type LoadBalancer through which the Ingress Controller is exposed externally. + ## The external address of the service is used when reporting the status of Ingress, VirtualServer and VirtualServerRoute resources. + ## controller.reportIngressStatus.enable must be set to true. + ## The default is autogenerated and matches the created service (see controller.service.create). + # externalService: nginx-ingress + + ## Specifies the name of the IngressLink resource, which exposes the Ingress Controller pods via a BIG-IP system. + ## The IP of the BIG-IP system is used when reporting the status of Ingress, VirtualServer and VirtualServerRoute resources. + ## controller.reportIngressStatus.enable must be set to true. + ingressLink: "" + + ## Enable Leader election to avoid multiple replicas of the controller reporting the status of Ingress resources. controller.reportIngressStatus.enable must be set to true. + enableLeaderElection: true + + ## Specifies the name of the ConfigMap, within the same namespace as the controller, used as the lock for leader election. controller.reportIngressStatus.enableLeaderElection must be set to true. + ## Autogenerated if not set or set to "". + # leaderElectionLockName: "nginx-ingress-leader-election" + + ## The annotations of the leader election configmap. + annotations: {} + + pod: + ## The annotations of the Ingress Controller pod. + annotations: {} + + ## The additional extra labels of the Ingress Controller pod. + extraLabels: {} + + ## The PriorityClass of the ingress controller pods. + priorityClassName: + + readyStatus: + ## Enables readiness endpoint "/nginx-ready". The endpoint returns a success code when NGINX has loaded all the config after startup. + enable: true + + ## Set the port where the readiness endpoint is exposed. + port: 8081 + + ## Enable collection of latency metrics for upstreams. Requires prometheus.create. + enableLatencyMetrics: false + +rbac: + ## Configures RBAC. + create: true + +prometheus: + ## Expose NGINX or NGINX Plus metrics in the Prometheus format. + create: true + + ## Configures the port to scrape the metrics. + port: 9113 + + ## Specifies the namespace/name of a Kubernetes TLS Secret which will be used to protect the Prometheus endpoint. + secret: "" + + ## Configures the HTTP scheme used. + scheme: http + +nginxServiceMesh: + ## Enables integration with NGINX Service Mesh. + ## Requires controller.nginxplus + enable: false + + ## Enables NGINX Service Mesh workload to route egress traffic through the Ingress Controller. + ## Requires nginxServiceMesh.enable + enableEgress: false diff --git a/roles/kubespray_patch/tasks/main.yml b/roles/kubespray_patch/tasks/main.yml index 10d8bbd2..babbff01 100644 --- a/roles/kubespray_patch/tasks/main.yml +++ b/roles/kubespray_patch/tasks/main.yml @@ -34,11 +34,22 @@ with_items: "{{ host_vars_details.results }}" when: item.stat.exists -- name: WA remove missing aufs-tools package from kubernetes/preinstall required_pkgs +- name: add retries for restart of api-server + blockinfile: + path: "{{ (playbook_dir, 'kubespray', 'roles', 'kubernetes', 'preinstall', 'handlers', 'main.yml') | path_join }}" + insertafter: "crictl" + block: |2 + register: preinstall_restart_apiserver + retries: 10 + until: preinstall_restart_apiserver.rc == 0 + delay: 1 + when: adq_dp.enabled | d(false) + +- name: remove wireguard package lineinfile: - path: "{{ playbook_dir }}/kubespray/roles/kubernetes/preinstall/vars/ubuntu.yml" - regexp: ' - aufs-tools' + path: "{{ (playbook_dir, 'kubespray', 'roles', 'network_plugin', 'calico', 'vars', 'redhat.yml') | path_join }}" state: absent + regexp: ' - wireguard-dkms' when: - - (target_distribution == "Ubuntu" and target_distribution_version in ['20.04', '22.04']) or - (vm_enabled and vm_image_distribution == "ubuntu" and vm_image_version_ubuntu in ['20.04', '22.04']) + - wireguard_enabled + - target_distribution == "Rocky" and target_distribution_version >= "9.0" diff --git a/roles/kubespray_target_setup/files/eventconfig.yaml b/roles/kubespray_target_setup/files/eventconfig.yaml index 0a3cf58b..c62b9532 100644 --- a/roles/kubespray_target_setup/files/eventconfig.yaml +++ b/roles/kubespray_target_setup/files/eventconfig.yaml @@ -1,15 +1,16 @@ +--- kind: Configuration apiVersion: eventratelimit.admission.k8s.io/v1alpha1 limits: -- type: Server - qps: 10 - burst: 50 -- type: Namespace - qps: 50 - burst: 100 -- type: User - qps: 10 - burst: 50 -- type: SourceAndObject - qps: 10 - burst: 50 + - type: Server + qps: 10 + burst: 50 + - type: Namespace + qps: 50 + burst: 100 + - type: User + qps: 10 + burst: 50 + - type: SourceAndObject + qps: 10 + burst: 50 diff --git a/roles/kubespray_target_setup/files/podsecurity.yaml b/roles/kubespray_target_setup/files/podsecurity.yaml new file mode 100644 index 00000000..67a94964 --- /dev/null +++ b/roles/kubespray_target_setup/files/podsecurity.yaml @@ -0,0 +1,13 @@ +apiVersion: pod-security.admission.config.k8s.io/v1beta1 +kind: PodSecurityConfiguration +defaults: + enforce: "privileged" + enforce-version: "latest" + audit: "restricted" + audit-version: "latest" + warn: "restricted" + warn-version: "latest" +exemptions: + usernames: [] + runtimeClasses: [] + namespaces: [] diff --git a/roles/kubespray_target_setup/tasks/main.yml b/roles/kubespray_target_setup/tasks/main.yml index 0bf8c05b..6c1f7c83 100644 --- a/roles/kubespray_target_setup/tasks/main.yml +++ b/roles/kubespray_target_setup/tasks/main.yml @@ -65,11 +65,31 @@ force: yes when: inventory_hostname in groups['kube_control_plane'] -- name: copy EventRateLimit configuration file +- name: copy admission controller configuration files copy: - dest: "{{ kube_config_dir }}/admission-control/eventconfig.yaml" - src: eventconfig.yaml + src: "{{ item }}" + dest: "{{ kube_config_dir }}/admission-control/" owner: root group: root mode: 0644 + with_fileglob: '*.yaml' when: inventory_hostname in groups['kube_control_plane'] + +- block: + - name: networkd | disable managing of foreign routes + lineinfile: + path: "{{ item.dest }}" + line: "{{ item.line }}" + regexp: "{{ item.regex }}" + firstmatch: yes + loop: + - {dest: "/etc/systemd/networkd.conf", + regex: "#ManageForeignRoutingPolicyRules=yes", line: "ManageForeignRoutingPolicyRules=no"} + - {dest: "/etc/systemd/networkd.conf", + regex: "#ManageForeignRoutes=yes", line: "ManageForeignRoutes=no"} + + - name: Restart systemd-networkd + systemd: + name: systemd-networkd.service + state: restarted + when: kube_network_plugin == "cni" diff --git a/roles/kubespray_target_setup/templates/config.yaml.j2 b/roles/kubespray_target_setup/templates/config.yaml.j2 index 6f0819f5..9f439ca4 100644 --- a/roles/kubespray_target_setup/templates/config.yaml.j2 +++ b/roles/kubespray_target_setup/templates/config.yaml.j2 @@ -3,3 +3,5 @@ kind: AdmissionConfiguration plugins: - name: EventRateLimit path: eventconfig.yaml +- name: PodSecurity + path: podsecurity.yaml \ No newline at end of file diff --git a/roles/kubespray_target_setup/templates/multus.conf.j2 b/roles/kubespray_target_setup/templates/multus.conf.j2 index 6b05d1f7..11953927 100644 --- a/roles/kubespray_target_setup/templates/multus.conf.j2 +++ b/roles/kubespray_target_setup/templates/multus.conf.j2 @@ -2,6 +2,8 @@ "name": "multus-cni-network", "type": "multus", "cniVersion": "0.3.1", + "logFile": "/var/log/multus.log", + "logLevel": "debug", "capabilities": { "portMappings": true }, @@ -11,12 +13,23 @@ "name": "default-cni-network", "plugins": [ { +{% if kube_network_plugin == "cni" %} + "name": "k8s-pod-network", + "cniVersion": "0.3.1", + "type": "cilium-cni" +{% endif %} {% if kube_network_plugin == "calico" %} "name": "k8s-pod-network", "cniVersion": "0.3.1", "type": "calico", - "log_level": "info", + "log_level": "debug", "log_file_path": "/var/log/calico/cni/cni.log", + "logOptions": { + "maxAge": 5, + "maxSize": 100, + "maxBackups": 5, + "compress": true + }, "datastore_type": "kubernetes", "ipam": { "type": "calico-ipam" diff --git a/roles/linkerd_service_mesh/defaults/main.yml b/roles/linkerd_service_mesh/defaults/main.yml new file mode 100644 index 00000000..e86918de --- /dev/null +++ b/roles/linkerd_service_mesh/defaults/main.yml @@ -0,0 +1,25 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +# defaults file for linkerd-cli +linkerd_cli_arch: "amd64" +linkerd_version: "2.12.0" +linkerd_cli_version: "{{ linkerd_version }}" +linkerd_cli_uri: + "https://github.com/linkerd/linkerd2/releases/download/stable-{{ linkerd_cli_version }}/\ + linkerd2-cli-stable-{{ linkerd_cli_version }}-linux-{{ linkerd_cli_arch }}" +linkerd_cli_bin: "/usr/local/bin/linkerd" +kubectl_cli_bin: "/usr/local/bin/kubectl" diff --git a/roles/linkerd_service_mesh/handlers/main.yml b/roles/linkerd_service_mesh/handlers/main.yml new file mode 100644 index 00000000..f911bb0b --- /dev/null +++ b/roles/linkerd_service_mesh/handlers/main.yml @@ -0,0 +1,17 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +# handlers file for linkerd-cli diff --git a/roles/linkerd_service_mesh/tasks/main.yml b/roles/linkerd_service_mesh/tasks/main.yml new file mode 100644 index 00000000..446f5337 --- /dev/null +++ b/roles/linkerd_service_mesh/tasks/main.yml @@ -0,0 +1,90 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: Fetch and install linkerd cli + ansible.builtin.get_url: + url: "{{ linkerd_cli_uri }}" + dest: "{{ linkerd_cli_bin }}" + owner: "root" + group: "root" + mode: "0755" + become: true + +- name: Fetch information about namespaces + ansible.builtin.command: "{{ kubectl_cli_bin }} get namespaces" + register: k8s_namespaces + changed_when: false + +- name: Debug list of namespaces + ansible.builtin.debug: + var: k8s_namespaces.stdout + +- name: Fetch list of ClusterRoleBindings + ansible.builtin.command: "{{ kubectl_cli_bin }} get ClusterRoleBindings" + register: k8s_cluster_role_bindings + changed_when: false + +- name: Debug list of ClusterRoleBindings + ansible.builtin.debug: + var: k8s_cluster_role_bindings.stdout + +- name: Install LinkerD + block: + - name: Get linkerd pre-check + ansible.builtin.command: "{{ linkerd_cli_bin }} check --pre" + register: linkerd_pre_check + changed_when: false + + - name: Debug linkerd pre-check + ansible.builtin.debug: + msg: "{{ linkerd_pre_check.stdout_lines }}" + + - name: Install linkerd crds + ansible.builtin.shell: "set -o pipefail && {{ linkerd_cli_bin }} install --crds | {{ kubectl_cli_bin }} apply -f -" + args: + executable: /bin/bash + register: linkerd_crds_installation + changed_when: true + + - name: Debug linkerd crds installation + ansible.builtin.debug: + msg: "{{ linkerd_crds_installation }}" + + - name: Install linkerd + ansible.builtin.shell: "set -o pipefail && {{ linkerd_cli_bin }} install --set proxyInit.runAsRoot=true | {{ kubectl_cli_bin }} apply -f -" + args: + executable: /bin/bash + warn: false + register: linkerd_installation + become: true + changed_when: true + + - name: Debug linkerd installation + ansible.builtin.debug: + msg: "{{ linkerd_installation.stdout_lines }}" + + - name: Delete LinkerD Heartbeat CronJob if http_proxy is enabled + ansible.builtin.shell: "set -o pipefail && {{ kubectl_cli_bin }} delete cronjob linkerd-heartbeat -n linkerd" + args: + executable: /bin/bash + warn: false + when: + - (http_proxy is defined and https_proxy is defined) or + (http_proxy is defined and https_proxy is not defined) or + (http_proxy is not defined and https_proxy is defined) + when: + - not k8s_namespaces.stdout is search('linkerd') + - not k8s_cluster_role_bindings.stdout is search('linkerd-linkerd-proxy-injector') diff --git a/roles/linkerd_service_mesh/vars/main.yml b/roles/linkerd_service_mesh/vars/main.yml new file mode 100644 index 00000000..1aa7580a --- /dev/null +++ b/roles/linkerd_service_mesh/vars/main.yml @@ -0,0 +1,16 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- diff --git a/roles/load_ddp/tasks/update_network_card.yml b/roles/load_ddp/tasks/update_network_card.yml index f347d645..7519e333 100644 --- a/roles/load_ddp/tasks/update_network_card.yml +++ b/roles/load_ddp/tasks/update_network_card.yml @@ -56,7 +56,7 @@ - name: copy custom .pkg file copy: remote_src: yes - src: "{{ ddp_path }}{{ custom_path_to_profile|basename }}" + src: "{{ ddp_path }}{{ custom_path_to_profile | basename }}" dest: "{{ (ddp_path, 'ice-' + card_id.stdout + '.pkg') | path_join }}" owner: root group: root diff --git a/roles/load_ddp/templates/ddp_ice_service.j2 b/roles/load_ddp/templates/ddp_ice_service.j2 index fef4ff8a..33eb04e2 100644 --- a/roles/load_ddp/templates/ddp_ice_service.j2 +++ b/roles/load_ddp/templates/ddp_ice_service.j2 @@ -5,8 +5,14 @@ AssertPathExists=/sbin/modprobe [Service] Type=oneshot ExecStartPre=/bin/sleep 5 -ExecStart=/sbin/modprobe -r {% if (ansible_distribution == "Ubuntu" and ansible_distribution_version == "22.04") %}irdma{% endif %} ice -ExecStart=/sbin/modprobe ice {% if (ansible_distribution == "Ubuntu" and ansible_distribution_version == "22.04") %}irdma{% endif %} - +{% if (not update_nic_drivers and +((ansible_distribution == "Ubuntu" and ansible_distribution_version >= "22.04") or +(ansible_os_family == "RedHat" and ansible_distribution_version >= "8.6"))) %} +ExecStart=/sbin/modprobe -r irdma ice +ExecStart=/sbin/modprobe -a ice irdma +{% else %} +ExecStart=/sbin/modprobe -r ice +ExecStart=/sbin/modprobe -a ice +{% endif %} [Install] WantedBy=multi-user.target diff --git a/roles/minio_install/charts/LICENSE b/roles/minio_install/charts/LICENSE new file mode 100644 index 00000000..0ad25db4 --- /dev/null +++ b/roles/minio_install/charts/LICENSE @@ -0,0 +1,661 @@ + GNU AFFERO GENERAL PUBLIC LICENSE + Version 3, 19 November 2007 + + Copyright (C) 2007 Free Software Foundation, Inc. + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The GNU Affero General Public License is a free, copyleft license for +software and other kinds of works, specifically designed to ensure +cooperation with the community in the case of network server software. + + The licenses for most software and other practical works are designed +to take away your freedom to share and change the works. By contrast, +our General Public Licenses are intended to guarantee your freedom to +share and change all versions of a program--to make sure it remains free +software for all its users. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +them if you wish), that you receive source code or can get it if you +want it, that you can change the software or use pieces of it in new +free programs, and that you know you can do these things. + + Developers that use our General Public Licenses protect your rights +with two steps: (1) assert copyright on the software, and (2) offer +you this License which gives you legal permission to copy, distribute +and/or modify the software. + + A secondary benefit of defending all users' freedom is that +improvements made in alternate versions of the program, if they +receive widespread use, become available for other developers to +incorporate. Many developers of free software are heartened and +encouraged by the resulting cooperation. However, in the case of +software used on network servers, this result may fail to come about. +The GNU General Public License permits making a modified version and +letting the public access it on a server without ever releasing its +source code to the public. + + The GNU Affero General Public License is designed specifically to +ensure that, in such cases, the modified source code becomes available +to the community. It requires the operator of a network server to +provide the source code of the modified version running there to the +users of that server. Therefore, public use of a modified version, on +a publicly accessible server, gives the public access to the source +code of the modified version. + + An older license, called the Affero General Public License and +published by Affero, was designed to accomplish similar goals. This is +a different license, not a version of the Affero GPL, but Affero has +released a new version of the Affero GPL which permits relicensing under +this license. + + The precise terms and conditions for copying, distribution and +modification follow. + + TERMS AND CONDITIONS + + 0. Definitions. + + "This License" refers to version 3 of the GNU Affero General Public License. + + "Copyright" also means copyright-like laws that apply to other kinds of +works, such as semiconductor masks. + + "The Program" refers to any copyrightable work licensed under this +License. Each licensee is addressed as "you". "Licensees" and +"recipients" may be individuals or organizations. + + To "modify" a work means to copy from or adapt all or part of the work +in a fashion requiring copyright permission, other than the making of an +exact copy. The resulting work is called a "modified version" of the +earlier work or a work "based on" the earlier work. + + A "covered work" means either the unmodified Program or a work based +on the Program. + + To "propagate" a work means to do anything with it that, without +permission, would make you directly or secondarily liable for +infringement under applicable copyright law, except executing it on a +computer or modifying a private copy. Propagation includes copying, +distribution (with or without modification), making available to the +public, and in some countries other activities as well. + + To "convey" a work means any kind of propagation that enables other +parties to make or receive copies. Mere interaction with a user through +a computer network, with no transfer of a copy, is not conveying. + + An interactive user interface displays "Appropriate Legal Notices" +to the extent that it includes a convenient and prominently visible +feature that (1) displays an appropriate copyright notice, and (2) +tells the user that there is no warranty for the work (except to the +extent that warranties are provided), that licensees may convey the +work under this License, and how to view a copy of this License. If +the interface presents a list of user commands or options, such as a +menu, a prominent item in the list meets this criterion. + + 1. Source Code. + + The "source code" for a work means the preferred form of the work +for making modifications to it. "Object code" means any non-source +form of a work. + + A "Standard Interface" means an interface that either is an official +standard defined by a recognized standards body, or, in the case of +interfaces specified for a particular programming language, one that +is widely used among developers working in that language. + + The "System Libraries" of an executable work include anything, other +than the work as a whole, that (a) is included in the normal form of +packaging a Major Component, but which is not part of that Major +Component, and (b) serves only to enable use of the work with that +Major Component, or to implement a Standard Interface for which an +implementation is available to the public in source code form. A +"Major Component", in this context, means a major essential component +(kernel, window system, and so on) of the specific operating system +(if any) on which the executable work runs, or a compiler used to +produce the work, or an object code interpreter used to run it. + + The "Corresponding Source" for a work in object code form means all +the source code needed to generate, install, and (for an executable +work) run the object code and to modify the work, including scripts to +control those activities. However, it does not include the work's +System Libraries, or general-purpose tools or generally available free +programs which are used unmodified in performing those activities but +which are not part of the work. For example, Corresponding Source +includes interface definition files associated with source files for +the work, and the source code for shared libraries and dynamically +linked subprograms that the work is specifically designed to require, +such as by intimate data communication or control flow between those +subprograms and other parts of the work. + + The Corresponding Source need not include anything that users +can regenerate automatically from other parts of the Corresponding +Source. + + The Corresponding Source for a work in source code form is that +same work. + + 2. Basic Permissions. + + All rights granted under this License are granted for the term of +copyright on the Program, and are irrevocable provided the stated +conditions are met. This License explicitly affirms your unlimited +permission to run the unmodified Program. The output from running a +covered work is covered by this License only if the output, given its +content, constitutes a covered work. This License acknowledges your +rights of fair use or other equivalent, as provided by copyright law. + + You may make, run and propagate covered works that you do not +convey, without conditions so long as your license otherwise remains +in force. You may convey covered works to others for the sole purpose +of having them make modifications exclusively for you, or provide you +with facilities for running those works, provided that you comply with +the terms of this License in conveying all material for which you do +not control copyright. Those thus making or running the covered works +for you must do so exclusively on your behalf, under your direction +and control, on terms that prohibit them from making any copies of +your copyrighted material outside their relationship with you. + + Conveying under any other circumstances is permitted solely under +the conditions stated below. Sublicensing is not allowed; section 10 +makes it unnecessary. + + 3. Protecting Users' Legal Rights From Anti-Circumvention Law. + + No covered work shall be deemed part of an effective technological +measure under any applicable law fulfilling obligations under article +11 of the WIPO copyright treaty adopted on 20 December 1996, or +similar laws prohibiting or restricting circumvention of such +measures. + + When you convey a covered work, you waive any legal power to forbid +circumvention of technological measures to the extent such circumvention +is effected by exercising rights under this License with respect to +the covered work, and you disclaim any intention to limit operation or +modification of the work as a means of enforcing, against the work's +users, your or third parties' legal rights to forbid circumvention of +technological measures. + + 4. Conveying Verbatim Copies. + + You may convey verbatim copies of the Program's source code as you +receive it, in any medium, provided that you conspicuously and +appropriately publish on each copy an appropriate copyright notice; +keep intact all notices stating that this License and any +non-permissive terms added in accord with section 7 apply to the code; +keep intact all notices of the absence of any warranty; and give all +recipients a copy of this License along with the Program. + + You may charge any price or no price for each copy that you convey, +and you may offer support or warranty protection for a fee. + + 5. Conveying Modified Source Versions. + + You may convey a work based on the Program, or the modifications to +produce it from the Program, in the form of source code under the +terms of section 4, provided that you also meet all of these conditions: + + a) The work must carry prominent notices stating that you modified + it, and giving a relevant date. + + b) The work must carry prominent notices stating that it is + released under this License and any conditions added under section + 7. This requirement modifies the requirement in section 4 to + "keep intact all notices". + + c) You must license the entire work, as a whole, under this + License to anyone who comes into possession of a copy. This + License will therefore apply, along with any applicable section 7 + additional terms, to the whole of the work, and all its parts, + regardless of how they are packaged. This License gives no + permission to license the work in any other way, but it does not + invalidate such permission if you have separately received it. + + d) If the work has interactive user interfaces, each must display + Appropriate Legal Notices; however, if the Program has interactive + interfaces that do not display Appropriate Legal Notices, your + work need not make them do so. + + A compilation of a covered work with other separate and independent +works, which are not by their nature extensions of the covered work, +and which are not combined with it such as to form a larger program, +in or on a volume of a storage or distribution medium, is called an +"aggregate" if the compilation and its resulting copyright are not +used to limit the access or legal rights of the compilation's users +beyond what the individual works permit. Inclusion of a covered work +in an aggregate does not cause this License to apply to the other +parts of the aggregate. + + 6. Conveying Non-Source Forms. + + You may convey a covered work in object code form under the terms +of sections 4 and 5, provided that you also convey the +machine-readable Corresponding Source under the terms of this License, +in one of these ways: + + a) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by the + Corresponding Source fixed on a durable physical medium + customarily used for software interchange. + + b) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by a + written offer, valid for at least three years and valid for as + long as you offer spare parts or customer support for that product + model, to give anyone who possesses the object code either (1) a + copy of the Corresponding Source for all the software in the + product that is covered by this License, on a durable physical + medium customarily used for software interchange, for a price no + more than your reasonable cost of physically performing this + conveying of source, or (2) access to copy the + Corresponding Source from a network server at no charge. + + c) Convey individual copies of the object code with a copy of the + written offer to provide the Corresponding Source. This + alternative is allowed only occasionally and noncommercially, and + only if you received the object code with such an offer, in accord + with subsection 6b. + + d) Convey the object code by offering access from a designated + place (gratis or for a charge), and offer equivalent access to the + Corresponding Source in the same way through the same place at no + further charge. You need not require recipients to copy the + Corresponding Source along with the object code. If the place to + copy the object code is a network server, the Corresponding Source + may be on a different server (operated by you or a third party) + that supports equivalent copying facilities, provided you maintain + clear directions next to the object code saying where to find the + Corresponding Source. Regardless of what server hosts the + Corresponding Source, you remain obligated to ensure that it is + available for as long as needed to satisfy these requirements. + + e) Convey the object code using peer-to-peer transmission, provided + you inform other peers where the object code and Corresponding + Source of the work are being offered to the general public at no + charge under subsection 6d. + + A separable portion of the object code, whose source code is excluded +from the Corresponding Source as a System Library, need not be +included in conveying the object code work. + + A "User Product" is either (1) a "consumer product", which means any +tangible personal property which is normally used for personal, family, +or household purposes, or (2) anything designed or sold for incorporation +into a dwelling. In determining whether a product is a consumer product, +doubtful cases shall be resolved in favor of coverage. For a particular +product received by a particular user, "normally used" refers to a +typical or common use of that class of product, regardless of the status +of the particular user or of the way in which the particular user +actually uses, or expects or is expected to use, the product. A product +is a consumer product regardless of whether the product has substantial +commercial, industrial or non-consumer uses, unless such uses represent +the only significant mode of use of the product. + + "Installation Information" for a User Product means any methods, +procedures, authorization keys, or other information required to install +and execute modified versions of a covered work in that User Product from +a modified version of its Corresponding Source. The information must +suffice to ensure that the continued functioning of the modified object +code is in no case prevented or interfered with solely because +modification has been made. + + If you convey an object code work under this section in, or with, or +specifically for use in, a User Product, and the conveying occurs as +part of a transaction in which the right of possession and use of the +User Product is transferred to the recipient in perpetuity or for a +fixed term (regardless of how the transaction is characterized), the +Corresponding Source conveyed under this section must be accompanied +by the Installation Information. But this requirement does not apply +if neither you nor any third party retains the ability to install +modified object code on the User Product (for example, the work has +been installed in ROM). + + The requirement to provide Installation Information does not include a +requirement to continue to provide support service, warranty, or updates +for a work that has been modified or installed by the recipient, or for +the User Product in which it has been modified or installed. Access to a +network may be denied when the modification itself materially and +adversely affects the operation of the network or violates the rules and +protocols for communication across the network. + + Corresponding Source conveyed, and Installation Information provided, +in accord with this section must be in a format that is publicly +documented (and with an implementation available to the public in +source code form), and must require no special password or key for +unpacking, reading or copying. + + 7. Additional Terms. + + "Additional permissions" are terms that supplement the terms of this +License by making exceptions from one or more of its conditions. +Additional permissions that are applicable to the entire Program shall +be treated as though they were included in this License, to the extent +that they are valid under applicable law. If additional permissions +apply only to part of the Program, that part may be used separately +under those permissions, but the entire Program remains governed by +this License without regard to the additional permissions. + + When you convey a copy of a covered work, you may at your option +remove any additional permissions from that copy, or from any part of +it. (Additional permissions may be written to require their own +removal in certain cases when you modify the work.) You may place +additional permissions on material, added by you to a covered work, +for which you have or can give appropriate copyright permission. + + Notwithstanding any other provision of this License, for material you +add to a covered work, you may (if authorized by the copyright holders of +that material) supplement the terms of this License with terms: + + a) Disclaiming warranty or limiting liability differently from the + terms of sections 15 and 16 of this License; or + + b) Requiring preservation of specified reasonable legal notices or + author attributions in that material or in the Appropriate Legal + Notices displayed by works containing it; or + + c) Prohibiting misrepresentation of the origin of that material, or + requiring that modified versions of such material be marked in + reasonable ways as different from the original version; or + + d) Limiting the use for publicity purposes of names of licensors or + authors of the material; or + + e) Declining to grant rights under trademark law for use of some + trade names, trademarks, or service marks; or + + f) Requiring indemnification of licensors and authors of that + material by anyone who conveys the material (or modified versions of + it) with contractual assumptions of liability to the recipient, for + any liability that these contractual assumptions directly impose on + those licensors and authors. + + All other non-permissive additional terms are considered "further +restrictions" within the meaning of section 10. If the Program as you +received it, or any part of it, contains a notice stating that it is +governed by this License along with a term that is a further +restriction, you may remove that term. If a license document contains +a further restriction but permits relicensing or conveying under this +License, you may add to a covered work material governed by the terms +of that license document, provided that the further restriction does +not survive such relicensing or conveying. + + If you add terms to a covered work in accord with this section, you +must place, in the relevant source files, a statement of the +additional terms that apply to those files, or a notice indicating +where to find the applicable terms. + + Additional terms, permissive or non-permissive, may be stated in the +form of a separately written license, or stated as exceptions; +the above requirements apply either way. + + 8. Termination. + + You may not propagate or modify a covered work except as expressly +provided under this License. Any attempt otherwise to propagate or +modify it is void, and will automatically terminate your rights under +this License (including any patent licenses granted under the third +paragraph of section 11). + + However, if you cease all violation of this License, then your +license from a particular copyright holder is reinstated (a) +provisionally, unless and until the copyright holder explicitly and +finally terminates your license, and (b) permanently, if the copyright +holder fails to notify you of the violation by some reasonable means +prior to 60 days after the cessation. + + Moreover, your license from a particular copyright holder is +reinstated permanently if the copyright holder notifies you of the +violation by some reasonable means, this is the first time you have +received notice of violation of this License (for any work) from that +copyright holder, and you cure the violation prior to 30 days after +your receipt of the notice. + + Termination of your rights under this section does not terminate the +licenses of parties who have received copies or rights from you under +this License. If your rights have been terminated and not permanently +reinstated, you do not qualify to receive new licenses for the same +material under section 10. + + 9. Acceptance Not Required for Having Copies. + + You are not required to accept this License in order to receive or +run a copy of the Program. Ancillary propagation of a covered work +occurring solely as a consequence of using peer-to-peer transmission +to receive a copy likewise does not require acceptance. However, +nothing other than this License grants you permission to propagate or +modify any covered work. These actions infringe copyright if you do +not accept this License. Therefore, by modifying or propagating a +covered work, you indicate your acceptance of this License to do so. + + 10. Automatic Licensing of Downstream Recipients. + + Each time you convey a covered work, the recipient automatically +receives a license from the original licensors, to run, modify and +propagate that work, subject to this License. You are not responsible +for enforcing compliance by third parties with this License. + + An "entity transaction" is a transaction transferring control of an +organization, or substantially all assets of one, or subdividing an +organization, or merging organizations. If propagation of a covered +work results from an entity transaction, each party to that +transaction who receives a copy of the work also receives whatever +licenses to the work the party's predecessor in interest had or could +give under the previous paragraph, plus a right to possession of the +Corresponding Source of the work from the predecessor in interest, if +the predecessor has it or can get it with reasonable efforts. + + You may not impose any further restrictions on the exercise of the +rights granted or affirmed under this License. For example, you may +not impose a license fee, royalty, or other charge for exercise of +rights granted under this License, and you may not initiate litigation +(including a cross-claim or counterclaim in a lawsuit) alleging that +any patent claim is infringed by making, using, selling, offering for +sale, or importing the Program or any portion of it. + + 11. Patents. + + A "contributor" is a copyright holder who authorizes use under this +License of the Program or a work on which the Program is based. The +work thus licensed is called the contributor's "contributor version". + + A contributor's "essential patent claims" are all patent claims +owned or controlled by the contributor, whether already acquired or +hereafter acquired, that would be infringed by some manner, permitted +by this License, of making, using, or selling its contributor version, +but do not include claims that would be infringed only as a +consequence of further modification of the contributor version. For +purposes of this definition, "control" includes the right to grant +patent sublicenses in a manner consistent with the requirements of +this License. + + Each contributor grants you a non-exclusive, worldwide, royalty-free +patent license under the contributor's essential patent claims, to +make, use, sell, offer for sale, import and otherwise run, modify and +propagate the contents of its contributor version. + + In the following three paragraphs, a "patent license" is any express +agreement or commitment, however denominated, not to enforce a patent +(such as an express permission to practice a patent or covenant not to +sue for patent infringement). To "grant" such a patent license to a +party means to make such an agreement or commitment not to enforce a +patent against the party. + + If you convey a covered work, knowingly relying on a patent license, +and the Corresponding Source of the work is not available for anyone +to copy, free of charge and under the terms of this License, through a +publicly available network server or other readily accessible means, +then you must either (1) cause the Corresponding Source to be so +available, or (2) arrange to deprive yourself of the benefit of the +patent license for this particular work, or (3) arrange, in a manner +consistent with the requirements of this License, to extend the patent +license to downstream recipients. "Knowingly relying" means you have +actual knowledge that, but for the patent license, your conveying the +covered work in a country, or your recipient's use of the covered work +in a country, would infringe one or more identifiable patents in that +country that you have reason to believe are valid. + + If, pursuant to or in connection with a single transaction or +arrangement, you convey, or propagate by procuring conveyance of, a +covered work, and grant a patent license to some of the parties +receiving the covered work authorizing them to use, propagate, modify +or convey a specific copy of the covered work, then the patent license +you grant is automatically extended to all recipients of the covered +work and works based on it. + + A patent license is "discriminatory" if it does not include within +the scope of its coverage, prohibits the exercise of, or is +conditioned on the non-exercise of one or more of the rights that are +specifically granted under this License. You may not convey a covered +work if you are a party to an arrangement with a third party that is +in the business of distributing software, under which you make payment +to the third party based on the extent of your activity of conveying +the work, and under which the third party grants, to any of the +parties who would receive the covered work from you, a discriminatory +patent license (a) in connection with copies of the covered work +conveyed by you (or copies made from those copies), or (b) primarily +for and in connection with specific products or compilations that +contain the covered work, unless you entered into that arrangement, +or that patent license was granted, prior to 28 March 2007. + + Nothing in this License shall be construed as excluding or limiting +any implied license or other defenses to infringement that may +otherwise be available to you under applicable patent law. + + 12. No Surrender of Others' Freedom. + + If conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot convey a +covered work so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you may +not convey it at all. For example, if you agree to terms that obligate you +to collect a royalty for further conveying from those to whom you convey +the Program, the only way you could satisfy both those terms and this +License would be to refrain entirely from conveying the Program. + + 13. Remote Network Interaction; Use with the GNU General Public License. + + Notwithstanding any other provision of this License, if you modify the +Program, your modified version must prominently offer all users +interacting with it remotely through a computer network (if your version +supports such interaction) an opportunity to receive the Corresponding +Source of your version by providing access to the Corresponding Source +from a network server at no charge, through some standard or customary +means of facilitating copying of software. This Corresponding Source +shall include the Corresponding Source for any work covered by version 3 +of the GNU General Public License that is incorporated pursuant to the +following paragraph. + + Notwithstanding any other provision of this License, you have +permission to link or combine any covered work with a work licensed +under version 3 of the GNU General Public License into a single +combined work, and to convey the resulting work. The terms of this +License will continue to apply to the part which is the covered work, +but the work with which it is combined will remain governed by version +3 of the GNU General Public License. + + 14. Revised Versions of this License. + + The Free Software Foundation may publish revised and/or new versions of +the GNU Affero General Public License from time to time. Such new versions +will be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + + Each version is given a distinguishing version number. If the +Program specifies that a certain numbered version of the GNU Affero General +Public License "or any later version" applies to it, you have the +option of following the terms and conditions either of that numbered +version or of any later version published by the Free Software +Foundation. If the Program does not specify a version number of the +GNU Affero General Public License, you may choose any version ever published +by the Free Software Foundation. + + If the Program specifies that a proxy can decide which future +versions of the GNU Affero General Public License can be used, that proxy's +public statement of acceptance of a version permanently authorizes you +to choose that version for the Program. + + Later license versions may give you additional or different +permissions. However, no additional obligations are imposed on any +author or copyright holder as a result of your choosing to follow a +later version. + + 15. Disclaimer of Warranty. + + THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY +APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT +HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY +OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, +THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM +IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF +ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + + 16. Limitation of Liability. + + IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS +THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY +GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE +USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF +DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD +PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), +EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF +SUCH DAMAGES. + + 17. Interpretation of Sections 15 and 16. + + If the disclaimer of warranty and limitation of liability provided +above cannot be given local legal effect according to their terms, +reviewing courts shall apply local law that most closely approximates +an absolute waiver of all civil liability in connection with the +Program, unless a warranty or assumption of liability accompanies a +copy of the Program in return for a fee. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +state the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + + Copyright (C) + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU Affero General Public License as published + by the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU Affero General Public License for more details. + + You should have received a copy of the GNU Affero General Public License + along with this program. If not, see . + +Also add information on how to contact you by electronic and paper mail. + + If your software can interact with users remotely through a computer +network, you should also make sure that it provides a way for users to +get its source. For example, if your program is a web application, its +interface could display a "Source" link that leads users to an archive +of the code. There are many ways you could offer source, and different +solutions will be better for different programs; see section 13 for the +specific requirements. + + You should also get your employer (if you work as a programmer) or school, +if any, to sign a "copyright disclaimer" for the program, if necessary. +For more information on this, and how to apply and follow the GNU AGPL, see +. diff --git a/roles/minio_install/charts/operator/.helmignore b/roles/minio_install/charts/operator/.helmignore new file mode 100644 index 00000000..50af0317 --- /dev/null +++ b/roles/minio_install/charts/operator/.helmignore @@ -0,0 +1,22 @@ +# Patterns to ignore when building packages. +# This supports shell glob matching, relative path matching, and +# negation (prefixed with !). Only one pattern per line. +.DS_Store +# Common VCS dirs +.git/ +.gitignore +.bzr/ +.bzrignore +.hg/ +.hgignore +.svn/ +# Common backup files +*.swp +*.bak +*.tmp +*~ +# Various IDEs +.project +.idea/ +*.tmproj +.vscode/ diff --git a/roles/minio_install/charts/operator/Chart.yaml b/roles/minio_install/charts/operator/Chart.yaml index db3a00f0..9e0ed0e3 100644 --- a/roles/minio_install/charts/operator/Chart.yaml +++ b/roles/minio_install/charts/operator/Chart.yaml @@ -16,8 +16,8 @@ apiVersion: v2 description: A Helm chart for MinIO Operator name: operator -version: 4.4.1 -appVersion: v4.4.1 +version: 4.4.28 +appVersion: v4.4.28 keywords: - storage - object-storage diff --git a/roles/minio_install/charts/operator/README.md b/roles/minio_install/charts/operator/README.md new file mode 100644 index 00000000..816df345 --- /dev/null +++ b/roles/minio_install/charts/operator/README.md @@ -0,0 +1,60 @@ + +# MinIO ![license](https://img.shields.io/badge/license-AGPL%20V3-blue) + +[MinIO](https://min.io) is a High Performance Object Storage released under GNU AGPLv3 or later. It is API compatible +with Amazon S3 cloud storage service. Use MinIO to build high performance infrastructure for machine learning, analytics +and application data workloads. + +For more detailed documentation please visit [here](https://docs.minio.io/) + +Introduction +------------ + +This chart bootstraps MinIO Operator on a [Kubernetes](http://kubernetes.io) cluster using the [Helm](https://helm.sh) package manager. + +Configure MinIO Helm repo +-------------------- + +```bash +helm repo add minio https://operator.min.io/ +``` + +Installing the Chart +-------------------- + +Install this chart using: + +```bash +helm install \ + --namespace minio-operator \ + --create-namespace \ + minio-operator minio/operator +``` + +The command deploys MinIO Operator on the Kubernetes cluster in the default configuration. + +Creating a Tenant +----------------- + +Once the MinIO Operator Chart is successfully installed, create a MinIO Tenant using: + +```bash +helm install --namespace tenant-ns \ + --create-namespace tenant minio/tenant +``` + +This creates a 4 Node MinIO Tenant (cluster). To change the default values, take a look at various [values.yaml](https://github.com/minio/operator/blob/master/helm/tenant/values.yaml). diff --git a/roles/minio_install/charts/operator/templates/NOTES.txt b/roles/minio_install/charts/operator/templates/NOTES.txt index d2fd6ae4..53d326f9 100644 --- a/roles/minio_install/charts/operator/templates/NOTES.txt +++ b/roles/minio_install/charts/operator/templates/NOTES.txt @@ -1,5 +1,15 @@ 1. Get the JWT for logging in to the console: - kubectl get secret $(kubectl get serviceaccount console-sa --namespace {{ .Release.Namespace }} -o jsonpath="{.secrets[0].name}") --namespace {{ .Release.Namespace }} -o jsonpath="{.data.token}" | base64 --decode +kubectl apply -f - < +# MinIO ![license](https://img.shields.io/badge/license-AGPL%20V3-blue) + +[MinIO](https://min.io) is a High Performance Object Storage released under GNU AGPLv3 or later. It is API compatible +with Amazon S3 cloud storage service. Use MinIO to build high performance infrastructure for machine learning, analytics +and application data workloads. + +For more detailed documentation please visit [here](https://docs.minio.io/) + +Introduction +------------ + +This chart bootstraps MinIO Tenant on a [Kubernetes](http://kubernetes.io) cluster using the [Helm](https://helm.sh) package manager. + +Configure MinIO Helm repo +-------------------- + +```bash +helm repo add minio https://operator.min.io/ +``` + +Creating a Tenant with Helm Chart +----------------- + +Once the [MinIO Operator Chart](https://github.com/minio/operator/tree/master/helm/operator) is successfully installed, create a MinIO Tenant using: + +```bash +helm install --namespace tenant-ns \ + --create-namespace tenant minio/tenant +``` + +This creates a 4 Node MinIO Tenant (cluster). To change the default values, take a look at various [values.yaml](https://github.com/minio/operator/blob/master/helm/tenant/values.yaml). diff --git a/roles/minio_install/charts/tenant/templates/NOTES.txt b/roles/minio_install/charts/tenant/templates/NOTES.txt index 1dc4b5b4..235e67af 100644 --- a/roles/minio_install/charts/tenant/templates/NOTES.txt +++ b/roles/minio_install/charts/tenant/templates/NOTES.txt @@ -1,13 +1,13 @@ -{{ range .Values.tenants }} - To connect to the {{.name}} tenant if it doesn't have a service exposed, you can port-forward to it by running: - {{- if dig "certificate" "requestAutoCert" false .}} +{{- with .Values.tenant }} + To connect to the {{ .name }} tenant if it doesn't have a service exposed, you can port-forward to it by running: + {{- if dig "certificate" "requestAutoCert" false . }} - kubectl --namespace {{ .namespace }} port-forward svc/{{ .name }}-console 9443:9443 + kubectl --namespace {{ $.Release.Namespace }} port-forward svc/{{ .name }}-console 9443:9443 Then visit the MinIO Console at https://127.0.0.1:9443 {{ else }} - kubectl --namespace {{ .namespace }} port-forward svc/{{ .name }}-console 9090:9090 + kubectl --namespace {{ $.Release.Namespace }} port-forward svc/{{ .name }}-console 9090:9090 Then visit the MinIO Console at http://127.0.0.1:9090 {{ end }} -{{ end }} +{{- end }} diff --git a/roles/minio_install/charts/tenant/templates/api-ingress.yaml b/roles/minio_install/charts/tenant/templates/api-ingress.yaml new file mode 100644 index 00000000..15bfba31 --- /dev/null +++ b/roles/minio_install/charts/tenant/templates/api-ingress.yaml @@ -0,0 +1,41 @@ +{{- if .Values.ingress.api.enabled }} +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: {{ .Values.tenant.name }} + {{- with .Values.ingress.api.labels }} + labels: {{ toYaml . | nindent 4 }} + {{- end }} + {{- with .Values.ingress.api.annotations }} + annotations: {{ toYaml . | nindent 4 }} + {{- end }} +spec: + {{- if .Values.ingress.api.ingressClassName }} + ingressClassName: {{ .Values.ingress.api.ingressClassName }} + {{- end }} + {{- if .Values.ingress.api.tls }} + tls: + {{- range .Values.ingress.api.tls }} + - hosts: + {{- range .hosts }} + - {{ . | quote }} + {{- end }} + secretName: {{ .secretName }} + {{- end }} + {{- end }} + rules: + - host: {{ .Values.ingress.api.host }} + http: + paths: + - path: {{ .Values.ingress.api.path }} + pathType: {{ .Values.ingress.api.pathType }} + backend: + service: + name: minio + port: + {{- if or .Values.tenant.certificate.requestAutoCert (not (empty .Values.tenant.certificate.externalCertSecret)) }} + name: https-minio + {{- else }} + name: http-minio + {{- end }} +{{ end }} diff --git a/roles/minio_install/charts/tenant/templates/console-ingress.yaml b/roles/minio_install/charts/tenant/templates/console-ingress.yaml new file mode 100644 index 00000000..c036f6df --- /dev/null +++ b/roles/minio_install/charts/tenant/templates/console-ingress.yaml @@ -0,0 +1,41 @@ +{{- if .Values.ingress.console.enabled }} +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: {{ .Values.tenant.name }}-console + {{- with .Values.ingress.console.labels }} + labels: {{ toYaml . | nindent 4 }} + {{- end }} + {{- with .Values.ingress.console.annotations }} + annotations: {{ toYaml . | nindent 4 }} + {{- end }} +spec: + {{- if .Values.ingress.console.ingressClassName }} + ingressClassName: {{ .Values.ingress.console.ingressClassName }} + {{- end }} + {{- if .Values.ingress.console.tls }} + tls: + {{- range .Values.ingress.console.tls }} + - hosts: + {{- range .hosts }} + - {{ . | quote }} + {{- end }} + secretName: {{ .secretName }} + {{- end }} + {{- end }} + rules: + - host: {{ .Values.ingress.console.host }} + http: + paths: + - path: {{ .Values.ingress.console.path }} + pathType: {{ .Values.ingress.console.pathType }} + backend: + service: + name: {{ .Values.tenant.name }}-console + port: + {{- if or .Values.tenant.certificate.requestAutoCert (not (empty .Values.tenant.certificate.externalCertSecret)) }} + name: https-console + {{- else }} + name: http-console + {{- end }} +{{ end }} diff --git a/roles/minio_install/charts/tenant/templates/kes-configuration-secret.yaml b/roles/minio_install/charts/tenant/templates/kes-configuration-secret.yaml new file mode 100644 index 00000000..e6c67174 --- /dev/null +++ b/roles/minio_install/charts/tenant/templates/kes-configuration-secret.yaml @@ -0,0 +1,9 @@ +{{- if dig "tenant" "kes" "configuration" false (.Values | merge (dict)) }} +apiVersion: v1 +kind: Secret +metadata: + name: "kes-configuration" +type: Opaque +stringData: + server-config.yaml: {{ .Values.tenant.kes.configuration | toYaml | indent 2 }} +{{- end }} \ No newline at end of file diff --git a/roles/minio_install/charts/tenant/templates/tenant-configuration.yaml b/roles/minio_install/charts/tenant/templates/tenant-configuration.yaml new file mode 100644 index 00000000..e0983b53 --- /dev/null +++ b/roles/minio_install/charts/tenant/templates/tenant-configuration.yaml @@ -0,0 +1,11 @@ +{{- if dig "secrets" false (.Values | merge (dict)) }} +apiVersion: v1 +kind: Secret +metadata: + name: {{ dig "secrets" "name" "" (.Values | merge (dict)) }} +type: Opaque +stringData: + config.env: |- + export MINIO_ROOT_USER={{ .Values.secrets.accessKey | quote }} + export MINIO_ROOT_PASSWORD={{ .Values.secrets.secretKey | quote }} +{{- end }} diff --git a/roles/minio_install/charts/tenant/templates/tenant-secret-deprecated.yaml b/roles/minio_install/charts/tenant/templates/tenant-secret-deprecated.yaml new file mode 100644 index 00000000..dd04ed77 --- /dev/null +++ b/roles/minio_install/charts/tenant/templates/tenant-secret-deprecated.yaml @@ -0,0 +1,10 @@ +apiVersion: v1 +kind: Secret +metadata: + name: "tenant-secret" +type: Opaque +data: + ## Access Key for MinIO Tenant + accesskey: "" + ## Secret Key for MinIO Tenant + secretkey: "" diff --git a/roles/minio_install/charts/tenant/templates/tenant-secret.yaml b/roles/minio_install/charts/tenant/templates/tenant-secret.yaml deleted file mode 100644 index 7d7ac1ad..00000000 --- a/roles/minio_install/charts/tenant/templates/tenant-secret.yaml +++ /dev/null @@ -1,16 +0,0 @@ -{{ range .Values.tenants }} - {{- if dig "secrets" "enabled" false . }} ---- -apiVersion: v1 -kind: Secret -metadata: - name: {{ dig "secrets" "name" "" . }} - namespace: {{ .namespace }} -type: Opaque -data: - ## Access Key for MinIO Tenant - accesskey: {{ dig "secrets" "accessKey" "" . | b64enc }} - ## Secret Key for MinIO Tenant - secretkey: {{ dig "secrets" "secretKey" "" . | b64enc }} - {{- end }} - {{ end }} diff --git a/roles/minio_install/charts/tenant/templates/tenant.yaml b/roles/minio_install/charts/tenant/templates/tenant.yaml index 9a2a3d7c..d06323d6 100644 --- a/roles/minio_install/charts/tenant/templates/tenant.yaml +++ b/roles/minio_install/charts/tenant/templates/tenant.yaml @@ -1,40 +1,46 @@ -{{ range .Values.tenants }} ---- +{{- with .Values.tenant }} apiVersion: minio.min.io/v2 kind: Tenant metadata: name: {{ .name }} - namespace: {{ .namespace }} ## Optionally pass labels to be applied to the statefulset pods labels: app: minio - {{ if dig "metrics" "enabled" false . }} + {{- if dig "metrics" "enabled" false . }} ## Annotations for MinIO Tenant Pods annotations: prometheus.io/path: /minio/v2/metrics/cluster prometheus.io/port: {{ dig "metrics" "port" 9000 . | quote }} prometheus.io/scrape: "true" - {{ end }} - {{ if dig "scheduler" "name" "" . }} + prometheus.io/scheme: {{ dig "metrics" "protocol" "http" . | quote }} + {{- end }} + {{- if dig "scheduler" "name" "" . }} scheduler: name: {{ dig "scheduler" "name" "" . }} - {{ end }} + {{- end }} spec: - image: {{ dig "image" "repository" "minio/minio" . }}:{{ dig "image" "tag" "RELEASE.2022-01-04T07-41-07Z" . }} + image: {{ dig "image" "repository" "minio/minio" . }}:{{ dig "image" "tag" "RELEASE.2022-01-08T03-11-54Z" . }} imagePullPolicy: {{ dig "image" "pullPolicy" "IfNotPresent" . }} - {{ if dig "imagePullSecret" "name" "" . }} + {{- if dig "imagePullSecret" "name" "" . }} imagePullSecret: name: {{ dig "imagePullSecret" "name" "" . }} - {{ end }} + {{- end }} + {{- if dig "secrets" false ($.Values | merge (dict)) }} ## Secret with credentials to be used by MinIO Tenant. - {{ if dig "secrets" "enabled" false . }} + configuration: + name: {{ dig "secrets" "name" "" ($.Values | merge (dict)) }} + ## Deprecated credsSecret credsSecret: - name: {{ dig "secrets" "name" "" . }} - {{ end }} + name: "tenant-secret" + {{- end }} pools: - {{ range (dig "pools" (list) .) }} + {{- range (dig "pools" (list) .) }} - servers: {{ dig "servers" 4 . }} + name: {{ dig "name" "" . }} volumesPerServer: {{ dig "volumesPerServer" 4 . }} + {{- if dig "runtimeClassName" "" . }} + runtimeClassName: {{ dig "runtimeClassName" "" . }} + {{- end }} volumeClaimTemplate: metadata: name: data @@ -47,80 +53,271 @@ spec: storage: {{ dig "size" "10Gi" . }} {{- with (dig "annotations" (dict) .) }} annotations: - {{ toYaml . | nindent 8 }} - {{ end }} + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with (dig "labels" (dict) .) }} + labels: + {{- toYaml . | nindent 8 }} + {{- end }} {{- with (dig "tolerations" (list) .) }} tolerations: - {{ toYaml . | nindent 8 }} - {{ end }} + {{- toYaml . | nindent 8 }} + {{- end }} {{- with (dig "nodeSelector" (dict) .) }} nodeSelector: - {{ toYaml . | nindent 8 }} - {{ end }} + {{- toYaml . | nindent 8 }} + {{- end }} {{- with (dig "affinity" (dict) .) }} affinity: - {{ toYaml . | nindent 8 }} - {{ end }} + {{- toYaml . | nindent 8 }} + {{- end }} {{- with (dig "resources" (dict) .) }} resources: - {{ toYaml . | nindent 8 }} - {{ end }} + {{- toYaml . | nindent 8 }} + {{- end }} {{- with (dig "securityContext" (dict) .) }} securityContext: - {{ toYaml . | nindent 8 }} - {{ end }} + {{- toYaml . | nindent 8 }} + {{- end }} {{- with (dig "topologySpreadConstraints" (list) .) }} topologySpreadConstraints: - {{ toYaml . | nindent 8 }} - {{ end }} - {{ end }} + {{- toYaml . | nindent 8 }} + {{- end }} + {{- end }} mountPath: {{ dig "mountPath" "/export" . }} subPath: {{ dig "subPath" "/data" . }} - {{- with (dig "certificate" "externalCaCertSecret" (dict) .) }} + {{- with (dig "certificate" "externalCaCertSecret" (list) .) }} externalCaCertSecret: - {{ toYaml . | nindent 6 }} - {{ end }} - {{- with (dig "certificate" "externalCertSecret" (dict) .) }} + {{- toYaml . | nindent 6 }} + {{- end }} + {{- with (dig "certificate" "externalCertSecret" (list) .) }} externalCertSecret: - {{ toYaml . | nindent 6 }} - {{ end }} + {{- toYaml . | nindent 6 }} + {{- end }} requestAutoCert: {{ dig "certificate" "requestAutoCert" false . }} - s3: - bucketDNS: {{ dig "s3" "bucketDNS" false . }} + {{- if dig "s3" "bucketDNS" false . }} + {{- fail "Value 'tenant.s3.bucketDNS' is deprecated since Operator v4.3.2, use 'tenant.features.bucketDNS' instead" }} + {{- end }} + features: + bucketDNS: {{ dig "features" "bucketDNS" false . }} + {{- with (dig "features" "domains" (dict) .) }} + domains: + {{- toYaml . | nindent 6 }} + {{- end }} + {{- with (dig "buckets" (list) .) }} + buckets: + {{- toYaml . | nindent 4 }} + {{- end }} + {{- with (dig "users" (list) .) }} + users: + {{- toYaml . | nindent 4 }} + {{- end }} {{- with (dig "certificate" "certConfig" (dict) .) }} certConfig: - {{ toYaml . | nindent 4 }} + {{- toYaml . | nindent 4 }} {{- end }} podManagementPolicy: {{ dig "podManagementPolicy" "Parallel" . }} - # {{- with (dig "readiness" (dict) .) }} - # readiness: - # {{ toYaml . | nindent 4 }} - # {{- end }} - # {{- with (dig "liveness" (dict) .) }} - # liveness: - # {{ toYaml . | nindent 4 }} - # {{- end }} - # {{- with (dig "exposeServices" (dict) .) }} - # exposeServices: - # {{ toYaml . | nindent 4 }} - # {{- end }} - # {{ if dig "serviceAccountName" "" . }} - # serviceAccountName: {{ dig "serviceAccountName" "" . }} - # {{ end }} - # prometheusOperator: {{ dig "prometheusOperator" "false" . }} - # {{- with (dig "logging" (dict) .) }} - # logging: - # {{ toYaml . | nindent 4 }} - # {{- end }} - # {{- with (dig "serviceMetadata" (dict) .) }} + {{- with (dig "readiness" (dict) .) }} + readiness: + {{- toYaml . | nindent 4 }} + {{- end }} + {{- with (dig "liveness" (dict) .) }} + liveness: + {{- toYaml . | nindent 4 }} + {{- end }} + {{- with (dig "exposeServices" (dict) .) }} + exposeServices: + {{- toYaml . | nindent 4 }} + {{- end }} + {{- if dig "serviceAccountName" "" . }} + serviceAccountName: {{ dig "serviceAccountName" "" . }} + {{- end }} + prometheusOperator: {{ dig "prometheusOperator" "false" . }} + {{- with (dig "logging" (dict) .) }} + logging: + {{- toYaml . | nindent 4 }} + {{- end }} + {{- with (dig "serviceMetadata" (dict) .) }} serviceMetadata: - {{ toYaml . | nindent 4 }} + {{- toYaml . | nindent 4 }} {{- end }} - {{- with (dig "env" (dict) .) }} + {{- with (dig "env" (list) .) }} env: - {{ toYaml . | nindent 4 }} + {{- toYaml . | nindent 4 }} {{- end }} - {{ if dig "priorityClassName" "" . }} + {{- if dig "priorityClassName" "" . }} priorityClassName: {{ dig "priorityClassName" "" . }} - {{ end }} -{{ end }} + {{- end }} + {{- if dig "kes" "configuration" false . }} + kes: + image: {{ .kes.image | quote }} + {{- with (dig "kes" "env" (list) .) }} + env: + {{- toYaml . | nindent 4 }} + {{- end }} + replicas: {{ .kes.replicas | int }} + kesSecret: + name: "kes-configuration" + imagePullPolicy: {{ .kes.imagePullPolicy | quote }} + externalCertSecret: {{ .kes.externalCertSecret | quote }} + clientCertSecret: {{ .kes.clientCertSecret | quote }} + ## Key name to be created on the KMS, default is "my-minio-key" + keyName: {{ .kes.keyName | quote }} + {{- with (dig "resources" (dict) .) }} + resources: + {{- toYaml . | nindent 4 }} + {{- end }} + {{- with (dig "nodeSelector" (dict) .) }} + nodeSelector: + {{- toYaml . | nindent 4 }} + {{- end }} + affinity: + nodeAffinity: { } + podAffinity: { } + podAntiAffinity: { } + tolerations: [ ] + {{- with (dig "annotations" (dict) .) }} + annotations: + {{- toYaml . | nindent 4 }} + {{- end }} + {{- with (dig "labels" (dict) .) }} + labels: + {{- toYaml . | nindent 4 }} + {{- end }} + serviceAccountName: {{ .kes.serviceAccountName | quote }} + securityContext: + runAsUser: {{ .kes.securityContext.runAsUser | int }} + runAsGroup: {{ .kes.securityContext.runAsGroup | int }} + runAsNonRoot: {{ .kes.securityContext.runAsNonRoot }} + fsGroup: {{ .kes.securityContext.fsGroup | int }} + {{- end }} + + {{- if dig "prometheus" "diskCapacityGB" false . }} + ## Prometheus setup for MinIO Tenant. + prometheus: + image: {{ .prometheus.image | quote }} + {{- with (dig "prometheus" "env" (list) .) }} + env: + {{- toYaml . | nindent 4 }} + {{- end }} + sidecarimage: {{ .prometheus.sidecarimage | quote }} + initimage: {{ .prometheus.initimage | quote }} + diskCapacityGB: {{ .prometheus.diskCapacityGB | int }} + storageClassName: {{ .prometheus.storageClassName }} + {{- with (dig "annotations" (dict) .) }} + annotations: + {{- toYaml . | nindent 4 }} + {{- end }} + {{- with (dig "labels" (dict) .) }} + labels: + {{- toYaml . | nindent 4 }} + {{- end }} + {{- with (dig "nodeSelector" (dict) .) }} + nodeSelector: + {{- toYaml . | nindent 4 }} + {{- end }} + affinity: + nodeAffinity: { } + podAffinity: { } + podAntiAffinity: { } + {{- with (dig "resources" (dict) .) }} + resources: + {{- toYaml . | nindent 4 }} + {{- end }} + serviceAccountName: {{ .prometheus.serviceAccountName | quote }} + securityContext: + runAsUser: {{ .prometheus.securityContext.runAsUser | int }} + runAsGroup: {{ .prometheus.securityContext.runAsGroup | int }} + runAsNonRoot: {{ .prometheus.securityContext.runAsNonRoot }} + fsGroup: {{ .prometheus.securityContext.fsGroup | int }} + {{- end }} + + {{- if dig "log" "audit" "diskCapacityGB" false . }} + ## LogSearch API setup for MinIO Tenant. + log: + image: {{ .log.image | quote }} + {{- with (dig "log" "env" (list) .) }} + env: + {{- toYaml . | nindent 4 }} + {{- end }} + {{- with (dig "resources" (dict) .) }} + resources: + {{ toYaml . | nindent 4 }} + {{- end }} + {{- with (dig "nodeSelector" (dict) .) }} + nodeSelector: + {{ toYaml . | nindent 4 }} + {{- end }} + affinity: + nodeAffinity: { } + podAffinity: { } + podAntiAffinity: { } + tolerations: [ ] + {{- with (dig "annotations" (dict) .) }} + annotations: + {{ toYaml . | nindent 4 }} + {{- end }} + {{- with (dig "labels" (dict) .) }} + labels: + {{ toYaml . | nindent 4 }} + {{- end }} + audit: + diskCapacityGB: {{ .log.audit.diskCapacityGB | int }} + db: + image: {{ .log.db.image | quote }} + {{- with (dig "log" "db" "env" (list) .) }} + env: + {{- range . }} + - name: {{ .name | quote }} + value: {{ .value | quote }} + {{- end }} + {{- end }} + initimage: {{ .log.db.initimage | quote }} + volumeClaimTemplate: + {{- with (dig "metadata" (dict) .) }} + metadata: + {{ toYaml . | nindent 4 }} + {{- end }} + spec: + storageClassName: {{ .log.db.volumeClaimTemplate.spec.storageClassName | quote }} + accessModes: + - ReadWriteOnce + resources: + requests: + storage: {{ .log.db.volumeClaimTemplate.spec.resources.requests.storage | quote }} + {{- with (dig "resources" (dict) .) }} + resources: + {{ toYaml . | nindent 4 }} + {{- end }} + {{- with (dig "nodeSelector" (dict) .) }} + nodeSelector: + {{ toYaml . | nindent 4 }} + {{- end }} + affinity: + nodeAffinity: { } + podAffinity: { } + podAntiAffinity: { } + tolerations: [ ] + + {{- with (dig "annotations" (dict) .) }} + annotations: + {{ toYaml . | nindent 4 }} + {{- end }} + {{- with (dig "labels" (dict) .) }} + labels: + {{ toYaml . | nindent 4 }} + {{- end }} + serviceAccountName: {{ .log.db.serviceAccountName | quote }} + securityContext: + runAsUser: {{ .log.db.securityContext.runAsUser | int }} + runAsGroup: {{ .log.db.securityContext.runAsGroup | int }} + runAsNonRoot: true + fsGroup: {{ .log.db.securityContext.fsGroup | int }} + serviceAccountName: {{ .log.serviceAccountName | quote }} + securityContext: + runAsUser: {{ .log.securityContext.runAsUser | int }} + runAsGroup: {{ .log.securityContext.runAsGroup | int }} + runAsNonRoot: true + fsGroup: {{ .log.securityContext.fsGroup | int }} + {{- end }} +{{- end }} diff --git a/roles/minio_install/charts/tenant/values.yaml b/roles/minio_install/charts/tenant/values.yaml index 3247bb5e..d3e39eb4 100644 --- a/roles/minio_install/charts/tenant/values.yaml +++ b/roles/minio_install/charts/tenant/values.yaml @@ -14,116 +14,307 @@ ## limitations under the License. ## +## Secret with credentials to be used by MinIO Tenant +secrets: + # create a kubernetes configuration secret object with the accessKey and secretKey as defined here. + name: minio1-env-configuration + accessKey: minio + secretKey: minio123 + ## MinIO Tenant Definition -tenants: +tenant: # Tenant name - - name: minio1 - ## Registry location and Tag to download MinIO Server image - image: - repository: quay.io/minio/minio - tag: RELEASE.2022-01-04T07-41-07Z - pullPolicy: IfNotPresent - ## Customize namespace for tenant deployment - namespace: default - ## Customize any private registry image pull secret. - ## currently only one secret registry is supported - imagePullSecret: { } - ## If a scheduler is specified here, Tenant pods will be dispatched by specified scheduler. - ## If not specified, the Tenant pods will be dispatched by default scheduler. - scheduler: { } - ## Specification for MinIO Pool(s) in this Tenant. - pools: - ## Servers specifies the number of MinIO Tenant Pods / Servers in this pool. - ## For standalone mode, supply 1. For distributed mode, supply 4 or more. - ## Note that the operator does not support upgrading from standalone to distributed mode. - - servers: 4 - ## volumesPerServer specifies the number of volumes attached per MinIO Tenant Pod / Server. - volumesPerServer: 4 - ## size specifies the capacity per volume - size: 1Gi - ## storageClass specifies the storage class name to be used for this pool - storageClassName: standard - ## Used to specify annotations for pods - annotations: { } - ## Used to specify a toleration for a pod - tolerations: { } - ## nodeSelector parameters for MinIO Pods. It specifies a map of key-value pairs. For the pod to be - ## eligible to run on a node, the node must have each of the - ## indicated key-value pairs as labels. - ## Read more here: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ - nodeSelector: { } - ## Affinity settings for MinIO pods. Read more about affinity - ## here: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity. - affinity: { } - ## Configure resource requests and limits for MinIO containers - resources: { } - ## Configure security context - securityContext: { } - ## Configure topology constraints - topologySpreadConstraints: [ ] - ## Mount path where PV will be mounted inside container(s). - mountPath: /export - ## Sub path inside Mount path where MinIO stores data. - subPath: /data - # pool secrets - secrets: - # create a kubernetes secret object with the accessKey and secretKey as defined here. - enabled: true - name: minio1-secret - accessKey: minio - secretKey: minio123 - # pool metrics to be read by Prometheus - metrics: - enabled: false - port: 9000 - certificate: - ## Use this field to provide one or more external CA certificates. This is used by MinIO - ## to verify TLS connections with other applications: - ## https://github.com/minio/minio/tree/master/docs/tls/kubernetes#2-create-kubernetes-secret - externalCaCertSecret: { } - ## Use this field to provide a list of Secrets with external certificates. This can be used to to configure - ## TLS for MinIO Tenant pods. Create secrets as explained here: - ## https://github.com/minio/minio/tree/master/docs/tls/kubernetes#2-create-kubernetes-secret - externalCertSecret: { } - ## Enable automatic Kubernetes based certificate generation and signing as explained in - ## https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster - requestAutoCert: true - ## This field is used only when "requestAutoCert" is set to true. Use this field to set CommonName - ## for the auto-generated certificate. Internal DNS name for the pod will be used if CommonName is - ## not provided. DNS name format is *.minio.default.svc.cluster.local - certConfig: { } - ## Enable S3 specific features such as Bucket DNS which would allow `buckets` to be - ## accessible as DNS entries of form `.minio.default.svc.cluster.local` - s3: - ## This feature is turned off by default - bucketDNS: false - ## PodManagement policy for MinIO Tenant Pods. Can be "OrderedReady" or "Parallel" - ## Refer https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#pod-management-policy - ## for details. - podManagementPolicy: Parallel - # Liveness Probe for container liveness. Container will be restarted if the probe fails. - # Refer https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes. - liveness: { } - # Readiness Probe for container readiness. Container will be removed from service endpoints if the probe fails. - # Refer https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ - readiness: { } - ## exposeServices defines the exposure of the MinIO object storage and Console services. - ## service is exposed as a loadbalancer in k8s service. - exposeServices: { } - # kubernetes service account associated with a specific tenant - # https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/ + name: minio1 + ## Registry location and Tag to download MinIO Server image + image: + repository: quay.io/minio/minio + tag: RELEASE.2022-05-26T05-48-41Z + pullPolicy: IfNotPresent + ## Customize any private registry image pull secret. + ## currently only one secret registry is supported + imagePullSecret: {} + ## If a scheduler is specified here, Tenant pods will be dispatched by specified scheduler. + ## If not specified, the Tenant pods will be dispatched by default scheduler. + scheduler: {} + ## Specification for MinIO Pool(s) in this Tenant. + pools: + ## Servers specifies the number of MinIO Tenant Pods / Servers in this pool. + ## For standalone mode, supply 1. For distributed mode, supply 4 or more. + ## Note that the operator does not support upgrading from standalone to distributed mode. + - servers: 2 + ## custom name for the pool + name: pool-0 + ## volumesPerServer specifies the number of volumes attached per MinIO Tenant Pod / Server. + volumesPerServer: 4 + ## size specifies the capacity per volume + size: 1Gi + ## storageClass specifies the storage class name to be used for this pool + storageClassName: local-storage + ## Used to specify annotations for pods + annotations: {} + ## Used to specify labels for pods + labels: {} + ## Used to specify a toleration for a pod + tolerations: [] + ## nodeSelector parameters for MinIO Pods. It specifies a map of key-value pairs. For the pod to be + ## eligible to run on a node, the node must have each of the + ## indicated key-value pairs as labels. + ## Read more here: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ + nodeSelector: {} + ## Affinity settings for MinIO pods. Read more about affinity + ## here: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity. + affinity: {} + ## Configure resource requests and limits for MinIO containers + resources: {} + ## Configure security context + securityContext: {} + ## Configure topology constraints + topologySpreadConstraints: [] + ## Configure Runtime Class + # runtimeClassName: "" + ## Mount path where PV will be mounted inside container(s). + mountPath: /export + ## Sub path inside Mount path where MinIO stores data. + subPath: /data + # pool metrics to be read by Prometheus + metrics: + enabled: false + port: 9000 + protocol: http + certificate: + ## Use this field to provide one or more external CA certificates. This is used by MinIO + ## to verify TLS connections with other applications: + ## https://github.com/minio/minio/tree/master/docs/tls/kubernetes#2-create-kubernetes-secret + externalCaCertSecret: [] + ## Use this field to provide a list of Secrets with external certificates. This can be used to configure + ## TLS for MinIO Tenant pods. Create secrets as explained here: + ## https://github.com/minio/minio/tree/master/docs/tls/kubernetes#2-create-kubernetes-secret + externalCertSecret: [] + ## Enable automatic Kubernetes based certificate generation and signing as explained in + ## https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster + requestAutoCert: true + ## This field is used only when "requestAutoCert" is set to true. Use this field to set CommonName + ## for the auto-generated certificate. Internal DNS name for the pod will be used if CommonName is + ## not provided. DNS name format is *.minio.default.svc.cluster.local + certConfig: {} + ## MinIO features to enable or disable in the MinIO Tenant + ## https://github.com/minio/operator/blob/master/docs/crd.adoc#features + features: + bucketDNS: false + domains: {} + ## List of bucket names to create during tenant provisioning + buckets: [] + ## List of secret names to use for generating MinIO users during tenant provisioning + users: [] + ## PodManagement policy for MinIO Tenant Pods. Can be "OrderedReady" or "Parallel" + ## Refer https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#pod-management-policy + ## for details. + podManagementPolicy: Parallel + # Liveness Probe for container liveness. Container will be restarted if the probe fails. + # Refer https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes. + liveness: {} + # Readiness Probe for container readiness. Container will be removed from service endpoints if the probe fails. + # Refer https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ + readiness: {} + ## exposeServices defines the exposure of the MinIO object storage and Console services. + ## service is exposed as a loadbalancer in k8s service. + exposeServices: {} + # kubernetes service account associated with a specific tenant + # https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/ + serviceAccountName: "" + # Tenant scrape configuration will be added to prometheus managed by the prometheus-operator. + prometheusOperator: false + # Enable JSON, Anonymous logging for MinIO tenants. + # Refer https://github.com/minio/operator/blob/master/pkg/apis/minio.min.io/v2/types.go#L303 + # How logs will look: + # $ k logs minio1-pool-0-0 -n default + # {"level":"INFO","errKind":"","time":"2022-04-07T21:49:33.740058549Z","message":"All MinIO sub-systems initialized successfully"} + # Notice they are in JSON format to be consumed + logging: + anonymous: true + json: true + quiet: true + ## serviceMetadata allows passing additional labels and annotations to MinIO and Console specific + ## services created by the operator. + serviceMetadata: {} + ## Add environment variables to be set in MinIO container (https://github.com/minio/minio/tree/master/docs/config) + env: [] + ## PriorityClassName indicates the Pod priority and hence importance of a Pod relative to other Pods. + ## This is applied to MinIO pods only. + ## Refer Kubernetes documentation for details https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass/ + priorityClassName: "" + ## Define configuration for KES (stateless and distributed key-management system) + ## Refer https://github.com/minio/kes + # kes: + # image: "" # minio/kes:v0.18.0 + # env: [] + # replicas: 2 + # configuration: |- + # address: :7373 + # root: _ # Effectively disabled since no root identity necessary. + # tls: + # key: /tmp/kes/server.key # Path to the TLS private key + # cert: /tmp/kes/server.crt # Path to the TLS certificate + # proxy: + # identities: [] + # header: + # cert: X-Tls-Client-Cert + # policy: + # my-policy: + # paths: + # - /v1/key/create/* + # - /v1/key/generate/* + # - /v1/key/decrypt/* + # identities: + # - ${MINIO_KES_IDENTITY} + # cache: + # expiry: + # any: 5m0s + # unused: 20s + # log: + # error: on + # audit: off + # keys: + # ## KES configured with fs (File System mode) doesnt work in Kubernetes environments and it's not recommended + # ## use a real KMS + # # fs: + # # path: "./keys" # Path to directory. Keys will be stored as files. Not Recommended for Production. + # vault: + # endpoint: "http://vault.default.svc.cluster.local:8200" # The Vault endpoint + # namespace: "" # An optional Vault namespace. See: https://www.vaultproject.io/docs/enterprise/namespaces/index.html + # prefix: "my-minio" # An optional K/V prefix. The server will store keys under this prefix. + # approle: # AppRole credentials. See: https://www.vaultproject.io/docs/auth/approle.html + # id: "" # Your AppRole Role ID + # secret: "" # Your AppRole Secret ID + # retry: 15s # Duration until the server tries to re-authenticate after connection loss. + # tls: # The Vault client TLS configuration for mTLS authentication and certificate verification + # key: "" # Path to the TLS client private key for mTLS authentication to Vault + # cert: "" # Path to the TLS client certificate for mTLS authentication to Vault + # ca: "" # Path to one or multiple PEM root CA certificates + # status: # Vault status configuration. The server will periodically reach out to Vault to check its status. + # ping: 10s # Duration until the server checks Vault's status again. + # # aws: + # # # The AWS SecretsManager key store. The server will store + # # # secret keys at the AWS SecretsManager encrypted with + # # # AWS-KMS. See: https://aws.amazon.com/secrets-manager + # # secretsmanager: + # # endpoint: "" # The AWS SecretsManager endpoint - e.g.: secretsmanager.us-east-2.amazonaws.com + # # region: "" # The AWS region of the SecretsManager - e.g.: us-east-2 + # # kmskey: "" # The AWS-KMS key ID used to en/decrypt secrets at the SecretsManager. + # # # By default (if not set) the default AWS-KMS key will be used. + # # credentials: # The AWS credentials for accessing secrets at the AWS SecretsManager. + # # accesskey: "" # Your AWS Access Key + # # secretkey: "" # Your AWS Secret Key + # # token: "" # Your AWS session token (usually optional) + # imagePullPolicy: "IfNotPresent" + # externalCertSecret: null + # clientCertSecret: null + # ## Key name to be created on the KMS, default is "my-minio-key" + # keyName: "" + # resources: {} + # nodeSelector: {} + # affinity: + # nodeAffinity: {} + # podAffinity: {} + # podAntiAffinity: {} + # tolerations: [] + # annotations: {} + # labels: {} + # serviceAccountName: "" + # securityContext: + # runAsUser: 1000 + # runAsGroup: 1000 + # runAsNonRoot: true + # fsGroup: 1000 + ## Prometheus setup for MinIO Tenant. + prometheus: + image: "" # defaults to quay.io/prometheus/prometheus:latest + env: [] + sidecarimage: "" # defaults to alpine + initimage: "" # defaults to busybox:1.33.1 + diskCapacityGB: 1 + storageClassName: local-storage + annotations: {} + labels: {} + nodeSelector: {} + affinity: + nodeAffinity: {} + podAffinity: {} + podAntiAffinity: {} + resources: {} + serviceAccountName: "" + securityContext: + runAsUser: 1000 + runAsGroup: 1000 + runAsNonRoot: true + fsGroup: 1000 + ## LogSearch API setup for MinIO Tenant. + log: + image: "" # defaults to minio/operator:v4.4.17 + env: [] + resources: {} + nodeSelector: {} + affinity: + nodeAffinity: {} + podAffinity: {} + podAntiAffinity: {} + tolerations: [] + annotations: {} + labels: {} + audit: + diskCapacityGB: 1 + ## Postgres setup for LogSearch API + db: + image: "" # defaults to library/postgres + env: [] + initimage: "" # defaults to busybox:1.33.1 + volumeClaimTemplate: + metadata: {} + spec: + storageClassName: local-storage + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 1Gi + resources: {} + nodeSelector: {} + affinity: + nodeAffinity: {} + podAffinity: {} + podAntiAffinity: {} + tolerations: [] + annotations: {} + labels: {} + serviceAccountName: "" + securityContext: + runAsUser: 999 + runAsGroup: 999 + runAsNonRoot: true + fsGroup: 999 serviceAccountName: "" - # Tenant scrape configuration will be added to prometheus managed by the prometheus-operator. - prometheusOperator: false - # Enable JSON, Anonymous logging for MinIO tenants. - # Refer https://github.com/minio/operator/blob/master/pkg/apis/minio.min.io/v2/types.go#L303 - logging: { } - ## serviceMetadata allows passing additional labels and annotations to MinIO and Console specific - ## services created by the operator. - serviceMetadata: { } - ## Add environment variables to be set in MinIO container (https://github.com/minio/minio/tree/master/docs/config) - env: { } - ## PriorityClassName indicates the Pod priority and hence importance of a Pod relative to other Pods. - ## This is applied to MinIO pods only. - ## Refer Kubernetes documentation for details https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass/ - priorityClassName: "" + securityContext: + runAsUser: 1000 + runAsGroup: 1000 + runAsNonRoot: true + fsGroup: 1000 + +ingress: + api: + enabled: false + ingressClassName: "" + labels: {} + annotations: {} + tls: [] + host: minio.local + path: / + pathType: Prefix + console: + enabled: false + ingressClassName: "" + labels: {} + annotations: {} + tls: [] + host: minio-console.local + path: / + pathType: Prefix diff --git a/roles/minio_install/defaults/main.yaml b/roles/minio_install/defaults/main.yaml index 4749fd3d..be7141fe 100644 --- a/roles/minio_install/defaults/main.yaml +++ b/roles/minio_install/defaults/main.yaml @@ -18,11 +18,28 @@ minio_application_name: "minio" # MinIO Main Application Name minio_operator_namespace: "minio-operator" # MinIO Operator/Console namespace minio_tenant_namespace: "minio-tenant" # MinIO Sample Tenant namespace +minio_git_url: "https://github.com/minio/minio.git" +minio_git_tag: "RELEASE.2022-08-22T23-53-06Z" +minio_local_build_dir: "{{ (project_root_dir, 'minio') | path_join }}" +minio_local_build_name: "minio" + +minio_build_image_locally: true + minio_operator_release_name: "minio-operator" # MinIO Operator/Console Helm Charts release name minio_tenant_release_name: "minio-tenant" # MinIO Tenant Helm Charts release name +minio_storage_controller_key: storage # MinIO controller key +minio_storage_controller_value: minio_controller # MinIO controller value + minio_storage_worker_key: storage # MinIO worker key +minio_multus_selector_key: app # MinIO multus selector key minio_storage_worker_value: minio # MinIO worker value minio_sriov_network_name_prefix: minio-sriov # MinIO Tenants SriovNetwork name prefix -minio_sriov_network_deviceType: netdevice # MinIO Tenants SriovNetwork device type +minio_sriov_network_devicetype: netdevice # MinIO Tenants SriovNetwork device type + +minio_log_postgres_name: 'minio_log_postgres' # MinIO Log Postgress container name +minio_log_postgres_image_download_url: "library/postgres:13" # MinIO Log DB image download URL +minio_log_postgres_local_image_name: "postgres" # MinIO Log DB local build image name +minio_log_postgres_local_image_tag: "minio" # MinIO Log DB local build tag +minio_log_huge_pages: "off" # MinIO Log DB huge_pages settings: try, on, and off diff --git a/roles/minio_install/files/main.py b/roles/minio_install/files/main.py deleted file mode 100755 index 4d2c3e29..00000000 --- a/roles/minio_install/files/main.py +++ /dev/null @@ -1,62 +0,0 @@ -#!/usr/bin/env python3 - -import ast -from ruamel import yaml - -def nostr(d): - def tr(s): - s = s.strip() - try: - return int(s) - except ValueError: - return s - - if isinstance(d, dict): - for k in d: - d[k] = nostr(d[k]) - return d - elif isinstance(d, list): - for idx, k in enumerate(d): - d[idx] = nostr(k) - return d - return tr(d) - -def generate_k8s_service_patch(): - # define text file to open - with open('parsed-ips-result.txt', 'r') as my_file: - data = my_file.read() - - # display content of text file - ips = [] - listdata = ast.literal_eval(data) - - for lst in listdata: - ip = [] - has_name = False - for str in lst: - # if str is in ['name', 'ips', 'pod-name']: - if str == 'name': - if has_name == True: - ips.append(ip) - ip = [] - has_name = True - ip.append(str) - ips.append(ip) - # copy last 2 elements into previous lists - num_ips_list = len(ips) - for x in range(num_ips_list - 1): - if len(ips[x]) != len(ip): - ips[x].append(ip[-2]) - ips[x].append(ip[-1]) - # convert to key, value - result = [] - for lst in ips: - result.append(dict(zip(lst[::2], lst[1::2]))) - - result2 = {} - result2['ep_data'] = result - - with open("gen-endpoints-data.yaml", "w") as output: - yaml.safe_dump(nostr(result2), output, indent=4, block_seq_indent=2, allow_unicode=False) -if __name__ == '__main__': - generate_k8s_service_patch() diff --git a/roles/minio_install/tasks/add_minio_tenant_endpoints.yml b/roles/minio_install/tasks/add_minio_tenant_endpoints.yml deleted file mode 100644 index 7d9b40d9..00000000 --- a/roles/minio_install/tasks/add_minio_tenant_endpoints.yml +++ /dev/null @@ -1,157 +0,0 @@ -## -## Copyright (c) 2020-2022 Intel Corporation. -## -## Licensed under the Apache License, Version 2.0 (the "License"); -## you may not use this file except in compliance with the License. -## You may obtain a copy of the License at -## -## http://www.apache.org/licenses/LICENSE-2.0 -## -## Unless required by applicable law or agreed to in writing, software -## distributed under the License is distributed on an "AS IS" BASIS, -## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -## See the License for the specific language governing permissions and -## limitations under the License. -## ---- -- name: check MinIO Tenant Helm charts temp directory - stat: - path: "{{ (project_root_dir, 'charts', 'tenant', 'temp') | path_join }}" - register: tenant_temp_dir - -- name: create the temp folder for MinIO Tenant custom values - file: - path: "{{ (project_root_dir, 'charts', 'tenant', 'temp') | path_join }}" - state: directory - mode: 0755 - when: - - not tenant_temp_dir.stat.exists - -- name: wait for MinIO pods to come up - command: >- - kubectl get pods -n={{ minio_tenant_namespace }} --selector app=minio -o json - register: minio_tenant_get_pods - until: minio_tenant_get_pods.stdout | from_json | json_query('items[*].status.phase') | unique == ["Running"] - retries: 120 - delay: 60 - changed_when: true - -- name: gather the deployed MinIO tenant pods definition - shell: >- - set -o pipefail && kubectl get pods -l app=minio -n {{ minio_tenant_namespace }} -o yaml > - "{{ (project_root_dir, 'charts', 'tenant', 'temp', 'tenants_pods.yaml') | path_join }}" - args: - executable: /bin/bash - changed_when: true - -- name: collect MinIO Tenant pods storage network IP addresses and push to controller node - template: - src: "minio_tenant_pods_ips_regex.j2" - dest: "{{ (project_root_dir, 'charts', 'tenant', 'temp', 'minio-tenant-pods-ips.txt') | path_join }}" - force: yes - mode: preserve - -- name: read MinIO Tenant pods definition file contents - command: cat {{ (project_root_dir, 'charts', 'tenant', 'temp', 'tenants_pods.yaml') | path_join }} - register: pod_def_contents - changed_when: true - -- name: read regular expression to parse - command: cat {{ (project_root_dir, 'charts', 'tenant', 'temp', 'minio-tenant-pods-ips.txt') | path_join }} - register: regex_expr - changed_when: true - -- name: search IPs - set_fact: - parsed_result: "{{ pod_def_contents.stdout | regex_findall(regex_expr.stdout) }}" - -- name: save parsed result into a file - copy: - content: "{{ parsed_result }}" - dest: "{{ (project_root_dir, 'charts', 'tenant', 'temp', 'parsed-ips-result.txt') | path_join }}" - backup: yes - mode: 0644 - -- name: copy file python script - copy: - src: "{{ role_path }}/files/main.py" - dest: "{{ (project_root_dir, 'charts', 'tenant', 'temp', 'main.py') | path_join }}" - mode: '0755' - -- name: generate MinIO Endpoints data - command: "{{ (project_root_dir, 'charts', 'tenant', 'temp', 'main.py') | path_join }}" - args: - chdir: "{{ (project_root_dir, 'charts', 'tenant', 'temp') | path_join }}" - changed_when: true - -- name: fetch gen-endpoints-data.yaml to host - fetch: - src: "{{ (project_root_dir, 'charts', 'tenant', 'temp', 'gen-endpoints-data.yaml') | path_join }}" - dest: "{{ ('/tmp', 'tenant', 'gen-endpoints-data.yaml') | path_join }}" - flat: yes - -- name: include variables for MinIO Endpoints - include_vars: "{{ ('/tmp', 'tenant', 'gen-endpoints-data.yaml') | path_join }}" - -- name: collect the deployed MinIO tenant pod names and node names - shell: >- - set -o pipefail && kubectl get pods -n {{ minio_tenant_namespace }} -o wide | grep {{ minio_tenant_namespace }} |awk -F ' ' '{ print $7 ": " $1}' - args: - executable: /bin/bash - changed_when: true - register: tenant_pod_node_info - -- name: set var - set_fact: - pod_node: "{{ tenant_pod_node_info.stdout | from_yaml }}" - -- name: populate MinIO Endpoints with additional networks - template: - src: "minio_tenant_endpoints.yml.j2" - dest: "{{ (project_root_dir, 'charts', 'tenant', 'temp', 'minio-tenant-endpoints' ~ ansible_loop.index ~ '.yml') | path_join }}" - force: yes - mode: preserve - loop: "{{ groups['kube_node'] }}" - loop_control: - extended: yes - -- name: create the initial patch file - copy: - dest: "{{ (project_root_dir, 'charts', 'tenant', 'temp', 'minio-tenant-endpoints-patch.yml') | path_join }}" - content: | - subsets: - - addresses: - mode: '0644' - -- name: merge all section of addresses - shell: >- - cat "{{ (project_root_dir, 'charts', 'tenant', 'temp', 'minio-tenant-endpoints' ~ ansible_loop.index ~ '.yml') | path_join }}" - >> "{{ (project_root_dir, 'charts', 'tenant', 'temp', 'minio-tenant-endpoints-patch.yml') | path_join }}" - loop: "{{ groups['kube_node'] }}" - loop_control: - extended: yes - changed_when: true - -- name: finalize patch file - lineinfile: - path: "{{ (project_root_dir, 'charts', 'tenant', 'temp', 'minio-tenant-endpoints-patch.yml') | path_join }}" - insertafter: EOF - line: >-2 - ports: - - name: http-minio - port: 9000 - protocol: TCP - -- name: get the current endpoints json - shell: >- - set -o pipefail && kubectl get endpoints {{ minio_tenant_namespace }}-hl -n {{ minio_tenant_namespace }} -o yaml > - "{{ (project_root_dir, 'charts', 'tenant', 'temp', 'current-endpoints.yaml') | path_join }}" - args: - executable: /bin/bash - changed_when: true - -- name: patch MinIO Tenants Endpoints - command: >- - kubectl patch endpoints {{ minio_tenant_release_name }}-hl -n {{ minio_tenant_namespace }} - --patch-file "{{ (project_root_dir, 'charts', 'tenant', 'temp', 'minio-tenant-endpoints-patch.yml') | path_join }}" - changed_when: true diff --git a/roles/minio_install/tasks/build_local_minio_image.yml b/roles/minio_install/tasks/build_local_minio_image.yml new file mode 100644 index 00000000..9cb45caa --- /dev/null +++ b/roles/minio_install/tasks/build_local_minio_image.yml @@ -0,0 +1,97 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: clone MinIO repository + git: + repo: "{{ minio_git_url }}" + version: "{{ minio_git_tag }}" + dest: "{{ minio_local_build_dir }}" + force: yes + +- name: modify MinIO Dockerfile content + replace: + path: "{{ (minio_local_build_dir, 'Dockerfile') | path_join }}" + regexp: 'FROM minio/minio:latest' + replace: >- + FROM ubuntu:22.04 + + RUN apt-get update && apt-get install -y --no-install-recommends \ + apt-transport-https \ + awscli \ + bash \ + bmon \ + bwm-ng \ + ca-certificates \ + curl \ + dnsutils \ + gnupg \ + iperf \ + iproute2 \ + iptables \ + iputils-ping \ + net-tools \ + slurm \ + tar \ + tcpdump \ + tcptrack \ + unzip \ + && rm -rf /var/lib/apt/lists/* + +# docker is used as container runtime: +- name: prepare containers images + block: + - name: compile local MinIO + make: + chdir: "{{ minio_local_build_dir }}" + when: minio_build_image_locally + + - name: build local MinIO custom image + command: >- + docker build -f Dockerfile -t {{ registry_local_address }}/{{ minio_local_build_name }}:{{ minio_git_tag }} + args: + chdir: "{{ minio_local_build_dir }}" + changed_when: true + when: minio_build_image_locally + + - name: push the local MinIO custom images to local registry + command: >- + docker push {{ registry_local_address }}/{{ minio_local_build_name }}:{{ minio_git_tag }} + when: minio_build_image_locally + when: + - container_runtime == "docker" + +# containerd/cri-o is used as container runtime: +- name: prepare containers images + block: + - name: compile local MinIO + make: + chdir: "{{ minio_local_build_dir }}" + when: minio_build_image_locally + + - name: build local MinIO custom image + command: >- + podman build -f Dockerfile -t {{ registry_local_address }}/{{ minio_local_build_name }}:{{ minio_git_tag }} + args: + chdir: "{{ minio_local_build_dir }}" + changed_when: true + when: minio_build_image_locally + + - name: push the local MinIO custom images to local registry + command: >- + podman push {{ registry_local_address }}/{{ minio_local_build_name }}:{{ minio_git_tag }} + when: minio_build_image_locally + when: + - container_runtime is in ['containerd', 'crio'] diff --git a/roles/minio_install/tasks/build_local_postgress_image.yml b/roles/minio_install/tasks/build_local_postgress_image.yml new file mode 100644 index 00000000..c3c6f344 --- /dev/null +++ b/roles/minio_install/tasks/build_local_postgress_image.yml @@ -0,0 +1,143 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: check MinIO Tenant Helm charts temp directory + stat: + path: "{{ (project_root_dir, 'charts', 'tenant', 'temp') | path_join }}" + register: tenant_temp_dir + +- name: create the temp folder for MinIO Tenant custom values + file: + path: "{{ (project_root_dir, 'charts', 'tenant', 'temp') | path_join }}" + state: directory + mode: 0755 + when: + - not tenant_temp_dir.stat.exists + +- name: generator random + shell: >- + set -o pipefail && + head /dev/urandom | tr -dc A-Za-z0-9 | head -c 8 + | kubectl create secret generic postgres --dry-run=client --from-file=password=/dev/stdin -o json + | jq '.data.password' + args: + executable: /bin/bash + changed_when: true + register: minio_log_postgres_password + +- name: set password variable + set_fact: + postgres_password: "{{ minio_log_postgres_password.stdout }}" + +- name: block for podman CLI + block: + - name: run postgres container + containers.podman.podman_container: + name: "{{ minio_log_postgres_name }}" + image: "{{ minio_log_postgres_image_download_url }}" + state: started + detach: yes + env: + POSTGRES_PASSWORD: "{{ postgres_password }}" + register: container_output + + - name: disable huge_pages of the container + command: >- + podman exec -it {{ minio_log_postgres_name }} sed -i -r 's/#huge_pages.*?/huge_pages = off/g' /usr/share/postgresql/postgresql.conf.sample + changed_when: true + + - name: output huge_pages setting of the container + shell: >- + podman exec -it {{ minio_log_postgres_name }} cat /usr/share/postgresql/postgresql.conf.sample | grep huge_pages + changed_when: true + + - name: commit the container + command: >- + podman commit {{ minio_log_postgres_name }} + register: podman_commit_id + changed_when: true + + - name: tag the container with the change to the new local image + command: >- + podman tag {{ podman_commit_id.stdout }} {{ registry_local_address }}/{{ minio_log_postgres_local_image_name }}:{{ minio_log_postgres_local_image_tag }} + changed_when: true + + - name: push the new local image to the local registry + command: >- + podman push {{ registry_local_address }}/{{ minio_log_postgres_local_image_name }}:{{ minio_log_postgres_local_image_tag }} + changed_when: true + + - name: stop the container + command: >- + podman stop {{ minio_log_postgres_name }} + changed_when: true + + - name: remove the container + command: >- + podman rm -f {{ minio_log_postgres_name }} + changed_when: true + when: + - container_runtime in ["crio", "containerd"] + +- name: block for docker CLI + block: + - name: run postgres container + community.docker.docker_container: + name: "{{ minio_log_postgres_name }}" + image: "{{ minio_log_postgres_image_download_url }}" + state: started + detach: yes + env: + POSTGRES_PASSWORD: "{{ postgres_password }}" + register: container_output + + - name: disable huge_pages of the container + command: >- + docker exec -it {{ minio_log_postgres_name }} sed -i -r 's/#huge_pages.*?/huge_pages = off/g' /usr/share/postgresql/postgresql.conf.sample + changed_when: true + + - name: output huge_hages setting of the container + shell: >- + docker exec -it {{ minio_log_postgres_name }} cat /usr/share/postgresql/postgresql.conf.sample | grep huge_pages + changed_when: true + + - name: commit the container + command: >- + docker commit {{ minio_log_postgres_name }} + register: docker_commit_id + changed_when: true + + - name: tag the container with the change into the new local image + command: >- + docker tag {{ docker_commit_id.stdout }} {{ registry_local_address }}/{{ minio_log_postgres_local_image_name }}:{{ minio_log_postgres_local_image_tag }} + changed_when: true + + - name: push the new local image to the local registry + command: >- + docker push {{ registry_local_address }}/{{ minio_log_postgres_local_image_name }}:{{ minio_log_postgres_local_image_tag }} + changed_when: true + + - name: stop the container + command: >- + docker stop {{ minio_log_postgres_name }} + changed_when: true + + - name: remove the container + command: >- + docker rm -f {{ minio_log_postgres_name }} + changed_when: true + when: + - container_runtime in ["docker"] diff --git a/roles/minio_install/tasks/build_minio_tenant_sriovnetwork.yml b/roles/minio_install/tasks/build_minio_tenant_sriovnetwork.yml index 66b25c5d..8b48b4df 100644 --- a/roles/minio_install/tasks/build_minio_tenant_sriovnetwork.yml +++ b/roles/minio_install/tasks/build_minio_tenant_sriovnetwork.yml @@ -50,7 +50,7 @@ set -o pipefail && kubectl -n {{ sriov_network_operator_namespace }} get sriovnetworknodepolicy -o json | jq '.items[] | select(.spec.nodeSelector["kubernetes.io/hostname"] == "{{ hostvars[groups['kube_node'][0]]['minio_interfaces'][ansible_loop.index0]['nodename'] }}" - and .spec.deviceType == "{{ minio_sriov_network_deviceType }}") + and .spec.deviceType == "{{ minio_sriov_network_devicetype }}") | select(.spec.nicSelector.pfNames[0] | startswith("{{ hostvars[groups['kube_node'][0]]['minio_interfaces'][ansible_loop.index0]['minio_pf_name'] }}")) | {resourceName: .spec.resourceName}' args: diff --git a/roles/minio_install/tasks/build_minio_variables.yml b/roles/minio_install/tasks/build_minio_variables.yml index 77a4cbb4..debd4c0b 100644 --- a/roles/minio_install/tasks/build_minio_variables.yml +++ b/roles/minio_install/tasks/build_minio_variables.yml @@ -27,92 +27,92 @@ minio_tenant_sriov_vfs_list: [] - block: - - name: collect the minio_vfs of nodes - set_fact: - minio_tenant_sriov_vfs: "{{ hostvars[inventory_hostname]['dataplane_interfaces'] | map(attribute='minio_vf') | list | length }}" - when: - - inventory_hostname in groups['kube_node'] + - name: collect the minio_vfs of nodes + set_fact: + minio_tenant_sriov_vfs: "{{ hostvars[inventory_hostname]['dataplane_interfaces'] | map(attribute='minio_vf') | list | length }}" + when: + - inventory_hostname in groups['kube_node'] - - name: insert nodename into temp_minio_interfaces - set_fact: - temp_minio_interfaces: "{{ temp_minio_interfaces | default([]) + [item | combine({'nodename': minio_hostname})] }}" - vars: - minio_hostname: "{{ inventory_hostname }}" - loop: "{{ hostvars[inventory_hostname]['dataplane_interfaces'] }}" - when: - - inventory_hostname in groups['kube_node'] + - name: insert nodename into temp_minio_interfaces + set_fact: + temp_minio_interfaces: "{{ temp_minio_interfaces | default([]) + [item | combine({'nodename': minio_hostname})] }}" + vars: + minio_hostname: "{{ inventory_hostname }}" + loop: "{{ hostvars[inventory_hostname]['dataplane_interfaces'] }}" + when: + - inventory_hostname in groups['kube_node'] - - name: collect PF name per minio_interface - include_tasks: collect_minio_pf_name.yml - loop: "{{ hostvars[inventory_hostname]['temp_minio_interfaces'] }}" - when: - - inventory_hostname in groups['kube_node'] + - name: collect PF name per minio_interface + include_tasks: collect_minio_pf_name.yml + loop: "{{ hostvars[inventory_hostname]['temp_minio_interfaces'] }}" + when: + - inventory_hostname in groups['kube_node'] - - name: update PF name per minio_interfaces - set_fact: - minio_interfaces: "{{ minio_interfaces | default([]) + [item | combine({'minio_pf_name': minio_pf_name_list[ansible_loop.index0]})] }}" - loop: "{{ hostvars[inventory_hostname]['temp_minio_interfaces'] }}" - loop_control: - extended: yes - when: - - inventory_hostname in groups['kube_node'] + - name: update PF name per minio_interfaces + set_fact: + minio_interfaces: "{{ minio_interfaces | default([]) + [item | combine({'minio_pf_name': minio_pf_name_list[ansible_loop.index0]})] }}" + loop: "{{ hostvars[inventory_hostname]['temp_minio_interfaces'] }}" + loop_control: + extended: yes + when: + - inventory_hostname in groups['kube_node'] - - name: set sriov vfs over all nodes - set_fact: - minio_vfs: "{{ hostvars[inventory_hostname]['minio_interfaces'] | selectattr('minio_vf', '==', True) }}" - when: - - inventory_hostname in groups['kube_node'] + - name: set sriov vfs over all nodes + set_fact: + minio_vfs: "{{ hostvars[inventory_hostname]['minio_interfaces'] | selectattr('minio_vf', '==', True) }}" + when: + - inventory_hostname in groups['kube_node'] - - name: set total_num_minio_vfs - set_fact: - total_num_minio_vfs: "{{ total_num_minio_vfs | default([]) + [hostvars[item]['minio_vfs'] | length] }}" - loop: "{{ groups['kube_node'] }}" - when: - - inventory_hostname in groups['kube_control_plane'][0] + - name: set total_num_minio_vfs + set_fact: + total_num_minio_vfs: "{{ total_num_minio_vfs | default([]) + [hostvars[item]['minio_vfs'] | length] }}" + loop: "{{ groups['kube_node'] }}" + when: + - inventory_hostname in groups['kube_control_plane'][0] - - name: check minio_vf settings per node are identical - assert: - that: - - total_num_minio_vfs | unique | length == 1 - msg: | - Incorrect configuration of 'minio_vf: true' in dataplane_interfaces for MinIO install. - Make sure the number of 'minio_vf: true' per node has to be same over all nodes. - Or, check if there is any missing 'minio_vf: true' or 'minio_vf: false' in dataplane_interfaces. - {{ hostvars[item]['minio_interfaces'] }} - loop: "{{ groups['kube_node'] }}" - when: - - inventory_hostname in groups['kube_control_plane'][0] + - name: check minio_vf settings per node are identical + assert: + that: + - total_num_minio_vfs | unique | length == 1 + msg: | + Incorrect configuration of 'minio_vf: true' in dataplane_interfaces for MinIO install. + Make sure the number of 'minio_vf: true' per node has to be same over all nodes. + Or, check if there is any missing 'minio_vf: true' or 'minio_vf: false' in dataplane_interfaces. + {{ hostvars[item]['minio_interfaces'] }} + loop: "{{ groups['kube_node'] }}" + when: + - inventory_hostname in groups['kube_control_plane'][0] - - name: filter minio_pf_name over all nodes - set_fact: - minio_pfs: "{{ hostvars[inventory_hostname]['minio_interfaces'] | json_query(the_var) }}" - vars: - the_var: "[?minio_pf_name].minio_pf_name" - when: - - inventory_hostname in groups['kube_node'] + - name: filter minio_pf_name over all nodes + set_fact: + minio_pfs: "{{ hostvars[inventory_hostname]['minio_interfaces'] | json_query(the_var) }}" + vars: + the_var: "[?minio_pf_name].minio_pf_name" + when: + - inventory_hostname in groups['kube_node'] - - name: combine all minio_pfs from nodes - set_fact: - total_minio_pfs: "{{ total_minio_pfs | default([]) + [hostvars[item]['minio_pfs']] }}" - loop: "{{ groups['kube_node'] }}" - when: - - inventory_hostname in groups['kube_control_plane'][0] + - name: combine all minio_pfs from nodes + set_fact: + total_minio_pfs: "{{ total_minio_pfs | default([]) + [hostvars[item]['minio_pfs']] }}" + loop: "{{ groups['kube_node'] }}" + when: + - inventory_hostname in groups['kube_control_plane'][0] - - name: zip each item from total_minio_pfs to make sure each list has same minio_pfs name - set_fact: - same_minio_vfs: "{{ (total_minio_pfs | first) | zip (*total_minio_pfs[1:]) }}" - when: - - inventory_hostname in groups['kube_control_plane'][0] + - name: zip each item from total_minio_pfs to make sure each list has same minio_pfs name + set_fact: + same_minio_vfs: "{{ (total_minio_pfs | first) | zip (*total_minio_pfs[1:]) }}" + when: + - inventory_hostname in groups['kube_control_plane'][0] - - name: check minio_vf resource name per node are identical - assert: - that: - - "item | unique | length == 1" - msg: | - "Inconsitent PF name over nodes in dataplane_interfaces for MinIO install." - "Make sure the PF names of 'minio_vf: true' over nodes should be same." - "{{ item }}" - loop: "{{ same_minio_vfs }}" - when: - - inventory_hostname == groups['kube_control_plane'][0] + - name: check minio_vf resource name per node are identical + assert: + that: + - "item | unique | length == 1" + msg: | + "Inconsitent PF name over nodes in dataplane_interfaces for MinIO install." + "Make sure the PF names of 'minio_vf: true' over nodes should be same." + "{{ item }}" + loop: "{{ same_minio_vfs }}" + when: + - inventory_hostname == groups['kube_control_plane'][0] any_errors_fatal: true diff --git a/roles/minio_install/tasks/cleanup_minio_filesystems.yml b/roles/minio_install/tasks/cleanup_minio_filesystems.yml index ffc7b8ca..8f96a9c9 100644 --- a/roles/minio_install/tasks/cleanup_minio_filesystems.yml +++ b/roles/minio_install/tasks/cleanup_minio_filesystems.yml @@ -19,14 +19,14 @@ - name: set iteration number set_fact: num: 1 - req_num: "{{ hostvars[inventory_hostname]['minio_pv'] |length }}" + req_num: "{{ hostvars[inventory_hostname]['minio_pv'] | length }}" - name: umount volumes command: >- umount --lazy "{{ item.mountPath }}" loop: "{{ hostvars[inventory_hostname]['minio_pv'] }}" when: - - hostvars[inventory_hostname]['minio_pv'] is defined and hostvars[inventory_hostname]['minio_pv']|length > 0 + - hostvars[inventory_hostname]['minio_pv'] is defined and hostvars[inventory_hostname]['minio_pv'] | length > 0 changed_when: false failed_when: false @@ -37,7 +37,7 @@ state: absent loop: "{{ hostvars[inventory_hostname]['minio_pv'] }}" when: - - hostvars[inventory_hostname]['minio_pv'] is defined and hostvars[inventory_hostname]['minio_pv']|length > 0 + - hostvars[inventory_hostname]['minio_pv'] is defined and hostvars[inventory_hostname]['minio_pv'] | length > 0 changed_when: false failed_when: false @@ -71,7 +71,7 @@ file: path: "{{ ('/tmp', 'diskimage' ~ ansible_loop.index) | path_join }}" state: absent - loop: "{{ range(num, req_num|int + 1) | list }}" + loop: "{{ range(num, req_num | int + 1) | list }}" loop_control: extended: yes changed_when: false diff --git a/roles/minio_install/tasks/cleanup_minio_main.yml b/roles/minio_install/tasks/cleanup_minio_main.yml index 31d709dc..1e20bf97 100644 --- a/roles/minio_install/tasks/cleanup_minio_main.yml +++ b/roles/minio_install/tasks/cleanup_minio_main.yml @@ -30,11 +30,45 @@ name: minio_install tasks_from: cleanup_minio_sriovnetwork + - name: load MinIO tenant multus service variables + include_vars: "{{ item }}" + with_first_found: + - files: + - "main.yml" + paths: + - "{{ (role_path, '..', 'multus_service', 'defaults') | path_join }}" + + - name: cleanup MinIO tenant multus helmchart + include_role: + name: minio_install + tasks_from: cleanup_minio_multus_service_helmchart + + - name: load MinIO tenant whereabouts service variables + include_vars: "{{ item }}" + with_first_found: + - files: + - "main.yml" + paths: + - "{{ (role_path, '..', 'whereabouts_install', 'defaults') | path_join }}" + - name: cleanup MinIO tenant whereabouts helmchart include_role: name: minio_install tasks_from: cleanup_minio_whereabouts_helmchart + - name: load MinIO tenant ingress controller variables + include_vars: "{{ item }}" + with_first_found: + - files: + - "main.yml" + paths: + - "{{ (role_path, '..', 'kubernetes_ingress_install', 'defaults') | path_join }}" + + - name: cleanup MinIO tenant ingress controller helmchart + include_role: + name: kubernetes_ingress_install + tasks_from: cleanup_kubernetes_ingress + - name: cleanup MinIO file systems import_tasks: cleanup_minio_filesystems.yml diff --git a/roles/minio_install/tasks/cleanup_minio_multus_service_helmchart.yml b/roles/minio_install/tasks/cleanup_minio_multus_service_helmchart.yml new file mode 100755 index 00000000..54a0902b --- /dev/null +++ b/roles/minio_install/tasks/cleanup_minio_multus_service_helmchart.yml @@ -0,0 +1,23 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: delete Multus Service Helm Charts + command: >- + helm delete {{ multus_service_release_name }} --namespace {{ multus_service_namespace }} + when: + - inventory_hostname == groups['kube_control_plane'][0] + changed_when: false + failed_when: false diff --git a/roles/minio_install/tasks/cleanup_minio_whereabouts_helmchart.yml b/roles/minio_install/tasks/cleanup_minio_whereabouts_helmchart.yml index 114b97d6..a4c665f4 100644 --- a/roles/minio_install/tasks/cleanup_minio_whereabouts_helmchart.yml +++ b/roles/minio_install/tasks/cleanup_minio_whereabouts_helmchart.yml @@ -16,7 +16,7 @@ --- - name: delete Whereabouts Helm Charts command: >- - helm delete whereabouts --namespace kube-system + helm delete {{ whereabouts_release_name }} --namespace {{ whereabouts_release_namespace }} when: - inventory_hostname == groups['kube_control_plane'][0] changed_when: false diff --git a/roles/minio_install/tasks/create_blockdevicefiles.yml b/roles/minio_install/tasks/create_blockdevicefiles.yml index eb06ebc9..5303642d 100644 --- a/roles/minio_install/tasks/create_blockdevicefiles.yml +++ b/roles/minio_install/tasks/create_blockdevicefiles.yml @@ -19,14 +19,7 @@ dd if=/dev/zero of=/tmp/diskimage{{ ansible_loop.index }} bs=1M - count= - {%- if hostvars[inventory_hostname]['minio_pv'][ansible_loop.index0].capacity.endswith('GiB') -%} - "{{ hostvars[inventory_hostname]['minio_pv'][ansible_loop.index0].capacity[:-3] |int * 1024 }}" - {%- elif hostvars[inventory_hostname]['minio_pv'][ansible_loop.index0].capacity.endswith('TiB') -%} - "{{ hostvars[inventory_hostname]['minio_pv'][ansible_loop.index0].capacity[:-3]|int * 1024 * 1024 }}" - {%- else -%} - "{{ hostvars[inventory_hostname]['minio_pv'][ansible_loop.index0].capacity }}" - {%- endif -%} + count="{{ minio_tenant_volume_size |int * 1024 }}" changed_when: true - name: create mount point for the file block devices diff --git a/roles/minio_install/tasks/create_minio_multus_service.yml b/roles/minio_install/tasks/create_minio_multus_service.yml new file mode 100755 index 00000000..ad3fa670 --- /dev/null +++ b/roles/minio_install/tasks/create_minio_multus_service.yml @@ -0,0 +1,45 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: check MinIO Tenant Helm charts temp directory + stat: + path: "{{ (project_root_dir, 'charts', 'tenant', 'temp') | path_join }}" + register: tenant_temp_dir + +- name: create the temp folder for MinIO Tenant custom values + file: + path: "{{ (project_root_dir, 'charts', 'tenant', 'temp') | path_join }}" + state: directory + mode: 0755 + when: + - not tenant_temp_dir.stat.exists + +- name: populate MinIO Tenant Multus Service configuration + template: + src: "minio_tenant_multus_services.yml.j2" + dest: "{{ (project_root_dir, 'charts', 'tenant', 'temp', 'minio-tenant-multus-service.yml') | path_join }}" + force: yes + mode: preserve + +# - name: apply MinIO Tenant Multus Serivce +# k8s: +# state: present +# src: "{{ (project_root_dir, 'charts', 'tenant', 'temp', 'minio-tenant-multus-service.yml') | path_join }}" + +- name: apply MinIO Tenant Multus Serivce + command: >- + kubectl apply -f {{ (project_root_dir, 'charts', 'tenant', 'temp', 'minio-tenant-multus-service.yml') | path_join }} + changed_when: true diff --git a/roles/minio_install/tasks/create_nvme_partition.yml b/roles/minio_install/tasks/create_nvme_partition.yml index 2c9ee119..5a8cc8d4 100644 --- a/roles/minio_install/tasks/create_nvme_partition.yml +++ b/roles/minio_install/tasks/create_nvme_partition.yml @@ -21,8 +21,8 @@ label: gpt number: 1 part_type: primary - part_end: "{{ hostvars[inventory_hostname]['minio_pv'][ansible_loop.index0].capacity }}" - flags: [ lvm ] + part_end: "{{ minio_tenant_volume_size }}GiB" + flags: [lvm] state: present - name: format the partition diff --git a/roles/minio_install/tasks/file_blockdevice.yml b/roles/minio_install/tasks/file_blockdevice.yml index 97bd6b7f..62c82145 100644 --- a/roles/minio_install/tasks/file_blockdevice.yml +++ b/roles/minio_install/tasks/file_blockdevice.yml @@ -17,28 +17,28 @@ - name: set iteration number set_fact: num: 1 - req_num: "{{ hostvars[inventory_hostname]['minio_pv'] |length }}" + req_num: "{{ hostvars[inventory_hostname]['minio_pv'] | length }}" - name: create local file block device include_tasks: create_blockdevicefiles.yml - loop: "{{ range(num, req_num|int + 1)|list }}" + loop: "{{ range(num, req_num|int + 1) | list }}" loop_control: extended: yes - name: format with xfs file block devices include_tasks: format_blockdevicefiles.yml - loop: "{{ range(num, req_num|int + 1)|list }}" + loop: "{{ range(num, req_num|int + 1) | list }}" loop_control: extended: yes - name: setup the loop devices include_tasks: setup_loopdevices.yml - loop: "{{ range(num, req_num|int + 1)|list }}" + loop: "{{ range(num, req_num|int + 1) | list }}" loop_control: extended: yes - name: mount the loop devices include_tasks: mount_loopdevices.yml - loop: "{{ range(num, req_num|int + 1)|list }}" + loop: "{{ range(num, req_num|int + 1) | list }}" loop_control: extended: yes diff --git a/roles/minio_install/tasks/main.yml b/roles/minio_install/tasks/main.yml index d83e2eaf..4bc562e3 100644 --- a/roles/minio_install/tasks/main.yml +++ b/roles/minio_install/tasks/main.yml @@ -14,6 +14,12 @@ ## limitations under the License. ## --- +- name: install dependencies + include_role: + name: install_dependencies + when: + - inventory_hostname == groups['kube_control_plane'][0] + - name: collect the number of nodes set_fact: minio_nodes: "{{ groups['kube_node'] | length }}" @@ -36,10 +42,14 @@ when: - minio_nodes | int >= minio_tenant_servers -- name: install dependencies +- name: install dependencies - whereabouts include_role: name: whereabouts_install +- name: install dependencies - kubernetes_ingress + include_role: + name: kubernetes_ingress_install + - name: install MinIO operator/console import_tasks: minio_operator.yml when: @@ -99,6 +109,20 @@ - inventory_hostname == groups['kube_control_plane'][0] - minio_vfs is defined and minio_vfs | length | int > 0 +- name: build MinIO tenant Postgress DB image + import_tasks: build_local_postgress_image.yml + when: + - minio_tenant_enabled + - minio_nodes | int >= minio_tenant_servers + - inventory_hostname == groups['kube_control_plane'][0] + +- name: build MinIO local image + import_tasks: build_local_minio_image.yml + when: + - minio_tenant_enabled + - minio_nodes | int >= minio_tenant_servers + - inventory_hostname == groups['kube_control_plane'][0] + - name: install MinIO tenant import_tasks: minio_tenant.yml when: @@ -106,12 +130,14 @@ - minio_nodes | int >= minio_tenant_servers - inventory_hostname == groups['kube_control_plane'][0] +- name: install Multus Service Helmchart + include_role: + name: multus_service -- name: install additional Endpoints for MinIO Sample Tenant pods - import_tasks: add_minio_tenant_endpoints.yml +- name: create MinIO storage Multus Service + import_tasks: create_minio_multus_service.yml when: - minio_tenant_enabled - minio_nodes | int >= minio_tenant_servers - inventory_hostname == groups['kube_control_plane'][0] - - minio_tenant_sriov_resources is defined and minio_tenant_sriov_resources | length > 0 - minio_vfs is defined and minio_vfs | length | int > 0 diff --git a/roles/minio_install/tasks/minio_operator.yml b/roles/minio_install/tasks/minio_operator.yml index 17aef4d7..1c437745 100644 --- a/roles/minio_install/tasks/minio_operator.yml +++ b/roles/minio_install/tasks/minio_operator.yml @@ -33,7 +33,7 @@ dest: "{{ (project_root_dir, 'charts') | path_join }}" mode: 0755 -- name: check MinIO Operator Helm charts temp directory. +- name: check MinIO Operator Helm charts temp directory stat: path: "{{ (project_root_dir, 'charts', 'operator', 'temp') | path_join }}" register: operator_temp_dir @@ -60,19 +60,44 @@ kind: Namespace state: present -- name: label minio on nodes +- name: label minio on controller nodes + command: >- + kubectl label --overwrite nodes {{ hostvars[item]['ansible_hostname'] }} + {{ minio_storage_controller_key }}={{ minio_storage_controller_value }} + loop: "{{ groups['kube_control_plane'] }}" + changed_when: true + +- name: label minio on worker nodes command: >- kubectl label --overwrite nodes {{ hostvars[item]['ansible_hostname'] }} {{ minio_storage_worker_key }}={{ minio_storage_worker_value }} loop: "{{ groups['kube_node'] }}" changed_when: true +# https://github.com/minio/operator/blob/master/docs/upgrade.md#upgrade-minio-operator-via-helm-charts +- name: create CRD (tenants.minio.min.io) + k8s: + src: "{{ (project_root_dir, 'charts', 'operator', 'templates', 'minio.min.io_tenants.yaml') | path_join }}" + state: present + apply: true + +- name: ensure current version of tenants.minio.min.io CRD includes the labels + command: >- + kubectl label crd tenants.minio.min.io app.kubernetes.io/managed-by=Helm --overwrite + changed_when: true + +- name: ensure current version of tenants.minio.min.io CRD includes the annotations + command: >- + kubectl annotate crd tenants.minio.min.io + meta.helm.sh/release-name={{ minio_operator_release_name }} + meta.helm.sh/release-namespace={{ minio_operator_namespace }} --overwrite + changed_when: true + - name: install MinIO Operator Helm charts command: >- - helm install + helm upgrade --install {{ minio_operator_release_name }} {{ (project_root_dir, 'charts/operator') | path_join }} --namespace {{ minio_operator_namespace }} - --set installCRDs=true -f {{ (project_root_dir, 'charts', 'operator', 'temp', 'minio-operator-custom-values.yml') | path_join }} changed_when: true diff --git a/roles/minio_install/tasks/nvme_blockdevice.yml b/roles/minio_install/tasks/nvme_blockdevice.yml index 4b659c51..9968be6a 100644 --- a/roles/minio_install/tasks/nvme_blockdevice.yml +++ b/roles/minio_install/tasks/nvme_blockdevice.yml @@ -17,10 +17,10 @@ - name: set iteration number set_fact: num: 1 - req_num: "{{ hostvars[inventory_hostname]['minio_pv'] |length }}" + req_num: "{{ hostvars[inventory_hostname]['minio_pv'] | length }}" - name: configuring nvme block device partition include_tasks: create_nvme_partition.yml - loop: "{{ range(num, req_num|int + 1)|list }}" + loop: "{{ range(num, req_num|int + 1) | list }}" loop_control: extended: yes diff --git a/roles/minio_install/tasks/preflight_minio_config.yml b/roles/minio_install/tasks/preflight_minio_config.yml index 56d09d02..594c50f9 100644 --- a/roles/minio_install/tasks/preflight_minio_config.yml +++ b/roles/minio_install/tasks/preflight_minio_config.yml @@ -21,8 +21,8 @@ that: "{{ minio_tenant_servers | int }} <= {{ groups['kube_node'] | length | int }}" msg: | "Incorrect configuration." - "The number of MinIO tenant servers '{{ minio_tenant_servers |int }}' defined in group vars must be" - "less or equal to the number of nodes '{{ groups['kube_node']|length |int }}'" + "The number of MinIO tenant servers '{{ minio_tenant_servers | int }}' defined in group vars must be" + "less or equal to the number of nodes '{{ groups['kube_node'] | length | int }}'" - name: make sure the MinIO tenant volumes per server >= the MiniO PV list assert: diff --git a/roles/minio_install/tasks/prepare_minio_tenant_sriovnetwork.yml b/roles/minio_install/tasks/prepare_minio_tenant_sriovnetwork.yml index 1acfb0c3..ebdad52a 100644 --- a/roles/minio_install/tasks/prepare_minio_tenant_sriovnetwork.yml +++ b/roles/minio_install/tasks/prepare_minio_tenant_sriovnetwork.yml @@ -51,7 +51,7 @@ shell: >- cat "{{ (project_root_dir, 'charts', 'tenant', 'temp', 'minio-storage-networks' ~ ansible_loop.index ~ '.yml') | path_join }}" >> "{{ (project_root_dir, 'charts', 'tenant', 'temp', 'minio-storage-networks.yml') | path_join }}" - loop: "{{ range(num, req_num|int + 1)|list }}" + loop: "{{ range(num, req_num | int + 1) | list }}" loop_control: extended: yes changed_when: true @@ -75,6 +75,6 @@ - name: load minio_tenant_sriov_resources for jinja template set_fact: - minio_tenant_sriov_resources : "{{ minio_tenant_sriov_resources['minio_tenant_sriov_resources'] }}" + minio_tenant_sriov_resources: "{{ minio_tenant_sriov_resources['minio_tenant_sriov_resources'] }}" when: - inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/minio_install/templates/minio_operator_custom_values.yml.j2 b/roles/minio_install/templates/minio_operator_custom_values.yml.j2 index 4893b2c8..c9cfb9c7 100644 --- a/roles/minio_install/templates/minio_operator_custom_values.yml.j2 +++ b/roles/minio_install/templates/minio_operator_custom_values.yml.j2 @@ -1,4 +1,5 @@ ---- +# Default values for minio-operator. + operator: ## Setup environment variables for the Operator # env: @@ -10,9 +11,17 @@ operator: # value: "" image: repository: {{ minio_operator_image | default("minio/operator") }} - tag: {{ minio_operator_version | default("v4.4.1")}} + tag: {{ minio_operator_version | default("v4.4.28")}} pullPolicy: IfNotPresent + imagePullSecrets: [] + initcontainers: [] replicaCount: {{ minio_tenant_servers }} + securityContext: + runAsUser: 1000 + runAsGroup: 1000 + runAsNonRoot: true + fsGroup: 1000 + nodeSelector: {} affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: @@ -22,8 +31,8 @@ operator: operator: In values: - minio - tolerations: [ ] - topologySpreadConstraints: [ ] + tolerations: [] + topologySpreadConstraints: [] resources: requests: cpu: 200m @@ -33,10 +42,12 @@ operator: console: image: repository: {{ minio_console_image | default("minio/console") }} - tag: {{ minio_console_version | default("v0.13.2")}} + tag: {{ minio_console_version | default("v0.19.4")}} pullPolicy: IfNotPresent + imagePullSecrets: [] + initcontainers: [] replicaCount: 1 - affinity: + nodeSelector: {} affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: @@ -46,11 +57,20 @@ console: operator: In values: - minio + tolerations: [] + topologySpreadConstraints: [] + resources: {} + securityContext: + runAsUser: 1000 + runAsNonRoot: true ingress: enabled: false ingressClassName: "" - labels: { } - annotations: { } - tls: [ ] + labels: {} + annotations: {} + tls: [] host: console.local path: / + pathType: Prefix + volumes: [] + volumeMounts: [] diff --git a/roles/minio_install/templates/minio_tenant_custom_values.yml.j2 b/roles/minio_install/templates/minio_tenant_custom_values.yml.j2 index 52261984..6c05f02f 100644 --- a/roles/minio_install/templates/minio_tenant_custom_values.yml.j2 +++ b/roles/minio_install/templates/minio_tenant_custom_values.yml.j2 @@ -1,88 +1,341 @@ ---- +## Secret with credentials to be used by MinIO Tenant +secrets: + # create a kubernetes configuration secret object with the accessKey and secretKey as defined here. + name: {{ minio_tenant_release_name }}-env-configuration + accessKey: minio + secretKey: minio123 + ## MinIO Tenant Definition -tenants: +tenant: # Tenant name - - name: minio-tenant - ## Registry location and Tag to download MinIO Server image - image: - repository: quay.io/minio/minio - tag: RELEASE.2022-01-04T07-41-07Z - pullPolicy: IfNotPresent - ## Customize namespace for tenant deployment - namespace: minio-tenant - ## Customize any private registry image pull secret. - ## currently only one secret registry is supported - imagePullSecret: { } - ## If a scheduler is specified here, Tenant pods will be dispatched by specified scheduler. - ## If not specified, the Tenant pods will be dispatched by default scheduler. - scheduler: { } - ## Specification for MinIO Pool(s) in this Tenant. - pools: - ## Servers specifies the number of MinIO Tenant Pods / Servers in this pool. - ## For standalone mode, supply 1. For distributed mode, supply 4 or more. - ## Note that the operator does not support upgrading from standalone to distributed mode. - - servers: {{ minio_tenant_servers }} - ## volumesPerServer specifies the number of volumes attached per MinIO Tenant Pod / Server. - volumesPerServer: {{ minio_tenant_volumes_per_server }} - ## size specifies the capacity per volume - size: 1Gi - ## storageClass specifies the storage class name to be used for this pool - storageClassName: local-storage - ## Used to specify annotations for pods + name: {{ minio_tenant_release_name }} + ## Registry location and Tag to download MinIO Server image + image: + repository: {{ (minio_build_image_locally) | ternary(registry_local_address ~ '/' ~ minio_local_build_name , 'quay.io/minio/minio') }} + tag: {{ minio_git_tag }} + pullPolicy: IfNotPresent + ## Customize any private registry image pull secret. + ## currently only one secret registry is supported + imagePullSecret: {} + ## If a scheduler is specified here, Tenant pods will be dispatched by specified scheduler. + ## If not specified, the Tenant pods will be dispatched by default scheduler. + scheduler: {} + ## Specification for MinIO Pool(s) in this Tenant. + pools: + ## Servers specifies the number of MinIO Tenant Pods / Servers in this pool. + ## For standalone mode, supply 1. For distributed mode, supply 4 or more. + ## Note that the operator does not support upgrading from standalone to distributed mode. + - servers: {{ minio_tenant_servers }} + ## custom name for the pool + name: pool-0 + ## volumesPerServer specifies the number of volumes attached per MinIO Tenant Pod / Server. + volumesPerServer: {{ minio_tenant_volumes_per_server }} + ## size specifies the capacity per volume + size: {{ minio_tenant_volume_size }}Gi + ## storageClass specifies the storage class name to be used for this pool + storageClassName: local-storage + ## Used to specify annotations for pods {% if minio_tenant_sriov_resources is defined %} - annotations: - k8s.v1.cni.cncf.io/networks: {{ minio_tenant_sriov_resources | map(attribute='sriov_network') | join(',') }} + annotations: + k8s.v1.cni.cncf.io/networks: {{ minio_tenant_sriov_resources | map(attribute='sriov_network') | join(',') }} {% endif %} - ## Used to specify a toleration for a pod - tolerations: { } - ## nodeSelector parameters for MinIO Pods. It specifies a map of key-value pairs. For the pod to be - ## eligible to run on a node, the node must have each of the - ## indicated key-value pairs as labels. - ## Read more here: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ - nodeSelector: { } - ## Affinity settings for MinIO pods. Read more about affinity - ## here: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity. - affinity: - podAntiAffinity: - requiredDuringSchedulingIgnoredDuringExecution: - - labelSelector: - matchExpressions: - - key: v1.min.io/tenant - operator: In - values: - - minio-tenant - topologyKey: kubernetes.io/hostname - ## Configure resource requests and limits for MinIO containers + ## Used to specify labels for pods + labels: {} + ## Used to specify a toleration for a pod + tolerations: {} + ## nodeSelector parameters for MinIO Pods. It specifies a map of key-value pairs. For the pod to be + ## eligible to run on a node, the node must have each of the + ## indicated key-value pairs as labels. + ## Read more here: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ + nodeSelector: + storage: minio + ## Affinity settings for MinIO pods. Read more about affinity + ## here: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity. + affinity: + podAntiAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - labelSelector: + matchExpressions: + - key: v1.min.io/tenant + operator: In + values: + - minio-tenant + topologyKey: kubernetes.io/hostname + ## Configure resource requests and limits for MinIO containers {% if minio_tenant_sriov_resources is defined %} - resources: - requests: + resources: + requests: {% for request in (minio_tenant_sriov_resources | map(attribute='resource') | map(attribute='requests')) %} {{ request | indent (width=12, first=True)}} {% endfor %} - limits: + limits: {% for limit in (minio_tenant_sriov_resources | map(attribute='resource') | map(attribute='limits')) %} {{ limit | indent (width=12, first=True) }} {% endfor %} {% else %} - resources: { } + resources: {} {% endif %} - ## Configure security context - securityContext: { } - ## Configure topology constraints - topologySpreadConstraints: [ ] - ## Mount path where PV will be mounted inside container(s). - mountPath: /mnt/data - ## Sub path inside Mount path where MinIO stores data. - subPath: /data + ## Configure security context +{% if minio_build_image_locally %} + securityContext: + runAsUser: 1000 + runAsGroup: 1000 + runAsNonRoot: true + fsGroup: 1000 +{% else %} + securityContext: {} +{% endif %} + ## Configure topology constraints + topologySpreadConstraints: [] + ## Mount path where PV will be mounted inside container(s). + mountPath: /export + ## Sub path inside Mount path where MinIO stores data. + subPath: /data + # pool metrics to be read by Prometheus + metrics: + enabled: false + port: 9000 + protocol: http + certificate: + ## Use this field to provide one or more external CA certificates. This is used by MinIO + ## to verify TLS connections with other applications: + ## https://github.com/minio/minio/tree/master/docs/tls/kubernetes#2-create-kubernetes-secret + externalCaCertSecret: [] + ## Use this field to provide a list of Secrets with external certificates. This can be used to configure + ## TLS for MinIO Tenant pods. Create secrets as explained here: + ## https://github.com/minio/minio/tree/master/docs/tls/kubernetes#2-create-kubernetes-secret + externalCertSecret: [] + {# externalCertSecret: + - name: tls-minio1-general + type: kubernetes.io/tls + - name: tls-minio1-star-general + type: kubernetes.io/tls + - name: tls-console-minio1-general + type: kubernetes.io/tls #} + ## Enable automatic Kubernetes based certificate generation and signing as explained in + ## https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster requestAutoCert: true - # pool secrets - secrets: - # create a kubernetes secret object with the accessKey and secretKey as defined here. - enabled: true - name: minio1-secret - accessKey: minio - secretKey: minio123 - # pool metrics to be read by Prometheus - metrics: - enabled: false - port: 9000 + ## This field is used only when "requestAutoCert" is set to true. Use this field to set CommonName + ## for the auto-generated certificate. Internal DNS name for the pod will be used if CommonName is + ## not provided. DNS name format is *.minio.default.svc.cluster.local + certConfig: {} + ## Enable S3 specific features such as Bucket DNS which would allow `buckets` to be + ## accessible as DNS entries of form `.minio.default.svc.cluster.local` + s3: + ## This feature is turned off by default + bucketDNS: false + ## List of bucket names to create during tenant provisioning + buckets: [] + ## List of secret names to use for generating MinIO users during tenant provisioning + users: [] + ## PodManagement policy for MinIO Tenant Pods. Can be "OrderedReady" or "Parallel" + ## Refer https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#pod-management-policy + ## for details. + podManagementPolicy: Parallel + # Liveness Probe for container liveness. Container will be restarted if the probe fails. + # Refer https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes. + liveness: {} + # Readiness Probe for container readiness. Container will be removed from service endpoints if the probe fails. + # Refer https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ + readiness: {} + ## exposeServices defines the exposure of the MinIO object storage and Console services. + ## service is exposed as a loadbalancer in k8s service. + exposeServices: {} + # kubernetes service account associated with a specific tenant + # https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/ + serviceAccountName: "" + # Tenant scrape configuration will be added to prometheus managed by the prometheus-operator. + prometheusOperator: false + # Enable JSON, Anonymous logging for MinIO tenants. + # Refer https://github.com/minio/operator/blob/master/pkg/apis/minio.min.io/v2/types.go#L303 + # How logs will look: + # $ k logs minio1-pool-0-0 -n default + # {"level":"INFO","errKind":"","time":"2022-04-07T21:49:33.740058549Z","message":"All MinIO sub-systems initialized successfully"} + # Notice they are in JSON format to be consumed + logging: + anonymous: true + json: true + quiet: true + ## serviceMetadata allows passing additional labels and annotations to MinIO and Console specific + ## services created by the operator. + serviceMetadata: {} + ## Add environment variables to be set in MinIO container (https://github.com/minio/minio/tree/master/docs/config) + env: [] + ## PriorityClassName indicates the Pod priority and hence importance of a Pod relative to other Pods. + ## This is applied to MinIO pods only. + ## Refer Kubernetes documentation for details https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass/ + priorityClassName: "" + ## Define configuration for KES (stateless and distributed key-management system) + ## Refer https://github.com/minio/kes + #kes: + # image: "" # minio/kes:v0.18.0 + # env: [] + # replicas: 2 + # configuration: |- + # address: :7373 + # root: _ # Effectively disabled since no root identity necessary. + # tls: + # key: /tmp/kes/server.key # Path to the TLS private key + # cert: /tmp/kes/server.crt # Path to the TLS certificate + # proxy: + # identities: [] + # header: + # cert: X-Tls-Client-Cert + # policy: + # my-policy: + # paths: + # - /v1/key/create/* + # - /v1/key/generate/* + # - /v1/key/decrypt/* + # identities: + # - ${MINIO_KES_IDENTITY} + # cache: + # expiry: + # any: 5m0s + # unused: 20s + # log: + # error: on + # audit: off + # keys: + # ## KES configured with fs (File System mode) doesnt work in Kubernetes environments and it's not recommended + # ## use a real KMS + # # fs: + # # path: "./keys" # Path to directory. Keys will be stored as files. Not Recommended for Production. + # vault: + # endpoint: "http://vault.default.svc.cluster.local:8200" # The Vault endpoint + # namespace: "" # An optional Vault namespace. See: https://www.vaultproject.io/docs/enterprise/namespaces/index.html + # prefix: "my-minio" # An optional K/V prefix. The server will store keys under this prefix. + # approle: # AppRole credentials. See: https://www.vaultproject.io/docs/auth/approle.html + # id: "" # Your AppRole Role ID + # secret: "" # Your AppRole Secret ID + # retry: 15s # Duration until the server tries to re-authenticate after connection loss. + # tls: # The Vault client TLS configuration for mTLS authentication and certificate verification + # key: "" # Path to the TLS client private key for mTLS authentication to Vault + # cert: "" # Path to the TLS client certificate for mTLS authentication to Vault + # ca: "" # Path to one or multiple PEM root CA certificates + # status: # Vault status configuration. The server will periodically reach out to Vault to check its status. + # ping: 10s # Duration until the server checks Vault's status again. + # # aws: + # # # The AWS SecretsManager key store. The server will store + # # # secret keys at the AWS SecretsManager encrypted with + # # # AWS-KMS. See: https://aws.amazon.com/secrets-manager + # # secretsmanager: + # # endpoint: "" # The AWS SecretsManager endpoint - e.g.: secretsmanager.us-east-2.amazonaws.com + # # region: "" # The AWS region of the SecretsManager - e.g.: us-east-2 + # # kmskey: "" # The AWS-KMS key ID used to en/decrypt secrets at the SecretsManager. By default (if not set) the default AWS-KMS key will be used. + # # credentials: # The AWS credentials for accessing secrets at the AWS SecretsManager. + # # accesskey: "" # Your AWS Access Key + # # secretkey: "" # Your AWS Secret Key + # # token: "" # Your AWS session token (usually optional) + # imagePullPolicy: "IfNotPresent" + # externalCertSecret: null + # clientCertSecret: null + # ## Key name to be created on the KMS, default is "my-minio-key" + # keyName: "" + # resources: {} + # nodeSelector: {} + # affinity: + # nodeAffinity: {} + # podAffinity: {} + # podAntiAffinity: {} + # tolerations: [] + # annotations: {} + # labels: {} + # serviceAccountName: "" + # securityContext: + # runAsUser: 1000 + # runAsGroup: 1000 + # runAsNonRoot: true + # fsGroup: 1000 + ## Prometheus setup for MinIO Tenant. + prometheus: + image: "" # defaults to quay.io/prometheus/prometheus:latest + env: [] + sidecarimage: "" # defaults to alpine + initimage: "" # defaults to busybox:1.33.1 + diskCapacityGB: 1 + storageClassName: local-storage + annotations: {} + labels: {} + nodeSelector: {} + affinity: + nodeAffinity: {} + podAffinity: {} + podAntiAffinity: {} + resources: {} + serviceAccountName: "" + securityContext: + runAsUser: 1000 + runAsGroup: 1000 + runAsNonRoot: true + fsGroup: 1000 + ## LogSearch API setup for MinIO Tenant. + log: + image: "" # defaults to minio/operator:v4.4.17 + env: [] + resources: {} + nodeSelector: {} + affinity: + nodeAffinity: {} + podAffinity: {} + podAntiAffinity: {} + tolerations: [] + annotations: {} + labels: {} + audit: + diskCapacityGB: 1 + ## Postgres setup for LogSearch API + db: + image: "{{ registry_local_address }}/{{ minio_log_postgres_local_image_name }}:{{ minio_log_postgres_local_image_tag }}" # defaults to library/postgres + env: [] + initimage: "" # defaults to busybox:1.33.1 + volumeClaimTemplate: + metadata: {} + spec: + storageClassName: local-storage + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 1Gi + resources: {} + nodeSelector: {} + affinity: + nodeAffinity: {} + podAffinity: {} + podAntiAffinity: {} + tolerations: [] + annotations: {} + labels: {} + serviceAccountName: "" + securityContext: + runAsUser: 999 + runAsGroup: 999 + runAsNonRoot: true + fsGroup: 999 + serviceAccountName: "" + securityContext: + runAsUser: 1000 + runAsGroup: 1000 + runAsNonRoot: true + fsGroup: 1000 + +ingress: + api: + enabled: false + ingressClassName: "" + labels: {} + annotations: {} + tls: [] + host: minio.local + path: / + pathType: Prefix + console: + enabled: false + ingressClassName: "" + labels: {} + annotations: {} + tls: [] + host: minio-console.local + path: / + pathType: Prefix diff --git a/roles/minio_install/templates/minio_tenant_endpoints.yml.j2 b/roles/minio_install/templates/minio_tenant_endpoints.yml.j2 deleted file mode 100644 index 1f6cd9b4..00000000 --- a/roles/minio_install/templates/minio_tenant_endpoints.yml.j2 +++ /dev/null @@ -1,11 +0,0 @@ -{%- for ep in ep_data %} -{%- if pod_node[item] == ep['pod-name'] %} - - hostname: {{ ep['pod-name'] }} - ip: {{ ep['ips'] }} - nodeName: {{ item }} - targetRef: - kind: Pod - name: {{ ep['pod-name'] }} - namespace: {{ minio_tenant_namespace }} -{% endif %} -{% endfor %} diff --git a/roles/minio_install/templates/minio_tenant_localpersistentvolumes.yml.j2 b/roles/minio_install/templates/minio_tenant_localpersistentvolumes.yml.j2 index 2d8c7ced..4de0c184 100644 --- a/roles/minio_install/templates/minio_tenant_localpersistentvolumes.yml.j2 +++ b/roles/minio_install/templates/minio_tenant_localpersistentvolumes.yml.j2 @@ -6,7 +6,7 @@ metadata: name: {{ pv.name }}-{{ item }} spec: capacity: - storage: {{ pv.capacity[:-1] }} + storage: {{ minio_tenant_volume_size }}Gi volumeMode: Filesystem accessModes: - {{ pv.accessMode }} diff --git a/roles/minio_install/templates/minio_tenant_multus_services.yml.j2 b/roles/minio_install/templates/minio_tenant_multus_services.yml.j2 new file mode 100644 index 00000000..1977b722 --- /dev/null +++ b/roles/minio_install/templates/minio_tenant_multus_services.yml.j2 @@ -0,0 +1,18 @@ +{%- for sriov_res in minio_tenant_sriov_resources %} +--- +kind: Service +apiVersion: v1 +metadata: + name: minio-multus-service-{{ loop.index }} + namespace: {{ minio_tenant_namespace }} + labels: + service.kubernetes.io/service-proxy-name: multus-proxy + annotations: + k8s.v1.cni.cncf.io/service-network: {{ sriov_res['sriov_network'] }} +spec: + selector: + {{ minio_multus_selector_key }}: {{ minio_storage_worker_value }} + ports: + - protocol: TCP + port: 9000 +{% endfor %} diff --git a/roles/minio_install/templates/minio_tenant_pods_ips_regex.j2 b/roles/minio_install/templates/minio_tenant_pods_ips_regex.j2 deleted file mode 100644 index 4936a8d0..00000000 --- a/roles/minio_install/templates/minio_tenant_pods_ips_regex.j2 +++ /dev/null @@ -1,5 +0,0 @@ -networks-status:[\s\S]*? -{%- for i in range(minio_tenant_sriov_resources | length + 1) -%} -(?:")(name)(?:":\s*")([\S]*)(?:",[\s\S]*?)(?:")(ips)(?:":\s*\[\s*")([\S]*)(?:"[\s\S]*?) -{%- endfor -%} -[\s\S]*?(pod-name)(?::\s)([\S]*) diff --git a/roles/minio_install/templates/minio_tenant_sriovnetwork_values.yml.j2 b/roles/minio_install/templates/minio_tenant_sriovnetwork_values.yml.j2 index 4f91bd62..2c3f4633 100644 --- a/roles/minio_install/templates/minio_tenant_sriovnetwork_values.yml.j2 +++ b/roles/minio_install/templates/minio_tenant_sriovnetwork_values.yml.j2 @@ -12,7 +12,7 @@ spec: "type": "whereabouts", "log_file": "/tmp/whereabouts.log", "log_level": "debug", - "range": "10.56.{{ ip_third_digit }}.0/16", + "range": "10.56.{{ ip_third_digit }}.0/24", "range_start": "10.56.{{ ip_third_digit }}.100", "range_end": "10.56.{{ ip_third_digit }}.200", "routes": [{ diff --git a/roles/minio_install/vars/main.yml b/roles/minio_install/vars/main.yml new file mode 100644 index 00000000..170d32e5 --- /dev/null +++ b/roles/minio_install/vars/main.yml @@ -0,0 +1,23 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +install_dependencies: + Debian: + - git + - make + RedHat: + - git + - make diff --git a/roles/multus_service/charts/multus_service/Chart.yaml b/roles/multus_service/charts/multus_service/Chart.yaml new file mode 100755 index 00000000..1235eab0 --- /dev/null +++ b/roles/multus_service/charts/multus_service/Chart.yaml @@ -0,0 +1,26 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +apiVersion: v2 +description: A Helm chart for Multus Service +name: multus-service +version: 0.0.1 +appVersion: v0.0.1 +sources: + - https://github.com/k8snetworkplumbingwg/multus-service.git +maintainers: + - name: Tomofumi Hayashi + email: tohayash@redhat.com +type: application diff --git a/roles/multus_service/charts/multus_service/templates/multus_service_clusterrole.yaml b/roles/multus_service/charts/multus_service/templates/multus_service_clusterrole.yaml new file mode 100755 index 00000000..9c4c05ed --- /dev/null +++ b/roles/multus_service/charts/multus_service/templates/multus_service_clusterrole.yaml @@ -0,0 +1,56 @@ +--- +kind: ClusterRole +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: multus-service +rules: + - apiGroups: + - "k8s.cni.cncf.io" + resources: + - '*' + verbs: + - '*' + - apiGroups: + - "discovery.k8s.io" + resources: + - endpointslices + verbs: + - create + - get + - list + - update + - watch + - apiGroups: + - "" + resources: + - pods + - namespaces + - nodes + - services + - statefulsets + verbs: + - get + - list + - watch + - apiGroups: + - "" + resources: + - events + verbs: + - create + - patch + - update + - apiGroups: + - coordination.k8s.io + resources: + - leases + verbs: + - '*' + - apiGroups: + - events.k8s.io + resources: + - events + verbs: + - create + - patch + - update diff --git a/roles/multus_service/charts/multus_service/templates/multus_service_clusterrolebinding.yaml b/roles/multus_service/charts/multus_service/templates/multus_service_clusterrolebinding.yaml new file mode 100755 index 00000000..26369ea2 --- /dev/null +++ b/roles/multus_service/charts/multus_service/templates/multus_service_clusterrolebinding.yaml @@ -0,0 +1,13 @@ +--- +kind: ClusterRoleBinding +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: multus-service +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: multus-service +subjects: +- kind: ServiceAccount + name: multus-service + namespace: kube-system diff --git a/roles/multus_service/charts/multus_service/templates/multus_service_daemonset.yaml b/roles/multus_service/charts/multus_service/templates/multus_service_daemonset.yaml new file mode 100755 index 00000000..be300c63 --- /dev/null +++ b/roles/multus_service/charts/multus_service/templates/multus_service_daemonset.yaml @@ -0,0 +1,69 @@ +--- +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: multus-proxy-ds-amd64 + namespace: kube-system + labels: + tier: node + app: multus-proxy + name: multus-proxy +spec: + selector: + matchLabels: + name: multus-proxy + updateStrategy: + type: RollingUpdate + template: + metadata: + labels: + tier: node + app: multus-proxy + name: multus-proxy + spec: + hostNetwork: true + nodeSelector: + kubernetes.io/arch: amd64 + tolerations: + - operator: Exists + effect: NoSchedule + serviceAccountName: multus-service + containers: + - name: multus-proxy + # crio support requires multus:latest for now. support 3.3 or later. + image: ghcr.io/k8snetworkplumbingwg/multus-service:latest-nft + imagePullPolicy: Always + command: ["/usr/bin/multus-proxy"] + args: + - "--host-prefix=/host" + # uncomment this if runtime is docker + - "--container-runtime=cri" + # change this if runtime is different that crio default + - "--container-runtime-endpoint=/run/crio/crio.sock" + # uncomment this if you want to store iptables rules + - "--pod-iptables=/var/lib/multus-proxy/iptables" + - "--logtostderr" + - "-v=4" + resources: + requests: + cpu: "100m" + memory: "80Mi" + limits: + cpu: "100m" + memory: "150Mi" + securityContext: + privileged: true + capabilities: + add: ["SYS_ADMIN", "SYS_NET_ADMIN"] + volumeMounts: + - name: host + mountPath: /host + - name: var-lib-multusproxy + mountPath: /var/lib/multus-proxy + volumes: + - name: host + hostPath: + path: / + - name: var-lib-multusproxy + hostPath: + path: /var/lib/multus-proxy diff --git a/roles/multus_service/charts/multus_service/templates/multus_service_deployment.yaml b/roles/multus_service/charts/multus_service/templates/multus_service_deployment.yaml new file mode 100755 index 00000000..200689df --- /dev/null +++ b/roles/multus_service/charts/multus_service/templates/multus_service_deployment.yaml @@ -0,0 +1,49 @@ +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: multus-service-controller + namespace: kube-system + labels: + app: multus-service-controller + name: multus-service-controller +spec: + replicas: 1 + selector: + matchLabels: + app: multus-service-controller + name: multus-service-controller + template: + metadata: + labels: + app: multus-service-controller + name: multus-service-controller + spec: + nodeSelector: + kubernetes.io/arch: amd64 + tolerations: + - operator: Exists + effect: NoSchedule + serviceAccountName: multus-service + containers: + - name: multus-service-controller + image: ghcr.io/k8snetworkplumbingwg/multus-service:latest-nft + imagePullPolicy: Always + command: ["/usr/bin/multus-service-controller"] + resources: + requests: + cpu: "100m" + memory: "80Mi" + limits: + cpu: "100m" + memory: "150Mi" + args: + - "--logtostderr" + - "-v=4" + resources: + requests: + cpu: "100m" + memory: "80Mi" + limits: + cpu: "100m" + memory: "150Mi" diff --git a/roles/multus_service/charts/multus_service/templates/multus_service_serviceaccount.yaml b/roles/multus_service/charts/multus_service/templates/multus_service_serviceaccount.yaml new file mode 100755 index 00000000..4411c091 --- /dev/null +++ b/roles/multus_service/charts/multus_service/templates/multus_service_serviceaccount.yaml @@ -0,0 +1,6 @@ +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: multus-service + namespace: kube-system diff --git a/roles/multus_service/charts/multus_service/values.yaml b/roles/multus_service/charts/multus_service/values.yaml new file mode 100755 index 00000000..9c4273dc --- /dev/null +++ b/roles/multus_service/charts/multus_service/values.yaml @@ -0,0 +1,17 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## + +# Default values for Multus Service. diff --git a/roles/multus_service/defaults/main.yml b/roles/multus_service/defaults/main.yml new file mode 100755 index 00000000..2bbb5641 --- /dev/null +++ b/roles/multus_service/defaults/main.yml @@ -0,0 +1,21 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +multus_service_namespace: "kube-system" # Multus Service namespace +multus_service_release_name: "multus-service" # Multus Service Helm Charts release name + +multus_service_git_url: "https://github.com/k8snetworkplumbingwg/multus-service.git" +multus_service_commit_hash: "392817f2441b5443d7868fd9f0f692447a027554" diff --git a/roles/multus_service/tasks/main.yml b/roles/multus_service/tasks/main.yml new file mode 100755 index 00000000..005991dc --- /dev/null +++ b/roles/multus_service/tasks/main.yml @@ -0,0 +1,20 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: install Multus Service + import_tasks: multus_service.yml + when: + - inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/multus_service/tasks/multus_service.yml b/roles/multus_service/tasks/multus_service.yml new file mode 100755 index 00000000..d7a13b3f --- /dev/null +++ b/roles/multus_service/tasks/multus_service.yml @@ -0,0 +1,49 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: check Multus Service Helm charts directory + stat: + path: "{{ (project_root_dir, 'charts', 'multus_service') | path_join }}" + register: multus_service_path + +- name: create Multus Service Helm charts directory if needed + file: + path: "{{ (project_root_dir, 'charts', 'multus_service') | path_join }}" + state: directory + mode: 0755 + when: + - multus_service_path.stat.exists is defined and not multus_service_path.stat.exists + +- name: copy Multus Service charts to the controller node + copy: + src: "{{ (role_path, 'charts', 'multus_service') | path_join }}" + dest: "{{ (project_root_dir, 'charts') | path_join }}" + mode: 0755 + +# - name: clone Multus Service repository + # git: + # repo: "{{ multus_service_git_url }}" + # dest: "{{ (project_root_dir, 'charts', 'multus_service') | path_join }}" + # version: "{{ multus_service_commit_hash }}" + # force: yes + +- name: install Multus Service Helm chart + command: >- + helm install {{ multus_service_release_name }} + {{ (project_root_dir, 'charts', 'multus_service') | path_join }} + --namespace {{ multus_service_namespace }} + --create-namespace + changed_when: true diff --git a/roles/net_attach_defs_create/tasks/cndp_net_attach_def.yml b/roles/net_attach_defs_create/tasks/cndp_net_attach_def.yml index 937bb357..66621df7 100644 --- a/roles/net_attach_defs_create/tasks/cndp_net_attach_def.yml +++ b/roles/net_attach_defs_create/tasks/cndp_net_attach_def.yml @@ -25,4 +25,3 @@ k8s: state: present src: "{{ cndp_k8s_manifest_dir }}/cndp_net_attach_def-e2e.yml" - diff --git a/roles/nfd_install/charts/node-feature-discovery/.helmignore b/roles/nfd_install/charts/node-feature-discovery/.helmignore index 50af0317..0e8a0eb3 100644 --- a/roles/nfd_install/charts/node-feature-discovery/.helmignore +++ b/roles/nfd_install/charts/node-feature-discovery/.helmignore @@ -14,6 +14,7 @@ *.swp *.bak *.tmp +*.orig *~ # Various IDEs .project diff --git a/roles/nfd_install/charts/node-feature-discovery/Chart.yaml b/roles/nfd_install/charts/node-feature-discovery/Chart.yaml index 30a33017..48bf5f9f 100644 --- a/roles/nfd_install/charts/node-feature-discovery/Chart.yaml +++ b/roles/nfd_install/charts/node-feature-discovery/Chart.yaml @@ -13,10 +13,18 @@ ## See the License for the specific language governing permissions and ## limitations under the License. ## -apiVersion: v1 +apiVersion: v2 +appVersion: v0.11.1 +description: | + Detects hardware features available on each node in a Kubernetes cluster, and advertises + those features using node labels. name: node-feature-discovery -version: 0.11.0 -description: Node Feature Discovery (NFD) sources: - - https://github.com/kubernetes-sigs/node-feature-discovery -appVersion: 0.11.0 + - https://github.com/kubernetes-sigs/node-feature-discovery +home: https://github.com/kubernetes-sigs/node-feature-discovery +keywords: + - feature-discovery + - feature-detection + - node-labels +type: application +version: 0.2.1 diff --git a/roles/nfd_install/charts/node-feature-discovery/README.md b/roles/nfd_install/charts/node-feature-discovery/README.md new file mode 100644 index 00000000..c24e63e3 --- /dev/null +++ b/roles/nfd_install/charts/node-feature-discovery/README.md @@ -0,0 +1,25 @@ + +# Node Feature Discovery + +Node Feature Discovery (NFD) is a Kubernetes add-on for detecting hardware +features and system configuration. Detected features are advertised as node +labels. NFD provides flexible configuration and extension points for a wide +range of vendor and application specific node labeling needs. + +See +[NFD documentation](https://kubernetes-sigs.github.io/node-feature-discovery/v0.11/get-started/deployment-and-usage.html#deployment-with-helm) +for deployment instructions. diff --git a/roles/nfd_install/charts/node-feature-discovery/manifests/nodefeaturerule-crd.yaml b/roles/nfd_install/charts/node-feature-discovery/manifests/nodefeaturerule-crd.yaml new file mode 100644 index 00000000..f85573ab --- /dev/null +++ b/roles/nfd_install/charts/node-feature-discovery/manifests/nodefeaturerule-crd.yaml @@ -0,0 +1,238 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## + +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.7.0 + creationTimestamp: null + name: nodefeaturerules.nfd.k8s-sigs.io +spec: + group: nfd.k8s-sigs.io + names: + kind: NodeFeatureRule + listKind: NodeFeatureRuleList + plural: nodefeaturerules + singular: nodefeaturerule + scope: Cluster + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: NodeFeatureRule resource specifies a configuration for feature-based + customization of node objects, such as node labeling. + properties: + apiVersion: + description: 'APIVersion defines the versioned schema of this representation + of an object. Servers should convert recognized schemas to the latest + internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources' + type: string + kind: + description: 'Kind is a string value representing the REST resource this + object represents. Servers may infer this from the endpoint the client + submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' + type: string + metadata: + type: object + spec: + description: NodeFeatureRuleSpec describes a NodeFeatureRule. + properties: + rules: + description: Rules is a list of node customization rules. + items: + description: Rule defines a rule for node customization such as + labeling. + properties: + labels: + additionalProperties: + type: string + description: Labels to create if the rule matches. + type: object + labelsTemplate: + description: LabelsTemplate specifies a template to expand for + dynamically generating multiple labels. Data (after template + expansion) must be keys with an optional value ([=]) + separated by newlines. + type: string + matchAny: + description: MatchAny specifies a list of matchers one of which + must match. + items: + description: MatchAnyElem specifies one sub-matcher of MatchAny. + properties: + matchFeatures: + description: MatchFeatures specifies a set of matcher + terms all of which must match. + items: + description: FeatureMatcherTerm defines requirements + against one feature set. All requirements (specified + as MatchExpressions) are evaluated against each element + in the feature set. + properties: + feature: + type: string + matchExpressions: + additionalProperties: + description: "MatchExpression specifies an expression + to evaluate against a set of input values. It + contains an operator that is applied when matching + the input and an array of values that the operator + evaluates the input against. \n NB: CreateMatchExpression + or MustCreateMatchExpression() should be used + for creating new instances. NB: Validate() + must be called if Op or Value fields are modified + or if a new instance is created from scratch + without using the helper functions." + properties: + op: + description: Op is the operator to be applied. + enum: + - In + - NotIn + - InRegexp + - Exists + - DoesNotExist + - Gt + - Lt + - GtLt + - IsTrue + - IsFalse + type: string + value: + description: Value is the list of values that + the operand evaluates the input against. + Value should be empty if the operator is + Exists, DoesNotExist, IsTrue or IsFalse. + Value should contain exactly one element + if the operator is Gt or Lt and exactly + two elements if the operator is GtLt. In + other cases Value should contain at least + one element. + items: + type: string + type: array + required: + - op + type: object + description: MatchExpressionSet contains a set of + MatchExpressions, each of which is evaluated against + a set of input values. + type: object + required: + - feature + - matchExpressions + type: object + type: array + required: + - matchFeatures + type: object + type: array + matchFeatures: + description: MatchFeatures specifies a set of matcher terms + all of which must match. + items: + description: FeatureMatcherTerm defines requirements against + one feature set. All requirements (specified as MatchExpressions) + are evaluated against each element in the feature set. + properties: + feature: + type: string + matchExpressions: + additionalProperties: + description: "MatchExpression specifies an expression + to evaluate against a set of input values. It contains + an operator that is applied when matching the input + and an array of values that the operator evaluates + the input against. \n NB: CreateMatchExpression or + MustCreateMatchExpression() should be used for creating + new instances. NB: Validate() must be called if Op + or Value fields are modified or if a new instance + is created from scratch without using the helper functions." + properties: + op: + description: Op is the operator to be applied. + enum: + - In + - NotIn + - InRegexp + - Exists + - DoesNotExist + - Gt + - Lt + - GtLt + - IsTrue + - IsFalse + type: string + value: + description: Value is the list of values that the + operand evaluates the input against. Value should + be empty if the operator is Exists, DoesNotExist, + IsTrue or IsFalse. Value should contain exactly + one element if the operator is Gt or Lt and exactly + two elements if the operator is GtLt. In other + cases Value should contain at least one element. + items: + type: string + type: array + required: + - op + type: object + description: MatchExpressionSet contains a set of MatchExpressions, + each of which is evaluated against a set of input values. + type: object + required: + - feature + - matchExpressions + type: object + type: array + name: + description: Name of the rule. + type: string + vars: + additionalProperties: + type: string + description: Vars is the variables to store if the rule matches. + Variables do not directly inflict any changes in the node + object. However, they can be referenced from other rules enabling + more complex rule hierarchies, without exposing intermediary + output values as labels. + type: object + varsTemplate: + description: VarsTemplate specifies a template to expand for + dynamically generating multiple variables. Data (after template + expansion) must be keys with an optional value ([=]) + separated by newlines. + type: string + required: + - name + type: object + type: array + required: + - rules + type: object + required: + - spec + type: object + served: true + storage: true +status: + acceptedNames: + kind: "" + plural: "" + conditions: [] + storedVersions: [] diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/NOTES.txt b/roles/nfd_install/charts/node-feature-discovery/templates/NOTES.txt deleted file mode 100644 index 895f55d2..00000000 --- a/roles/nfd_install/charts/node-feature-discovery/templates/NOTES.txt +++ /dev/null @@ -1,16 +0,0 @@ -{{ .Chart.Name }} was installed. - -Your release is named {{ .Release.Name }}. - -To learn more about the release, try: - - $ helm status {{ .Release.Name }} - $ helm get {{ .Release.Name }} - -To inspect node labels, try: - - $ kubectl get node -o json | jq .metadata.labels - - or if jq is not installed: - - $ kubectl get node -o jsonpath="{.metadata.labels}" \ No newline at end of file diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/_helpers.tpl b/roles/nfd_install/charts/node-feature-discovery/templates/_helpers.tpl index 04aeb071..08760ea0 100644 --- a/roles/nfd_install/charts/node-feature-discovery/templates/_helpers.tpl +++ b/roles/nfd_install/charts/node-feature-discovery/templates/_helpers.tpl @@ -1,7 +1,16 @@ +{{/* vim: set filetype=mustache: */}} +{{/* +Expand the name of the chart. +*/}} {{- define "node-feature-discovery.name" -}} {{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" -}} {{- end -}} +{{/* +Create a default fully qualified app name. +We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec). +If release name contains chart name it will be used as a full name. +*/}} {{- define "node-feature-discovery.fullname" -}} {{- if .Values.fullnameOverride -}} {{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" -}} @@ -15,24 +24,62 @@ {{- end -}} {{- end -}} +{{/* +Create chart name and version as used by the chart label. +*/}} {{- define "node-feature-discovery.chart" -}} {{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" -}} {{- end -}} +{{/* +Common labels +*/}} {{- define "node-feature-discovery.labels" -}} -app.kubernetes.io/name: {{ include "node-feature-discovery.name" . }} helm.sh/chart: {{ include "node-feature-discovery.chart" . }} -app.kubernetes.io/instance: {{ .Release.Name }} +{{ include "node-feature-discovery.selectorLabels" . }} {{- if .Chart.AppVersion }} app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} {{- end }} app.kubernetes.io/managed-by: {{ .Release.Service }} {{- end -}} -{{- define "node-feature-discovery.saName" -}} -{{- if .Values.serviceAccount.create -}} - {{ default (include "node-feature-discovery.fullname" .) .Values.serviceAccount.name }} +{{/* +Selector labels +*/}} +{{- define "node-feature-discovery.selectorLabels" -}} +app.kubernetes.io/name: {{ include "node-feature-discovery.name" . }} +app.kubernetes.io/instance: {{ .Release.Name }} +{{- end -}} + +{{/* +Create the name of the service account which the nfd master will use +*/}} +{{- define "node-feature-discovery.master.serviceAccountName" -}} +{{- if .Values.master.serviceAccount.create -}} + {{ default (include "node-feature-discovery.fullname" .) .Values.master.serviceAccount.name }} {{- else -}} - {{ default "default" .Values.serviceAccount.name }} + {{ default "default" .Values.master.serviceAccount.name }} +{{- end -}} +{{- end -}} + +{{/* +Create the name of the service account which the nfd worker will use +*/}} +{{- define "node-feature-discovery.worker.serviceAccountName" -}} +{{- if .Values.worker.serviceAccount.create -}} + {{ default (printf "%s-worker" (include "node-feature-discovery.fullname" .)) .Values.worker.serviceAccount.name }} +{{- else -}} + {{ default "default" .Values.worker.serviceAccount.name }} +{{- end -}} +{{- end -}} + +{{/* +Create the name of the service account which topologyUpdater will use +*/}} +{{- define "node-feature-discovery.topologyUpdater.serviceAccountName" -}} +{{- if .Values.topologyUpdater.serviceAccount.create -}} + {{ default (printf "%s-topology-updater" (include "node-feature-discovery.fullname" .)) .Values.topologyUpdater.serviceAccount.name }} +{{- else -}} + {{ default "default" .Values.topologyUpdater.serviceAccount.name }} +{{- end -}} {{- end -}} -{{- end -}} \ No newline at end of file diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/cert-manager-certs.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/cert-manager-certs.yaml new file mode 100644 index 00000000..9e3a3112 --- /dev/null +++ b/roles/nfd_install/charts/node-feature-discovery/templates/cert-manager-certs.yaml @@ -0,0 +1,64 @@ +{{- if .Values.tls.certManager }} +--- +apiVersion: cert-manager.io/v1 +kind: Certificate +metadata: + name: nfd-master-cert +spec: + secretName: nfd-master-cert + subject: + organizations: + - node-feature-discovery + commonName: nfd-master + dnsNames: + # must match the service name + - {{ include "node-feature-discovery.fullname" . }}-master + # first one is configured for use by the worker; below are for completeness + - {{ include "node-feature-discovery.fullname" . }}-master.{{ $.Release.Namespace }}.svc + - {{ include "node-feature-discovery.fullname" . }}-master.{{ $.Release.Namespace }}.svc.cluster.local + # localhost needed for grpc_health_probe + - localhost + issuerRef: + name: nfd-ca-issuer + kind: Issuer + group: cert-manager.io + +--- +apiVersion: cert-manager.io/v1 +kind: Certificate +metadata: + name: nfd-worker-cert +spec: + secretName: nfd-worker-cert + subject: + organizations: + - node-feature-discovery + commonName: nfd-worker + dnsNames: + - {{ include "node-feature-discovery.fullname" . }}-worker.{{ $.Release.Namespace }}.svc.cluster.local + issuerRef: + name: nfd-ca-issuer + kind: Issuer + group: cert-manager.io + +{{- if .Values.topologyUpdater.enable }} +--- +apiVersion: cert-manager.io/v1 +kind: Certificate +metadata: + name: nfd-topology-updater-cert +spec: + secretName: nfd-topology-updater-cert + subject: + organizations: + - node-feature-discovery + commonName: nfd-topology-updater + dnsNames: + - {{ include "node-feature-discovery.fullname" . }}-topology-updater.{{ $.Release.Namespace }}.svc.cluster.local + issuerRef: + name: nfd-ca-issuer + kind: Issuer + group: cert-manager.io +{{- end }} + +{{- end }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/cert-manager-issuer.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/cert-manager-issuer.yaml new file mode 100644 index 00000000..0401edd6 --- /dev/null +++ b/roles/nfd_install/charts/node-feature-discovery/templates/cert-manager-issuer.yaml @@ -0,0 +1,39 @@ +{{- if .Values.tls.certManager }} +# See https://cert-manager.io/docs/configuration/selfsigned/#bootstrapping-ca-issuers +# - Create a self signed issuer +# - Use this to create a CA cert +# - Use this to now create a CA issuer +--- +apiVersion: cert-manager.io/v1 +kind: Issuer +metadata: + name: nfd-ca-bootstrap +spec: + selfSigned: {} + +--- +apiVersion: cert-manager.io/v1 +kind: Certificate +metadata: + name: nfd-ca-cert +spec: + isCA: true + secretName: nfd-ca-cert + subject: + organizations: + - node-feature-discovery + commonName: nfd-ca-cert + issuerRef: + name: nfd-ca-bootstrap + kind: Issuer + group: cert-manager.io + +--- +apiVersion: cert-manager.io/v1 +kind: Issuer +metadata: + name: nfd-ca-issuer +spec: + ca: + secretName: nfd-ca-cert +{{- end }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/clusterrole.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/clusterrole.yaml new file mode 100644 index 00000000..36a12ecb --- /dev/null +++ b/roles/nfd_install/charts/node-feature-discovery/templates/clusterrole.yaml @@ -0,0 +1,63 @@ +{{- if .Values.master.rbac.create }} +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: {{ include "node-feature-discovery.fullname" . }} + labels: + {{- include "node-feature-discovery.labels" . | nindent 4 }} +rules: +- apiGroups: + - "" + resources: + - nodes +{{- if .Values.master.resourceLabels | empty | not }} + - nodes/status +{{- end }} + verbs: + - get + - patch + - update + - list +- apiGroups: + - nfd.k8s-sigs.io + resources: + - nodefeaturerules + verbs: + - get + - list + - watch +{{- if .Values.topologyUpdater.enable }} +- apiGroups: + - topology.node.k8s.io + resources: + - noderesourcetopologies + verbs: + - create + - get + - update +{{- end }} +{{- end }} + +--- +{{- if .Values.topologyUpdater.rbac.create }} +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: {{ include "node-feature-discovery.fullname" . }}-topology-updater + labels: + {{- include "node-feature-discovery.labels" . | nindent 4 }} +rules: +- apiGroups: + - "" + resources: + - nodes + verbs: + - get + - list +- apiGroups: + - "" + resources: + - pods + verbs: + - get +{{- end }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/clusterrolebinding.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/clusterrolebinding.yaml new file mode 100644 index 00000000..40033c64 --- /dev/null +++ b/roles/nfd_install/charts/node-feature-discovery/templates/clusterrolebinding.yaml @@ -0,0 +1,34 @@ +{{- if .Values.master.rbac.create }} +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: {{ include "node-feature-discovery.fullname" . }} + labels: + {{- include "node-feature-discovery.labels" . | nindent 4 }} +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: {{ include "node-feature-discovery.fullname" . }} +subjects: +- kind: ServiceAccount + name: {{ include "node-feature-discovery.master.serviceAccountName" . }} + namespace: {{ $.Release.Namespace }} +{{- end }} + +--- +{{- if .Values.topologyUpdater.rbac.create }} +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: {{ include "node-feature-discovery.fullname" . }}-topology-updater + labels: + {{- include "node-feature-discovery.labels" . | nindent 4 }} +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: {{ include "node-feature-discovery.fullname" . }}-topology-updater +subjects: +- kind: ServiceAccount + name: {{ include "node-feature-discovery.topologyUpdater.serviceAccountName" . }} + namespace: {{ $.Release.Namespace }} +{{- end }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/config.yml b/roles/nfd_install/charts/node-feature-discovery/templates/config.yml deleted file mode 100644 index 031c51ba..00000000 --- a/roles/nfd_install/charts/node-feature-discovery/templates/config.yml +++ /dev/null @@ -1,12 +0,0 @@ ---- -{{- if .Values.config }} -apiVersion: v1 -kind: ConfigMap -metadata: - name: {{ template "node-feature-discovery.fullname" . }}-worker-config - labels: -{{ include "node-feature-discovery.labels" . | indent 4 }} -data: - nfd-worker.conf: | -{{ toYaml .Values.config | indent 4 }} -{{- end }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/master.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/master.yaml new file mode 100644 index 00000000..ce28646d --- /dev/null +++ b/roles/nfd_install/charts/node-feature-discovery/templates/master.yaml @@ -0,0 +1,117 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: {{ include "node-feature-discovery.fullname" . }}-master + labels: + {{- include "node-feature-discovery.labels" . | nindent 4 }} + role: master + annotations: + {{- toYaml .Values.master.deploymentAnnotations | nindent 4 }} +spec: + replicas: {{ .Values.master.replicaCount }} + selector: + matchLabels: + {{- include "node-feature-discovery.selectorLabels" . | nindent 6 }} + role: master + template: + metadata: + labels: + {{- include "node-feature-discovery.selectorLabels" . | nindent 8 }} + role: master + annotations: + {{- toYaml .Values.master.annotations | nindent 8 }} + spec: + {{- with .Values.imagePullSecrets }} + imagePullSecrets: + {{- toYaml . | nindent 8 }} + {{- end }} + serviceAccountName: {{ include "node-feature-discovery.master.serviceAccountName" . }} + securityContext: + {{- toYaml .Values.master.podSecurityContext | nindent 8 }} + containers: + - name: master + securityContext: + {{- toYaml .Values.master.securityContext | nindent 12 }} + image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}" + imagePullPolicy: {{ .Values.image.pullPolicy }} + livenessProbe: + exec: + command: + - "/usr/bin/grpc_health_probe" + - "-addr=:8080" + {{- if .Values.tls.enable }} + - "-tls" + - "-tls-ca-cert=/etc/kubernetes/node-feature-discovery/certs/ca.crt" + - "-tls-client-key=/etc/kubernetes/node-feature-discovery/certs/tls.key" + - "-tls-client-cert=/etc/kubernetes/node-feature-discovery/certs/tls.crt" + {{- end }} + initialDelaySeconds: 10 + periodSeconds: 10 + readinessProbe: + exec: + command: + - "/usr/bin/grpc_health_probe" + - "-addr=:8080" + {{- if .Values.tls.enable }} + - "-tls" + - "-tls-ca-cert=/etc/kubernetes/node-feature-discovery/certs/ca.crt" + - "-tls-client-key=/etc/kubernetes/node-feature-discovery/certs/tls.key" + - "-tls-client-cert=/etc/kubernetes/node-feature-discovery/certs/tls.crt" + {{- end }} + initialDelaySeconds: 5 + periodSeconds: 10 + failureThreshold: 10 + ports: + - containerPort: 8080 + name: grpc + env: + - name: NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName + command: + - "nfd-master" + resources: + {{- toYaml .Values.master.resources | nindent 12 }} + args: + {{- if .Values.master.instance | empty | not }} + - "--instance={{ .Values.master.instance }}" + {{- end }} + {{- if .Values.master.extraLabelNs | empty | not }} + - "--extra-label-ns={{- join "," .Values.master.extraLabelNs }}" + {{- end }} + {{- if .Values.master.resourceLabels | empty | not }} + - "--resource-labels={{- join "," .Values.master.resourceLabels }}" + {{- end }} + {{- if .Values.master.featureRulesController | kindIs "invalid" | not }} + - "-featurerules-controller={{ .Values.master.featureRulesController }}" + {{- else }} + ## By default, disable NodeFeatureRules controller for other than the default instances + - "-featurerules-controller={{ .Values.master.instance | empty }}" + {{- end }} + {{- if .Values.tls.enable }} + - "--ca-file=/etc/kubernetes/node-feature-discovery/certs/ca.crt" + - "--key-file=/etc/kubernetes/node-feature-discovery/certs/tls.key" + - "--cert-file=/etc/kubernetes/node-feature-discovery/certs/tls.crt" + volumeMounts: + - name: nfd-master-cert + mountPath: "/etc/kubernetes/node-feature-discovery/certs" + readOnly: true + volumes: + - name: nfd-master-cert + secret: + secretName: nfd-master-cert + ## /TLS ## + {{- end }} + {{- with .Values.master.nodeSelector }} + nodeSelector: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.master.affinity }} + affinity: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.master.tolerations }} + tolerations: + {{- toYaml . | nindent 8 }} + {{- end }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/master.yml b/roles/nfd_install/charts/node-feature-discovery/templates/master.yml deleted file mode 100644 index 732e6e04..00000000 --- a/roles/nfd_install/charts/node-feature-discovery/templates/master.yml +++ /dev/null @@ -1,72 +0,0 @@ ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - labels: - app: {{ template "node-feature-discovery.fullname" . }}-controller -{{ include "node-feature-discovery.labels" . | indent 4 }} - name: {{ template "node-feature-discovery.fullname" . }}-controller -spec: - replicas: 1 - selector: - matchLabels: - app: {{ template "node-feature-discovery.fullname" . }}-controller - template: - metadata: - labels: - app: {{ template "node-feature-discovery.fullname" . }}-controller -{{ include "node-feature-discovery.labels" . | indent 8 }} - spec: - serviceAccount: {{ template "node-feature-discovery.saName" . }} - affinity: - nodeAffinity: - preferredDuringSchedulingIgnoredDuringExecution: - - weight: 1 - preference: - matchExpressions: - - key: "node-role.kubernetes.io/master" - operator: In - values: [""] - tolerations: - - key: "node-role.kubernetes.io/master" - operator: "Equal" - value: "" - effect: "NoSchedule" - containers: - - env: - - name: NODE_NAME - valueFrom: - fieldRef: - fieldPath: spec.nodeName - image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}" - imagePullPolicy: {{ .Values.image.pullPolicy }} - name: nfd-controller - command: - - "nfd-master" - - --port={{ .Values.service.port }} -{{- if .Values.nfd_extra_labels_ns | empty | not }} - - "--extra-label-ns={{- join "," .Values.nfd_extra_labels_ns }}" -{{- end }} -{{- if .Values.nfd_resource_labels | empty | not }} - - "--resource-labels={{- join "," .Values.nfd_resource_labels }}" -{{- end }} -{{- if .Values.tls.enabled }} - args: - - "--ca-file=/etc/kubernetes/node-feature-discovery/trust/ca.crt" - - "--key-file=/etc/kubernetes/node-feature-discovery/certs/tls.key" - - "--cert-file=/etc/kubernetes/node-feature-discovery/certs/tls.crt" - volumeMounts: - - name: nfd-ca-cert - mountPath: "/etc/kubernetes/node-feature-discovery/trust" - readOnly: true - - name: nfd-controller-cert - mountPath: "/etc/kubernetes/node-feature-discovery/certs" - readOnly: true - volumes: - - name: nfd-ca-cert - configMap: - name: {{ include "node-feature-discovery.fullname" . }}-ca-cert - - name: nfd-controller-cert - secret: - secretName: {{ include "node-feature-discovery.fullname" . }}-controller-cert -{{- end }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/nfd-worker-conf.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/nfd-worker-conf.yaml new file mode 100644 index 00000000..93c8d86d --- /dev/null +++ b/roles/nfd_install/charts/node-feature-discovery/templates/nfd-worker-conf.yaml @@ -0,0 +1,9 @@ +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ include "node-feature-discovery.fullname" . }}-worker-conf + labels: + {{- include "node-feature-discovery.labels" . | nindent 4 }} +data: + nfd-worker.conf: |- + {{- .Values.worker.config | toYaml | nindent 4 }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/nodefeaturerule-crd.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/nodefeaturerule-crd.yaml new file mode 100644 index 00000000..f5d30850 --- /dev/null +++ b/roles/nfd_install/charts/node-feature-discovery/templates/nodefeaturerule-crd.yaml @@ -0,0 +1,3 @@ +{{- if .Values.nodeFeatureRule.createCRD }} +{{ .Files.Get "manifests/nodefeaturerule-crd.yaml" }} +{{- end}} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/rbac.yml b/roles/nfd_install/charts/node-feature-discovery/templates/rbac.yml deleted file mode 100644 index cb3ec995..00000000 --- a/roles/nfd_install/charts/node-feature-discovery/templates/rbac.yml +++ /dev/null @@ -1,49 +0,0 @@ -{{- if .Values.serviceAccount.create }} -apiVersion: v1 -kind: ServiceAccount -metadata: - name: {{ template "node-feature-discovery.saName" . }} - labels: -{{ include "node-feature-discovery.labels" . | indent 4 }} -{{- end }} ---- -{{- if .Values.rbac.enabled }} -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - name: {{ template "node-feature-discovery.fullname" . }} - labels: -{{ include "node-feature-discovery.labels" . | indent 4 }} -rules: -- apiGroups: - - "" - resources: - - nodes -# when using command line flag --resource-labels to create extended resources -# "- nodes/status" is needed -{{- if or (.Values.sgx_dp_enabled) (.Values.gpu_dp.enabled) }} - - nodes/status -{{- end }} - verbs: - - get - - patch - - update -{{- if .Values.gpu_dp.enabled }} - - list -{{- end }} ---- -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRoleBinding -metadata: - name: {{ template "node-feature-discovery.fullname" . }} - labels: -{{ include "node-feature-discovery.labels" . | indent 4 }} -roleRef: - apiGroup: rbac.authorization.k8s.io - kind: ClusterRole - name: {{ template "node-feature-discovery.fullname" . }} -subjects: -- kind: ServiceAccount - name: {{ template "node-feature-discovery.saName" . }} - namespace: {{ .Release.Namespace }} -{{- end }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/service.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/service.yaml new file mode 100644 index 00000000..97d0a587 --- /dev/null +++ b/roles/nfd_install/charts/node-feature-discovery/templates/service.yaml @@ -0,0 +1,16 @@ +apiVersion: v1 +kind: Service +metadata: + name: {{ include "node-feature-discovery.fullname" . }}-master + labels: + {{- include "node-feature-discovery.labels" . | nindent 4 }} + role: master +spec: + type: {{ .Values.master.service.type }} + ports: + - port: {{ .Values.master.service.port }} + targetPort: grpc + protocol: TCP + name: grpc + selector: + {{- include "node-feature-discovery.selectorLabels" . | nindent 4 }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/service.yml b/roles/nfd_install/charts/node-feature-discovery/templates/service.yml deleted file mode 100644 index 0475f4bf..00000000 --- a/roles/nfd_install/charts/node-feature-discovery/templates/service.yml +++ /dev/null @@ -1,14 +0,0 @@ ---- -apiVersion: v1 -kind: Service -metadata: - name: {{ template "node-feature-discovery.fullname" . }}-controller - labels: -{{ include "node-feature-discovery.labels" . | indent 4 }} -spec: - selector: - app: {{ template "node-feature-discovery.fullname" . }}-controller - ports: - - protocol: TCP - port: {{ .Values.service.port }} - type: {{ .Values.service.type }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/serviceaccount.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/serviceaccount.yaml new file mode 100644 index 00000000..025c30a5 --- /dev/null +++ b/roles/nfd_install/charts/node-feature-discovery/templates/serviceaccount.yaml @@ -0,0 +1,40 @@ +{{- if .Values.master.serviceAccount.create -}} +apiVersion: v1 +kind: ServiceAccount +metadata: + name: {{ include "node-feature-discovery.master.serviceAccountName" . }} + labels: + {{- include "node-feature-discovery.labels" . | nindent 4 }} + {{- with .Values.master.serviceAccount.annotations }} + annotations: + {{- toYaml . | nindent 4 }} + {{- end }} +{{- end }} + +--- +{{- if .Values.topologyUpdater.serviceAccount.create }} +apiVersion: v1 +kind: ServiceAccount +metadata: + name: {{ include "node-feature-discovery.topologyUpdater.serviceAccountName" . }} + labels: + {{- include "node-feature-discovery.labels" . | nindent 4 }} + {{- with .Values.topologyUpdater.serviceAccount.annotations }} + annotations: + {{- toYaml . | nindent 4 }} + {{- end }} +{{- end }} + +--- +{{- if .Values.worker.serviceAccount.create }} +apiVersion: v1 +kind: ServiceAccount +metadata: + name: {{ include "node-feature-discovery.worker.serviceAccountName" . }} + labels: + {{- include "node-feature-discovery.labels" . | nindent 4 }} + {{- with .Values.worker.serviceAccount.annotations }} + annotations: + {{- toYaml . | nindent 4 }} + {{- end }} +{{- end }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/tls.yml b/roles/nfd_install/charts/node-feature-discovery/templates/tls.yml deleted file mode 100644 index 9ccc3bf1..00000000 --- a/roles/nfd_install/charts/node-feature-discovery/templates/tls.yml +++ /dev/null @@ -1,50 +0,0 @@ -{{ $ca := genCA "nfd-ca" 365 }} -{{ $masterAltNames := list ( printf "%s-controller" (include "node-feature-discovery.fullname" .) ) ( printf "%s-controller.%s" (include "node-feature-discovery.fullname" .) .Release.Namespace ) ( printf "%s-controller.%s.svc" (include "node-feature-discovery.fullname" .) .Release.Namespace ) }} -{{ $masterCert := genSignedCert ( printf "%s-controller" (include "node-feature-discovery.fullname" .) ) nil $masterAltNames 365 $ca }} -{{ $workerAltNames := list ( printf "%s-worker" (include "node-feature-discovery.fullname" .) .Release.Namespace ) }} -{{ $workerCert := genSignedCert ( printf "%s-worker" (include "node-feature-discovery.fullname" .) ) nil $workerAltNames 365 $ca }} ---- -{{- if .Values.tls.enabled }} -apiVersion: v1 -kind: Secret -metadata: - name: {{ include "node-feature-discovery.fullname" . }}-controller-cert -data: -{{- if .Values.tls.generate }} - tls.crt: {{ $masterCert.Cert | b64enc }} - tls.key: {{ $masterCert.Key | b64enc }} -{{- else }} - tls.crt: {{ .Values.tls.masterCert }} - tls.key: {{ .Values.tls.masterKey }} -{{- end }} -type: Opaque ---- -apiVersion: v1 -kind: Secret -metadata: - name: {{ include "node-feature-discovery.fullname" . }}-worker-cert -data: -{{- if .Values.tls.generate }} - tls.crt: {{ $workerCert.Cert | b64enc }} - tls.key: {{ $workerCert.Key | b64enc }} -{{- else }} - tls.crt: {{ .Values.tls.workerCert }} - tls.key: {{ .Values.tls.workerKey }} -{{- end }} -type: Opaque ---- -apiVersion: v1 -kind: ConfigMap -metadata: - name: {{ include "node-feature-discovery.fullname" . }}-ca-cert - labels: -{{ include "node-feature-discovery.labels" . | indent 4 }} -data: -{{- if .Values.tls.generate }} - ca.crt: | -{{ $ca.Cert | toString | indent 4 }} -{{- else }} - ca.crt: | -{{ .Values.tls.caCert | b64dec | toString | indent 4 }} -{{- end }} -{{- end }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/topologyupdater-crds.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/topologyupdater-crds.yaml new file mode 100644 index 00000000..cf5daf27 --- /dev/null +++ b/roles/nfd_install/charts/node-feature-discovery/templates/topologyupdater-crds.yaml @@ -0,0 +1,145 @@ +{{- if and .Values.topologyUpdater.enable .Values.topologyUpdater.createCRDs -}} +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + api-approved.kubernetes.io: https://github.com/kubernetes/enhancements/pull/1870 + controller-gen.kubebuilder.io/version: v0.7.0 + creationTimestamp: null + name: noderesourcetopologies.topology.node.k8s.io +spec: + group: topology.node.k8s.io + names: + kind: NodeResourceTopology + listKind: NodeResourceTopologyList + plural: noderesourcetopologies + shortNames: + - node-res-topo + singular: noderesourcetopology + scope: Cluster + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: NodeResourceTopology describes node resources and their topology. + properties: + apiVersion: + description: 'APIVersion defines the versioned schema of this representation + of an object. Servers should convert recognized schemas to the latest + internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources' + type: string + kind: + description: 'Kind is a string value representing the REST resource this + object represents. Servers may infer this from the endpoint the client + submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' + type: string + metadata: + type: object + topologyPolicies: + items: + type: string + type: array + zones: + description: ZoneList contains an array of Zone objects. + items: + description: Zone represents a resource topology zone, e.g. socket, + node, die or core. + properties: + attributes: + description: AttributeList contains an array of AttributeInfo objects. + items: + description: AttributeInfo contains one attribute of a Zone. + properties: + name: + type: string + value: + type: string + required: + - name + - value + type: object + type: array + costs: + description: CostList contains an array of CostInfo objects. + items: + description: CostInfo describes the cost (or distance) between + two Zones. + properties: + name: + type: string + value: + format: int64 + type: integer + required: + - name + - value + type: object + type: array + name: + type: string + parent: + type: string + resources: + description: ResourceInfoList contains an array of ResourceInfo + objects. + items: + description: ResourceInfo contains information about one resource + type. + properties: + allocatable: + anyOf: + - type: integer + - type: string + description: Allocatable quantity of the resource, corresponding + to allocatable in node status, i.e. total amount of this + resource available to be used by pods. + pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$ + x-kubernetes-int-or-string: true + available: + anyOf: + - type: integer + - type: string + description: Available is the amount of this resource currently + available for new (to be scheduled) pods, i.e. Allocatable + minus the resources reserved by currently running pods. + pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$ + x-kubernetes-int-or-string: true + capacity: + anyOf: + - type: integer + - type: string + description: Capacity of the resource, corresponding to capacity + in node status, i.e. total amount of this resource that + the node has. + pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$ + x-kubernetes-int-or-string: true + name: + description: Name of the resource. + type: string + required: + - allocatable + - available + - capacity + - name + type: object + type: array + type: + type: string + required: + - name + - type + type: object + type: array + required: + - topologyPolicies + - zones + type: object + served: true + storage: true +status: + acceptedNames: + kind: "" + plural: "" + conditions: [] + storedVersions: [] +{{- end }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/topologyupdater.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/topologyupdater.yaml new file mode 100644 index 00000000..ffddc190 --- /dev/null +++ b/roles/nfd_install/charts/node-feature-discovery/templates/topologyupdater.yaml @@ -0,0 +1,111 @@ +{{- if .Values.topologyUpdater.enable -}} +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: {{ include "node-feature-discovery.fullname" . }}-topology-updater + labels: + {{- include "node-feature-discovery.labels" . | nindent 4 }} + role: topology-updater +spec: + selector: + matchLabels: + {{- include "node-feature-discovery.selectorLabels" . | nindent 6 }} + role: topology-updater + template: + metadata: + labels: + {{- include "node-feature-discovery.selectorLabels" . | nindent 8 }} + role: topology-updater + annotations: + {{- toYaml .Values.topologyUpdater.annotations | nindent 8 }} + spec: + serviceAccountName: {{ include "node-feature-discovery.topologyUpdater.serviceAccountName" . }} + dnsPolicy: ClusterFirstWithHostNet + {{- with .Values.imagePullSecrets }} + imagePullSecrets: + {{- toYaml . | nindent 8 }} + {{- end }} + securityContext: + {{- toYaml .Values.topologyUpdater.podSecurityContext | nindent 8 }} + containers: + - name: topology-updater + image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}" + imagePullPolicy: "{{ .Values.image.pullPolicy }}" + env: + - name: NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName + command: + - "nfd-topology-updater" + args: + - "--server={{ include "node-feature-discovery.fullname" . }}-master:{{ .Values.master.service.port }}" + {{- if .Values.topologyUpdater.updateInterval | empty | not }} + - "--sleep-interval={{ .Values.topologyUpdater.updateInterval }}" + {{- else }} + - "--sleep-interval=3s" + {{- end }} + {{- if .Values.topologyUpdater.watchNamespace | empty | not }} + - "--watch-namespace={{ .Values.topologyUpdater.watchNamespace }}" + {{- else }} + - "--watch-namespace=*" + {{- end }} + {{- if .Values.tls.enable }} + - "--ca-file=/etc/kubernetes/node-feature-discovery/certs/ca.crt" + - "--key-file=/etc/kubernetes/node-feature-discovery/certs/tls.key" + - "--cert-file=/etc/kubernetes/node-feature-discovery/certs/tls.crt" + {{- end }} + volumeMounts: + - name: kubelet-config + mountPath: /host-var/lib/kubelet/config.yaml + - name: kubelet-podresources-sock + mountPath: /host-var/lib/kubelet/pod-resources/kubelet.sock + - name: host-sys + mountPath: /host-sys + {{- if .Values.tls.enable }} + - name: nfd-topology-updater-cert + mountPath: "/etc/kubernetes/node-feature-discovery/certs" + readOnly: true + {{- end }} + + resources: + {{- toYaml .Values.topologyUpdater.resources | nindent 12 }} + securityContext: + {{- toYaml .Values.topologyUpdater.securityContext | nindent 12 }} + volumes: + - name: host-sys + hostPath: + path: "/sys" + - name: kubelet-config + hostPath: + {{- if .Values.topologyUpdater.kubeletConfigPath | empty | not }} + path: {{ .Values.topologyUpdater.kubeletConfigPath }} + {{- else }} + path: /var/lib/kubelet/config.yaml + {{- end }} + - name: kubelet-podresources-sock + hostPath: + {{- if .Values.topologyUpdater.kubeletPodResourcesSockPath | empty | not }} + path: {{ .Values.topologyUpdater.kubeletPodResourcesSockPath }} + {{- else }} + path: /var/lib/kubelet/pod-resources/kubelet.sock + {{- end }} + {{- if .Values.tls.enable }} + - name: nfd-topology-updater-cert + secret: + secretName: nfd-topology-updater-cert + {{- end }} + + {{- with .Values.topologyUpdater.nodeSelector }} + nodeSelector: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.topologyUpdater.affinity }} + affinity: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.topologyUpdater.tolerations }} + tolerations: + {{- toYaml . | nindent 8 }} + {{- end }} +{{- end }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/worker.yaml b/roles/nfd_install/charts/node-feature-discovery/templates/worker.yaml new file mode 100644 index 00000000..6c4b208f --- /dev/null +++ b/roles/nfd_install/charts/node-feature-discovery/templates/worker.yaml @@ -0,0 +1,137 @@ +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: {{ include "node-feature-discovery.fullname" . }}-worker + labels: + {{- include "node-feature-discovery.labels" . | nindent 4 }} + role: worker + annotations: + {{- toYaml .Values.worker.daemonsetAnnotations | nindent 4 }} +spec: + selector: + matchLabels: + {{- include "node-feature-discovery.selectorLabels" . | nindent 6 }} + role: worker + template: + metadata: + labels: + {{- include "node-feature-discovery.selectorLabels" . | nindent 8 }} + role: worker + annotations: + {{- toYaml .Values.worker.annotations | nindent 8 }} + spec: + dnsPolicy: ClusterFirstWithHostNet + {{- with .Values.imagePullSecrets }} + imagePullSecrets: + {{- toYaml . | nindent 8 }} + {{- end }} + serviceAccountName: {{ include "node-feature-discovery.worker.serviceAccountName" . }} + securityContext: + {{- toYaml .Values.worker.podSecurityContext | nindent 8 }} + containers: + - name: worker + securityContext: + {{- toYaml .Values.worker.securityContext | nindent 12 }} + image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}" + imagePullPolicy: {{ .Values.image.pullPolicy }} + env: + - name: NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName +{{- if .Values.gpu_dp.enabled }} +{{- if .Values.gpu_dp.max_memory }} + - name: GPU_MEMORY_OVERRIDE + value: "{{ .Values.gpu_dp.max_memory }}" +{{- end }} +{{- end }} + resources: + {{- toYaml .Values.worker.resources | nindent 12 }} + command: + - "nfd-worker" + args: + - "--server={{ include "node-feature-discovery.fullname" . }}-master:{{ .Values.master.service.port }}" +{{- if .Values.tls.enable }} + - "--ca-file=/etc/kubernetes/node-feature-discovery/certs/ca.crt" + - "--key-file=/etc/kubernetes/node-feature-discovery/certs/tls.key" + - "--cert-file=/etc/kubernetes/node-feature-discovery/certs/tls.crt" +{{- end }} + volumeMounts: + - name: host-boot + mountPath: "/host-boot" + readOnly: true + - name: host-os-release + mountPath: "/host-etc/os-release" + readOnly: true + - name: host-sys + mountPath: "/host-sys" + readOnly: true + - name: host-usr-lib + mountPath: "/host-usr/lib" + readOnly: true + {{- if .Values.worker.mountUsrSrc }} + - name: host-usr-src + mountPath: "/host-usr/src" + readOnly: true + {{- end }} + - name: source-d + mountPath: "/etc/kubernetes/node-feature-discovery/source.d/" + readOnly: true + - name: features-d + mountPath: "/etc/kubernetes/node-feature-discovery/features.d/" + readOnly: true + - name: nfd-worker-conf + mountPath: "/etc/kubernetes/node-feature-discovery" + readOnly: true +{{- if .Values.tls.enable }} + - name: nfd-worker-cert + mountPath: "/etc/kubernetes/node-feature-discovery/certs" + readOnly: true +{{- end }} + volumes: + - name: host-boot + hostPath: + path: "/boot" + - name: host-os-release + hostPath: + path: "/etc/os-release" + - name: host-sys + hostPath: + path: "/sys" + - name: host-usr-lib + hostPath: + path: "/usr/lib" + {{- if .Values.worker.mountUsrSrc }} + - name: host-usr-src + hostPath: + path: "/usr/src" + {{- end }} + - name: source-d + hostPath: + path: "/etc/kubernetes/node-feature-discovery/source.d/" + - name: features-d + hostPath: + path: "/etc/kubernetes/node-feature-discovery/features.d/" + - name: nfd-worker-conf + configMap: + name: {{ include "node-feature-discovery.fullname" . }}-worker-conf + items: + - key: nfd-worker.conf + path: nfd-worker.conf +{{- if .Values.tls.enable }} + - name: nfd-worker-cert + secret: + secretName: nfd-worker-cert +{{- end }} + {{- with .Values.worker.nodeSelector }} + nodeSelector: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.worker.affinity }} + affinity: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.worker.tolerations }} + tolerations: + {{- toYaml . | nindent 8 }} + {{- end }} diff --git a/roles/nfd_install/charts/node-feature-discovery/templates/worker.yml b/roles/nfd_install/charts/node-feature-discovery/templates/worker.yml deleted file mode 100644 index d32c3ccd..00000000 --- a/roles/nfd_install/charts/node-feature-discovery/templates/worker.yml +++ /dev/null @@ -1,96 +0,0 @@ ---- -apiVersion: apps/v1 -kind: DaemonSet -metadata: - labels: - app: {{ template "node-feature-discovery.fullname" . }}-worker -{{ include "node-feature-discovery.labels" . | indent 4 }} - name: {{ template "node-feature-discovery.fullname" . }}-worker -spec: - selector: - matchLabels: - app: {{ template "node-feature-discovery.fullname" . }}-worker - template: - metadata: - labels: - app: {{ template "node-feature-discovery.fullname" . }}-worker -{{ include "node-feature-discovery.labels" . | indent 8 }} - spec: - dnsPolicy: ClusterFirstWithHostNet - containers: - - env: - - name: NODE_NAME - valueFrom: - fieldRef: - fieldPath: spec.nodeName -{{- if .Values.gpu_dp.enabled }} -{{- if .Values.gpu_dp.max_memory }} - - name: GPU_MEMORY_OVERRIDE - value: "{{ .Values.gpu_dp.max_memory }}" -{{- end }} -{{- end }} - image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}" - name: nfd-worker - command: - - "nfd-worker" - args: - - "--sleep-interval={{ .Values.sleepInterval }}" - - "--server={{ template "node-feature-discovery.fullname" . }}-controller:{{ .Values.service.port }}" -{{- if .Values.tls.enabled }} - - "--ca-file=/etc/kubernetes/node-feature-discovery/trust/ca.crt" - - "--key-file=/etc/kubernetes/node-feature-discovery/certs/tls.key" - - "--cert-file=/etc/kubernetes/node-feature-discovery/certs/tls.crt" -{{- end }} - volumeMounts: - - name: host-boot - mountPath: "/host-boot" - readOnly: true - - name: host-os-release - mountPath: "/host-etc/os-release" - readOnly: true - - name: host-sys - mountPath: "/host-sys" -{{- if .Values.config }} - - name: config - mountPath: "/etc/kubernetes/node-feature-discovery/" -{{- end }} - - name: source-d - mountPath: "/etc/kubernetes/node-feature-discovery/source.d/" - - name: features-d - mountPath: "/etc/kubernetes/node-feature-discovery/features.d/" -{{- if .Values.tls.enabled }} - - name: nfd-ca-cert - mountPath: "/etc/kubernetes/node-feature-discovery/trust" - readOnly: true - - name: nfd-worker-cert - mountPath: "/etc/kubernetes/node-feature-discovery/certs" -{{- end }} - volumes: - - name: host-boot - hostPath: - path: "/boot" - - name: host-os-release - hostPath: - path: "/etc/os-release" - - name: host-sys - hostPath: - path: "/sys" -{{- if .Values.config }} - - name: config - configMap: - name: {{ template "node-feature-discovery.fullname" . }}-worker-config -{{- end }} - - name: source-d - hostPath: - path: "/etc/kubernetes/node-feature-discovery/source.d/" - - name: features-d - hostPath: - path: "/etc/kubernetes/node-feature-discovery/features.d/" -{{- if .Values.tls.enabled }} - - name: nfd-ca-cert - configMap: - name: {{ include "node-feature-discovery.fullname" . }}-ca-cert - - name: nfd-worker-cert - secret: - secretName: {{ include "node-feature-discovery.fullname" . }}-worker-cert -{{- end }} diff --git a/roles/nfd_install/charts/node-feature-discovery/values.yaml b/roles/nfd_install/charts/node-feature-discovery/values.yaml index d4edcd4f..918c24a1 100644 --- a/roles/nfd_install/charts/node-feature-discovery/values.yaml +++ b/roles/nfd_install/charts/node-feature-discovery/values.yaml @@ -13,107 +13,416 @@ ## See the License for the specific language governing permissions and ## limitations under the License. ## ---- image: - repository: k8s.gcr.io/nfd/node-feature-discovery - tag: v0.11.0 - pullPolicy: IfNotPresent + repository: k8s.gcr.io/nfd/node-feature-discovery + # This should be set to 'IfNotPresent' for released version + pullPolicy: IfNotPresent + # tag, if defined will use the given image tag, else Chart.AppVersion will be used + # tag +imagePullSecrets: [] -sleepInterval: "120s" +nameOverride: "" +fullnameOverride: "" -serviceAccount: - create: true - name: "" +nodeFeatureRule: + createCRD: true -rbac: - enabled: true +master: + instance: + extraLabelNs: [] + resourceLabels: [] + featureRulesController: null + deploymentAnnotations: {} + replicaCount: 1 + podSecurityContext: {} + # fsGroup: 2000 + + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: [ "ALL" ] + readOnlyRootFilesystem: true + runAsNonRoot: true + # runAsUser: 1000 + + serviceAccount: + # Specifies whether a service account should be created + create: true + # Annotations to add to the service account + annotations: {} + # The name of the service account to use. + # If not set and create is true, a name is generated using the fullname template + name: + + rbac: + create: true + + service: + type: ClusterIP + port: 8080 + + resources: {} + # We usually recommend not to specify default resources and to leave this as a conscious + # choice for the user. This also increases chances charts run on environments with little + # resources, such as Minikube. If you do want to specify resources, uncomment the following + # lines, adjust them as necessary, and remove the curly braces after 'resources:'. + # limits: + # cpu: 100m + # memory: 128Mi + # requests: + # cpu: 100m + # memory: 128Mi + + nodeSelector: {} + + tolerations: + - key: "node-role.kubernetes.io/master" + operator: "Equal" + value: "" + effect: "NoSchedule" + - key: "node-role.kubernetes.io/control-plane" + operator: "Equal" + value: "" + effect: "NoSchedule" + + annotations: {} + + affinity: + nodeAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 1 + preference: + matchExpressions: + - key: "node-role.kubernetes.io/master" + operator: In + values: [""] + - weight: 1 + preference: + matchExpressions: + - key: "node-role.kubernetes.io/control-plane" + operator: In + values: [""] + +worker: + config: ### + #core: + # labelWhiteList: + # noPublish: false + # sleepInterval: 60s + # featureSources: [all] + # labelSources: [all] + # klog: + # addDirHeader: false + # alsologtostderr: false + # logBacktraceAt: + # logtostderr: true + # skipHeaders: false + # stderrthreshold: 2 + # v: 0 + # vmodule: + ## NOTE: the following options are not dynamically run-time configurable + ## and require a nfd-worker restart to take effect after being changed + # logDir: + # logFile: + # logFileMaxSize: 1800 + # skipLogHeaders: false + #sources: + # cpu: + # cpuid: + ## NOTE: whitelist has priority over blacklist + # attributeBlacklist: + # - "BMI1" + # - "BMI2" + # - "CLMUL" + # - "CMOV" + # - "CX16" + # - "ERMS" + # - "F16C" + # - "HTT" + # - "LZCNT" + # - "MMX" + # - "MMXEXT" + # - "NX" + # - "POPCNT" + # - "RDRAND" + # - "RDSEED" + # - "RDTSCP" + # - "SGX" + # - "SSE" + # - "SSE2" + # - "SSE3" + # - "SSE4" + # - "SSE42" + # - "SSSE3" + # attributeWhitelist: + # kernel: + # kconfigFile: "/path/to/kconfig" + # configOpts: + # - "NO_HZ" + # - "X86" + # - "DMI" + # pci: + # deviceClassWhitelist: + # - "0200" + # - "03" + # - "12" + # deviceLabelFields: + # - "class" + # - "vendor" + # - "device" + # - "subsystem_vendor" + # - "subsystem_device" + # usb: + # deviceClassWhitelist: + # - "0e" + # - "ef" + # - "fe" + # - "ff" + # deviceLabelFields: + # - "class" + # - "vendor" + # - "device" + # custom: + # # The following feature demonstrates the capabilities of the matchFeatures + # - name: "my custom rule" + # labels: + # my-ng-feature: "true" + # # matchFeatures implements a logical AND over all matcher terms in the + # # list (i.e. all of the terms, or per-feature matchers, must match) + # matchFeatures: + # - feature: cpu.cpuid + # matchExpressions: + # AVX512F: {op: Exists} + # - feature: cpu.cstate + # matchExpressions: + # enabled: {op: IsTrue} + # - feature: cpu.pstate + # matchExpressions: + # no_turbo: {op: IsFalse} + # scaling_governor: {op: In, value: ["performance"]} + # - feature: cpu.rdt + # matchExpressions: + # RDTL3CA: {op: Exists} + # - feature: cpu.sst + # matchExpressions: + # bf.enabled: {op: IsTrue} + # - feature: cpu.topology + # matchExpressions: + # hardware_multithreading: {op: IsFalse} + # + # - feature: kernel.config + # matchExpressions: + # X86: {op: Exists} + # LSM: {op: InRegexp, value: ["apparmor"]} + # - feature: kernel.loadedmodule + # matchExpressions: + # e1000e: {op: Exists} + # - feature: kernel.selinux + # matchExpressions: + # enabled: {op: IsFalse} + # - feature: kernel.version + # matchExpressions: + # major: {op: In, value: ["5"]} + # minor: {op: Gt, value: ["10"]} + # + # - feature: storage.block + # matchExpressions: + # rotational: {op: In, value: ["0"]} + # dax: {op: In, value: ["0"]} + # + # - feature: network.device + # matchExpressions: + # operstate: {op: In, value: ["up"]} + # speed: {op: Gt, value: ["100"]} + # + # - feature: memory.numa + # matchExpressions: + # node_count: {op: Gt, value: ["2"]} + # - feature: memory.nv + # matchExpressions: + # devtype: {op: In, value: ["nd_dax"]} + # mode: {op: In, value: ["memory"]} + # + # - feature: system.osrelease + # matchExpressions: + # ID: {op: In, value: ["fedora", "centos"]} + # - feature: system.name + # matchExpressions: + # nodename: {op: InRegexp, value: ["^worker-X"]} + # + # - feature: local.label + # matchExpressions: + # custom-feature-knob: {op: Gt, value: ["100"]} + # + # # The following feature demonstrates the capabilities of the matchAny + # - name: "my matchAny rule" + # labels: + # my-ng-feature-2: "my-value" + # # matchAny implements a logical IF over all elements (sub-matchers) in + # # the list (i.e. at least one feature matcher must match) + # matchAny: + # - matchFeatures: + # - feature: kernel.loadedmodule + # matchExpressions: + # driver-module-X: {op: Exists} + # - feature: pci.device + # matchExpressions: + # vendor: {op: In, value: ["8086"]} + # class: {op: In, value: ["0200"]} + # - matchFeatures: + # - feature: kernel.loadedmodule + # matchExpressions: + # driver-module-Y: {op: Exists} + # - feature: usb.device + # matchExpressions: + # vendor: {op: In, value: ["8086"]} + # class: {op: In, value: ["02"]} + # + # # The following features demonstreate label templating capabilities + # - name: "my template rule" + # labelsTemplate: | + # {{ range .system.osrelease }}my-system-feature.{{ .Name }}={{ .Value }} + # {{ end }} + # matchFeatures: + # - feature: system.osrelease + # matchExpressions: + # ID: {op: InRegexp, value: ["^open.*"]} + # VERSION_ID.major: {op: In, value: ["13", "15"]} + # + # - name: "my template rule 2" + # labelsTemplate: | + # {{ range .pci.device }}my-pci-device.{{ .class }}-{{ .device }}=with-cpuid + # {{ end }} + # matchFeatures: + # - feature: pci.device + # matchExpressions: + # class: {op: InRegexp, value: ["^06"]} + # vendor: ["8086"] + # - feature: cpu.cpuid + # matchExpressions: + # AVX: {op: Exists} + # + # # The following examples demonstrate vars field and back-referencing + # # previous labels and vars + # - name: "my dummy kernel rule" + # labels: + # "my.kernel.feature": "true" + # matchFeatures: + # - feature: kernel.version + # matchExpressions: + # major: {op: Gt, value: ["2"]} + # + # - name: "my dummy rule with no labels" + # vars: + # "my.dummy.var": "1" + # matchFeatures: + # - feature: cpu.cpuid + # matchExpressions: {} + # + # - name: "my rule using backrefs" + # labels: + # "my.backref.feature": "true" + # matchFeatures: + # - feature: rule.matched + # matchExpressions: + # my.kernel.feature: {op: IsTrue} + # my.dummy.var: {op: Gt, value: ["0"]} + # +### + + daemonsetAnnotations: {} + podSecurityContext: {} + # fsGroup: 2000 + + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: [ "ALL" ] + readOnlyRootFilesystem: true + runAsNonRoot: true + # runAsUser: 1000 + + serviceAccount: + # Specifies whether a service account should be created. + # We create this by default to make it easier for downstream users to apply PodSecurityPolicies. + create: true + # Annotations to add to the service account + annotations: {} + # The name of the service account to use. + # If not set and create is true, a name is generated using the fullname template + name: + + # Allow users to mount the hostPath /usr/src, useful for RHCOS on s390x + # Does not work on systems without /usr/src AND a read-only /usr, such as Talos + mountUsrSrc: false + + resources: {} + # We usually recommend not to specify default resources and to leave this as a conscious + # choice for the user. This also increases chances charts run on environments with little + # resources, such as Minikube. If you do want to specify resources, uncomment the following + # lines, adjust them as necessary, and remove the curly braces after 'resources:'. + # limits: + # cpu: 100m + # memory: 128Mi + # requests: + # cpu: 100m + # memory: 128Mi + + nodeSelector: {} + + tolerations: [] + + annotations: {} + + affinity: {} + +topologyUpdater: + enable: false + createCRDs: false + + serviceAccount: + create: false + annotations: {} + name: + rbac: + create: false + + kubeletConfigPath: + kubeletPodResourcesSockPath: + updateInterval: 60s + watchNamespace: "*" + + podSecurityContext: {} + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: [ "ALL" ] + readOnlyRootFilesystem: true + runAsUser: 0 + + resources: {} + # We usually recommend not to specify default resources and to leave this as a conscious + # choice for the user. This also increases chances charts run on environments with little + # resources, such as Minikube. If you do want to specify resources, uncomment the following + # lines, adjust them as necessary, and remove the curly braces after 'resources:'. + # limits: + # cpu: 100m + # memory: 128Mi + # requests: + # cpu: 100m + # memory: 128Mi + + nodeSelector: {} + tolerations: [] + annotations: {} + affinity: {} + +# Optionally use encryption for worker <--> master comms +# TODO: verify hostname is not yet supported +# +# If you do not enable certManager (and have it installed) you will +# need to manually, or otherwise, provision the TLS certs as secrets tls: - enabled: true - # automatically generate self-signed certificate - generate: true - # base64-encoded nfd-master private TLS key, required when tls.enabled is true and tls.generate is false - masterKey: "" - # base64-encoded nfd-master private TLS certificate, required when tls.enabled is true and tls.generate is false - masterCert: "" - # base64-encoded nfd-worker private TLS key, required when tls.enabled is true and tls.generate is false - workerKey: "" - # base64-encoded nfd-worker private TLS certificate, required when tls.enabled is true and tls.generate is false - workerCert: "" - # base64-encoded additional CA certificate which will be used to validate peer certificates, required when tls.enabled is true and tls.generate is false - caCert: "" - -service: - port: 8080 - type: ClusterIP - -#nameOverride: "node-feature-discovery" -#fullnameOverride: "node-feature-discovery" - -config: -# sources: -# cpu: -# cpuid: -# # NOTE: whitelist has priority over blacklist -# attributeBlacklist: -# - "BMI1" -# - "BMI2" -# - "CLMUL" -# - "CMOV" -# - "CX16" -# - "ERMS" -# - "F16C" -# - "HTT" -# - "LZCNT" -# - "MMX" -# - "MMXEXT" -# - "NX" -# - "POPCNT" -# - "RDRAND" -# - "RDSEED" -# - "RDTSCP" -# - "SGX" -# - "SSE" -# - "SSE2" -# - "SSE3" -# - "SSE4.1" -# - "SSE4.2" -# - "SSSE3" -# attributeWhitelist: -# kernel: -# kconfigFile: "/path/to/kconfig" -# configOpts: -# - "NO_HZ" -# - "X86" -# - "DMI" -# pci: -# deviceClassWhitelist: -# - "0200" -# - "03" -# - "12" -# deviceLabelFields: -# - "class" -# - "vendor" -# - "device" -# - "subsystem_vendor" -# - "subsystem_device" -# custom: -# - name: "my.kernel.feature" -# matchOn: -# - loadedKMod: ["example_kmod1", "example_kmod2"] -# - name: "my.pci.feature" -# matchOn: -# - pciId: -# class: ["0200"] -# vendor: ["15b3"] -# device: ["1014", "1017"] -# - pciId : -# vendor: ["8086"] -# device: ["1000", "1100"] -# - name: "my.combined.feature" -# matchOn: -# - pciId: -# vendor: ["15b3"] -# device: ["1014", "1017"] -# loadedKMod : ["vendor_kmod1", "vendor_kmod2"] -# + enable: false + certManager: false diff --git a/roles/nfd_install/defaults/main.yml b/roles/nfd_install/defaults/main.yml index 4ef5ba04..66f22de4 100644 --- a/roles/nfd_install/defaults/main.yml +++ b/roles/nfd_install/defaults/main.yml @@ -14,32 +14,10 @@ ## limitations under the License. ## --- -nfd_git_url: "https://github.com/kubernetes-sigs/node-feature-discovery.git" -nfd_git_ref: "v0.11.0" -nfd_dir: "{{ (project_root_dir, 'nfd') | path_join }}" +nfd_image: "k8s.gcr.io/nfd/node-feature-discovery" +nfd_image_tag: "v0.11.1" -nfd_external_image_name: "k8s.gcr.io/nfd/node-feature-discovery" -nfd_external_image_tag: "v0.11.0" - -nfd_build_image_locally: false nfd_namespace: "kube-system" -nfd_sleep_interval: "60s" - - -nfd_tls_enabled: true -# automatically generate self-signed certificate -nfd_tls_generate: true -# base64-encoded nfd-master private TLS key, required when tls.enabled is true and tls.generate is false -nfd_tls_master_key: "" -# base64-encoded nfd-master private TLS certificate, required when tls.enabled is true and tls.generate is false -nfd_tls_master_cert: "" -# base64-encoded nfd-worker private TLS key, required when tls.enabled is true and tls.generate is false -nfd_tls_worker_key: "" -# base64-encoded nfd-worker private TLS certificate, required when tls.enabled is true and tls.generate is false -nfd_tls_worker_cert: "" -# base64-encoded additional CA certificate which will be used to validate peer certificates, required when tls.enabled is true and tls.generate is false -nfd_tls_ca_cert: "" - nfd_sa_create: true nfd_sa_name: "" @@ -47,4 +25,3 @@ nfd_rbac_enabled: true nfd_svc_port: 8080 nfd_svc_type: ClusterIP - diff --git a/roles/nfd_install/tasks/main.yml b/roles/nfd_install/tasks/main.yml index ef8c9ac3..7104f8f9 100644 --- a/roles/nfd_install/tasks/main.yml +++ b/roles/nfd_install/tasks/main.yml @@ -18,105 +18,12 @@ include_role: name: install_dependencies -- name: clone NFD repository - git: - repo: "{{ nfd_git_url }}" - dest: "{{ nfd_dir }}" - version: "{{ nfd_git_ref }}" - force: yes - when: - - inventory_hostname == groups['kube_node'][0] - - nfd_build_image_locally - -- name: build NFD image - make: - target: all - chdir: "{{ nfd_dir }}" - when: - - inventory_hostname == groups['kube_node'][0] - - nfd_build_image_locally - - container_runtime == "docker" - -- name: read current NFD version -# noqa 303 - git is called intentionally here - command: git describe --tags --dirty --always - args: - chdir: "{{ nfd_dir }}" - register: nfd_img_version - when: - - inventory_hostname == groups['kube_node'][0] - - nfd_build_image_locally - -- name: tag NFD image - command: "docker tag {{ nfd_external_image_name }}:{{ nfd_img_version.stdout }} \ - {{ registry_local_address }}/node-feature-discovery:{{ nfd_img_version.stdout }}" - when: - - inventory_hostname == groups['kube_node'][0] - - nfd_build_image_locally - - container_runtime == "docker" - -- name: push NFD image to local registry - command: docker push {{ registry_local_address }}/node-feature-discovery:{{ nfd_img_version.stdout }} - when: - - inventory_hostname == groups['kube_node'][0] - - nfd_build_image_locally - - container_runtime == "docker" - -- name: build and tag NFD image - command: >- - podman build -f Dockerfile - --build-arg=VERSION={{ nfd_img_version.stdout }} - --build-arg=HOSTMOUNT_PREFIX=/host- - --build-arg=BASE_IMAGE_FULL=debian:buster-slim - --build-arg=BASE_IMAGE_MINIMAL=gcr.io/distroless/base - -t {{ registry_local_address }}/node-feature-discovery:{{ nfd_img_version.stdout }} - args: - chdir: "{{ nfd_dir }}" - changed_when: true - when: - - inventory_hostname == groups['kube_node'][0] - - nfd_build_image_locally - - '"docker" not in container_runtime' - -- name: push NFD image to local registry - command: podman push {{ registry_local_address }}/node-feature-discovery:{{ nfd_img_version.stdout }} - changed_when: true - when: - - inventory_hostname == groups['kube_node'][0] - - nfd_build_image_locally - - '"docker" not in container_runtime' - -- name: create Helm charts directory if needed - file: - path: "{{ (project_root_dir, 'charts') | path_join }}" - state: directory - mode: 0755 - when: - - inventory_hostname == groups['kube_control_plane'][0] - - name: copy NFD Helm chart to the controller node copy: src: "{{ (role_path, 'charts', 'node-feature-discovery') | path_join }}" dest: "{{ (project_root_dir, 'charts') | path_join }}" mode: 0755 - when: - - inventory_hostname == groups['kube_control_plane'][0] - -- name: set values for NFD Helm chart values for locally built and stored image - set_fact: - nfd_image: "{{ registry_local_address }}/node-feature-discovery" - nfd_version: "{{ hostvars[groups['kube_node'][0]].nfd_img_version.stdout }}" - when: - - nfd_build_image_locally - - inventory_hostname == groups['kube_control_plane'][0] - -- name: set values for NFD Helm chart values for external image - set_fact: - nfd_image: "{{ nfd_external_image_name }}" - nfd_version: "{{ nfd_external_image_tag }}" - when: - - not nfd_build_image_locally - - inventory_hostname == groups['kube_control_plane'][0] + when: inventory_hostname == groups['kube_control_plane'][0] - name: populate NFD Helm chart values template and push to controller node template: @@ -124,31 +31,41 @@ dest: "{{ (project_root_dir, 'charts', 'nfd-values.yml') | path_join }}" force: yes mode: preserve - when: - - inventory_hostname == groups['kube_control_plane'][0] - -- name: check if NFD namespace exists - command: kubectl get namespace {{ nfd_namespace }} - register: ns_exists - failed_when: no when: inventory_hostname == groups['kube_control_plane'][0] -- name: create a namespace for NFD - command: kubectl create namespace {{ nfd_namespace }} - when: inventory_hostname == groups['kube_control_plane'][0] and "NotFound" in ns_exists.stderr +- name: Deploy NFD chart using values files on target + kubernetes.core.helm: + name: node-feature-discovery + release_state: present + chart_ref: "{{ (project_root_dir, 'charts', 'node-feature-discovery') | path_join }}" + release_namespace: "{{ nfd_namespace }}" + values_files: "{{ (project_root_dir, 'charts', 'nfd-values.yml') | path_join }}" + wait: yes + when: inventory_hostname == groups['kube_control_plane'][0] - name: wait for kubernetes service to be accessible wait_for: port: 6443 delay: 10 - when: - - inventory_hostname == groups['kube_control_plane'][0] + when: inventory_hostname == groups['kube_control_plane'][0] -- name: install NFD helm chart - command: >- - helm upgrade -i node-feature-discovery - --namespace {{ nfd_namespace }} - -f {{ (project_root_dir, 'charts', 'nfd-values.yml') | path_join }} - {{ (project_root_dir, 'charts', 'node-feature-discovery') | path_join }} +- name: NodeFeatureRules for DPs + block: + - name: populate NodeFeatureRules yaml file and push to controller node + template: + src: "node-feature-rules.yml.j2" + dest: "{{ (project_root_dir, 'node-feature-rules.yml') | path_join }}" + force: yes + mode: preserve + + - name: apply NodeFeatureRules + k8s: + state: present + src: "{{ (project_root_dir, 'node-feature-rules.yml') | path_join }}" when: - inventory_hostname == groups['kube_control_plane'][0] + - (qat_dp_enabled | d(false)) or + (sgx_dp_enabled | d(false)) or + (gpu_dp_enabled | d(false)) or + (dsa_dp_enabled | d(false)) or + (dlb_dp_enabled | d(false)) diff --git a/roles/nfd_install/templates/helm_values.yml.j2 b/roles/nfd_install/templates/helm_values.yml.j2 index 11691621..655e086f 100644 --- a/roles/nfd_install/templates/helm_values.yml.j2 +++ b/roles/nfd_install/templates/helm_values.yml.j2 @@ -1,98 +1,219 @@ --- image: - repository: {{ nfd_image | default("k8s.gcr.io/nfd/node-feature-discovery") }} - tag: {{ nfd_version | default("v0.11.0")}} + repository: {{ nfd_image |d("k8s.gcr.io/nfd/node-feature-discovery") }} + # This should be set to 'IfNotPresent' for released version pullPolicy: IfNotPresent + # tag, if defined will use the given image tag, else Chart.AppVersion will be used + tag: {{ nfd_image_tag |d("v0.11.1") }} +imagePullSecrets: [] +gpu_dp: + enabled: {{ gpu_dp_enabled | default(false) | bool | lower }} +{% if gpu_dp_max_memory is defined %} + max_memory: "{{ gpu_dp_max_memory | human_to_bytes }}" +{% else %} + max_memory: "0" +{% endif %} + +nameOverride: "" +fullnameOverride: "" + +nodeFeatureRule: + createCRD: true + +master: + instance: {% if sgx_dp_enabled | default(false) or gpu_dp_enabled | default(false) %} -nfd_resource_labels: + extraLabelNs: {% if sgx_dp_enabled | default(false) %} - - "sgx.intel.com/epc" + - "sgx.intel.com" {% endif %} {% if gpu_dp_enabled | default(false) %} - - "gpu.intel.com/memory.max" - - "gpu.intel.com/millicores" -{% endif %} -{% else %} -nfd_resource_labels: [] + - "gpu.intel.com" {% endif %} - -{% if sgx_dp_enabled | default(false) or gpu_dp_enabled | default(false) %} -nfd_extra_labels_ns: + resourceLabels: {% if sgx_dp_enabled | default(false) %} - - "sgx.intel.com" + - "sgx.intel.com/epc" {% endif %} {% if gpu_dp_enabled | default(false) %} - - "gpu.intel.com" + - "gpu.intel.com/memory.max" + - "gpu.intel.com/millicores" {% endif %} {% else %} -nfd_extra_labels_ns: [] + extraLabelNs: [] + resourceLabels: [] {% endif %} -gpu_dp: - enabled: {{ gpu_dp_enabled | default(false) | bool | lower }} -{% if gpu_dp_max_memory is defined %} - max_memory: "{{ gpu_dp_max_memory | human_to_bytes }}" -{% else %} - max_memory: "0" -{% endif %} + featureRulesController: null + deploymentAnnotations: {} + replicaCount: 1 -{% if sgx_dp_enabled | default(false) or qat_dp_enabled | default(false) %} -sgx_dp_enabled: {{ sgx_dp_enabled | default(false) | bool | lower }} + podSecurityContext: {} + # fsGroup: 2000 -config: - sources: - custom: -{% if sgx_dp_enabled | default(false) %} - - name: "intel.sgx" - labels: - intel.sgx: "true" - matchFeatures: - - feature: cpu.cpuid - matchExpressions: - SGX: {op: Exists} - SGXLC: {op: Exists} - - feature: cpu.sgx - matchExpressions: - enabled: {op: IsTrue} -{% if not (ansible_distribution == "Ubuntu" and ansible_distribution_version == "20.04") %} - - feature: kernel.config + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: [ "ALL" ] + readOnlyRootFilesystem: true + runAsNonRoot: true + # runAsUser: 1000 + + serviceAccount: + # Specifies whether a service account should be created + create: {{ nfd_sa_create | default(true) | bool | lower }} + # Annotations to add to the service account + annotations: {} + # The name of the service account to use. + # If not set and create is true, a name is generated using the fullname template + name: {{ nfd_sa_name | default(false) | bool | lower }} + + rbac: + create: {{ nfd_rbac_enabled | default(true) | bool | lower }} + + service: + type: {{ nfd_svc_type }} + port: {{ nfd_svc_port }} + + resources: {} + # We usually recommend not to specify default resources and to leave this as a conscious + # choice for the user. This also increases chances charts run on environments with little + # resources, such as Minikube. If you do want to specify resources, uncomment the following + # lines, adjust them as necessary, and remove the curly braces after 'resources:'. + # limits: + # cpu: 100m + # memory: 128Mi + # requests: + # cpu: 100m + # memory: 128Mi + + nodeSelector: {} + + tolerations: + - key: "node-role.kubernetes.io/master" + operator: "Equal" + value: "" + effect: "NoSchedule" + - key: "node-role.kubernetes.io/control-plane" + operator: "Equal" + value: "" + effect: "NoSchedule" + + annotations: {} + + affinity: + nodeAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 1 + preference: matchExpressions: - X86_SGX: {op: Exists} -{% endif %} -{% endif %} -{% if qat_dp_enabled | default(false) %} - - name: "intel.qat" - labels: - intel.qat: "true" - matchFeatures: - - feature: pci.device + - key: "node-role.kubernetes.io/master" + operator: In + values: [""] + - weight: 1 + preference: matchExpressions: - vendor: {op: In, value: ["8086"]} - device: {op: In, value: {{ qat_supported_pf_dev_ids | list + qat_supported_vf_dev_ids | list }}} -{% endif %} -{% endif %} + - key: "node-role.kubernetes.io/control-plane" + operator: In + values: [""] -sleepInterval: {{ nfd_sleep_interval | default("60s")}} +worker: + config: + core: + sleepInterval: {{ nfd_sleep_interval |d("60s") }} -tls: - enabled: {{ nfd_tls_enabled | default(false) | bool | lower }} - generate: {{ nfd_tls_generate | default(false) | bool | lower }} - masterKey: "{{ nfd_tls_master_key }}" - masterCert: "{{ nfd_tls_master_cert }}" - workerKey: "{{nfd_tls_worker_key}}" - workerCert: "{{ nfd_tls_worker_cert }}" - caCert: "{{ nfd_tls_ca_cert }}" + daemonsetAnnotations: {} + podSecurityContext: {} + # fsGroup: 2000 + + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: [ "ALL" ] + readOnlyRootFilesystem: true + runAsNonRoot: true + # runAsUser: 1000 + + serviceAccount: + # Specifies whether a service account should be created. + # We create this by default to make it easier for downstream users to apply PodSecurityPolicies. + create: {{ nfd_sa_create | default(false) | bool | lower }} + # Annotations to add to the service account + annotations: {} + # The name of the service account to use. + # If not set and create is true, a name is generated using the fullname template + name: {{ nfd_sa_name | default(false) | bool | lower }} + + # Allow users to mount the hostPath /usr/src, useful for RHCOS on s390x + # Does not work on systems without /usr/src AND a read-only /usr, such as Talos + mountUsrSrc: false + + resources: {} + # We usually recommend not to specify default resources and to leave this as a conscious + # choice for the user. This also increases chances charts run on environments with little + # resources, such as Minikube. If you do want to specify resources, uncomment the following + # lines, adjust them as necessary, and remove the curly braces after 'resources:'. + # limits: + # cpu: 100m + # memory: 128Mi + # requests: + # cpu: 100m + # memory: 128Mi + + nodeSelector: {} + tolerations: [] -serviceAccount: - create: {{ nfd_sa_create | default(false) | bool | lower }} - name: "{{ nfd_sa_name }}" + annotations: {} -rbac: - enabled: {{ nfd_rbac_enabled | default(false) | bool | lower }} + affinity: {} -service: - port: {{ nfd_svc_port }} - type: "{{ nfd_svc_type }}" +topologyUpdater: + enable: false + createCRDs: false + serviceAccount: + create: false + annotations: {} + name: + rbac: + create: false + + kubeletConfigPath: + kubeletPodResourcesSockPath: + updateInterval: 60s + watchNamespace: "*" + + podSecurityContext: {} + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: [ "ALL" ] + readOnlyRootFilesystem: true + runAsUser: 0 + + resources: {} + # We usually recommend not to specify default resources and to leave this as a conscious + # choice for the user. This also increases chances charts run on environments with little + # resources, such as Minikube. If you do want to specify resources, uncomment the following + # lines, adjust them as necessary, and remove the curly braces after 'resources:'. + # limits: + # cpu: 100m + # memory: 128Mi + # requests: + # cpu: 100m + # memory: 128Mi + + nodeSelector: {} + tolerations: [] + annotations: {} + affinity: {} + +# Optionally use encryption for worker <--> master comms +# TODO: verify hostname is not yet supported +# +# If you do not enable certManager (and have it installed) you will +# need to manually, or otherwise, provision the TLS certs as secrets +tls: + enable: true + certManager: {{ cert_manager_enabled |d(false) }} diff --git a/roles/nfd_install/templates/node-feature-rules.yml.j2 b/roles/nfd_install/templates/node-feature-rules.yml.j2 new file mode 100644 index 00000000..2acc8b8e --- /dev/null +++ b/roles/nfd_install/templates/node-feature-rules.yml.j2 @@ -0,0 +1,79 @@ +apiVersion: nfd.k8s-sigs.io/v1alpha1 +kind: NodeFeatureRule +metadata: + name: intel-dp-devices +spec: + rules: +{% if dlb_dp_enabled | d(false) %} + - name: "intel.dlb" + labels: + "intel.feature.node.kubernetes.io/dlb": "true" + matchFeatures: + - feature: pci.device + matchExpressions: + vendor: {op: In, value: ["8086"]} + device: {op: In, value: ["2710"]} + class: {op: In, value: ["0b40"]} + - feature: kernel.loadedmodule + matchExpressions: + dlb2: {op: Exists} +{% endif %} +{% if dsa_dp_enabled | d(false) %} + - name: "intel.dsa" + labels: + "intel.feature.node.kubernetes.io/dsa": "true" + matchFeatures: + - feature: pci.device + matchExpressions: + vendor: {op: In, value: ["8086"]} + device: {op: In, value: ["0b25"]} + class: {op: In, value: ["0880"]} + - feature: kernel.loadedmodule + matchExpressions: + idxd: {op: Exists} +{% endif %} +{% if gpu_dp_enabled | d(false) %} + - name: "intel.gpu" + labels: + "intel.feature.node.kubernetes.io/gpu": "true" + matchFeatures: + - feature: pci.device + matchExpressions: + vendor: {op: In, value: ["8086"]} + class: {op: In, value: ["0300"]} + - feature: kernel.loadedmodule + matchExpressions: + drm: {op: Exists} +{% endif %} +{% if qat_dp_enabled | d(false) %} + - name: "intel.qat" + labels: + "intel.feature.node.kubernetes.io/qat": "true" + matchFeatures: + - feature: pci.device + matchExpressions: + vendor: {op: In, value: ["8086"]} + device: {op: In, value: {{ qat_supported_pf_dev_ids | list + qat_supported_vf_dev_ids | list }}} + class: {op: In, value: ["0b40"]} + - feature: kernel.loadedmodule + matchExpressions: + intel_qat: {op: Exists} +{% endif %} +{% if sgx_dp_enabled | d(false) %} + - name: "intel.sgx" + labels: + "intel.feature.node.kubernetes.io/sgx": "true" + matchFeatures: + - feature: cpu.cpuid + matchExpressions: + SGX: {op: Exists} + SGXLC: {op: Exists} + - feature: cpu.sgx + matchExpressions: + enabled: {op: IsTrue} +{% if not (ansible_distribution == "Ubuntu" and ansible_distribution_version == "20.04") %} + - feature: kernel.config + matchExpressions: + X86_SGX: {op: Exists} +{% endif%} +{% endif %} diff --git a/roles/openssl_engine_install/defaults/main.yml b/roles/openssl_engine_install/defaults/main.yml index d9765512..6e01be01 100644 --- a/roles/openssl_engine_install/defaults/main.yml +++ b/roles/openssl_engine_install/defaults/main.yml @@ -16,9 +16,9 @@ --- openssl_engine_dir: "{{ (project_root_dir, 'openssl') | path_join }}" openssl_engine_url: "https://github.com/intel/QAT_Engine.git" -openssl_engine_version: "v0.6.12" +openssl_engine_version: "v0.6.15" libarchive_url: "https://github.com/libarchive/libarchive/releases/download/v3.5.1/libarchive-3.5.1.tar.xz" ipp_crypto_url: "https://github.com/intel/ipp-crypto.git" -ipp_crypto_version: "ippcp_2021.5" +ipp_crypto_version: "ippcp_2021.6" intel_ipsec_url: "https://github.com/intel/intel-ipsec-mb.git" intel_ipsec_version: "v1.2" diff --git a/roles/openssl_engine_install/tasks/openssl_engine_config.yml b/roles/openssl_engine_install/tasks/openssl_engine_config.yml index 2f6330a2..8b704c85 100644 --- a/roles/openssl_engine_install/tasks/openssl_engine_config.yml +++ b/roles/openssl_engine_install/tasks/openssl_engine_config.yml @@ -135,24 +135,24 @@ - name: autogen configuration for OpenSSL*Engine command: ./autogen.sh args: - chdir: "{{ openssl_engine_dir }}/openssl_engine" + chdir: "{{ (openssl_engine_dir, 'openssl_engine') | path_join }}" changed_when: true - name: check all configuration is present for OpenSSL*Engine - command: "./configure --enable-multibuff_offload --enable-ipsec_offload --enable-multibuff_ecx --enable-qat_sw" + command: "./configure --enable-multibuff_offload --enable-ipsec_offload --enable-multibuff_ecx --enable-qat_sw" args: - chdir: "{{ openssl_engine_dir }}/openssl_engine" + chdir: "{{ (openssl_engine_dir, 'openssl_engine') | path_join }}" changed_when: false - name: Build OpenSSL*Engine Library once successfully configured command: 'make -j{{ nproc_out.stdout | int }}' args: - chdir: "{{ openssl_engine_dir }}/openssl_engine" + chdir: "{{ (openssl_engine_dir, 'openssl_engine') | path_join }}" changed_when: true - name: make install OpenSSL*Engine make: - chdir: "{{ openssl_engine_dir }}/openssl_engine" + chdir: "{{ (openssl_engine_dir, 'openssl_engine') | path_join }}" target: install environment: "MAKEFLAGS": "-j{{ nproc_out.stdout | int }}" @@ -169,5 +169,6 @@ - name: OpenSSL*Engine command returns errors, playbook terminated fail: - msg: "OpenSSL Engine failed to load... Cause of failure can be unsupported hardware or misconfiguration of Intel QAT OpenSSL*Engine" - when: "openssl_engine_version not in confirm_openssl_engine.stdout" + msg: "OpenSSL Engine failed to load... Cause of failure can be unsupported hardware or misconfiguration of Intel QAT OpenSSL*Engine" + when: + - "openssl_engine_version not in confirm_openssl_engine.stdout" diff --git a/roles/opentelemetry_install/defaults/main.yml b/roles/opentelemetry_install/defaults/main.yml new file mode 100644 index 00000000..e8a2cccf --- /dev/null +++ b/roles/opentelemetry_install/defaults/main.yml @@ -0,0 +1,20 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +opentelemetry_repo: "https://open-telemetry.github.io/opentelemetry-helm-charts" +opentelemetry_operator_namespace: "opentelemetry-operator-system" +opentelemetry_operator_chart_name: "opentelemetry-operator" +opentelemetry_operator_chart_version: "0.11.8" diff --git a/roles/opentelemetry_install/files/collector-demo.yml b/roles/opentelemetry_install/files/collector-demo.yml new file mode 100644 index 00000000..cb694403 --- /dev/null +++ b/roles/opentelemetry_install/files/collector-demo.yml @@ -0,0 +1,49 @@ +apiVersion: opentelemetry.io/v1alpha1 +kind: OpenTelemetryCollector +metadata: + name: otel-telegraf +spec: + mode: deployment + serviceAccount: prometheus-k8s + volumeMounts: + - name: telegraf-ca + mountPath: "/var/run/secrets/telegraf-tls" + volumes: + - name: telegraf-ca + secret: + secretName: telegraf-tls + items: + - key: ca.crt + path: "ca.crt" + config: | + receivers: + prometheus: + config: + scrape_configs: + - job_name: "otel-collector" + scrape_interval: 5s + static_configs: + - targets: ["telegraf:9273"] + authorization: + credentials_file: "/var/run/secrets/kubernetes.io/serviceaccount/token" + scheme: https + tls_config: + ca_file: "/var/run/secrets/telegraf-tls/ca.crt" + + processors: + batch: + send_batch_size: 10000 + timeout: 5s + + exporters: + logging: + + service: + telemetry: + logs: + level: debug + pipelines: + metrics: + receivers: [prometheus] + processors: [batch] + exporters: [logging] diff --git a/roles/opentelemetry_install/tasks/cleanup.yml b/roles/opentelemetry_install/tasks/cleanup.yml new file mode 100644 index 00000000..df3b6694 --- /dev/null +++ b/roles/opentelemetry_install/tasks/cleanup.yml @@ -0,0 +1,44 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- block: + - name: load variables from telegraf role + ansible.builtin.import_role: + name: telegraf_install + when: false + - name: delete demo collector deployment + k8s: + state: absent + definition: "{{ lookup('file', '../files/collector-demo.yml') | from_yaml }}" + namespace: "{{ telegraf_namespace }}" + failed_when: false # quick workaround until proper cleanup is implemented (todo) + tags: + - opentelemetry + when: + - inventory_hostname == groups['kube_control_plane'][0] + - opentelemetry_demo_enabled | default(true) | bool +- block: + - name: remove opentelemetry operator + kubernetes.core.helm: + release_name: "{{ opentelemetry_operator_chart_name }}" + release_namespace: "{{ opentelemetry_operator_namespace }}" + update_repo_cache: true + state: absent + failed_when: false # quick workaround until proper cleanup is implemented (todo) + tags: + - opentelemetry + when: + - inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/opentelemetry_install/tasks/install.yml b/roles/opentelemetry_install/tasks/install.yml new file mode 100644 index 00000000..6e3e97bc --- /dev/null +++ b/roles/opentelemetry_install/tasks/install.yml @@ -0,0 +1,34 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: deploy opentelemetry operator + kubernetes.core.helm: + chart_repo_url: "{{ opentelemetry_repo }}" + chart_ref: "{{ opentelemetry_operator_chart_name }}" + chart_version: "{{ opentelemetry_operator_chart_version }}" + create_namespace: true + release_name: "{{ opentelemetry_operator_chart_name }}" + release_namespace: "{{ opentelemetry_operator_namespace }}" + wait: true + +- name: create demo collector deployment + k8s: + state: present + definition: "{{ lookup('file', '../files/collector-demo.yml') | from_yaml }}" + namespace: "{{ telegraf_namespace }}" + when: + - opentelemetry_demo_enabled | default(true) | bool + - telegraf_enabled | default(false) | bool diff --git a/roles/opentelemetry_install/tasks/main.yml b/roles/opentelemetry_install/tasks/main.yml new file mode 100644 index 00000000..a7d3dc4c --- /dev/null +++ b/roles/opentelemetry_install/tasks/main.yml @@ -0,0 +1,21 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: install opentelemetry + import_tasks: install.yml + when: + - opentelemetry_enabled is defined and opentelemetry_enabled + - inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/platform_aware_scheduling_install/charts/gpu-aware-scheduling/templates/gas-deployment.yaml b/roles/platform_aware_scheduling_install/charts/gpu-aware-scheduling/templates/gas-deployment.yaml index 7957f369..d43212a0 100644 --- a/roles/platform_aware_scheduling_install/charts/gpu-aware-scheduling/templates/gas-deployment.yaml +++ b/roles/platform_aware_scheduling_install/charts/gpu-aware-scheduling/templates/gas-deployment.yaml @@ -51,6 +51,8 @@ spec: tolerations: - key: node-role.kubernetes.io/master operator: Exists + - key: node-role.kubernetes.io/control-plane + operator: Exists affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: @@ -58,4 +60,6 @@ spec: - matchExpressions: - key: node-role.kubernetes.io/master operator: Exists - + - matchExpressions: + - key: node-role.kubernetes.io/control-plane + operator: Exists diff --git a/roles/platform_aware_scheduling_install/charts/gpu-aware-scheduling/values.yaml b/roles/platform_aware_scheduling_install/charts/gpu-aware-scheduling/values.yaml index bb623da7..cb6ed0f0 100644 --- a/roles/platform_aware_scheduling_install/charts/gpu-aware-scheduling/values.yaml +++ b/roles/platform_aware_scheduling_install/charts/gpu-aware-scheduling/values.yaml @@ -42,4 +42,3 @@ tls: # key: # certificate: verbosity: 4 - diff --git a/roles/platform_aware_scheduling_install/defaults/main.yml b/roles/platform_aware_scheduling_install/defaults/main.yml index 0bc051de..2d2e8122 100644 --- a/roles/platform_aware_scheduling_install/defaults/main.yml +++ b/roles/platform_aware_scheduling_install/defaults/main.yml @@ -29,15 +29,15 @@ pas_namespace: kube-system # Descheduler descheduler_git_url: https://github.com/kubernetes-sigs/descheduler.git -descheduler_git_version: "v0.23.1" +descheduler_git_version: "v0.24.1" descheduler_dir: "{{ (project_root_dir, 'sigs.k8s.io/descheduler') | path_join }}" sigs_k8s_io_dir: "{{ (project_root_dir, 'sigs.k8s.io') | path_join }}" # TAS deployment tas_enabled: false tas_build_image_locally: false -tas_git_version: "telemetry-aware-scheduling/v0.2.0" -tas_extender_image_tag_default: "0.2.0" +tas_git_version: "telemetry-aware-scheduling/v0.3.0" +tas_extender_image_tag_default: "0.3.0" tas_version: |- {{ ('telemetry-aware-scheduling' in tas_git_version) | ternary(tas_git_version[28:], tas_git_version[1:]) }} @@ -69,8 +69,8 @@ tas_verbosity: 4 # GAS deployment gas_enabled: false gas_build_image_locally: false -gas_git_version: "gpu-aware-scheduling/v0.3.0" -gas_extender_image_tag_default: "0.3.0" +gas_git_version: "gpu-aware-scheduling/v0.4.0" +gas_extender_image_tag_default: "0.4.0" gas_version: |- {{ ('gpu-aware-scheduling' in gas_git_version) | ternary(gas_git_version[22:], gas_git_version[1:]) }} diff --git a/roles/platform_aware_scheduling_install/tasks/backups_restore.yml b/roles/platform_aware_scheduling_install/tasks/backups_restore.yml index cd4638b2..64b18564 100644 --- a/roles/platform_aware_scheduling_install/tasks/backups_restore.yml +++ b/roles/platform_aware_scheduling_install/tasks/backups_restore.yml @@ -46,4 +46,3 @@ state: absent when: - config_backup.stat.exists - diff --git a/roles/platform_aware_scheduling_install/tasks/create-scheduler-config.yml b/roles/platform_aware_scheduling_install/tasks/create-scheduler-config.yml index c7efbf22..50f3829d 100644 --- a/roles/platform_aware_scheduling_install/tasks/create-scheduler-config.yml +++ b/roles/platform_aware_scheduling_install/tasks/create-scheduler-config.yml @@ -35,7 +35,7 @@ - name: configure API version in kube-scheduler config file lineinfile: path: /tmp/kubescheduler-config.yaml - line: "apiVersion: kubescheduler.config.k8s.io/v1beta2" + line: "apiVersion: kubescheduler.config.k8s.io/v1beta3" regexp: "apiVersion: " state: present mode: 0600 @@ -130,8 +130,8 @@ state: present mode: 0600 with_items: - - { arg: " - --policy-configmap", value: "pas-scheduler-extenders-policy" } - - { arg: " - --policy-configmap-namespace", value: "{{ pas_namespace }}" } + - {arg: " - --policy-configmap", value: "pas-scheduler-extenders-policy"} + - {arg: " - --policy-configmap-namespace", value: "{{ pas_namespace }}"} when: kube_version is version('v1.22', '<') - name: configure arguments from Kubernetes Scheduler file - dnsPolicy diff --git a/roles/platform_aware_scheduling_install/tasks/gas.yml b/roles/platform_aware_scheduling_install/tasks/gas.yml index e2e6b40f..9a1c393f 100644 --- a/roles/platform_aware_scheduling_install/tasks/gas.yml +++ b/roles/platform_aware_scheduling_install/tasks/gas.yml @@ -28,13 +28,13 @@ mode: preserve loop: - { - src: "gas-values.yaml.j2", - dest: "{{ (project_root_dir, 'charts/gpu-aware-scheduling/values.yaml') | path_join }}" - } + src: "gas-values.yaml.j2", + dest: "{{ (project_root_dir, 'charts/gpu-aware-scheduling/values.yaml') | path_join }}" + } - { - src: "gas-chart.yaml.j2", - dest: "{{ (project_root_dir, 'charts/gpu-aware-scheduling/Chart.yaml') | path_join }}" - } + src: "gas-chart.yaml.j2", + dest: "{{ (project_root_dir, 'charts/gpu-aware-scheduling/Chart.yaml') | path_join }}" + } - name: install GAS helm chart command: >- @@ -48,4 +48,3 @@ register: result until: result.rc == 0 changed_when: true - diff --git a/roles/platform_aware_scheduling_install/tasks/main.yml b/roles/platform_aware_scheduling_install/tasks/main.yml index 78f30d2c..7b7b28b4 100644 --- a/roles/platform_aware_scheduling_install/tasks/main.yml +++ b/roles/platform_aware_scheduling_install/tasks/main.yml @@ -14,7 +14,7 @@ ## limitations under the License. ## --- -- name : install dependencies +- name: install dependencies include_role: name: install_dependencies when: diff --git a/roles/platform_aware_scheduling_install/tasks/tas.yml b/roles/platform_aware_scheduling_install/tasks/tas.yml index 0897b294..6a7e698e 100644 --- a/roles/platform_aware_scheduling_install/tasks/tas.yml +++ b/roles/platform_aware_scheduling_install/tasks/tas.yml @@ -49,13 +49,13 @@ mode: preserve loop: - { - src: "tas-values.yaml.j2", - dest: "{{ (project_root_dir, 'charts/telemetry-aware-scheduling/values.yaml') | path_join }}" - } + src: "tas-values.yaml.j2", + dest: "{{ (project_root_dir, 'charts/telemetry-aware-scheduling/values.yaml') | path_join }}" + } - { - src: "tas-chart.yaml.j2", - dest: "{{ (project_root_dir, 'charts/telemetry-aware-scheduling/Chart.yaml') | path_join }}" - } + src: "tas-chart.yaml.j2", + dest: "{{ (project_root_dir, 'charts/telemetry-aware-scheduling/Chart.yaml') | path_join }}" + } - name: install TAS helm chart command: >- diff --git a/roles/platform_aware_scheduling_install/tasks/tls.yml b/roles/platform_aware_scheduling_install/tasks/tls.yml index 75703fc4..cb6fc4a7 100644 --- a/roles/platform_aware_scheduling_install/tasks/tls.yml +++ b/roles/platform_aware_scheduling_install/tasks/tls.yml @@ -57,8 +57,8 @@ chdir: "{{ extender.ssl }}" executable: /bin/bash loop: - - { name: server, target: "tls-extender" } - - { name: client, target: "tls-extender-client" } + - {name: server, target: "tls-extender"} + - {name: client, target: "tls-extender-client"} changed_when: true - name: create secret {{ extender.secret_name }} diff --git a/roles/platform_aware_scheduling_install/vars/main.yml b/roles/platform_aware_scheduling_install/vars/main.yml index 18e5098b..1b09ad60 100644 --- a/roles/platform_aware_scheduling_install/vars/main.yml +++ b/roles/platform_aware_scheduling_install/vars/main.yml @@ -18,33 +18,33 @@ # please use defaults/main.yml instead extenders: - { - name: "{{ tas_name }}", - dir: "{{ tas_extender_dir }}", - image: "{{ tas_extender_image }}", - tag: "{{ tas_extender_image_tag }}", - context: "{{ tas_image_build_context }}", - build: "{{ tas_build_image_locally }}", - bin_build: "{{ tas_build_bin }}", - secret_name: "{{ tas_extender_secret_name }}", - service_name: "{{ tas_service_name }}", - tls_enabled: "{{ tas_tls_enabled }}", - ssl: "{{ tas_ssl }}", - version: "{{ tas_git_version }}" - } + name: "{{ tas_name }}", + dir: "{{ tas_extender_dir }}", + image: "{{ tas_extender_image }}", + tag: "{{ tas_extender_image_tag }}", + context: "{{ tas_image_build_context }}", + build: "{{ tas_build_image_locally }}", + bin_build: "{{ tas_build_bin }}", + secret_name: "{{ tas_extender_secret_name }}", + service_name: "{{ tas_service_name }}", + tls_enabled: "{{ tas_tls_enabled }}", + ssl: "{{ tas_ssl }}", + version: "{{ tas_git_version }}" + } - { - name: "{{ gas_name }}", - dir: "{{ gas_extender_dir }}", - image: "{{ gas_extender_image }}", - tag: "{{ gas_extender_image_tag }}", - context: "{{ gas_image_build_context }}", - build: "{{ gas_build_image_locally }}", - bin_build: "{{ gas_build_bin }}", - secret_name: "{{ gas_extender_secret_name }}", - service_name: "{{ gas_service_name }}", - tls_enabled: "{{ gas_tls_enabled }}", - ssl: "{{ gas_ssl }}", - version: "{{ gas_git_version }}" - } + name: "{{ gas_name }}", + dir: "{{ gas_extender_dir }}", + image: "{{ gas_extender_image }}", + tag: "{{ gas_extender_image_tag }}", + context: "{{ gas_image_build_context }}", + build: "{{ gas_build_image_locally }}", + bin_build: "{{ gas_build_bin }}", + secret_name: "{{ gas_extender_secret_name }}", + service_name: "{{ gas_service_name }}", + tls_enabled: "{{ gas_tls_enabled }}", + ssl: "{{ gas_ssl }}", + version: "{{ gas_git_version }}" + } # kube-scheduler config files to backup kube_scheduler_configs: diff --git a/roles/qat_dp_install/defaults/main.yml b/roles/qat_dp_install/defaults/main.yml index d7e82519..b38970bd 100644 --- a/roles/qat_dp_install/defaults/main.yml +++ b/roles/qat_dp_install/defaults/main.yml @@ -15,8 +15,8 @@ ## --- intel_qat_dp_git_url: "https://github.com/intel/intel-device-plugins-for-kubernetes.git" -intel_qat_dp_git_ref: "v0.23.0" -intel_qat_dp_version: "0.23.0" +intel_qat_dp_git_ref: "v0.24.0" +intel_qat_dp_version: "0.24.0" intel_qat_dp_dir: "{{ (project_root_dir, 'intel-qat-dp') | path_join }}" qat_dp_namespace: "kube-system" diff --git a/roles/qat_dp_install/files/noiommu.patch b/roles/qat_dp_install/files/noiommu.patch deleted file mode 100644 index 03a0da4a..00000000 --- a/roles/qat_dp_install/files/noiommu.patch +++ /dev/null @@ -1,78 +0,0 @@ -diff --git a/cmd/qat_plugin/dpdkdrv/dpdkdrv.go b/cmd/qat_plugin/dpdkdrv/dpdkdrv.go -index 8a33e78..87da7e8 100644 ---- a/cmd/qat_plugin/dpdkdrv/dpdkdrv.go -+++ b/cmd/qat_plugin/dpdkdrv/dpdkdrv.go -@@ -169,6 +169,15 @@ func (dp *DevicePlugin) getDpdkDevice(vfBdf string) (string, error) { - - s := filepath.Base(group) - -+ // If the kernel has CONFIG_VFIO_NOIOMMU enabled and the node admin -+ // has explicitly set enable_unsafe_noiommu_mode VFIO parameter, -+ // VFIO taints the kernel and writes "vfio-noiommu" to the IOMMU -+ // group name. If these conditions are true, the /dev/vfio/ devices -+ // are prefixed with "noiommu-". -+ if isVfioNoIOMMU(vfioDirPath) { -+ s = fmt.Sprintf("noiommu-%s", s) -+ } -+ - return s, nil - - default: -@@ -176,6 +185,16 @@ func (dp *DevicePlugin) getDpdkDevice(vfBdf string) (string, error) { - } - } - -+func isVfioNoIOMMU(iommuGroupPath string) bool { -+ if fileData, err := os.ReadFile(filepath.Join(iommuGroupPath, "name")); err == nil { -+ if strings.TrimSpace(string(fileData)) == "vfio-noiommu" { -+ return true -+ } -+ } -+ -+ return false -+} -+ - func (dp *DevicePlugin) getDpdkDeviceSpecs(dpdkDeviceName string) []pluginapi.DeviceSpec { - switch dp.dpdkDriver { - case igbUio: -diff --git a/cmd/qat_plugin/dpdkdrv/dpdkdrv_test.go b/cmd/qat_plugin/dpdkdrv/dpdkdrv_test.go -index c55a4e9..a231beb 100644 ---- a/cmd/qat_plugin/dpdkdrv/dpdkdrv_test.go -+++ b/cmd/qat_plugin/dpdkdrv/dpdkdrv_test.go -@@ -348,7 +348,7 @@ func TestScan(t *testing.T) { - expectedErr: true, - }, - { -- name: "vfio-pci DPDKdriver with one kernel bound device (QAT device) where vfdevID is equal to qatDevId (37c9), running in a VM", -+ name: "vfio-pci DPDKdriver with one kernel bound device (QAT device) where vfdevID is equal to qatDevId (37c9), running in a VM with vIOMMU", - dpdkDriver: "vfio-pci", - kernelVfDrivers: []string{"c6xxvf"}, - dirs: []string{ -@@ -366,6 +366,27 @@ func TestScan(t *testing.T) { - maxDevNum: 1, - expectedDevNum: 1, - }, -+ { -+ name: "vfio-pci DPDKdriver in unsafe NOIOMMU mode with one kernel bound device (QAT device) where vfdevID is equal to qatDevId (37c9), running in a VM without IOMMU", -+ dpdkDriver: "vfio-pci", -+ kernelVfDrivers: []string{"c6xxvf"}, -+ dirs: []string{ -+ "sys/bus/pci/drivers/c6xxvf", -+ "sys/bus/pci/drivers/vfio-pci", -+ "sys/bus/pci/devices/0000:02:01.0", -+ "sys/kernel/iommu_groups/vfiotestfile", -+ }, -+ files: map[string][]byte{ -+ "sys/bus/pci/devices/0000:02:01.0/device": []byte("0x37c9"), -+ "sys/kernel/iommu_groups/vfiotestfile/name": []byte("vfio-noiommu"), -+ }, -+ symlinks: map[string]string{ -+ "sys/bus/pci/devices/0000:02:01.0/iommu_group": "sys/kernel/iommu_groups/vfiotestfile", -+ "sys/bus/pci/devices/0000:02:01.0/driver": "sys/bus/pci/drivers/c6xxvf", -+ }, -+ maxDevNum: 1, -+ expectedDevNum: 1, -+ }, - } - for _, tt := range tcases { - t.Run(tt.name, func(t *testing.T) { diff --git a/roles/qat_dp_install/tasks/main.yml b/roles/qat_dp_install/tasks/main.yml index f1535f34..9604a8b1 100644 --- a/roles/qat_dp_install/tasks/main.yml +++ b/roles/qat_dp_install/tasks/main.yml @@ -28,23 +28,6 @@ - inventory_hostname == groups['kube_node'][0] - qat_dp_build_image_locally -# Once the patch is released in official intel_qat_dp version -# remove this task and disable local build in group_vars.j2 via -# qat_dp_build_image_locally: false -# Correction expected in intel_qat_dp version 0.24.0 -- name: apply qat_plugin patch - patch: - src: "{{ item }}" - basedir: "{{ intel_qat_dp_dir }}" - strip: 1 - state: present - with_items: - - "noiommu.patch" - when: - - inventory_hostname == groups['kube_node'][0] - - qat_dp_build_image_locally - - on_vms is defined and on_vms - - name: prepare containers images block: - name: Build QAT Plugin before Intel QAT Device Plugin image @@ -59,9 +42,11 @@ - name: tag Intel QAT Device Plugin image command: docker tag intel/intel-qat-plugin:{{ intel_qat_dp_version }} {{ registry_local_address }}/intel-qat-plugin:{{ intel_qat_dp_version }} + changed_when: true - name: push Intel QAT Device Plugin image to local registry command: docker push {{ registry_local_address }}/intel-qat-plugin:{{ intel_qat_dp_version }} + changed_when: true when: - inventory_hostname == groups['kube_node'][0] - qat_dp_build_image_locally @@ -75,7 +60,7 @@ chdir: "{{ intel_qat_dp_dir }}" changed_when: true with_items: - - { file: intel-qat-plugin.Dockerfile, name: intel-qat-plugin } + - {file: intel-qat-plugin.Dockerfile, name: intel-qat-plugin} - name: push Intel QAT Device Plugin image to local registry command: podman push {{ registry_local_address }}/intel-qat-plugin:{{ intel_qat_dp_version }} diff --git a/roles/qat_dp_install/templates/intel-qat-plugin.yml.j2 b/roles/qat_dp_install/templates/intel-qat-plugin.yml.j2 index f58cb9af..0afa1c6f 100644 --- a/roles/qat_dp_install/templates/intel-qat-plugin.yml.j2 +++ b/roles/qat_dp_install/templates/intel-qat-plugin.yml.j2 @@ -12,11 +12,14 @@ metadata: container.apparmor.security.beta.kubernetes.io/intel-qat-plugin: {{ qat_dp_apparmor_profile | default("unconfined") }} {% endif %} spec: - image: {{ qat_dp_image | default("docker.io/intel/intel-qat-plugin") }}:{{ intel_qat_dp_version | default("0.23.0") }} + image: {{ qat_dp_image | default("docker.io/intel/intel-qat-plugin") }}:{{ intel_qat_dp_version | default("0.24.0") }} dpdkDriver: {{ qat_dp_dpdk_drivers }} kernelVfDrivers: {{ qat_dp_kernel_drivers }} maxNumDevices: {{ qat_dp_max_num_devices }} +{% if allocation_policy is defined %} + preferredAllocationPolicy: {{ allocation_policy }} +{% endif %} logLevel: {{ qat_dp_verbosity | default(4) }} nodeSelector: - feature.node.kubernetes.io/intel.qat: 'true' + intel.feature.node.kubernetes.io/qat: 'true' qat.configured: 'true' diff --git a/roles/redeploy_cleanup/defaults/main.yml b/roles/redeploy_cleanup/defaults/main.yml index d03f1a37..ee264b3d 100644 --- a/roles/redeploy_cleanup/defaults/main.yml +++ b/roles/redeploy_cleanup/defaults/main.yml @@ -51,3 +51,6 @@ intel_services_to_stop: - "sst-bf-configure.service" - "sst-tf-configure.service" - "vpp.service" + +# Mentioned below folder location must match with roles/bootstrap/install_qat_drivers_services/defaults/main.yml +qat_drivers_dir: "{{ (project_root_dir, 'qat_drivers') | path_join }}" diff --git a/roles/redeploy_cleanup/tasks/intel_cleanup.yml b/roles/redeploy_cleanup/tasks/intel_cleanup.yml index 16bed225..e1ed389b 100644 --- a/roles/redeploy_cleanup/tasks/intel_cleanup.yml +++ b/roles/redeploy_cleanup/tasks/intel_cleanup.yml @@ -53,19 +53,20 @@ with_items: - "{{ intel_services_to_stop + (ddp_i40e_service.stdout | ternary([ddp_i40e_service.stdout], [])) }}" -- name: remove custom kernel - block: - - name: remove custom kernel build files - make: - dest: "{{ project_root_dir }}/kernels/gpu_kernel/" - target: "{{ item }}" - with_items: - - clean - - mrproper - - distclean - changed_when: false - failed_when: false +- name: remove custom kernel build files + make: + dest: "{{ project_root_dir }}/kernels/gpu_kernel/" + target: "{{ item }}" + with_items: + - clean" + - mrproper + - distclean + changed_when: false + failed_when: false + when: configure_gpu | default(false) +- name: block required only in case when update_kernel is true in group_vars + block: - name: find files in /boot to remove find: paths: /boot @@ -89,14 +90,22 @@ state: absent mode: 0644 + - name: show current Kernel version + debug: + msg: "{{ ansible_kernel }}" + - name: find generic kernel files in /boot find: paths: /boot - patterns: "vmlinuz-*-*" + patterns: "vmlinuz-{{ ansible_kernel }}*" register: kernel_path changed_when: false failed_when: false + - name: check Kernel version selection before setting it to default in grub + debug: + msg: "{{ kernel_path.files[0].path }}" + - name: use generic kernel on Ubuntu lineinfile: dest: /etc/default/grub @@ -107,17 +116,53 @@ changed_when: false failed_when: false become: yes - when: ansible_distribution == "Ubuntu" - - name: use generic kernel on CentOS/RHEL - command: grubby --set-default "{{ kernel_path.files[0].path }}" + - name: use generic kernel on RHEL / Rocky + command: "grubby --set-default {{ kernel_path.files[0].path }}" changed_when: true failed_when: false become: yes - when: ansible_os_family == "RedHat" + when: + - (ansible_distribution == "Ubuntu" and ansible_distribution_version == "20.04" and update_kernel) or + (ansible_os_family == "RedHat" and ansible_distribution_version < "9.0" and update_kernel) or + configure_gpu | default(false) + +- include_role: + name: bootstrap/update_grub + +# All QAT settings need to be uninstalled / reverted if re-configuration is required +- name: revert QAT setup + block: + - name: make sure qat_service is stopped and disabled + service: + state: stopped + name: qat_service + enabled: no + failed_when: false + + - name: uninstall QAT drivers + make: + chdir: "{{ qat_drivers_dir }}" + target: uninstall + become: yes + failed_when: false + + - name: make clean QAT drivers + make: + chdir: "{{ qat_drivers_dir }}" + target: clean + become: yes + failed_when: false - - include_role: - name: bootstrap/update_grub + - name: make distclean QAT drivers + make: + chdir: "{{ qat_drivers_dir }}" + target: distclean + become: yes + failed_when: false + when: + - update_qat_drivers is defined and update_qat_drivers + - qat_devices is defined and qat_devices != [] - name: cleanup /usr/local/* # ansible find module too slow @@ -154,5 +199,29 @@ changed_when: true failed_when: false +- name: block for remove sgx software on Rocky / RHEL >= 9.0 + block: + - name: remove sgx software on Rocky / RHEL >= 9.0 + shell: "set -o pipefail && rpm -e {{ item }}" + args: + executable: /bin/bash + loop: + - "libsgx-launch" + - "libsgx-epid" + - "libsgx-quote-ex" + - "sgx-aesm-service" + failed_when: false + changed_when: true + + - name: remove libproto for sgx on Rocky / RHEL >= 9.0 + file: + path: /usr/lib64/libprotobuf.so.15 + state: absent + changed_when: true + failed_when: false + when: + - ansible_os_family == "RedHat" + - ansible_distribution_version >= "9.0" + - debug: msg: "Intel Container Experience Kit features have been removed ..." diff --git a/roles/redeploy_cleanup/tasks/k8s_cleanup.yml b/roles/redeploy_cleanup/tasks/k8s_cleanup.yml index 412da35f..4d2d8c5a 100644 --- a/roles/redeploy_cleanup/tasks/k8s_cleanup.yml +++ b/roles/redeploy_cleanup/tasks/k8s_cleanup.yml @@ -64,7 +64,7 @@ state: absent with_items: "{{ kubepods_files.files | default([]) }}" when: - - kubepods_files.files | length > 0 + - kubepods_files.files | length > 0 changed_when: false failed_when: false @@ -89,8 +89,8 @@ with_items: - "{{ existing_containers.stdout | default([]) }}" when: - - existing_containers.stdout | length > 0 - - container_runtime == "docker" + - existing_containers.stdout | length > 0 + - container_runtime == "docker" changed_when: false failed_when: false - name: list running or stopped containers @@ -105,8 +105,8 @@ with_items: - "{{ crictl_existing_containers.stdout | default([]) }}" when: - - crictl_existing_containers.stdout | length > 0 - - container_runtime != "docker" + - crictl_existing_containers.stdout | length > 0 + - container_runtime != "docker" changed_when: false failed_when: false @@ -118,14 +118,14 @@ changed_when: false failed_when: false when: - - container_runtime == "docker" + - container_runtime == "docker" - name: remove container images command: "docker rmi -f {{ item }}" with_items: - "{{ container_images.stdout | default([]) }}" when: - - container_images.stdout | length > 0 - - container_runtime == "docker" + - container_images.stdout | length > 0 + - container_runtime == "docker" changed_when: false failed_when: false - name: list existing containers images @@ -134,14 +134,14 @@ changed_when: false failed_when: false when: - - container_runtime != "docker" + - container_runtime != "docker" - name: remove container images command: "crictl rmi {{ item }}" with_items: - "{{ crictl_container_images.stdout | default([]) }}" when: - - container_runtime != "docker" - - crictl_container_images.stdout | length > 0 + - container_runtime != "docker" + - crictl_container_images.stdout | length > 0 changed_when: false failed_when: false when: @@ -198,5 +198,42 @@ failed_when: false when: container_runtime == "crio" +- name: reset DNS settings in dhclient.conf + blockinfile: + path: "{{ item }}" + state: absent + marker: "# Ansible entries {mark}" + failed_when: false + with_items: + - /etc/dhclient.conf + - /etc/dhcp/dhclient.conf + - /etc/resolv.conf + +- name: reset entries in /etc/hosts + blockinfile: + path: "/etc/hosts" + state: absent + marker: "# Ansible inventory hosts {mark}" + failed_when: false + +- name: run dhclient to get IP after restarting network in case of failure + command: "dhclient" + changed_when: true + failed_when: false + +- name: restart network.service on RHEL / Rocky + systemd: + name: network.service + state: restarted + when: ansible_os_family == "RedHat" + failed_when: false + +- name: restart systemd-resolved on Ubuntu + systemd: + name: systemd-resolved + state: restarted + when: ansible_os_family == "Debian" + failed_when: false + - debug: msg: "k8s cluster has been removed ..." diff --git a/roles/redeploy_cleanup/tasks/main.yml b/roles/redeploy_cleanup/tasks/main.yml index 3a3db781..f2512337 100644 --- a/roles/redeploy_cleanup/tasks/main.yml +++ b/roles/redeploy_cleanup/tasks/main.yml @@ -14,6 +14,20 @@ ## limitations under the License. ## --- +- name: uninstall opentelemetry + include_role: + name: opentelemetry_install + tasks_from: cleanup + tags: + - opentelemetry + +- name: uninstall cAdvisor + include_role: + name: cadvisor_install + tasks_from: cleanup_cadvisor + tags: + - cadvisor + - name: reset and remove Kubernetes cluster import_tasks: k8s_cleanup.yml diff --git a/roles/remove_kubespray_host_dns_settings/tasks/main.yml b/roles/remove_kubespray_host_dns_settings/tasks/main.yml index 90bc32b6..df9f06bc 100644 --- a/roles/remove_kubespray_host_dns_settings/tasks/main.yml +++ b/roles/remove_kubespray_host_dns_settings/tasks/main.yml @@ -23,6 +23,7 @@ with_items: - /etc/dhclient.conf - /etc/dhcp/dhclient.conf + - /etc/resolv.conf - name: reset entries in /etc/hosts blockinfile: @@ -30,3 +31,22 @@ state: absent marker: "# Ansible inventory hosts {mark}" failed_when: false + +- name: run dhclient to get IP after restarting network in case of failure + command: "dhclient" + changed_when: true + failed_when: false + +- name: restart network.service on RHEL / Rocky + systemd: + name: network.service + state: restarted + when: ansible_os_family == "RedHat" + failed_when: false + +- name: restart systemd-resolved on Ubuntu + systemd: + name: systemd-resolved + state: restarted + when: ansible_os_family == "Debian" + failed_when: false diff --git a/roles/service_mesh_install/templates/istioctl-options.yml.j2 b/roles/service_mesh_install/templates/istioctl-options.yml.j2 deleted file mode 100644 index 22d38fef..00000000 --- a/roles/service_mesh_install/templates/istioctl-options.yml.j2 +++ /dev/null @@ -1,60 +0,0 @@ -argv: - - --skip-confirmation -{% if service_mesh.context is defined and service_mesh.context != '' %} - - --context - - {{ service_mesh.context }} -{% endif -%} -{% if service_mesh.filename is defined and service_mesh.filename != [] %} -{% for item in service_mesh.filename %} - - --filename - - {{ item }} -{% endfor -%} -{% endif -%} -{% if service_mesh.namespace is defined and service_mesh.namespace != '' %} - - --namespace - - {{ service_mesh.namespace }} -{% endif -%} -{% if service_mesh.istio_namespace is defined and service_mesh.istio_namespace != '' %} - - --istioNamespace - - {{ service_mesh.istio_namespace }} -{% endif -%} -{% if service_mesh.kubeconfig is defined and service_mesh.kubeconfig != '' %} - - --kubeconfig - - {{ service_mesh.kubeconfig }} -{% endif -%} -{% if service_mesh.vklog is defined and service_mesh.vklog != '' %} - - --vklog - - {{ service_mesh.vklog }} -{% endif -%} -{% if service_mesh.revision is defined and service_mesh.revision != '' %} - - --revision - - {{ service_mesh.revision }} -{% endif -%} -{% if service_mesh.manifest is defined and service_mesh.manifest != '' %} - - --manifests - - {{ service_mesh.manifest }} -{% endif -%} -{% if service_mesh.dry_run is defined and service_mesh.dry_run | bool %} - - --dry-run -{% endif -%} -{% if service_mesh.force is defined and service_mesh.force | bool %} - - --force -{% endif -%} -{% if service_mesh.readiness_timeout is defined and service_mesh.readiness_timeout != '' %} - - --readiness-timeout - - {{ service_mesh.readiness_timeout }} -{% endif -%} -{% if service_mesh.set is defined and service_mesh.set != [] %} -{% for item in service_mesh.set %} - - --set - - {{ item }} -{% endfor -%} -{% endif -%} -{% if service_mesh.verify is defined and service_mesh.verify | bool and service_mesh.profile != 'empty' %} - - --verify -{% endif -%} -{% if service_mesh.profile in ['default', 'demo', 'minimal', 'external', 'empty', 'preview'] %} - - --set profile={{ service_mesh.profile }} -{% else %} - - --filename={{ service_mesh_profiles_dir }}/{{ service_mesh.profile }}.yaml -{% endif -%} diff --git a/roles/service_mesh_install/templates/tcs-cluster-issuer.yaml.j2 b/roles/service_mesh_install/templates/tcs-cluster-issuer.yaml.j2 deleted file mode 100644 index 9a6e38fb..00000000 --- a/roles/service_mesh_install/templates/tcs-cluster-issuer.yaml.j2 +++ /dev/null @@ -1,6 +0,0 @@ -apiVersion: tcs.intel.com/v1alpha1 -kind: TCSClusterIssuer -metadata: - name: {{ service_mesh.sgx_signer.name }} -spec: - secretName: {{ service_mesh.sgx_signer.name }}-secret diff --git a/roles/sgx_dp_install/charts/intel-sgx-aesmd/templates/intel-sgx-aesmd-demo-clusterrole.yml b/roles/sgx_dp_install/charts/intel-sgx-aesmd/templates/intel-sgx-aesmd-demo-clusterrole.yml deleted file mode 100644 index 1b9f37ba..00000000 --- a/roles/sgx_dp_install/charts/intel-sgx-aesmd/templates/intel-sgx-aesmd-demo-clusterrole.yml +++ /dev/null @@ -1,12 +0,0 @@ -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - name: intel-sgx-aesmd-demo - labels: - app: intel-sgx-aesmd-demo -rules: -- apiGroups: ['policy'] - resources: ['podsecuritypolicies'] - verbs: ['use'] - resourceNames: - - intel-sgx-aesmd-demo diff --git a/roles/sgx_dp_install/charts/intel-sgx-aesmd/templates/intel-sgx-aesmd-demo-psp.yml b/roles/sgx_dp_install/charts/intel-sgx-aesmd/templates/intel-sgx-aesmd-demo-psp.yml deleted file mode 100644 index dfb70147..00000000 --- a/roles/sgx_dp_install/charts/intel-sgx-aesmd/templates/intel-sgx-aesmd-demo-psp.yml +++ /dev/null @@ -1,25 +0,0 @@ ---- -apiVersion: policy/v1beta1 -kind: PodSecurityPolicy -metadata: - name: intel-sgx-aesmd-demo - labels: - app: intel-sgx-aesmd-demo -spec: - privileged: true - hostNetwork: true - allowPrivilegeEscalation: true - allowedCapabilities: - - '*' - allowedUnsafeSysctls: - - '*' - fsGroup: - rule: RunAsAny - runAsUser: - rule: RunAsAny - seLinux: - rule: RunAsAny - supplementalGroups: - rule: RunAsAny - volumes: - - '*' diff --git a/roles/sgx_dp_install/charts/intel-sgx-aesmd/templates/intel-sgx-aesmd-demo-rolebinding.yml b/roles/sgx_dp_install/charts/intel-sgx-aesmd/templates/intel-sgx-aesmd-demo-rolebinding.yml deleted file mode 100644 index dfb0f3a3..00000000 --- a/roles/sgx_dp_install/charts/intel-sgx-aesmd/templates/intel-sgx-aesmd-demo-rolebinding.yml +++ /dev/null @@ -1,15 +0,0 @@ -apiVersion: rbac.authorization.k8s.io/v1 -kind: RoleBinding -metadata: - name: intel-sgx-aesmd-demo - namespace: "{{ .Release.Namespace }}" - labels: - app: intel-sgx-aesmd-demo -roleRef: - kind: ClusterRole - name: intel-sgx-aesmd-demo - apiGroup: rbac.authorization.k8s.io -subjects: - - kind: ServiceAccount - name: intel-sgx-aesmd-demo - namespace: {{ .Release.Namespace }} diff --git a/roles/sgx_dp_install/charts/intel-sgx-aesmd/templates/intel-sgx-aesmd-demo-serviceaccount.yml b/roles/sgx_dp_install/charts/intel-sgx-aesmd/templates/intel-sgx-aesmd-demo-serviceaccount.yml deleted file mode 100644 index ad923477..00000000 --- a/roles/sgx_dp_install/charts/intel-sgx-aesmd/templates/intel-sgx-aesmd-demo-serviceaccount.yml +++ /dev/null @@ -1,7 +0,0 @@ -apiVersion: v1 -kind: ServiceAccount -metadata: - name: intel-sgx-aesmd-demo - namespace: {{ .Release.Namespace }} - labels: - app: intel-sgx-aesmd-demo diff --git a/roles/sgx_dp_install/defaults/main.yaml b/roles/sgx_dp_install/defaults/main.yaml index ce8877b4..b5980b12 100644 --- a/roles/sgx_dp_install/defaults/main.yaml +++ b/roles/sgx_dp_install/defaults/main.yaml @@ -15,10 +15,9 @@ ## --- intel_sgx_dp_git_url: "https://github.com/intel/intel-device-plugins-for-kubernetes.git" -intel_sgx_dp_git_ref: "v0.23.0" -intel_sgx_dp_version: "0.23.0" +intel_sgx_dp_git_ref: "v0.24.0" +intel_sgx_dp_version: "0.24.0" intel_sgx_dp_dir: "{{ (project_root_dir, 'intel-sgx-dp') | path_join }}" -intel_sgx_psp_rbac_dir: "{{ (project_root_dir, 'psp-rbac-rules') | path_join }}" sgx_dp_build_image_locally: true sgx_dp_provision_limit: 20 diff --git a/roles/sgx_dp_install/files/sgx-psp.yml b/roles/sgx_dp_install/files/sgx-psp.yml deleted file mode 100644 index b8038c81..00000000 --- a/roles/sgx_dp_install/files/sgx-psp.yml +++ /dev/null @@ -1,22 +0,0 @@ ---- -apiVersion: policy/v1beta1 -kind: PodSecurityPolicy -metadata: - name: intel-sgx-plugin -spec: - privileged: true - allowPrivilegeEscalation: true - allowedCapabilities: - - '*' - allowedUnsafeSysctls: - - '*' - fsGroup: - rule: RunAsAny - runAsUser: - rule: RunAsAny - seLinux: - rule: RunAsAny - supplementalGroups: - rule: RunAsAny - volumes: - - '*' diff --git a/roles/sgx_dp_install/files/sgx-rbac-cluster-role.yml b/roles/sgx_dp_install/files/sgx-rbac-cluster-role.yml deleted file mode 100644 index 5b8f3b4a..00000000 --- a/roles/sgx_dp_install/files/sgx-rbac-cluster-role.yml +++ /dev/null @@ -1,10 +0,0 @@ -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - name: intel-sgx-plugin -rules: -- apiGroups: ['policy'] - resources: ['podsecuritypolicies'] - verbs: ['use'] - resourceNames: - - intel-sgx-plugin diff --git a/roles/sgx_dp_install/tasks/main.yaml b/roles/sgx_dp_install/tasks/main.yaml index 363ee81a..d731d3d1 100644 --- a/roles/sgx_dp_install/tasks/main.yaml +++ b/roles/sgx_dp_install/tasks/main.yaml @@ -14,19 +14,11 @@ ## limitations under the License. ## --- -- name: determine machine type - include_role: - name: check_machine_type - when: - - inventory_hostname == groups['kube_node'][0] - - name: install dependencies include_role: name: install_dependencies when: - inventory_hostname == groups['kube_node'][0] - - is_icx | default(false) | bool or - is_spr | default(false) | bool - name: clone Intel Device Plugins repository git: @@ -36,8 +28,6 @@ force: yes when: - inventory_hostname == groups['kube_node'][0] - - is_icx | default(false) | bool or - is_spr | default(false) | bool # docker is used as container runtime: - name: prepare containers images @@ -72,8 +62,6 @@ when: sgx_dp_build_image_locally when: - inventory_hostname == groups['kube_node'][0] - - is_icx | default(false) | bool or - is_spr | default(false) | bool - container_runtime == "docker" # containerd/cri-o is used as container runtime: @@ -85,8 +73,8 @@ chdir: "{{ intel_sgx_dp_dir }}" changed_when: true with_items: - - { file: intel-sgx-initcontainer.Dockerfile, name: intel-sgx-initcontainer } - - { file: intel-sgx-plugin.Dockerfile, name: intel-sgx-plugin } + - {file: intel-sgx-initcontainer.Dockerfile, name: intel-sgx-initcontainer} + - {file: intel-sgx-plugin.Dockerfile, name: intel-sgx-plugin} when: sgx_dp_build_image_locally - name: push Intel SGX Device Plugin images to local registry @@ -98,49 +86,8 @@ when: sgx_dp_build_image_locally when: - inventory_hostname == groups['kube_node'][0] - - is_icx | default(false) | bool or - is_spr | default(false) | bool - '"docker" not in container_runtime' -# start deployment of SGX DP -- name: prepare and deploy PSP and RBAC - block: - - name: make sure directory for PSP and RBAC rules exists - file: - path: "{{ intel_sgx_psp_rbac_dir }}" - state: directory - mode: 0755 - - - name: copy PSP and RBAC files - copy: - src: "{{ (role_path , 'files', item) | path_join }}" - dest: "{{ (intel_sgx_psp_rbac_dir, item) | path_join }}" - mode: 0755 - loop: - - sgx-psp.yml - - sgx-rbac-cluster-role.yml - - - name: populate RBAC role binding yaml file and push to controller node - template: - src: "sgx-rbac-role-binding.yml.j2" - dest: "{{ (intel_sgx_psp_rbac_dir, 'sgx-rbac-role-binding.yml') | path_join }}" - force: yes - mode: preserve - - - name: create PSP and RBAC - k8s: - state: present - src: "{{ (intel_sgx_psp_rbac_dir, item) | path_join }}" - loop: - - sgx-psp.yml - - sgx-rbac-cluster-role.yml - - sgx-rbac-role-binding.yml - when: - - inventory_hostname == groups['kube_control_plane'][0] - - hostvars[groups['kube_node'][0]]['is_icx'] or - hostvars[groups['kube_node'][0]]['is_spr'] - - psp_enabled | default(true) - - name: prepare and deploy Intel SGX Device Plugin block: - name: set values @@ -163,15 +110,10 @@ src: "{{ (project_root_dir, 'intel-sgx-plugin.yml') | path_join }}" when: - inventory_hostname == groups['kube_control_plane'][0] - - hostvars[groups['kube_node'][0]]['is_icx'] or - hostvars[groups['kube_node'][0]]['is_spr'] - name: wait for Intel SGX Device Plugin pause: minutes: 1 - when: - - is_icx | default(false) | bool or - is_spr | default(false) | bool - name: build Intel sgx-aesmd demo image Docker engine block: @@ -186,14 +128,14 @@ - name: tag Intel sgx-aesmd image command: docker tag intel/sgx-aesmd-demo:{{ intel_sgx_dp_version }} {{ registry_local_address }}/intel-sgx-aesmd-demo:{{ intel_sgx_dp_version }} + changed_when: true - name: push Intel sgx-aesmd image to local registry command: docker push {{ registry_local_address }}/intel-sgx-aesmd-demo:{{ intel_sgx_dp_version }} + changed_when: true when: - sgx_aesmd_demo_enable | default(false) | bool - inventory_hostname == groups['kube_node'][0] - - is_icx | default(false) | bool or - is_spr | default(false) | bool - container_runtime == "docker" - name: build Intel sgx-aesmd demo image non-Docker engine @@ -210,8 +152,6 @@ when: - sgx_aesmd_demo_enable | default(false) | bool - inventory_hostname == groups['kube_node'][0] - - is_icx | default(false) | bool or - is_spr | default(false) | bool - '"docker" not in container_runtime' - name: prepare and deploy Intel SGX aesmd demo @@ -242,8 +182,7 @@ --namespace {{ sgx_aesmd_namespace }} --create-namespace {{ (project_root_dir, 'charts', 'intel-sgx-aesmd') | path_join }} + changed_when: true when: - sgx_aesmd_demo_enable | default(false) | bool - inventory_hostname == groups['kube_control_plane'][0] - - hostvars[groups['kube_node'][0]]['is_icx'] or - hostvars[groups['kube_node'][0]]['is_spr'] diff --git a/roles/sgx_dp_install/templates/intel-sgx-plugin.yml.j2 b/roles/sgx_dp_install/templates/intel-sgx-plugin.yml.j2 index 60bc739d..724b086e 100644 --- a/roles/sgx_dp_install/templates/intel-sgx-plugin.yml.j2 +++ b/roles/sgx_dp_install/templates/intel-sgx-plugin.yml.j2 @@ -3,11 +3,11 @@ kind: SgxDevicePlugin metadata: name: intel-sgx-device-plugin spec: - image: {{ sgx_dp_image | default("docker.io/intel/intel-sgx-plugin") }}:{{ sgx_dp_version | default("0.23.0") }} - initImage: {{ sgx_dp_init_image | default("docker.io/intel/intel-sgx-initcontainer") }}:{{ sgx_dp_version | default("0.23.0") }} + image: {{ sgx_dp_image | default("docker.io/intel/intel-sgx-plugin") }}:{{ sgx_dp_version | default("0.24.0") }} + initImage: {{ sgx_dp_init_image | default("docker.io/intel/intel-sgx-initcontainer") }}:{{ sgx_dp_version | default("0.24.0") }} enclaveLimit: {{ sgx_dp_enclave_limit }} provisionLimit: {{ sgx_dp_provision_limit }} logLevel: {{ sgx_dp_verbosity | default(4) }} nodeSelector: - feature.node.kubernetes.io/intel.sgx: 'true' + intel.feature.node.kubernetes.io/sgx: 'true' sgx.configured: 'true' diff --git a/roles/sgx_dp_install/templates/sgx-rbac-role-binding.yml.j2 b/roles/sgx_dp_install/templates/sgx-rbac-role-binding.yml.j2 deleted file mode 100644 index 09c5e381..00000000 --- a/roles/sgx_dp_install/templates/sgx-rbac-role-binding.yml.j2 +++ /dev/null @@ -1,13 +0,0 @@ -apiVersion: rbac.authorization.k8s.io/v1 -kind: RoleBinding -metadata: - name: intel-sgx-plugin - namespace: "{{ intel_dp_namespace }}" -roleRef: - kind: ClusterRole - name: intel-sgx-plugin - apiGroup: rbac.authorization.k8s.io -subjects: -- kind: Group - name: system:authenticated - apiGroup: rbac.authorization.k8s.io diff --git a/roles/sriov_cni_install/tasks/main.yml b/roles/sriov_cni_install/tasks/main.yml index f21e2719..389fa2f9 100644 --- a/roles/sriov_cni_install/tasks/main.yml +++ b/roles/sriov_cni_install/tasks/main.yml @@ -41,10 +41,10 @@ - name: create /opt/cni/bin file: - path: "/opt/cni/bin" - state: directory - recurse: yes - mode: 0755 + path: "/opt/cni/bin" + state: directory + recurse: yes + mode: 0755 - name: install sriov-cni binary to /opt/cni/bin directory copy: diff --git a/roles/sriov_dp_install/charts/sriov-net-dp/values.yaml b/roles/sriov_dp_install/charts/sriov-net-dp/values.yaml index b4b8c36c..f6d0a500 100644 --- a/roles/sriov_dp_install/charts/sriov-net-dp/values.yaml +++ b/roles/sriov_dp_install/charts/sriov-net-dp/values.yaml @@ -18,7 +18,7 @@ namespace: kube-system image: repository: ghcr.io/k8snetworkplumbingwg/sriov-network-device-plugin - tag: v3.4.0 + tag: v3.5.0 pullPolicy: IfNotPresent configPath: /etc/pcidp/config.json diff --git a/roles/sriov_dp_install/defaults/main.yml b/roles/sriov_dp_install/defaults/main.yml index e1dbf9e7..faa904fa 100644 --- a/roles/sriov_dp_install/defaults/main.yml +++ b/roles/sriov_dp_install/defaults/main.yml @@ -17,7 +17,7 @@ sriov_net_dp_git_url: "https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin.git" sriov_net_dp_dir: "{{ (project_root_dir, 'sriov-network-device-plugin') | path_join }}" sriov_net_dp_image: "ghcr.io/k8snetworkplumbingwg/sriov-network-device-plugin" -sriov_net_dp_tag: "v3.4.0" +sriov_net_dp_tag: "v3.5.0" sriov_net_dp_build_image_locally: false sriov_net_dp_namespace: kube-system diff --git a/roles/sriov_dp_install/tasks/main.yml b/roles/sriov_dp_install/tasks/main.yml index 0ae2b05f..ea68092f 100644 --- a/roles/sriov_dp_install/tasks/main.yml +++ b/roles/sriov_dp_install/tasks/main.yml @@ -83,7 +83,7 @@ chdir: "{{ sriov_net_dp_dir }}" changed_when: true with_items: - - { file: images/Dockerfile, name: sriov-device-plugin } + - {file: images/Dockerfile, name: sriov-device-plugin} register: sriov_dp_image_build retries: 5 until: sriov_dp_image_build is success @@ -100,7 +100,7 @@ chdir: "{{ sriov_net_dp_dir }}" changed_when: true with_items: - - { file: images/Dockerfile, name: sriov-device-plugin } + - {file: images/Dockerfile, name: sriov-device-plugin} register: sriov_dp_image_build retries: 5 until: sriov_dp_image_build is success diff --git a/roles/sriov_dp_install/templates/helm_values.yml.j2 b/roles/sriov_dp_install/templates/helm_values.yml.j2 index 3b5a24ca..d859c0cf 100644 --- a/roles/sriov_dp_install/templates/helm_values.yml.j2 +++ b/roles/sriov_dp_install/templates/helm_values.yml.j2 @@ -3,7 +3,7 @@ namespace: {{ sriov_net_dp_namespace | default("kube-system") }} image: repository: {{ sriov_net_dp_image | default("ghcr.io/k8snetworkplumbingwg/sriov-network-device-plugin") }} - tag: {{ sriov_net_dp_tag | default("v3.4.0") }} + tag: {{ sriov_net_dp_tag | default("v3.5.0") }} pullPolicy: IfNotPresent configPath: {{ sriov_net_dp_config_path | default("/etc/pcidp/config.json") }} diff --git a/roles/sriov_network_operator_install/charts/sriov-network-operator/bindata/manifests/plugins/002-rbac.yaml b/roles/sriov_network_operator_install/charts/sriov-network-operator/bindata/manifests/plugins/002-rbac.yaml index 5e752e76..9d191333 100644 --- a/roles/sriov_network_operator_install/charts/sriov-network-operator/bindata/manifests/plugins/002-rbac.yaml +++ b/roles/sriov_network_operator_install/charts/sriov-network-operator/bindata/manifests/plugins/002-rbac.yaml @@ -19,14 +19,7 @@ rules: - securitycontextconstraints verbs: - use - - apiGroups: - - policy - resources: - - podsecuritypolicies - verbs: - - use - resourceNames: - - sriov-psp + --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding diff --git a/roles/sriov_network_operator_install/charts/sriov-network-operator/templates/clusterrole.yaml b/roles/sriov_network_operator_install/charts/sriov-network-operator/templates/clusterrole.yaml index e5c9b781..c2d1461b 100644 --- a/roles/sriov_network_operator_install/charts/sriov-network-operator/templates/clusterrole.yaml +++ b/roles/sriov_network_operator_install/charts/sriov-network-operator/templates/clusterrole.yaml @@ -32,11 +32,7 @@ rules: - apiGroups: ["machineconfiguration.openshift.io"] resources: ["*"] verbs: ["*"] - - apiGroups: ['policy'] - resources: ['podsecuritypolicies'] - verbs: ['use'] - resourceNames: - - sriov-psp + --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole @@ -57,8 +53,3 @@ rules: - apiGroups: [""] resources: ["pods/eviction"] verbs: ["create"] - - apiGroups: ['policy'] - resources: ['podsecuritypolicies'] - verbs: ['use'] - resourceNames: - - sriov-psp diff --git a/roles/sriov_network_operator_install/charts/sriov-network-operator/templates/psp.yaml b/roles/sriov_network_operator_install/charts/sriov-network-operator/templates/psp.yaml deleted file mode 100644 index ceb6afbc..00000000 --- a/roles/sriov_network_operator_install/charts/sriov-network-operator/templates/psp.yaml +++ /dev/null @@ -1,26 +0,0 @@ ---- -apiVersion: policy/v1beta1 -kind: PodSecurityPolicy -metadata: - name: sriov-psp -spec: - allowPrivilegeEscalation: true - allowedCapabilities: - - '*' - allowedUnsafeSysctls: - - '*' - readOnlyRootFilesystem: false - fsGroup: - rule: RunAsAny - hostNetwork: true - hostIPC: true - hostPID: true - privileged: true - runAsUser: - rule: RunAsAny - seLinux: - rule: RunAsAny - supplementalGroups: - rule: RunAsAny - volumes: - - "*" diff --git a/roles/sriov_network_operator_install/charts/sriov-network-operator/templates/role.yaml b/roles/sriov_network_operator_install/charts/sriov-network-operator/templates/role.yaml index 48244413..e9e5f66f 100644 --- a/roles/sriov_network_operator_install/charts/sriov-network-operator/templates/role.yaml +++ b/roles/sriov_network_operator_install/charts/sriov-network-operator/templates/role.yaml @@ -51,11 +51,6 @@ rules: - rolebindings verbs: - '*' - - apiGroups: ['policy'] - resources: ['podsecuritypolicies'] - verbs: ['use'] - resourceNames: - - sriov-psp --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role @@ -105,11 +100,7 @@ rules: - 'leases' verbs: - '*' - - apiGroups: ['policy'] - resources: ['podsecuritypolicies'] - verbs: ['use'] - resourceNames: - - sriov-psp + --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role @@ -125,8 +116,3 @@ rules: - configmaps verbs: - get - - apiGroups: ['policy'] - resources: ['podsecuritypolicies'] - verbs: ['use'] - resourceNames: - - sriov-psp diff --git a/roles/sriov_network_operator_install/defaults/main.yml b/roles/sriov_network_operator_install/defaults/main.yml index 49ea989b..6023815e 100644 --- a/roles/sriov_network_operator_install/defaults/main.yml +++ b/roles/sriov_network_operator_install/defaults/main.yml @@ -13,8 +13,8 @@ ## See the License for the specific language governing permissions and ## limitations under the License. ## - -#sriov-network-operator: github.com/k8snetworkplumbingwg/sriov-network-operator +--- +# sriov-network-operator: github.com/k8snetworkplumbingwg/sriov-network-operator sriov_network_charts_dir: "{{ (project_root_dir, 'charts', 'sriov-network-operator') | path_join }}" sriov_network_policies_dir: "{{ (project_root_dir, 'charts', 'sriov-network-nodes-policies') | path_join }}" @@ -24,13 +24,13 @@ sriov_network_operator_helm_release_name: "sriov-network-operator" # helm values defaults sriov_network_operator_images: - operator: ghcr.io/k8snetworkplumbingwg/sriov-network-operator@sha256:3f1b2e2d0792e96f2c6031a8bc8e42a4a093e573844f15f6f6f2c086646b19ba - sriovConfigDaemon: ghcr.io/k8snetworkplumbingwg/sriov-network-operator-config-daemon@sha256:9916b47350ad85be883e54e761baf617d6ca9b3bb4656b75d57194b52af46995 - sriovCni: ghcr.io/k8snetworkplumbingwg/sriov-cni@sha256:ab4675dcea6c4a810997e728da3a7890ea47ec9f13cf5e744c1b0bd3cc1a6978 - ibSriovCni: ghcr.io/k8snetworkplumbingwg/ib-sriov-cni@sha256:3eb56b423056c145a1a428df9dd2681fd3ccdd85ffd6d620e5dbe295dd721507 - sriovDevicePlugin: ghcr.io/k8snetworkplumbingwg/sriov-network-device-plugin@sha256:830672be46c0c828333af0fe076e5c515396baea3a7c0ad21752077e62c4594c - resourcesInjector: ghcr.io/k8snetworkplumbingwg/network-resources-injector@sha256:48e66db279e4965e1172323dff1d909ea68106685b784340838de3a64f7afbd6 - webhook: ghcr.io/k8snetworkplumbingwg/sriov-network-operator-webhook@sha256:e435b2a4676e0ee74189636641a6e14b0979b9b96ec2904b03a1b446dfbb1262 + operator: ghcr.io/k8snetworkplumbingwg/sriov-network-operator@sha256:cb125d483afa434b78cc722ace1a3a4e8d768f73c5c7f74960e6f8e5f5bc5660 + sriovConfigDaemon: ghcr.io/k8snetworkplumbingwg/sriov-network-operator-config-daemon@sha256:018fb75722a5287bcd74c4b6aac27e3480bad1c39465873d2ecc62cd04c92001 + sriovCni: ghcr.io/k8snetworkplumbingwg/sriov-cni@sha256:99ba2d85c3dbd2ec5ad9c56e1a41d309ad9f773964c7350590619562c420d938 + ibSriovCni: ghcr.io/k8snetworkplumbingwg/ib-sriov-cni@sha256:2091243e0d6bee15c588f93e9e7b19f35c7170e6a45d7fd7363fa46d114cb09d + sriovDevicePlugin: ghcr.io/k8snetworkplumbingwg/sriov-network-device-plugin@sha256:5b73397cd20ee12c3e823b374fee215fac113601498e16660fad77ea1add78d7 + resourcesInjector: ghcr.io/k8snetworkplumbingwg/network-resources-injector@sha256:6f3fcc4aa4eead8aae8179c02a6ba0f3c63c17c0a32817634cb7437b6539c2b3 + webhook: ghcr.io/k8snetworkplumbingwg/sriov-network-operator-webhook@sha256:c091ca374caec31afdb6439a37ac6de724990d385b36951d38d5d535f0b84057 sriov_resource_name_prefix: "intel.com" sriov_network_operator_name_override: "" diff --git a/roles/sriov_network_operator_install/tasks/load_ddp_profile.yml b/roles/sriov_network_operator_install/tasks/load_ddp_profile.yml index 73a19767..1ad7c763 100644 --- a/roles/sriov_network_operator_install/tasks/load_ddp_profile.yml +++ b/roles/sriov_network_operator_install/tasks/load_ddp_profile.yml @@ -34,4 +34,4 @@ - nic_port.ddp_profile|default("")|length > 0 loop: "{{ dataplane_interfaces }}" loop_control: - loop_var: nic_port \ No newline at end of file + loop_var: nic_port diff --git a/roles/sriov_network_operator_install/tasks/sriov_network_node_policy_configure.yml b/roles/sriov_network_operator_install/tasks/sriov_network_node_policy_configure.yml index bff56f33..03110415 100644 --- a/roles/sriov_network_operator_install/tasks/sriov_network_node_policy_configure.yml +++ b/roles/sriov_network_operator_install/tasks/sriov_network_node_policy_configure.yml @@ -43,3 +43,17 @@ args: chdir: "{{ sriov_network_policies_dir }}" changed_when: true + +- name: wait for SriovNetworkNodeState CR after sriov network node policy + shell: >- + set -o pipefail && kubectl get SriovNetworkNodeState {{ item }} + -n {{ sriov_network_operator_namespace }} -o yaml | grep 'syncStatus' + args: + executable: /bin/bash + register: sriov_wait_policy + until: "'InProgress' not in sriov_wait_policy.stdout" + failed_when: "'Failed' in sriov_wait_policy.stdout or 'InProgress' in sriov_wait_policy.stdout or sriov_wait_policy.rc != 0" + retries: 50 + delay: 10 + changed_when: false + loop: "{{ groups['kube_node'] }}" diff --git a/roles/sriov_network_operator_install/tasks/sriov_network_operator_install.yml b/roles/sriov_network_operator_install/tasks/sriov_network_operator_install.yml index 7322d318..2fb53ffa 100644 --- a/roles/sriov_network_operator_install/tasks/sriov_network_operator_install.yml +++ b/roles/sriov_network_operator_install/tasks/sriov_network_operator_install.yml @@ -53,17 +53,6 @@ - hostvars[item]['dataplane_interfaces'] | default({}) | length > 0 loop: "{{ groups['kube_node'] }}" -# NOTE(kmlynekx): this is workaround for -# https://github.com/kubernetes-sigs/node-feature-discovery/issues/812 -- name: label configured nodes with feature.node.kubernetes.io/network-sriov.capable=true label - shell: "set -o pipefail && kubectl label nodes {{ hostvars[item]['ansible_hostname'] }} feature.node.kubernetes.io/network-sriov.capable=true --overwrite" - args: - executable: /bin/bash - changed_when: true - when: - - hostvars[item]['iommu_enabled'] - loop: "{{ groups['kube_node'] }}" - - name: deploy sriov-network-operator command: | helm upgrade \ @@ -74,3 +63,40 @@ args: chdir: "{{ sriov_network_charts_dir }}" changed_when: true + +- name: wait for sriov-network-config-daemon daemonset + pause: + seconds: 20 + +- name: wait for sriov-network-config-daemon pods + shell: >- + set -o pipefail && kubectl get ds sriov-network-config-daemon + -n {{ sriov_network_operator_namespace }} -o yaml | grep 'numberReady' + args: + executable: /bin/bash + register: sriov_daemon_wait + until: "'0' not in sriov_daemon_wait.stdout" + failed_when: "'0' in sriov_daemon_wait.stdout or sriov_daemon_wait.rc != 0" + retries: 30 + delay: 10 + changed_when: false + +- name: check if SriovNetworkNodeState CR was created + command: kubectl get SriovNetworkNodeState -n {{ sriov_network_operator_namespace }} + register: sriov_node_state_check + until: "'No resources found in {{ sriov_network_operator_namespace }}' not in sriov_node_state_check.stderr" + failed_when: "'No resources found in {{ sriov_network_operator_namespace }}' in sriov_node_state_check.stderr" + retries: 30 + delay: 10 + changed_when: false + +- name: wait for SriovNetworkNodeState CR sync + command: kubectl get SriovNetworkNodeState {{ item }} -n {{ sriov_network_operator_namespace }} -o yaml + register: sriov_node_wait + until: sriov_node_wait.stdout is not search('InProgress') and sriov_node_wait.stdout is search('syncStatus') + failed_when: + sriov_node_wait.stdout is search('Failed') or sriov_node_wait.stdout is search('InProgress') or sriov_node_wait.stdout is not search('syncStatus') + retries: 30 + delay: 10 + changed_when: false + loop: "{{ groups['kube_node'] }}" diff --git a/roles/sriov_network_operator_install/templates/values.yml.j2 b/roles/sriov_network_operator_install/templates/values.yml.j2 index 52f08761..4e29b04a 100644 --- a/roles/sriov_network_operator_install/templates/values.yml.j2 +++ b/roles/sriov_network_operator_install/templates/values.yml.j2 @@ -3,8 +3,22 @@ operator: - key: "node-role.kubernetes.io/master" operator: "Exists" effect: "NoSchedule" - nodeSelector: - node-role.kubernetes.io/master: "" + - key: "node-role.kubernetes.io/control-plane" + operator: "Exists" + effect: "NoSchedule" + nodeSelector: {} + affinity: + nodeAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + nodeSelectorTerms: + - matchExpressions: + - key: "node-role.kubernetes.io/master" + operator: In + values: [ "" ] + - matchExpressions: + - key: "node-role.kubernetes.io/control-plane" + operator: In + values: [ "" ] nameOverride: "{{ sriov_network_operator_name_override }}" fullnameOverride: "{{ sriov_network_operator_full_name_override }}" resourcePrefix: "{{ sriov_resource_name_prefix }}" diff --git a/roles/sriov_nic_init/files/cek_sriov_nic_init b/roles/sriov_nic_init/files/cek_sriov_nic_init index b4179567..22b4ca80 100644 --- a/roles/sriov_nic_init/files/cek_sriov_nic_init +++ b/roles/sriov_nic_init/files/cek_sriov_nic_init @@ -18,6 +18,8 @@ DEVBIND_TOOL=${DEVBIND_TOOL:-"/usr/local/bin/dpdk-devbind.py"} SRIOV_NUMVFS_MAPPINGS=${SRIOV_NUMVFS_MAPPINGS:-"/etc/cek/cek_sriov_numvfs"} DEVICE_DRIVER_MAPPINGS=${DEVICE_DRIVER_MAPPINGS:-"/etc/cek/cek_interfaces"} +FORCE_DRIVER_BINDING=${FORCE_DRIVER_BINDING:-"/etc/cek/cek_force_driver_binding"} +DO_DRIVER_BINDING="" setup_vfs() { echo "Setting up VFs" @@ -26,9 +28,9 @@ setup_vfs() { return 0 fi - while read -r pci_address numvfs; do - if [[ ${pci_address} == "" ]] || [[ ${numvfs} == "" ]]; then - echo "Empty PCI address or number of VFs, skipping..." + while read -r pci_address numvfs interface_name; do + if [[ ${pci_address} == "" ]] || [[ ${numvfs} == "" ]] || [[ ${interface_name} == "" ]]; then + echo "Empty PCI address or number of VFs or interface name, skipping..." continue fi @@ -45,6 +47,9 @@ setup_vfs() { # if change is needed we must reset it first echo 0 > "${numvfs_path}" echo "${numvfs}" > "${numvfs_path}" + DO_DRIVER_BINDING="${DO_DRIVER_BINDING} ${interface_name}" + else + echo "${numvfs} Virtual Functions are already present on ${pci_address} Do nothing." fi echo "Setting admin state UP for ${pci_address}" @@ -57,47 +62,58 @@ setup_vfs() { } bind_all() { - if [[ ! -r "${DEVICE_DRIVER_MAPPINGS}" ]]; then - echo "File ${DEVICE_DRIVER_MAPPINGS} doesn't exist, driver bindings won't be changed" - return 0 + if [[ -r "${FORCE_DRIVER_BINDING}" ]]; then + echo "Force driver binding" + while read -r interface_name; do + DO_DRIVER_BINDING="${DO_DRIVER_BINDING} ${interface_name}" + done < "${FORCE_DRIVER_BINDING}" + rm -f ${FORCE_DRIVER_BINDING} fi - while read -r pci_address driver; do - if [[ ${pci_address} == "" ]] || [[ ${driver} == "" ]]; then - echo "Empty PCI address or driver, skipping..." - continue + for if_name in ${DO_DRIVER_BINDING}; do + echo "Driver binding to VFs from PF ${if_name}" + if [[ ! -r "${DEVICE_DRIVER_MAPPINGS}_${if_name}" ]]; then + echo "File ${DEVICE_DRIVER_MAPPINGS}_${if_name} doesn't exist, driver bindings won't be changed" + return 0 fi - echo "Binding ${pci_address} to ${driver}" + while read -r pci_address driver; do + if [[ ${pci_address} == "" ]] || [[ ${driver} == "" ]]; then + echo "Empty PCI address or driver, skipping..." + continue + fi - device_path="/sys/bus/pci/devices/${pci_address}" + echo "Binding ${pci_address} to ${driver}" - # skip if device doesn't exist - if [[ ! -e "${device_path}" ]]; then - echo "Could not find device ${pci_address}, skipping..." - continue - fi + device_path="/sys/bus/pci/devices/${pci_address}" - # get current driver - if [[ -L "${device_path}/driver" ]]; then - current_driver=$(readlink "${device_path}/driver") - current_driver=$(basename "$current_driver") - echo "Current driver of ${pci_address} is ${current_driver}" - else - current_driver="" - fi + # skip if device doesn't exist + if [[ ! -e "${device_path}" ]]; then + echo "Could not find device ${pci_address}, skipping..." + continue + fi - # don't bind if not needed - if [[ "${driver}" != "${current_driver}" ]]; then - modprobe -q "${driver}" || true - if [[ -e "/sys/bus/pci/drivers/${driver}" ]]; then - $DEVBIND_TOOL -b "${driver}" --force "${pci_address}" + # get current driver + if [[ -L "${device_path}/driver" ]]; then + current_driver=$(readlink "${device_path}/driver") + current_driver=$(basename "$current_driver") + echo "Current driver of ${pci_address} is ${current_driver}" else - echo "Failed to bind ${pci_address}: target driver ${driver} doesn't exist" + current_driver="" + fi + + # don't bind if not needed + if [[ "${driver}" != "${current_driver}" ]]; then + modprobe -q "${driver}" || true + if [[ -e "/sys/bus/pci/drivers/${driver}" ]]; then + $DEVBIND_TOOL -b "${driver}" --force "${pci_address}" + else + echo "Failed to bind ${pci_address}: target driver ${driver} doesn't exist" + fi fi - fi - done < "${DEVICE_DRIVER_MAPPINGS}" + done < "${DEVICE_DRIVER_MAPPINGS}_${if_name}" + done } setup_vfs diff --git a/roles/sriov_nic_init/tasks/bind_vf_driver.yml b/roles/sriov_nic_init/tasks/bind_vf_driver.yml index 7a8eac3a..494874e2 100644 --- a/roles/sriov_nic_init/tasks/bind_vf_driver.yml +++ b/roles/sriov_nic_init/tasks/bind_vf_driver.yml @@ -46,7 +46,13 @@ loop_control: loop_var: vf extended: yes - when: ansible_loop.index < (item.sriov_numvfs | default(0) | int ) + when: ansible_loop.index < (item.sriov_numvfs | default(0) | int ) + +- name: clean up existing configuration file cek_interfaces_{{ item.name }} + file: + path: "{{ sriov_config_path }}/cek_interfaces_{{ item.name }}" + state: absent + become: yes # get a list of VFs PCI addresses and save the configuration - name: attach VFs driver @@ -60,13 +66,14 @@ - name: save VF driver binding lineinfile: - path: "{{ sriov_config_path }}/cek_interfaces" + path: "{{ sriov_config_path }}/cek_interfaces_{{ item.name }}" line: "{{ this_item[0] }} {{ this_item[1].value }}" regexp: "^{{ this_item[0] }}" create: yes owner: root group: root mode: '0600' + become: yes loop: "{{ vf_pciids.stdout_lines | zip(vfs_acc | dict2items) | list }}" loop_control: loop_var: this_item diff --git a/roles/sriov_nic_init/tasks/create_vfs.yml b/roles/sriov_nic_init/tasks/create_vfs.yml index f89aa433..f0277c7e 100644 --- a/roles/sriov_nic_init/tasks/create_vfs.yml +++ b/roles/sriov_nic_init/tasks/create_vfs.yml @@ -34,11 +34,22 @@ # in case when SR-IOV VFs have been already configured we reset it first to avoid "device or resource busy" error - name: reset SR-IOV Virtual Functions shell: echo 0 > /sys/class/net/{{ item.name }}/device/sriov_numvfs - when: existing_vfs|int != item.sriov_numvfs + when: existing_vfs.stdout|int != 0 and existing_vfs.stdout|int != item.sriov_numvfs - name: enable SR-IOV Virtual Functions shell: echo {{ item.sriov_numvfs }} > /sys/class/net/{{ item.name }}/device/sriov_numvfs - when: existing_vfs|int != item.sriov_numvfs + when: existing_vfs.stdout|int != item.sriov_numvfs + + - name: force driver binding when VFs are created + lineinfile: + path: "{{ sriov_config_path }}/cek_force_driver_binding" + line: "{{ item.name }}" + regexp: "^{{ item.name }}" + create: yes + owner: root + group: root + mode: '0600' + when: existing_vfs.stdout|int != item.sriov_numvfs - name: get full PCI address of the PF shell: basename $(readlink -f /sys/class/net/{{ item.name }}/device) @@ -50,11 +61,12 @@ - name: save number of VFs lineinfile: path: "{{ sriov_config_path }}/cek_sriov_numvfs" - line: "{{ pf_pci_address.stdout }} {{ item.sriov_numvfs | default(0) | int }}" + line: "{{ pf_pci_address.stdout }} {{ item.sriov_numvfs | default(0) | int }} {{ item.name }}" regexp: "^{{ pf_pci_address.stdout }}" create: yes owner: root group: root mode: '0600' - when: - - item.sriov_numvfs | default(0) | int != 0 + become: yes + when: + - item.sriov_numvfs | default(0) | int != 0 diff --git a/roles/sriov_nic_init/tasks/main.yml b/roles/sriov_nic_init/tasks/main.yml index c4926725..d29cb3da 100644 --- a/roles/sriov_nic_init/tasks/main.yml +++ b/roles/sriov_nic_init/tasks/main.yml @@ -54,15 +54,17 @@ - name: clean up existing configuration file file: - path: "{{ item }}" + path: "{{ sriov_config_path }}/{{ item }}" state: absent - become: yes with_items: - - "{{ sriov_config_path }}/cek_sriov_numvfs" - - "{{ sriov_config_path }}/cek_interfaces" + - cek_sriov_numvfs + - cek_force_driver_binding + become: yes - name: configure VFs include_tasks: create_vfs.yml + when: + - item.sriov_numvfs | default(0) > 0 with_items: "{{ dataplane_interfaces }}" - name: bring up PF interfaces @@ -75,7 +77,7 @@ when: item.sriov_numvfs | default(0) > 0 with_items: "{{ dataplane_interfaces }}" -- name: copy NIC setup script to /usr/bin +- name: copy NIC SRIOV setup script to /usr/local/bin copy: src: "{{ role_path }}/files/cek_sriov_nic_init" dest: /usr/local/bin/cek_sriov_nic_init @@ -95,8 +97,8 @@ - name: ensure that systemd service is enabled on startup and restarted to apply the configuration systemd: - daemon_reload: yes - enabled: yes - state: restarted name: cek_sriov_nic_init + state: restarted + enabled: yes + daemon_reload: yes become: yes diff --git a/roles/tadk_install/charts/tadk/.helmignore b/roles/tadk_install/charts/tadk/.helmignore new file mode 100644 index 00000000..4ba40535 --- /dev/null +++ b/roles/tadk_install/charts/tadk/.helmignore @@ -0,0 +1,23 @@ +# Patterns to ignore when building packages. +# This supports shell glob matching, relative path matching, and +# negation (prefixed with !). Only one pattern per line. +.DS_Store +# Various IDEs +.vscode/ +.idea/ +*.tmproj +.project +# Common backup files +*~ +*.swp +*.bak +*.orig +*.tmp +# Common VCS dirs +.svn/ +.gitignore +.git/ +.bzr/ +.hg/ +.bzrignore +.hgignore diff --git a/roles/tadk_install/charts/tadk/Chart.yaml b/roles/tadk_install/charts/tadk/Chart.yaml new file mode 100644 index 00000000..a8df497d --- /dev/null +++ b/roles/tadk_install/charts/tadk/Chart.yaml @@ -0,0 +1,51 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## + +# This software and the related documents are Intel copyrighted materials, +# and your use of them is governed by the express license under which +# they were provided to you ("License"). Unless the License provides +# otherwise, you may not use, modify, copy, publish, distribute, disclose or +# transmit this software or the related documents without Intel's prior +# written permission. + +# This software and the related documents are provided as is, with no express +# or implied warranties, other than those that are expressly stated in the +# License. + +apiVersion: v2 +name: tadkchart +description: A Helm chart for Kubernetes + +# A chart can be either an 'application' or a 'library' chart. +# +# Application charts are a collection of templates that can be packaged into versioned archives +# to be deployed. +# +# Library charts provide useful utilities or functions for the chart developer. They're included as +# a dependency of application charts to inject those utilities and functions into the rendering +# pipeline. Library charts do not define any templates and therefore cannot be deployed. +type: application + +# This is the chart version. This version number should be incremented each time you make changes +# to the chart and its templates, including the app version. +# Versions are expected to follow Semantic Versioning (https://semver.org/) +version: 0.1.0 + +# This is the version number of the application being deployed. This version number should be +# incremented each time you make changes to the application. Versions are not expected to +# follow Semantic Versioning. They should reflect the version the application is using. +# It is recommended to use it with quotes. +appVersion: "v22.03" diff --git a/roles/tadk_install/charts/tadk/templates/NOTES.txt b/roles/tadk_install/charts/tadk/templates/NOTES.txt new file mode 100644 index 00000000..b7a7c554 --- /dev/null +++ b/roles/tadk_install/charts/tadk/templates/NOTES.txt @@ -0,0 +1,30 @@ +1. Get the application URL by running these commands: +{{- if .Values.ingress.enabled }} +{{- range $host := .Values.ingress.hosts }} + {{- range .paths }} + http{{ if $.Values.ingress.tls }}s{{ end }}://{{ $host.host }}{{ .path }} + {{- end }} +{{- end }} +{{- else if contains "NodePort" .Values.service.type }} + export NODE_PORT=$(kubectl get --namespace {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}" services {{ include "tadkchart.fullname" . }}) + export NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}") + curl http://$NODE_IP:$NODE_PORT +{{- else if contains "LoadBalancer" .Values.service.type }} + NOTE: It may take a few minutes for the LoadBalancer IP to be available. + You can watch the status of by running 'kubectl get --namespace {{ .Release.Namespace }} svc -w {{ include "tadkchart.fullname" . }}' + export SERVICE_IP=$(kubectl get svc --namespace {{ .Release.Namespace }} {{ include "tadkchart.fullname" . }} --template "{{"{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}"}}") + curl http://$SERVICE_IP:{{ .Values.service.port }} +{{- else if contains "ClusterIP" .Values.service.type }} + export POD_NAME=$(kubectl get pods --namespace {{ .Release.Namespace }} -l "app.kubernetes.io/name={{ include "tadkchart.name" . }},app.kubernetes.io/instance={{ .Release.Name }}" -o jsonpath="{.items[0].metadata.name}") + export container_port=$(kubectl get pod --namespace {{ .Release.Namespace }} $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}") + echo "Visit http://127.0.0.1:8080 to use your application" + kubectl --namespace {{ .Release.Namespace }} port-forward $POD_NAME 8080:$container_port +{{- end }} + + +{{- if contains "NodePort" .Values.service.type }} + +2. Test the tadk image by using these commands: + export POD_NAME=$(kubectl get pods --namespace {{ .Release.Namespace }} -l "app.kubernetes.io/name={{ include "tadkchart.name" . }},app.kubernetes.io/instance={{ .Release.Name }}" -o jsonpath="{.items[0].metadata.name}") + curl -d "username=admin&password=unknown' or '1'='1" "$NODE_IP:$NODE_PORT" +{{- end }} diff --git a/roles/tadk_install/charts/tadk/templates/_helpers.tpl b/roles/tadk_install/charts/tadk/templates/_helpers.tpl new file mode 100644 index 00000000..1331fdd9 --- /dev/null +++ b/roles/tadk_install/charts/tadk/templates/_helpers.tpl @@ -0,0 +1,62 @@ +{{/* +Expand the name of the chart. +*/}} +{{- define "tadkchart.name" -}} +{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Create a default fully qualified app name. +We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec). +If release name contains chart name it will be used as a full name. +*/}} +{{- define "tadkchart.fullname" -}} +{{- if .Values.fullnameOverride }} +{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }} +{{- else }} +{{- $name := default .Chart.Name .Values.nameOverride }} +{{- if contains $name .Release.Name }} +{{- .Release.Name | trunc 63 | trimSuffix "-" }} +{{- else }} +{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }} +{{- end }} +{{- end }} +{{- end }} + +{{/* +Create chart name and version as used by the chart label. +*/}} +{{- define "tadkchart.chart" -}} +{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Common labels +*/}} +{{- define "tadkchart.labels" -}} +helm.sh/chart: {{ include "tadkchart.chart" . }} +{{ include "tadkchart.selectorLabels" . }} +{{- if .Chart.AppVersion }} +app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} +{{- end }} +app.kubernetes.io/managed-by: {{ .Release.Service }} +{{- end }} + +{{/* +Selector labels +*/}} +{{- define "tadkchart.selectorLabels" -}} +app.kubernetes.io/name: {{ include "tadkchart.name" . }} +app.kubernetes.io/instance: {{ .Release.Name }} +{{- end }} + +{{/* +Create the name of the service account to use +*/}} +{{- define "tadkchart.serviceAccountName" -}} +{{- if .Values.serviceAccount.create }} +{{- default (include "tadkchart.fullname" .) .Values.serviceAccount.name }} +{{- else }} +{{- default "default" .Values.serviceAccount.name }} +{{- end }} +{{- end }} diff --git a/roles/tadk_install/charts/tadk/templates/clusterrole.yml b/roles/tadk_install/charts/tadk/templates/clusterrole.yml new file mode 100644 index 00000000..693e3002 --- /dev/null +++ b/roles/tadk_install/charts/tadk/templates/clusterrole.yml @@ -0,0 +1,26 @@ +# Copyright 2022 Intel Corporation. + +# This software and the related documents are Intel copyrighted materials, +# and your use of them is governed by the express license under which +# they were provided to you ("License"). Unless the License provides +# otherwise, you may not use, modify, copy, publish, distribute, disclose or +# transmit this software or the related documents without Intel's prior +# written permission. + +# This software and the related documents are provided as is, with no express +# or implied warranties, other than those that are expressly stated in the +# License. +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: {{ .Release.Name }} +rules: + - apiGroups: ["authentication.k8s.io"] + resources: + - tokenreviews + verbs: ["create"] + - apiGroups: ["authorization.k8s.io"] + resources: + - subjectaccessreviews + verbs: ["create"] diff --git a/roles/tadk_install/charts/tadk/templates/deployment.yaml b/roles/tadk_install/charts/tadk/templates/deployment.yaml new file mode 100644 index 00000000..e438e5a7 --- /dev/null +++ b/roles/tadk_install/charts/tadk/templates/deployment.yaml @@ -0,0 +1,67 @@ +# Copyright 2022 Intel Corporation. + +# This software and the related documents are Intel copyrighted materials, +# and your use of them is governed by the express license under which +# they were provided to you ("License"). Unless the License provides +# otherwise, you may not use, modify, copy, publish, distribute, disclose or +# transmit this software or the related documents without Intel's prior +# written permission. + +# This software and the related documents are provided as is, with no express +# or implied warranties, other than those that are expressly stated in the +# License. + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: {{ include "tadkchart.fullname" . }} + labels: + {{- include "tadkchart.labels" . | nindent 4 }} +spec: + {{- if not .Values.autoscaling.enabled }} + replicas: {{ .Values.replicaCount }} + {{- end }} + selector: + matchLabels: + {{- include "tadkchart.selectorLabels" . | nindent 6 }} + template: + metadata: + {{- with .Values.podAnnotations }} + annotations: + {{- toYaml . | nindent 8 }} + {{- end }} + labels: + {{- include "tadkchart.selectorLabels" . | nindent 8 }} + spec: + {{- with .Values.imagePullSecrets }} + imagePullSecrets: + {{- toYaml . | nindent 8 }} + {{- end }} + serviceAccountName: {{ .Release.Name }} + securityContext: + {{- toYaml .Values.podSecurityContext | nindent 8 }} + containers: + - name: {{ .Chart.Name }} + securityContext: + {{- toYaml .Values.securityContext | nindent 12 }} + image: "{{.Values.image.registry}}/{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}" + imagePullPolicy: {{ .Values.image.pullPolicy }} + ports: + - name: http + containerPort: {{ .Values.service.port }} + # hostPort: {{ .Values.service.port }} + protocol: TCP + resources: + {{- toYaml .Values.resources | nindent 12 }} + {{- with .Values.nodeSelector }} + nodeSelector: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.affinity }} + affinity: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.tolerations }} + tolerations: + {{- toYaml . | nindent 8 }} + {{- end }} diff --git a/roles/tadk_install/charts/tadk/templates/hpa.yaml b/roles/tadk_install/charts/tadk/templates/hpa.yaml new file mode 100644 index 00000000..73557c16 --- /dev/null +++ b/roles/tadk_install/charts/tadk/templates/hpa.yaml @@ -0,0 +1,40 @@ +# Copyright 2022 Intel Corporation. + +# This software and the related documents are Intel copyrighted materials, +# and your use of them is governed by the express license under which +# they were provided to you ("License"). Unless the License provides +# otherwise, you may not use, modify, copy, publish, distribute, disclose or +# transmit this software or the related documents without Intel's prior +# written permission. + +# This software and the related documents are provided as is, with no express +# or implied warranties, other than those that are expressly stated in the +# License. +{{- if .Values.autoscaling.enabled }} +apiVersion: autoscaling/v2beta1 +kind: HorizontalPodAutoscaler +metadata: + name: {{ include "tadkchart.fullname" . }} + labels: + {{- include "tadkchart.labels" . | nindent 4 }} +spec: + scaleTargetRef: + apiVersion: apps/v1 + kind: Deployment + name: {{ include "tadkchart.fullname" . }} + minReplicas: {{ .Values.autoscaling.minReplicas }} + maxReplicas: {{ .Values.autoscaling.maxReplicas }} + metrics: + {{- if .Values.autoscaling.targetCPUUtilizationPercentage }} + - type: Resource + resource: + name: cpu + targetAverageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }} + {{- end }} + {{- if .Values.autoscaling.targetMemoryUtilizationPercentage }} + - type: Resource + resource: + name: memory + targetAverageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }} + {{- end }} +{{- end }} diff --git a/roles/tadk_install/charts/tadk/templates/ingress.yaml b/roles/tadk_install/charts/tadk/templates/ingress.yaml new file mode 100644 index 00000000..999b82c5 --- /dev/null +++ b/roles/tadk_install/charts/tadk/templates/ingress.yaml @@ -0,0 +1,73 @@ +# Copyright 2019-2021 Intel Corporation. + +# This software and the related documents are Intel copyrighted materials, +# and your use of them is governed by the express license under which +# they were provided to you ("License"). Unless the License provides +# otherwise, you may not use, modify, copy, publish, distribute, disclose or +# transmit this software or the related documents without Intel's prior +# written permission. + +# This software and the related documents are provided as is, with no express +# or implied warranties, other than those that are expressly stated in the +# License. +{{- if .Values.ingress.enabled -}} +{{- $fullName := include "tadkchart.fullname" . -}} +{{- $svcPort := .Values.service.port -}} +{{- if and .Values.ingress.className (not (semverCompare ">=1.18-0" .Capabilities.KubeVersion.GitVersion)) }} + {{- if not (hasKey .Values.ingress.annotations "kubernetes.io/ingress.class") }} + {{- $_ := set .Values.ingress.annotations "kubernetes.io/ingress.class" .Values.ingress.className}} + {{- end }} +{{- end }} +{{- if semverCompare ">=1.19-0" .Capabilities.KubeVersion.GitVersion -}} +apiVersion: networking.k8s.io/v1 +{{- else if semverCompare ">=1.14-0" .Capabilities.KubeVersion.GitVersion -}} +apiVersion: networking.k8s.io/v1beta1 +{{- else -}} +apiVersion: extensions/v1beta1 +{{- end }} +kind: Ingress +metadata: + name: {{ $fullName }} + labels: + {{- include "tadkchart.labels" . | nindent 4 }} + {{- with .Values.ingress.annotations }} + annotations: + {{- toYaml . | nindent 4 }} + {{- end }} +spec: + {{- if and .Values.ingress.className (semverCompare ">=1.18-0" .Capabilities.KubeVersion.GitVersion) }} + ingressClassName: {{ .Values.ingress.className }} + {{- end }} + {{- if .Values.ingress.tls }} + tls: + {{- range .Values.ingress.tls }} + - hosts: + {{- range .hosts }} + - {{ . | quote }} + {{- end }} + secretName: {{ .secretName }} + {{- end }} + {{- end }} + rules: + {{- range .Values.ingress.hosts }} + - host: {{ .host | quote }} + http: + paths: + {{- range .paths }} + - path: {{ .path }} + {{- if and .pathType (semverCompare ">=1.18-0" $.Capabilities.KubeVersion.GitVersion) }} + pathType: {{ .pathType }} + {{- end }} + backend: + {{- if semverCompare ">=1.19-0" $.Capabilities.KubeVersion.GitVersion }} + service: + name: {{ $fullName }} + port: + number: {{ $svcPort }} + {{- else }} + serviceName: {{ $fullName }} + servicePort: {{ $svcPort }} + {{- end }} + {{- end }} + {{- end }} +{{- end }} diff --git a/roles/tadk_install/charts/tadk/templates/modsec-tadk-loadkey-rbac-cluster-role-binding.yml b/roles/tadk_install/charts/tadk/templates/modsec-tadk-loadkey-rbac-cluster-role-binding.yml new file mode 100644 index 00000000..fe23e468 --- /dev/null +++ b/roles/tadk_install/charts/tadk/templates/modsec-tadk-loadkey-rbac-cluster-role-binding.yml @@ -0,0 +1,28 @@ +# Copyright 2022 Intel Corporation. + +# This software and the related documents are Intel copyrighted materials, +# and your use of them is governed by the express license under which +# they were provided to you ("License"). Unless the License provides +# otherwise, you may not use, modify, copy, publish, distribute, disclose or +# transmit this software or the related documents without Intel's prior +# written permission. + +# This software and the related documents are provided as is, with no express +# or implied warranties, other than those that are expressly stated in the +# License. +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: {{ .Release.Name }} +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: {{ .Release.Name }} +subjects: + - kind: ServiceAccount + name: {{ .Release.Name }} + namespace: "{{ .Release.Namespace }}" + - kind: Group + apiGroup: rbac.authorization.k8s.io + name: system:serviceaccounts diff --git a/roles/tadk_install/charts/tadk/templates/modsec-tadk-loadkey-rbac-service-account.yml b/roles/tadk_install/charts/tadk/templates/modsec-tadk-loadkey-rbac-service-account.yml new file mode 100644 index 00000000..0abed99c --- /dev/null +++ b/roles/tadk_install/charts/tadk/templates/modsec-tadk-loadkey-rbac-service-account.yml @@ -0,0 +1,17 @@ +# Copyright 2022 Intel Corporation. + +# This software and the related documents are Intel copyrighted materials, +# and your use of them is governed by the express license under which +# they were provided to you ("License"). Unless the License provides +# otherwise, you may not use, modify, copy, publish, distribute, disclose or +# transmit this software or the related documents without Intel's prior +# written permission. + +# This software and the related documents are provided as is, with no express +# or implied warranties, other than those that are expressly stated in the +# License. +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: {{ .Release.Name }} diff --git a/roles/tadk_install/charts/tadk/templates/service.yaml b/roles/tadk_install/charts/tadk/templates/service.yaml new file mode 100644 index 00000000..6f2059f5 --- /dev/null +++ b/roles/tadk_install/charts/tadk/templates/service.yaml @@ -0,0 +1,27 @@ +# Copyright 2022 Intel Corporation. + +# This software and the related documents are Intel copyrighted materials, +# and your use of them is governed by the express license under which +# they were provided to you ("License"). Unless the License provides +# otherwise, you may not use, modify, copy, publish, distribute, disclose or +# transmit this software or the related documents without Intel's prior +# written permission. + +# This software and the related documents are provided as is, with no express +# or implied warranties, other than those that are expressly stated in the +# License. +apiVersion: v1 +kind: Service +metadata: + name: {{ include "tadkchart.fullname" . }} + labels: + {{- include "tadkchart.labels" . | nindent 4 }} +spec: + type: {{ .Values.service.type }} + ports: + - port: {{ .Values.service.port }} + targetPort: http + protocol: TCP + name: http + selector: + {{- include "tadkchart.selectorLabels" . | nindent 4 }} diff --git a/roles/tadk_install/charts/tadk/values.yaml b/roles/tadk_install/charts/tadk/values.yaml new file mode 100644 index 00000000..2331c036 --- /dev/null +++ b/roles/tadk_install/charts/tadk/values.yaml @@ -0,0 +1,110 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## + +# This software and the related documents are Intel copyrighted materials, +# and your use of them is governed by the express license under which +# they were provided to you ("License"). Unless the License provides +# otherwise, you may not use, modify, copy, publish, distribute, disclose or +# transmit this software or the related documents without Intel's prior +# written permission. + +# This software and the related documents are provided as is, with no express +# or implied warranties, other than those that are expressly stated in the +# License. + +# Default values for tadkchart. +# This is a YAML-formatted file. +# Declare variables to be passed into your templates. + +replicaCount: 1 + +image: + registry: "intel" + repository: "tadk-waf" + pullPolicy: IfNotPresent + # Overrides the image tag whose default is the chart appVersion. + tag: "v22.03" + +imagePullSecrets: [] +nameOverride: "" +fullnameOverride: "" + +serviceAccount: {} +# # Specifies whether a service account should be created +# create: true +# # Annotations to add to the service account +# annotations: {} +# # The name of the service account to use. +# # If not set and create is true, a name is generated using the fullname template +# name: "" + +podAnnotations: {} + +podSecurityContext: {} + # fsGroup: 2000 + +securityContext: {} + # capabilities: + # drop: + # - ALL + # readOnlyRootFilesystem: true + # runAsNonRoot: true + # runAsUser: 1000 + +service: + type: "NodePort" + port: 8005 + +ingress: + enabled: false + className: "" + annotations: {} + # kubernetes.io/ingress.class: nginx + # kubernetes.io/tls-acme: "true" + hosts: + - host: chart-example.local + paths: + - path: / + pathType: ImplementationSpecific + tls: [] + # - secretName: chart-example-tls + # hosts: + # - chart-example.local + +resources: {} + # We usually recommend not to specify default resources and to leave this as a conscious + # choice for the user. This also increases chances charts run on environments with little + # resources, such as Minikube. If you do want to specify resources, uncomment the following + # lines, adjust them as necessary, and remove the curly braces after 'resources:'. + # limits: + # cpu: 100m + # memory: 128Mi + # requests: + # cpu: 100m + # memory: 128Mi + +autoscaling: + enabled: false + minReplicas: 1 + maxReplicas: 100 + targetCPUUtilizationPercentage: 80 + # targetMemoryUtilizationPercentage: 80 + +nodeSelector: {} + +tolerations: [] + +affinity: {} diff --git a/roles/tadk_install/defaults/main.yml b/roles/tadk_install/defaults/main.yml new file mode 100644 index 00000000..e0c485f8 --- /dev/null +++ b/roles/tadk_install/defaults/main.yml @@ -0,0 +1,28 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +dest_path: "{{ (project_root_dir, 'charts') | path_join }}" + +image_registry: intel +image_name: tadk-waf +tadk_version: "v22.03" +container_port: 8005 + +service_type: NodePort + +deploy_name: tadk-intel + +tadk_namespace: modsec-tadk diff --git a/roles/tadk_install/tasks/main.yml b/roles/tadk_install/tasks/main.yml new file mode 100644 index 00000000..a0cf88f4 --- /dev/null +++ b/roles/tadk_install/tasks/main.yml @@ -0,0 +1,54 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: Make the path for install + ansible.builtin.file: + path: "{{ dest_path }}" + state: directory + mode: '0755' + when: + - inventory_hostname == groups['kube_control_plane'][0] + +- name: Copy file with owner and permissions + ansible.builtin.copy: + src: "{{ (role_path, 'charts', 'tadk') | path_join }}" + dest: "{{ dest_path }}" + mode: '0755' + owner: "{{ ansible_user | default(ansible_user_id) }}" + group: "{{ ansible_user | default(ansible_user_id) }}" + when: + - inventory_hostname == groups['kube_control_plane'][0] + +- name: populate template files with values + template: + src: "{{ item }}.j2" + dest: "{{ (dest_path, 'tadk', item) | path_join }}" + force: yes + mode: preserve + loop: + - values.yaml + - Chart.yaml + when: + - inventory_hostname == groups['kube_control_plane'][0] + +- name: deploy the helm + ansible.builtin.command: >- + helm upgrade -i {{ deploy_name }} --create-namespace --namespace {{ tadk_namespace }} + -f {{ (dest_path, 'tadk', 'values.yaml') | path_join }} {{ (dest_path, 'tadk') | path_join }} + args: + chdir: "{{ (dest_path, 'tadk') | path_join }}" + when: + - inventory_hostname == groups['kube_control_plane'][0] diff --git a/roles/tadk_install/templates/Chart.yaml.j2 b/roles/tadk_install/templates/Chart.yaml.j2 new file mode 100644 index 00000000..1986df2d --- /dev/null +++ b/roles/tadk_install/templates/Chart.yaml.j2 @@ -0,0 +1,24 @@ +apiVersion: v2 +name: tadkchart +description: A Helm chart for Kubernetes + +# A chart can be either an 'application' or a 'library' chart. +# +# Application charts are a collection of templates that can be packaged into versioned archives +# to be deployed. +# +# Library charts provide useful utilities or functions for the chart developer. They're included as +# a dependency of application charts to inject those utilities and functions into the rendering +# pipeline. Library charts do not define any templates and therefore cannot be deployed. +type: application + +# This is the chart version. This version number should be incremented each time you make changes +# to the chart and its templates, including the app version. +# Versions are expected to follow Semantic Versioning (https://semver.org/) +version: 0.1.0 + +# This is the version number of the application being deployed. This version number should be +# incremented each time you make changes to the application. Versions are not expected to +# follow Semantic Versioning. They should reflect the version the application is using. +# It is recommended to use it with quotes. +appVersion: "{{ tadk_version }}" diff --git a/roles/tadk_install/templates/values.yaml.j2 b/roles/tadk_install/templates/values.yaml.j2 new file mode 100644 index 00000000..77aec5bd --- /dev/null +++ b/roles/tadk_install/templates/values.yaml.j2 @@ -0,0 +1,83 @@ +# Default values for tadkchart. +# This is a YAML-formatted file. +# Declare variables to be passed into your templates. + +replicaCount: 1 + +image: + registry: "{{ image_registry }}" + repository: "{{ image_name }}" + pullPolicy: IfNotPresent + # Overrides the image tag whose default is the chart appVersion. + tag: "{{ tadk_version }}" + +imagePullSecrets: [] +nameOverride: "" +fullnameOverride: "" + +serviceAccount: {} +# # Specifies whether a service account should be created +# create: true +# # Annotations to add to the service account +# annotations: {} +# # The name of the service account to use. +# # If not set and create is true, a name is generated using the fullname template +# name: "" + +podAnnotations: {} + +podSecurityContext: {} + # fsGroup: 2000 + +securityContext: {} + # capabilities: + # drop: + # - ALL + # readOnlyRootFilesystem: true + # runAsNonRoot: true + # runAsUser: 1000 + +service: + type: "{{ service_type }}" + port: {{ container_port }} + +ingress: + enabled: false + className: "" + annotations: {} + # kubernetes.io/ingress.class: nginx + # kubernetes.io/tls-acme: "true" + hosts: + - host: chart-example.local + paths: + - path: / + pathType: ImplementationSpecific + tls: [] + # - secretName: chart-example-tls + # hosts: + # - chart-example.local + +resources: {} + # We usually recommend not to specify default resources and to leave this as a conscious + # choice for the user. This also increases chances charts run on environments with little + # resources, such as Minikube. If you do want to specify resources, uncomment the following + # lines, adjust them as necessary, and remove the curly braces after 'resources:'. + # limits: + # cpu: 100m + # memory: 128Mi + # requests: + # cpu: 100m + # memory: 128Mi + +autoscaling: + enabled: false + minReplicas: 1 + maxReplicas: 100 + targetCPUUtilizationPercentage: 80 + # targetMemoryUtilizationPercentage: 80 + +nodeSelector: {} + +tolerations: [] + +affinity: {} diff --git a/roles/tca_install/defaults/main.yml b/roles/tca_install/defaults/main.yml index 21221c4e..408230dc 100644 --- a/roles/tca_install/defaults/main.yml +++ b/roles/tca_install/defaults/main.yml @@ -15,5 +15,6 @@ ## --- tca_git_repo_url: https://github.com/intel/trusted-attestation-controller -tca_git_version: v0.1.0 +tca_git_version: 0.2.0 tca_git_path: "{{ (project_root_dir, 'tca') | path_join }}" +tca_image_tag: 0.2.0 diff --git a/roles/tca_install/tasks/tca_install.yml b/roles/tca_install/tasks/tca_install.yml index 98e6ce88..c715d0bf 100644 --- a/roles/tca_install/tasks/tca_install.yml +++ b/roles/tca_install/tasks/tca_install.yml @@ -24,7 +24,7 @@ - name: "prepare to build_image_locally: clean deps" command: "go mod tidy -go=1.17" args: - chdir: "{{ tca_git_path }}" + chdir: "{{ tca_git_path }}" changed_when: true when: - tca.build_image_locally | default(false) @@ -32,7 +32,7 @@ - name: "prepare to build_image_locally: create vendor dir" command: "go mod vendor" args: - chdir: "{{ tca_git_path }}" + chdir: "{{ tca_git_path }}" changed_when: true when: - tca.build_image_locally | default(false) @@ -43,7 +43,7 @@ make: target: docker-build - chdir: "{{ tca_git_path }}" + chdir: "{{ tca_git_path }}" environment: REGISTRY: "{{ registry_local_address }}" @@ -51,7 +51,7 @@ make: target: docker-push - chdir: "{{ tca_git_path }}" + chdir: "{{ tca_git_path }}" environment: REGISTRY: "{{ registry_local_address }}" when: @@ -63,13 +63,13 @@ - name: build container image command: "podman build -t {{ registry_local_address }}/sgx-attestation-controller:latest -f Dockerfile ." args: - chdir: "{{ tca_git_path }}" + chdir: "{{ tca_git_path }}" changed_when: false - name: push container image command: "podman push {{ registry_local_address }}/sgx-attestation-controller:latest" args: - chdir: "{{ tca_git_path }}" + chdir: "{{ tca_git_path }}" changed_when: false when: - tca.build_image_locally | default(false) @@ -127,31 +127,17 @@ state: present mode: 0644 -- name: pre-create psp - block: - - name: pre-create TCA namespace - shell: "set -o pipefail && kubectl create ns {{ tca.namespace }} -o yaml --dry-run=client | kubectl apply -f -" - args: - executable: /bin/bash - changed_when: true - - - name: pre-create TCA ns rolebinding - shell: |- - set -o pipefail && \ - kubectl create -n {{ tca.namespace }} rolebinding wa-psp \ - --clusterrole psp:privileged \ - --group system:serviceaccounts:{{ tca.namespace }} \ - -o yaml --dry-run=client | kubectl apply -f - - args: - executable: /bin/bash - changed_when: true - when: - - psp_enabled | default(false) +- name: create tca namespace + k8s: + name: "{{ tca.namespace }}" + kind: Namespace + state: present - name: deploy TCA make: target: deploy - chdir: "{{ tca_git_path }}" + chdir: "{{ tca_git_path }}" environment: - REGISTRY: "{{ (tca.build_image_locally) | ternary(registry_local_address, 'intel') }}" + REGISTRY: "{{ (tca.build_image_locally) | ternary(registry_local_address, 'docker.io') }}" + IMG_TAG: "{{ tca_image_tag }}" diff --git a/roles/tcs_install/defaults/main.yml b/roles/tcs_install/defaults/main.yml index 8050367a..c9139b17 100644 --- a/roles/tcs_install/defaults/main.yml +++ b/roles/tcs_install/defaults/main.yml @@ -13,7 +13,8 @@ ## See the License for the specific language governing permissions and ## limitations under the License. ## +--- tcs_git_repo_url: https://github.com/intel/trusted-certificate-issuer -tcs_git_version: v0.1.0 - +tcs_git_version: 0.2.0 tcs_git_path: "{{ (project_root_dir, 'tcs') | path_join }}" +tcs_image_tag: 0.2.0 diff --git a/roles/tcs_install/tasks/tcs_install.yml b/roles/tcs_install/tasks/tcs_install.yml index cf45d90c..61607c43 100644 --- a/roles/tcs_install/tasks/tcs_install.yml +++ b/roles/tcs_install/tasks/tcs_install.yml @@ -27,7 +27,7 @@ make: target: docker-build - chdir: "{{ tcs_git_path }}" + chdir: "{{ tcs_git_path }}" environment: REGISTRY: "{{ registry_local_address }}" @@ -35,7 +35,7 @@ make: target: docker-push - chdir: "{{ tcs_git_path }}" + chdir: "{{ tcs_git_path }}" environment: REGISTRY: "{{ registry_local_address }}" when: @@ -48,13 +48,13 @@ make: target: enclave-config/privatekey.pem - chdir: "{{ tcs_git_path }}" + chdir: "{{ tcs_git_path }}" - name: prepare vendor dir make: target: vendor - chdir: "{{ tcs_git_path }}" + chdir: "{{ tcs_git_path }}" - name: "collect required values: read project Makefile" slurp: @@ -79,39 +79,24 @@ -t {{ registry_local_address }}/trusted-certificate-issuer:latest -f Dockerfile . args: - chdir: "{{ tcs_git_path }}" + chdir: "{{ tcs_git_path }}" executable: "/bin/bash" changed_when: false - name: push container image command: "podman push {{ registry_local_address }}/trusted-certificate-issuer:latest" args: - chdir: "{{ tcs_git_path }}" + chdir: "{{ tcs_git_path }}" changed_when: false when: - tcs.build_image_locally | default(false) - '"docker" not in container_runtime' -- name: pre-create psp - block: - - name: pre-create tcs namespace - shell: "set -o pipefail && kubectl create ns {{ tcs.namespace }} -o yaml --dry-run=client | kubectl apply -f -" - args: - executable: /bin/bash - changed_when: true - - - name: pre-create tcs ns rolebinding - shell: |- - set -o pipefail && \ - kubectl create -n {{ tcs.namespace }} rolebinding wa-psp \ - --clusterrole psp:privileged \ - --group system:serviceaccounts:{{ tcs.namespace }} \ - -o yaml --dry-run=client | kubectl apply -f - - args: - executable: /bin/bash - changed_when: true - when: - - psp_enabled | default(false) +- name: create tcs namespace + k8s: + name: "{{ tcs.namespace }}" + kind: Namespace + state: present - name: update TCS config - kustomization file lineinfile: @@ -142,12 +127,13 @@ make: target: install - chdir: "{{ tcs_git_path }}" + chdir: "{{ tcs_git_path }}" - name: deploy TCS make: target: deploy - chdir: "{{ tcs_git_path }}" + chdir: "{{ tcs_git_path }}" environment: - REGISTRY: "{{ (tcs.build_image_locally) | ternary(registry_local_address, 'intel') }}" + REGISTRY: "{{ (tcs.build_image_locally) | ternary(registry_local_address, 'docker.io') }}" + IMG_TAG: "{{ tcs_image_tag }}" diff --git a/roles/telegraf_install/charts/telegraf/templates/clusterrole.yml b/roles/telegraf_install/charts/telegraf/templates/clusterrole.yml index a105062d..7781d380 100644 --- a/roles/telegraf_install/charts/telegraf/templates/clusterrole.yml +++ b/roles/telegraf_install/charts/telegraf/templates/clusterrole.yml @@ -14,8 +14,3 @@ rules: resources: - subjectaccessreviews verbs: ["create"] - - apiGroups: ['policy'] - resources: ['podsecuritypolicies'] - verbs: ['use'] - resourceNames: - - {{ include "telegraf.fullname" . }} diff --git a/roles/telegraf_install/charts/telegraf/templates/podsecuritypolicy.yaml b/roles/telegraf_install/charts/telegraf/templates/podsecuritypolicy.yaml deleted file mode 100644 index d4efce0f..00000000 --- a/roles/telegraf_install/charts/telegraf/templates/podsecuritypolicy.yaml +++ /dev/null @@ -1,34 +0,0 @@ ---- -apiVersion: policy/v1beta1 -kind: PodSecurityPolicy -metadata: - name: {{ include "telegraf.fullname" . }} - labels: - {{- include "telegraf.labels" . | nindent 4 }} -spec: - allowPrivilegeEscalation: true - allowedCapabilities: - - '*' - allowedUnsafeSysctls: - - '*' - fsGroup: - rule: RunAsAny - hostNetwork: true - hostPorts: - - max: {{ .Values.prometheusMetricsEndpointPort }} - min: {{ .Values.prometheusMetricsEndpointPort }} - privileged: true - runAsUser: - rule: RunAsAny - seLinux: - rule: RunAsAny - supplementalGroups: - rule: RunAsAny - volumes: - - "configMap" - - "downwardAPI" - - "emptyDir" - - "persistentVolumeClaim" - - "secret" - - "projected" - - "hostPath" diff --git a/roles/userspace_cni_install/defaults/main.yml b/roles/userspace_cni_install/defaults/main.yml index 1ef14770..923623a8 100644 --- a/roles/userspace_cni_install/defaults/main.yml +++ b/roles/userspace_cni_install/defaults/main.yml @@ -17,7 +17,7 @@ userspace_cni_git_url: "https://github.com/intel/userspace-cni-network-plugin.git" userspace_cni_version: "v1.3" -vpp_version: 2110 +vpp_version: 2210 ovs_dir: "{{ (project_root_dir, 'ovs') | path_join }}" ovs_repo: https://github.com/openvswitch/ovs.git diff --git a/roles/userspace_cni_install/tasks/main.yml b/roles/userspace_cni_install/tasks/main.yml index fe5f4340..086e872a 100644 --- a/roles/userspace_cni_install/tasks/main.yml +++ b/roles/userspace_cni_install/tasks/main.yml @@ -21,9 +21,7 @@ - name: determine whether VPP can be installed on the target set_fact: vpp_supported: false - when: - - ((ansible_os_family == 'RedHat') and (ansible_distribution_version >= '8')) or - ((ansible_distribution == 'Ubuntu') and (ansible_distribution_version >= '21.04')) + when: ansible_os_family == 'RedHat' and ansible_distribution_version >= '8' - name: install OVS-DPDK include: ovs_install.yml @@ -47,4 +45,3 @@ when: - vpp_enabled | default(false) - vpp_supported | default(false) - diff --git a/roles/userspace_cni_install/tasks/ovs_install.yml b/roles/userspace_cni_install/tasks/ovs_install.yml index 5b351971..19982bd6 100644 --- a/roles/userspace_cni_install/tasks/ovs_install.yml +++ b/roles/userspace_cni_install/tasks/ovs_install.yml @@ -147,7 +147,7 @@ - name: WA for bug in DPDK initial device scan - block qat devices block: - name: get Device ID from PCI address - shell: "set -o pipefail && lspci -s {{ item.qat_id }} -n |cut -f3 -d' ' |cut -f2 -d':'" + shell: "set -o pipefail && lspci -s {{ item.qat_id }} -n | cut -f3 -d' ' | cut -f2 -d':'" args: executable: /bin/bash failed_when: false @@ -156,7 +156,7 @@ with_items: "{{ qat_devices }}" - name: get Device ID for VFs from PCI address - shell: "set -o pipefail && lspci -s {{ item.qat_id }} -vv |grep 'Device ID:' |cut -f4 -d':' |cut -f2 -d' '" + shell: "set -o pipefail && lspci -s {{ item.qat_id }} -vv |grep 'Device ID:' | cut -f4 -d':' | cut -f2 -d' '" args: executable: /bin/bash failed_when: false @@ -192,7 +192,7 @@ msg: "unique dev_ids_list_var: {{ dev_ids_list_var }}" - name: list QAT devices with vfio-pci driver - shell: "set -o pipefail && dpdk-devbind.py -s |grep '{{ item }}' |grep 'drv=vfio-pci' |cut -f1 -d' '" + shell: "set -o pipefail && dpdk-devbind.py -s | grep '{{ item }}' |grep 'drv=vfio-pci' | cut -f1 -d' '" args: executable: /bin/bash failed_when: false @@ -217,7 +217,7 @@ - name: Prepare info for WA set_fact: block_list_var: "{{ block_list_var }} {{ ovs_dpdk_extra }}{{ item }}" - with_items: "{{ qat_dev_list_var.split('\n') }}" + loop: "{{ qat_dev_list_var.split('\n') }}" when: - "qat_dev_list_var | length>0" - "item | length>0" diff --git a/roles/userspace_cni_install/tasks/userspace_cni_install.yml b/roles/userspace_cni_install/tasks/userspace_cni_install.yml index d721bb8c..fb551cad 100644 --- a/roles/userspace_cni_install/tasks/userspace_cni_install.yml +++ b/roles/userspace_cni_install/tasks/userspace_cni_install.yml @@ -16,10 +16,10 @@ --- - name: create /opt/cni/bin file: - path: "/opt/cni/bin" - state: directory - recurse: yes - mode: 0755 + path: "/opt/cni/bin" + state: directory + recurse: yes + mode: 0755 - name: set path to the Userspace CNI plugin sources set_fact: @@ -34,10 +34,10 @@ - name: replace CentOS with Rocky in Makefile replace: - path: "{{ userspace_cni_path }}/Makefile" - regexp: 'centos' - replace: 'rocky' - mode: 0600 + path: "{{ userspace_cni_path }}/Makefile" + regexp: 'centos' + replace: 'rocky' + mode: 0600 when: ansible_distribution == "Rocky" - name: build Userspace CNI plugin diff --git a/roles/userspace_cni_install/tasks/vpp_install.yml b/roles/userspace_cni_install/tasks/vpp_install.yml index 382736ba..3a9c7120 100644 --- a/roles/userspace_cni_install/tasks/vpp_install.yml +++ b/roles/userspace_cni_install/tasks/vpp_install.yml @@ -58,8 +58,8 @@ - name: pick final sysctl entries values set_fact: vpp_nr_hugepages: "{{ (default_hugepage_size == '2M') | ternary(number_of_hugepages_2M, number_of_hugepages_1G) }}" - vpp_shmmax: "{{ [vpp_calc_shmmax|int, vpp_orig_shmmax|int] | max }}" - vpp_max_map_count: "{{ [vpp_orig_max_map_count|int, vpp_calc_max_map_count|int] | max }}" + vpp_shmmax: "{{ [vpp_calc_shmmax | int, vpp_orig_shmmax | int] | max }}" + vpp_max_map_count: "{{ [vpp_orig_max_map_count | int, vpp_calc_max_map_count | int] | max }}" - name: download packagecloud.io to install VPP packages get_url: @@ -70,7 +70,7 @@ - name: execute VPP bash script command: "./script.deb.sh" args: - chdir: "{{ vpp_dir }}" + chdir: "{{ vpp_dir }}" changed_when: true - name: install vpp packages in Red Hat @@ -91,6 +91,8 @@ - vpp-plugin-core - vpp-plugin-dpdk - libvppinfra + - vpp-plugin-devtools + - vpp-dev - libvppinfra-dev - python3-vpp-api state: present diff --git a/roles/vm/compile_libvirt/defaults/main.yml b/roles/vm/compile_libvirt/defaults/main.yml new file mode 100644 index 00000000..28030ec4 --- /dev/null +++ b/roles/vm/compile_libvirt/defaults/main.yml @@ -0,0 +1,19 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +libvirt_groups: + - libvirt + - libvirtd + - libvirt-qemu diff --git a/roles/vm/compile_libvirt/files/preferences b/roles/vm/compile_libvirt/files/preferences new file mode 100644 index 00000000..0b9d758a --- /dev/null +++ b/roles/vm/compile_libvirt/files/preferences @@ -0,0 +1,12 @@ +Package: libvirt-daemon-system +Pin: release * +Pin-Priority: -1 +Package: libvirt-clients +Pin: release * +Pin-Priority: -1 +Package: libvirt-daemon-driver-qemu +Pin: release * +Pin-Priority: -1 +Package: libvirt-daemon +Pin: release * +Pin-Priority: -1 diff --git a/roles/vm/compile_libvirt/files/qemu.conf b/roles/vm/compile_libvirt/files/qemu.conf new file mode 100644 index 00000000..6b59251e --- /dev/null +++ b/roles/vm/compile_libvirt/files/qemu.conf @@ -0,0 +1,956 @@ +# Master configuration file for the QEMU driver. +# All settings described here are optional - if omitted, sensible +# defaults are used. + +# Use of TLS requires that x509 certificates be issued. The default is +# to keep them in /etc/pki/qemu. This directory must contain +# +# ca-cert.pem - the CA master certificate +# server-cert.pem - the server certificate signed with ca-cert.pem +# server-key.pem - the server private key +# +# and optionally may contain +# +# dh-params.pem - the DH params configuration file +# +# If the directory does not exist, libvirtd will fail to start. If the +# directory doesn't contain the necessary files, QEMU domains will fail +# to start if they are configured to use TLS. +# +# In order to overwrite the default path alter the following. This path +# definition will be used as the default path for other *_tls_x509_cert_dir +# configuration settings if their default path does not exist or is not +# specifically set. +# +#default_tls_x509_cert_dir = "/etc/pki/qemu" + + +# The default TLS configuration only uses certificates for the server +# allowing the client to verify the server's identity and establish +# an encrypted channel. +# +# It is possible to use x509 certificates for authentication too, by +# issuing an x509 certificate to every client who needs to connect. +# +# Enabling this option will reject any client who does not have a +# certificate signed by the CA in /etc/pki/qemu/ca-cert.pem +# +# The default_tls_x509_cert_dir directory must also contain +# +# client-cert.pem - the client certificate signed with the ca-cert.pem +# client-key.pem - the client private key +# +# If this option is supplied it provides the default for the "_verify" option +# of specific TLS users such as vnc, backups, migration, etc. The specific +# users of TLS may override this by setting the specific "_verify" option. +# +# When not supplied the specific TLS users provide their own defaults. +# +#default_tls_x509_verify = 1 + +# +# Libvirt assumes the server-key.pem file is unencrypted by default. +# To use an encrypted server-key.pem file, the password to decrypt +# the PEM file is required. This can be provided by creating a secret +# object in libvirt and then to uncomment this setting to set the UUID +# of the secret. +# +# NB This default all-zeros UUID will not work. Replace it with the +# output from the UUID for the TLS secret from a 'virsh secret-list' +# command and then uncomment the entry +# +#default_tls_x509_secret_uuid = "00000000-0000-0000-0000-000000000000" + + +# VNC is configured to listen on 127.0.0.1 by default. +# To make it listen on all public interfaces, uncomment +# this next option. +# +# NB, strong recommendation to enable TLS + x509 certificate +# verification when allowing public access +# +#vnc_listen = "0.0.0.0" + +# Enable this option to have VNC served over an automatically created +# unix socket. This prevents unprivileged access from users on the +# host machine, though most VNC clients do not support it. +# +# This will only be enabled for VNC configurations that have listen +# type=address but without any address specified. This setting takes +# preference over vnc_listen. +# +#vnc_auto_unix_socket = 1 + +# Enable use of TLS encryption on the VNC server. This requires +# a VNC client which supports the VeNCrypt protocol extension. +# Examples include vinagre, virt-viewer, virt-manager and vencrypt +# itself. UltraVNC, RealVNC, TightVNC do not support this +# +# It is necessary to setup CA and issue a server certificate +# before enabling this. +# +#vnc_tls = 1 + + +# In order to override the default TLS certificate location for +# vnc certificates, supply a valid path to the certificate directory. +# If the provided path does not exist, libvirtd will fail to start. +# If the path is not provided, but vnc_tls = 1, then the +# default_tls_x509_cert_dir path will be used. +# +#vnc_tls_x509_cert_dir = "/etc/pki/libvirt-vnc" + + +# Uncomment and use the following option to override the default secret +# UUID provided in the default_tls_x509_secret_uuid parameter. +# +#vnc_tls_x509_secret_uuid = "00000000-0000-0000-0000-000000000000" + + +# The default TLS configuration only uses certificates for the server +# allowing the client to verify the server's identity and establish +# an encrypted channel. +# +# It is possible to use x509 certificates for authentication too, by +# issuing an x509 certificate to every client who needs to connect. +# +# Enabling this option will reject any client that does not have a +# certificate (as described in default_tls_x509_verify) signed by the +# CA in the vnc_tls_x509_cert_dir (or default_tls_x509_cert_dir). +# +# If this option is not supplied, it will be set to the value of +# "default_tls_x509_verify". If "default_tls_x509_verify" is not supplied either, +# the default is "0". +# +#vnc_tls_x509_verify = 1 + + +# The default VNC password. Only 8 bytes are significant for +# VNC passwords. This parameter is only used if the per-domain +# XML config does not already provide a password. To allow +# access without passwords, leave this commented out. An empty +# string will still enable passwords, but be rejected by QEMU, +# effectively preventing any use of VNC. Obviously change this +# example here before you set this. +# + + +# Enable use of SASL encryption on the VNC server. This requires +# a VNC client which supports the SASL protocol extension. +# Examples include vinagre, virt-viewer and virt-manager +# itself. UltraVNC, RealVNC, TightVNC do not support this +# +# It is necessary to configure /etc/sasl2/qemu.conf to choose +# the desired SASL plugin (eg, GSSPI for Kerberos) +# +#vnc_sasl = 1 + + +# The default SASL configuration file is located in /etc/sasl2/ +# When running libvirtd unprivileged, it may be desirable to +# override the configs in this location. Set this parameter to +# point to the directory, and create a qemu.conf in that location +# +#vnc_sasl_dir = "/some/directory/sasl2" + + +# QEMU implements an extension for providing audio over a VNC connection, +# though if your VNC client does not support it, your only chance for getting +# sound output is through regular audio backends. By default, libvirt will +# disable all QEMU sound backends if using VNC, since they can cause +# permissions issues. Enabling this option will make libvirtd honor the +# QEMU_AUDIO_DRV environment variable when using VNC. +# +#vnc_allow_host_audio = 0 + + + +# SPICE is configured to listen on 127.0.0.1 by default. +# To make it listen on all public interfaces, uncomment +# this next option. +# +# NB, strong recommendation to enable TLS + x509 certificate +# verification when allowing public access +# +#spice_listen = "0.0.0.0" + + +# Enable use of TLS encryption on the SPICE server. +# +# It is necessary to setup CA and issue a server certificate +# before enabling this. +# +#spice_tls = 1 + + +# In order to override the default TLS certificate location for +# spice certificates, supply a valid path to the certificate directory. +# If the provided path does not exist, libvirtd will fail to start. +# If the path is not provided, but spice_tls = 1, then the +# default_tls_x509_cert_dir path will be used. +# +#spice_tls_x509_cert_dir = "/etc/pki/libvirt-spice" + + +# Enable this option to have SPICE served over an automatically created +# unix socket. This prevents unprivileged access from users on the +# host machine. +# +# This will only be enabled for SPICE configurations that have listen +# type=address but without any address specified. This setting takes +# preference over spice_listen. +# +#spice_auto_unix_socket = 1 + + +# The default SPICE password. This parameter is only used if the +# per-domain XML config does not already provide a password. To +# allow access without passwords, leave this commented out. An +# empty string will still enable passwords, but be rejected by +# QEMU, effectively preventing any use of SPICE. Obviously change +# this example here before you set this. +# +#spice_password = "XYZ12345" + + +# Enable use of SASL encryption on the SPICE server. This requires +# a SPICE client which supports the SASL protocol extension. +# +# It is necessary to configure /etc/sasl2/qemu.conf to choose +# the desired SASL plugin (eg, GSSPI for Kerberos) +# +#spice_sasl = 1 + +# The default SASL configuration file is located in /etc/sasl2/ +# When running libvirtd unprivileged, it may be desirable to +# override the configs in this location. Set this parameter to +# point to the directory, and create a qemu.conf in that location +# +#spice_sasl_dir = "/some/directory/sasl2" + +# Enable use of TLS encryption on the chardev TCP transports. +# +# It is necessary to setup CA and issue a server certificate +# before enabling this. +# +#chardev_tls = 1 + + +# In order to override the default TLS certificate location for character +# device TCP certificates, supply a valid path to the certificate directory. +# If the provided path does not exist, libvirtd will fail to start. +# If the path is not provided, but chardev_tls = 1, then the +# default_tls_x509_cert_dir path will be used. +# +#chardev_tls_x509_cert_dir = "/etc/pki/libvirt-chardev" + + +# The default TLS configuration only uses certificates for the server +# allowing the client to verify the server's identity and establish +# an encrypted channel. +# +# It is possible to use x509 certificates for authentication too, by +# issuing an x509 certificate to every client who needs to connect. +# +# Enabling this option will reject any client that does not have a +# certificate (as described in default_tls_x509_verify) signed by the +# CA in the chardev_tls_x509_cert_dir (or default_tls_x509_cert_dir). +# +# If this option is not supplied, it will be set to the value of +# "default_tls_x509_verify". If "default_tls_x509_verify" is not supplied either, +# the default is "1". +# +#chardev_tls_x509_verify = 1 + + +# Uncomment and use the following option to override the default secret +# UUID provided in the default_tls_x509_secret_uuid parameter. +# +# NB This default all-zeros UUID will not work. Replace it with the +# output from the UUID for the TLS secret from a 'virsh secret-list' +# command and then uncomment the entry +# +#chardev_tls_x509_secret_uuid = "00000000-0000-0000-0000-000000000000" + + +# Enable use of TLS encryption for all VxHS network block devices that +# don't specifically disable. +# +# When the VxHS network block device server is set up appropriately, +# x509 certificates are required for authentication between the clients +# (qemu processes) and the remote VxHS server. +# +# It is necessary to setup CA and issue the client certificate before +# enabling this. +# +#vxhs_tls = 1 + + +# In order to override the default TLS certificate location for VxHS +# backed storage, supply a valid path to the certificate directory. +# This is used to authenticate the VxHS block device clients to the VxHS +# server. +# +# If the provided path does not exist, libvirtd will fail to start. +# If the path is not provided, but vxhs_tls = 1, then the +# default_tls_x509_cert_dir path will be used. +# +# VxHS block device clients expect the client certificate and key to be +# present in the certificate directory along with the CA master certificate. +# If using the default environment, default_tls_x509_verify must be configured. +# Since this is only a client the server-key.pem certificate is not needed. +# Thus a VxHS directory must contain the following: +# +# ca-cert.pem - the CA master certificate +# client-cert.pem - the client certificate signed with the ca-cert.pem +# client-key.pem - the client private key +# +#vxhs_tls_x509_cert_dir = "/etc/pki/libvirt-vxhs" + + +# Uncomment and use the following option to override the default secret +# UUID provided in the default_tls_x509_secret_uuid parameter. +# +# NB This default all-zeros UUID will not work. Replace it with the +# output from the UUID for the TLS secret from a 'virsh secret-list' +# command and then uncomment the entry +# +#vxhs_tls_x509_secret_uuid = "00000000-0000-0000-0000-000000000000" + + +# Enable use of TLS encryption for all NBD disk devices that don't +# specifically disable it. +# +# When the NBD server is set up appropriately, x509 certificates are required +# for authentication between the client and the remote NBD server. +# +# It is necessary to setup CA and issue the client certificate before +# enabling this. +# +#nbd_tls = 1 + + +# In order to override the default TLS certificate location for NBD +# backed storage, supply a valid path to the certificate directory. +# This is used to authenticate the NBD block device clients to the NBD +# server. +# +# If the provided path does not exist, libvirtd will fail to start. +# If the path is not provided, but nbd_tls = 1, then the +# default_tls_x509_cert_dir path will be used. +# +# NBD block device clients expect the client certificate and key to be +# present in the certificate directory along with the CA certificate. +# Since this is only a client the server-key.pem certificate is not needed. +# Thus a NBD directory must contain the following: +# +# ca-cert.pem - the CA master certificate +# client-cert.pem - the client certificate signed with the ca-cert.pem +# client-key.pem - the client private key +# +#nbd_tls_x509_cert_dir = "/etc/pki/libvirt-nbd" + + +# Uncomment and use the following option to override the default secret +# UUID provided in the default_tls_x509_secret_uuid parameter. +# +# NB This default all-zeros UUID will not work. Replace it with the +# output from the UUID for the TLS secret from a 'virsh secret-list' +# command and then uncomment the entry +# +#nbd_tls_x509_secret_uuid = "00000000-0000-0000-0000-000000000000" + + +# In order to override the default TLS certificate location for migration +# certificates, supply a valid path to the certificate directory. If the +# provided path does not exist, libvirtd will fail to start. If the path is +# not provided, but TLS-encrypted migration is requested, then the +# default_tls_x509_cert_dir path will be used. Once/if a default certificate is +# enabled/defined, migration will then be able to use the certificate via +# migration API flags. +# +#migrate_tls_x509_cert_dir = "/etc/pki/libvirt-migrate" + + +# The default TLS configuration only uses certificates for the server +# allowing the client to verify the server's identity and establish +# an encrypted channel. +# +# It is possible to use x509 certificates for authentication too, by +# issuing an x509 certificate to every client who needs to connect. +# +# Enabling this option will reject any client that does not have a +# certificate (as described in default_tls_x509_verify) signed by the +# CA in the migrate_tls_x509_cert_dir (or default_tls_x509_cert_dir). +# +# If this option is not supplied, it will be set to the value of +# "default_tls_x509_verify". If "default_tls_x509_verify" is not supplied +# either, the default is "1". +# +#migrate_tls_x509_verify = 1 + + +# Uncomment and use the following option to override the default secret +# UUID provided in the default_tls_x509_secret_uuid parameter. +# +# NB This default all-zeros UUID will not work. Replace it with the +# output from the UUID for the TLS secret from a 'virsh secret-list' +# command and then uncomment the entry +# +#migrate_tls_x509_secret_uuid = "00000000-0000-0000-0000-000000000000" + + +# By default TLS is requested using the VIR_MIGRATE_TLS flag, thus not requested +# automatically. Setting 'migate_tls_force' to "1" will prevent any migration +# which is not using VIR_MIGRATE_TLS to ensure higher level of security in +# deployments with TLS. +# +#migrate_tls_force = 0 + + +# In order to override the default TLS certificate location for backup NBD +# server certificates, supply a valid path to the certificate directory. If the +# provided path does not exist, libvirtd will fail to start. If the path is +# not provided, but TLS-encrypted backup is requested, then the +# default_tls_x509_cert_dir path will be used. +# +#backup_tls_x509_cert_dir = "/etc/pki/libvirt-backup" + + +# The default TLS configuration only uses certificates for the server +# allowing the client to verify the server's identity and establish +# an encrypted channel. +# +# It is possible to use x509 certificates for authentication too, by +# issuing an x509 certificate to every client who needs to connect. +# +# Enabling this option will reject any client that does not have a +# certificate (as described in default_tls_x509_verify) signed by the +# CA in the backup_tls_x509_cert_dir (or default_tls_x509_cert_dir). +# +# If this option is not supplied, it will be set to the value of +# "default_tls_x509_verify". If "default_tls_x509_verify" is not supplied either, +# the default is "1". +# +#backup_tls_x509_verify = 1 + + +# Uncomment and use the following option to override the default secret +# UUID provided in the default_tls_x509_secret_uuid parameter. +# +# NB This default all-zeros UUID will not work. Replace it with the +# output from the UUID for the TLS secret from a 'virsh secret-list' +# command and then uncomment the entry +# +#backup_tls_x509_secret_uuid = "00000000-0000-0000-0000-000000000000" + + +# By default, if no graphical front end is configured, libvirt will disable +# QEMU audio output since directly talking to alsa/pulseaudio may not work +# with various security settings. If you know what you're doing, enable +# the setting below and libvirt will passthrough the QEMU_AUDIO_DRV +# environment variable when using nographics. +# +#nographics_allow_host_audio = 1 + + +# Override the port for creating both VNC and SPICE sessions (min). +# This defaults to 5900 and increases for consecutive sessions +# or when ports are occupied, until it hits the maximum. +# +# Minimum must be greater than or equal to 5900 as lower number would +# result into negative vnc display number. +# +# Maximum must be less than 65536, because higher numbers do not make +# sense as a port number. +# +#remote_display_port_min = 5900 +#remote_display_port_max = 65535 + +# VNC WebSocket port policies, same rules apply as with remote display +# ports. VNC WebSockets use similar display <-> port mappings, with +# the exception being that ports start from 5700 instead of 5900. +# +#remote_websocket_port_min = 5700 +#remote_websocket_port_max = 65535 + +# The default security driver is SELinux. If SELinux is disabled +# on the host, then the security driver will automatically disable +# itself. If you wish to disable QEMU SELinux security driver while +# leaving SELinux enabled for the host in general, then set this +# to 'none' instead. It's also possible to use more than one security +# driver at the same time, for this use a list of names separated by +# comma and delimited by square brackets. For example: +# +# security_driver = [ "selinux", "apparmor" ] +# +# Notes: The DAC security driver is always enabled; as a result, the +# value of security_driver cannot contain "dac". The value "none" is +# a special value; security_driver can be set to that value in +# isolation, but it cannot appear in a list of drivers. +# +#security_driver = "selinux" + +# If set to non-zero, then the default security labeling +# will make guests confined. If set to zero, then guests +# will be unconfined by default. Defaults to 1. +#security_default_confined = 1 + +# If set to non-zero, then attempts to create unconfined +# guests will be blocked. Defaults to 0. +#security_require_confined = 1 + +# The user for QEMU processes run by the system instance. It can be +# specified as a user name or as a user id. The qemu driver will try to +# parse this value first as a name and then, if the name doesn't exist, +# as a user id. +# +# Since a sequence of digits is a valid user name, a leading plus sign +# can be used to ensure that a user id will not be interpreted as a user +# name. +# +# Some examples of valid values are: +# +# user = "qemu" # A user named "qemu" +# user = "+0" # Super user (uid=0) +# user = "100" # A user named "100" or a user with uid=100 +# +#user = "libvirt-qemu" + +# The group for QEMU processes run by the system instance. It can be +# specified in a similar way to user. +#group = "kvm" + +# Whether libvirt should dynamically change file ownership +# to match the configured user/group above. Defaults to 1. +# Set to 0 to disable file ownership changes. +#dynamic_ownership = 1 + +# Whether libvirt should remember and restore the original +# ownership over files it is relabeling. Defaults to 1, set +# to 0 to disable the feature. +#remember_owner = 1 + +# What cgroup controllers to make use of with QEMU guests +# +# - 'cpu' - use for scheduler tunables +# - 'devices' - use for device access control +# - 'memory' - use for memory tunables +# - 'blkio' - use for block devices I/O tunables +# - 'cpuset' - use for CPUs and memory nodes +# - 'cpuacct' - use for CPUs statistics. +# +# NB, even if configured here, they won't be used unless +# the administrator has mounted cgroups, e.g.: +# +# mkdir /dev/cgroup +# mount -t cgroup -o devices,cpu,memory,blkio,cpuset none /dev/cgroup +# +# They can be mounted anywhere, and different controllers +# can be mounted in different locations. libvirt will detect +# where they are located. +# +cgroup_controllers = [ "cpu", "memory", "blkio", "cpuset", "cpuacct" ] + +# This is the basic set of devices allowed / required by +# all virtual machines. +# +# As well as this, any configured block backed disks, +# all sound device, and all PTY devices are allowed. +# +# This will only need setting if newer QEMU suddenly +# wants some device we don't already know about. +# +cgroup_device_acl = [ + "/dev/null", "/dev/full", "/dev/zero", + "/dev/random", "/dev/urandom", + "/dev/ptmx", "/dev/kvm", + "/dev/sgx_vepc", "/dev/sgx_provision", "/dev/sgx_enclave" +] +# +# RDMA migration requires the following extra files to be added to the list: +# "/dev/infiniband/rdma_cm", +# "/dev/infiniband/issm0", +# "/dev/infiniband/issm1", +# "/dev/infiniband/umad0", +# "/dev/infiniband/umad1", +# "/dev/infiniband/uverbs0" + + +# The default format for QEMU/KVM guest save images is raw; that is, the +# memory from the domain is dumped out directly to a file. If you have +# guests with a large amount of memory, however, this can take up quite +# a bit of space. If you would like to compress the images while they +# are being saved to disk, you can also set "lzop", "gzip", "bzip2", or "xz" +# for save_image_format. Note that this means you slow down the process of +# saving a domain in order to save disk space; the list above is in descending +# order by performance and ascending order by compression ratio. +# +# save_image_format is used when you use 'virsh save' or 'virsh managedsave' +# at scheduled saving, and it is an error if the specified save_image_format +# is not valid, or the requested compression program can't be found. +# +# dump_image_format is used when you use 'virsh dump' at emergency +# crashdump, and if the specified dump_image_format is not valid, or +# the requested compression program can't be found, this falls +# back to "raw" compression. +# +# snapshot_image_format specifies the compression algorithm of the memory save +# image when an external snapshot of a domain is taken. This does not apply +# on disk image format. It is an error if the specified format isn't valid, +# or the requested compression program can't be found. +# +#save_image_format = "raw" +#dump_image_format = "raw" +#snapshot_image_format = "raw" + +# When a domain is configured to be auto-dumped when libvirtd receives a +# watchdog event from qemu guest, libvirtd will save dump files in directory +# specified by auto_dump_path. Default value is /var/lib/libvirt/qemu/dump +# +#auto_dump_path = "/var/lib/libvirt/qemu/dump" + +# When a domain is configured to be auto-dumped, enabling this flag +# has the same effect as using the VIR_DUMP_BYPASS_CACHE flag with the +# virDomainCoreDump API. That is, the system will avoid using the +# file system cache while writing the dump file, but may cause +# slower operation. +# +#auto_dump_bypass_cache = 0 + +# When a domain is configured to be auto-started, enabling this flag +# has the same effect as using the VIR_DOMAIN_START_BYPASS_CACHE flag +# with the virDomainCreateWithFlags API. That is, the system will +# avoid using the file system cache when restoring any managed state +# file, but may cause slower operation. +# +#auto_start_bypass_cache = 0 + +# If provided by the host and a hugetlbfs mount point is configured, +# a guest may request huge page backing. When this mount point is +# unspecified here, determination of a host mount point in /proc/mounts +# will be attempted. Specifying an explicit mount overrides detection +# of the same in /proc/mounts. Setting the mount point to "" will +# disable guest hugepage backing. If desired, multiple mount points can +# be specified at once, separated by comma and enclosed in square +# brackets, for example: +# +# hugetlbfs_mount = ["/dev/hugepages2M", "/dev/hugepages1G"] +# +# The size of huge page served by specific mount point is determined by +# libvirt at the daemon startup. +# +# NB, within these mount points, guests will create memory backing +# files in a location of $MOUNTPOINT/libvirt/qemu +# +#hugetlbfs_mount = "/dev/hugepages" + + +# Path to the setuid helper for creating tap devices. This executable +# is used to create interfaces when libvirtd is +# running unprivileged. libvirt invokes the helper directly, instead +# of using "-netdev bridge", for security reasons. +#bridge_helper = "/usr/libexec/qemu-bridge-helper" + + +# If enabled, libvirt will have QEMU set its process name to +# "qemu:VM_NAME", where VM_NAME is the name of the VM. The QEMU +# process will appear as "qemu:VM_NAME" in process listings and +# other system monitoring tools. By default, QEMU does not set +# its process title, so the complete QEMU command (emulator and +# its arguments) appear in process listings. +# +#set_process_name = 1 + + +# If max_processes is set to a positive integer, libvirt will use +# it to set the maximum number of processes that can be run by qemu +# user. This can be used to override default value set by host OS. +# The same applies to max_files which sets the limit on the maximum +# number of opened files. +# +#max_processes = 0 +#max_files = 0 + +# If max_threads_per_process is set to a positive integer, libvirt +# will use it to set the maximum number of threads that can be +# created by a qemu process. Some VM configurations can result in +# qemu processes with tens of thousands of threads. systemd-based +# systems typically limit the number of threads per process to +# 16k. max_threads_per_process can be used to override default +# limits in the host OS. +# +#max_threads_per_process = 0 + +# If max_core is set to a non-zero integer, then QEMU will be +# permitted to create core dumps when it crashes, provided its +# RAM size is smaller than the limit set. +# +# Be warned that the core dump will include a full copy of the +# guest RAM, if the 'dump_guest_core' setting has been enabled, +# or if the guest XML contains +# +# ...guest ram... +# +# If guest RAM is to be included, ensure the max_core limit +# is set to at least the size of the largest expected guest +# plus another 1GB for any QEMU host side memory mappings. +# +# As a special case it can be set to the string "unlimited" to +# to allow arbitrarily sized core dumps. +# +# By default the core dump size is set to 0 disabling all dumps +# +# Size is a positive integer specifying bytes or the +# string "unlimited" +# +#max_core = "unlimited" + +# Determine if guest RAM is included in QEMU core dumps. By +# default guest RAM will be excluded if a new enough QEMU is +# present. Setting this to '1' will force guest RAM to always +# be included in QEMU core dumps. +# +# This setting will be ignored if the guest XML has set the +# dumpcore attribute on the element. +# +#dump_guest_core = 1 + +# mac_filter enables MAC addressed based filtering on bridge ports. +# This currently requires ebtables to be installed. +# +#mac_filter = 1 + + +# By default, PCI devices below non-ACS switch are not allowed to be assigned +# to guests. By setting relaxed_acs_check to 1 such devices will be allowed to +# be assigned to guests. +# +#relaxed_acs_check = 1 + + +# In order to prevent accidentally starting two domains that +# share one writable disk, libvirt offers two approaches for +# locking files. The first one is sanlock, the other one, +# virtlockd, is then our own implementation. Accepted values +# are "sanlock" and "lockd". +# +#lock_manager = "lockd" + + +# Set limit of maximum APIs queued on one domain. All other APIs +# over this threshold will fail on acquiring job lock. Specially, +# setting to zero turns this feature off. +# Note, that job lock is per domain. +# +#max_queued = 0 + +################################################################### +# Keepalive protocol: +# This allows qemu driver to detect broken connections to remote +# libvirtd during peer-to-peer migration. A keepalive message is +# sent to the daemon after keepalive_interval seconds of inactivity +# to check if the daemon is still responding; keepalive_count is a +# maximum number of keepalive messages that are allowed to be sent +# to the daemon without getting any response before the connection +# is considered broken. In other words, the connection is +# automatically closed approximately after +# keepalive_interval * (keepalive_count + 1) seconds since the last +# message received from the daemon. If keepalive_interval is set to +# -1, qemu driver will not send keepalive requests during +# peer-to-peer migration; however, the remote libvirtd can still +# send them and source libvirtd will send responses. When +# keepalive_count is set to 0, connections will be automatically +# closed after keepalive_interval seconds of inactivity without +# sending any keepalive messages. +# +#keepalive_interval = 5 +#keepalive_count = 5 + + + +# Use seccomp syscall filtering sandbox in QEMU. +# 1 == filter enabled, 0 == filter disabled +# +# Unless this option is disabled, QEMU will be run with +# a seccomp filter that stops it from executing certain +# syscalls. +# +#seccomp_sandbox = 1 + + +# Override the listen address for all incoming migrations. Defaults to +# 0.0.0.0, or :: if both host and qemu are capable of IPv6. +#migration_address = "0.0.0.0" + + +# The default hostname or IP address which will be used by a migration +# source for transferring migration data to this host. The migration +# source has to be able to resolve this hostname and connect to it so +# setting "localhost" will not work. By default, the host's configured +# hostname is used. +#migration_host = "host.example.com" + + +# Override the port range used for incoming migrations. +# +# Minimum must be greater than 0, however when QEMU is not running as root, +# setting the minimum to be lower than 1024 will not work. +# +# Maximum must not be greater than 65535. +# +#migration_port_min = 49152 +#migration_port_max = 49215 + + + +# Timestamp QEMU's log messages (if QEMU supports it) +# +# Defaults to 1. +# +#log_timestamp = 0 + + +# Location of master nvram file +# +# This configuration option is obsolete. Libvirt will follow the +# QEMU firmware metadata specification to automatically locate +# firmware images. See docs/interop/firmware.json in the QEMU +# source tree. These metadata files are distributed alongside any +# firmware images intended for use with QEMU. +# +# NOTE: if ANY firmware metadata files are detected, this setting +# will be COMPLETELY IGNORED. +# +# ------------------------------------------ +# +# When a domain is configured to use UEFI instead of standard +# BIOS it may use a separate storage for UEFI variables. If +# that's the case libvirt creates the variable store per domain +# using this master file as image. Each UEFI firmware can, +# however, have different variables store. Therefore the nvram is +# a list of strings when a single item is in form of: +# ${PATH_TO_UEFI_FW}:${PATH_TO_UEFI_VARS}. +# Later, when libvirt creates per domain variable store, this list is +# searched for the master image. The UEFI firmware can be called +# differently for different guest architectures. For instance, it's OVMF +# for x86_64 and i686, but it's AAVMF for aarch64. The libvirt default +# follows this scheme. +#nvram = [ +# "/usr/share/OVMF/OVMF_CODE.fd:/usr/share/OVMF/OVMF_VARS.fd", +# "/usr/share/OVMF/OVMF_CODE.secboot.fd:/usr/share/OVMF/OVMF_VARS.fd", +# "/usr/share/AAVMF/AAVMF_CODE.fd:/usr/share/AAVMF/AAVMF_VARS.fd", +# "/usr/share/AAVMF/AAVMF32_CODE.fd:/usr/share/AAVMF/AAVMF32_VARS.fd" +#] + +# The backend to use for handling stdout/stderr output from +# QEMU processes. +# +# 'file': QEMU writes directly to a plain file. This is the +# historical default, but allows QEMU to inflict a +# denial of service attack on the host by exhausting +# filesystem space +# +# 'logd': QEMU writes to a pipe provided by virtlogd daemon. +# This is the current default, providing protection +# against denial of service by performing log file +# rollover when a size limit is hit. +# +#stdio_handler = "logd" + +# QEMU gluster libgfapi log level, debug levels are 0-9, with 9 being the +# most verbose, and 0 representing no debugging output. +# +# The current logging levels defined in the gluster GFAPI are: +# +# 0 - None +# 1 - Emergency +# 2 - Alert +# 3 - Critical +# 4 - Error +# 5 - Warning +# 6 - Notice +# 7 - Info +# 8 - Debug +# 9 - Trace +# +# Defaults to 4 +# +#gluster_debug_level = 9 + +# virtiofsd debug +# +# Whether to enable the debugging output of the virtiofsd daemon. +# Possible values are 0 or 1. Disabled by default. +# +#virtiofsd_debug = 1 + +# To enhance security, QEMU driver is capable of creating private namespaces +# for each domain started. Well, so far only "mount" namespace is supported. If +# enabled it means qemu process is unable to see all the devices on the system, +# only those configured for the domain in question. Libvirt then manages +# devices entries throughout the domain lifetime. This namespace is turned on +# by default. +#namespaces = [ "mount" ] + +# This directory is used for memoryBacking source if configured as file. +# NOTE: big files will be stored here +#memory_backing_dir = "/var/lib/libvirt/qemu/ram" + +# Path to the SCSI persistent reservations helper. This helper is +# used whenever are enabled for SCSI LUN devices. +#pr_helper = "/usr/bin/qemu-pr-helper" + +# Path to the SLIRP networking helper. +#slirp_helper = "/usr/bin/slirp-helper" + +# Path to the dbus-daemon +#dbus_daemon = "/usr/bin/dbus-daemon" + +# User for the swtpm TPM Emulator +# +# Default is 'tss'; this is the same user that tcsd (TrouSerS) installs +# and uses; alternative is 'root' +# +#swtpm_user = "tss" +#swtpm_group = "tss" + +# For debugging and testing purposes it's sometimes useful to be able to disable +# libvirt behaviour based on the capabilities of the qemu process. This option +# allows to do so. DO _NOT_ use in production and beaware that the behaviour +# may change across versions. +# +#capability_filters = [ "capname" ] + +# 'deprecation_behavior' setting controls how the qemu process behaves towards +# deprecated commands and arguments used by libvirt. +# +# This setting is meant for developers and CI efforts to make it obvious when +# libvirt relies on fields which are deprecated so that it can be fixes as soon +# as possible. +# +# Possible options are: +# "none" - (default) qemu is supposed to accept and output deprecated fields +# and commands +# "omit" - qemu is instructed to omit deprecated fields on output, behaviour +# towards fields and commands from qemu is not changed +# "reject" - qemu is instructed to report an error if a deprecated command or +# field is used by libvirtd +# "crash" - qemu crashes when an deprecated command or field is used by libvirtd +# +# For both "reject" and "crash" qemu is instructed to omit any deprecated fields +# on output. +# +# The "reject" option is less harsh towards the VMs but some code paths ignore +# errors reported by qemu and thus it may not be obvious that a deprecated +# command/field was used, thus it's suggested to use the "crash" option instead. +# +# In cases when qemu doesn't support configuring the behaviour this setting is +# silently ignored to allow testing older qemu versions without having to +# reconfigure libvirtd. +# +# DO NOT use in production. +# +#deprecation_behavior = "none" +user = "root" +group = "root" diff --git a/roles/vm/compile_libvirt/files/x86_features.xml b/roles/vm/compile_libvirt/files/x86_features.xml new file mode 100644 index 00000000..7b57df18 --- /dev/null +++ b/roles/vm/compile_libvirt/files/x86_features.xml @@ -0,0 +1,619 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/roles/vm/compile_libvirt/tasks/main.yml b/roles/vm/compile_libvirt/tasks/main.yml new file mode 100644 index 00000000..b1c3c5aa --- /dev/null +++ b/roles/vm/compile_libvirt/tasks/main.yml @@ -0,0 +1,80 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: Copy preferences + copy: + src: preferences + dest: /etc/apt/preferences + mode: '0644' + +- name: Install dependencies + include_role: + name: install_dependencies + +- name: Clone libvirt fork with sgx support + git: + repo: 'https://github.com/hhb584520/libvirt.git' + dest: "{{ (project_root_dir, 'libvirt') | path_join }}" + version: sgx-dev + +- name: Disabling apparmor + systemd: + name: apparmor + enabled: no + state: stopped + +- name: Adding libvrit group + group: + name: "{{ item }}" + state: present + system: true + with_items: "{{ libvirt_groups }}" + +- name: Adding user libvirt + user: + name: libvirt-qemu + group: libvirt-qemu + groups: libvirtd, libvirt + +- name: Meson configure libvirt + command: + cmd: meson build -Dsystem=true -Ddriver_qemu=enabled --prefix=/usr/ + chdir: "{{ (project_root_dir, 'libvirt') | path_join }}" + changed_when: true + +- name: Installation of libvirt with sgx support + command: + cmd: ninja install + chdir: "{{ (project_root_dir, 'libvirt', 'build') | path_join }}" + changed_when: true + +- name: Copy qemu.conf file + copy: + src: qemu.conf + dest: /etc/libvirt/qemu.conf + mode: '0644' + +- name: Copy x86_features.xml file + copy: + src: x86_features.xml + dest: /usr/share/libvirt/cpu_map/x86_features.xml + mode: '0644' + +- name: Enabling libvirt + systemd: + name: libvirtd + enabled: yes + state: started diff --git a/roles/vm/compile_libvirt/vars/main.yml b/roles/vm/compile_libvirt/vars/main.yml new file mode 100644 index 00000000..88851d30 --- /dev/null +++ b/roles/vm/compile_libvirt/vars/main.yml @@ -0,0 +1,48 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +install_dependencies: + Debian: + - virt-manager + - libxml2-utils + - xsltproc + - pkg-config + - libglib2.0-dev + - libgnutls28-dev + - libxml2-dev + - python3-docutils + - libyajl-dev + - libsanlock-dev + - libssh2-1-dev + - libssh-dev + - libparted-dev + - libcap-ng-dev + - libcurl4-openssl-dev + - augeas-tools + - flake8 + - open-iscsi + - libapparmor-dev + - libaudit-dev + - libavahi-client-dev + - libnl-3-dev + - libnl-route-3-dev + - libnuma-dev + - libpcap-dev + - libpciaccess-dev + - librados-dev + - librbd-dev + - libsasl2-dev + - libsystemd-dev + - dnsmasq diff --git a/roles/vm/conf_libvirt/tasks/main.yml b/roles/vm/conf_libvirt/tasks/main.yml index c37e92ba..ec6474c9 100644 --- a/roles/vm/conf_libvirt/tasks/main.yml +++ b/roles/vm/conf_libvirt/tasks/main.yml @@ -14,10 +14,18 @@ ## limitations under the License. ## --- -- name: install dependencies +- name: Install dependencies include_role: name: install_dependencies +- name: Install packages only if sgx not enabled + apt: + pkg: + - libvirt-daemon-system + - libvirt-clients + when: + - ansible_facts['os_family'] == "Debian" + - not sgx_dp_enabled | default(false) # This task might not be needed as libvirt will use root group - name: Add root user to libvirt group @@ -96,35 +104,53 @@ group: root mode: '0644' -- name: Destroy vm-default network +- name: Check current vm-default network virt_net: - command: destroy - name: "vm-default" - ignore_errors: true - -- name: Undefine vm-default network - virt_net: - command: undefine - name: "vm-default" - ignore_errors: true - -- name: Read vm-default.xml - command: cat {{ (vm_project_root_dir, 'vm-default.xml') | path_join }} - register: vm_default_xml - changed_when: false - -- name: Define vm-default network - virt_net: - command: define - name: "vm-default" - xml: "{{ vm_default_xml.stdout }}" - -- name: Start vm-default network - virt_net: - command: start - name: "vm-default" - -- name: Autostart vm-default network - virt_net: - autostart: yes - name: "vm-default" + command: list_nets + register: net_list + failed_when: false + +- name: Handle vm-default network + block: + - name: Destroy vm-default network + virt_net: + command: destroy + name: "vm-default" + failed_when: false + + - name: Undefine vm-default network + virt_net: + command: undefine + name: "vm-default" + failed_when: false + + - name: Read vm-default.xml + command: cat {{ (vm_project_root_dir, 'vm-default.xml') | path_join }} + register: vm_default_xml + changed_when: false + + - name: Define vm-default network + virt_net: + command: define + name: "vm-default" + xml: "{{ vm_default_xml.stdout }}" + + - name: Start vm-default network + virt_net: + command: start + name: "vm-default" + + - name: Autostart vm-default network + virt_net: + autostart: yes + name: "vm-default" + when: + - (not 'vm-default' in net_list.list_nets) or + vm_recreate_existing | default(true) + +- name: Current vm-default network + debug: + msg: "Current vm-default network was not changed" + when: + - ('vm-default' in net_list.list_nets) + - not vm_recreate_existing | default(true) diff --git a/roles/vm/conf_libvirt/vars/main.yml b/roles/vm/conf_libvirt/vars/main.yml index 865ab0fc..90758414 100644 --- a/roles/vm/conf_libvirt/vars/main.yml +++ b/roles/vm/conf_libvirt/vars/main.yml @@ -16,9 +16,7 @@ --- install_dependencies: Debian: - - qemu-kvm - - libvirt-daemon-system - - libvirt-clients + - qemu-system-x86 - genisoimage - virt-manager - bridge-utils @@ -35,7 +33,6 @@ install_dependencies: - python3-lxml - perl - numactl - - "@virt" - virt-top - libguestfs-tools - virt-install diff --git a/roles/vm/manage_bridges/tasks/configure_bridges.yml b/roles/vm/manage_bridges/tasks/configure_bridges.yml index cfff2434..269d1db2 100644 --- a/roles/vm/manage_bridges/tasks/configure_bridges.yml +++ b/roles/vm/manage_bridges/tasks/configure_bridges.yml @@ -14,40 +14,62 @@ ## limitations under the License. ## --- -- name: Stop VXLAN bridge - virt_net: - command: destroy - name: "vxlanbr{{ item.vxlan }}" - ignore_errors: true +- name: Set VXLAN bridge name for {{ item.name }} + set_fact: + vxlan_bridge_name: "vxlanbr{{ item.vxlan }}" -- name: Undefine VXLAN bridge +- name: Check current VXLAN bridge network - {{ vxlan_bridge_name }} virt_net: - command: undefine - name: "vxlanbr{{ item.vxlan }}" - ignore_errors: true + command: list_nets + register: vxlan_net_list + failed_when: false -- name: Define simple VXLAN bridge if needed - virt_net: - command: define - name: "vxlanbr{{ item.vxlan }}" - xml: '{{ lookup("template", "simple-bridge.xml.j2") }}' - when: - - item.vxlan not in dhcp +- name: Handle VXLAN bridge network + block: + - name: Stop VXLAN bridge + virt_net: + command: destroy + name: "{{ vxlan_bridge_name }}" + failed_when: false -- name: Define dhcp VXLAN bridge if needed - virt_net: - command: define - name: "vxlanbr{{ item.vxlan }}" - xml: '{{ lookup("template", "dhcp-bridge.xml.j2") }}' - when: - - item.vxlan in dhcp + - name: Undefine VXLAN bridge + virt_net: + command: undefine + name: "{{ vxlan_bridge_name }}" + failed_when: false -- name: Create VXLAN bridge - virt_net: - command: create - name: "vxlanbr{{ item.vxlan }}" + - name: Define simple VXLAN bridge if needed + virt_net: + command: define + name: "{{ vxlan_bridge_name }}" + xml: '{{ lookup("template", "simple-bridge.xml.j2") }}' + when: + - item.vxlan not in dhcp -- name: Autostart VXLAN bridge - virt_net: - autostart: yes - name: "vxlanbr{{ item.vxlan }}" + - name: Define dhcp VXLAN bridge if needed + virt_net: + command: define + name: "{{ vxlan_bridge_name }}" + xml: '{{ lookup("template", "dhcp-bridge.xml.j2") }}' + when: + - item.vxlan in dhcp + + - name: Create VXLAN bridge + virt_net: + command: create + name: "{{ vxlan_bridge_name }}" + + - name: Autostart VXLAN bridge + virt_net: + autostart: yes + name: "{{ vxlan_bridge_name }}" + when: + - (not vxlan_bridge_name in vxlan_net_list.list_nets) or + vm_recreate_existing | default(true) + +- name: Current VXLAN bridge network + debug: + msg: "Current VXLAN bridge network - {{ vxlan_bridge_name }} was not changed" + when: + - (vxlan_bridge_name in vxlan_net_list.list_nets) + - not vm_recreate_existing | default(true) diff --git a/roles/vm/manage_imgs/tasks/main.yml b/roles/vm/manage_imgs/tasks/main.yml index 9e9f47df..d85f1e06 100644 --- a/roles/vm/manage_imgs/tasks/main.yml +++ b/roles/vm/manage_imgs/tasks/main.yml @@ -36,41 +36,6 @@ - name: Include vars for VM image links include_vars: "vm_image_links_vars.yml" - -- name: Make sure user directories exist - file: - path: "{{ vm_project_root_dir }}/{{ item.type }}/{{ item.name }}" - state: directory - owner: root - group: root - mode: 0700 - recurse: yes - loop: "{{ vms }}" - -- name: Destroy VMs if changing state - command: virsh destroy {{ item.name }} - loop: "{{ vms }}" - changed_when: true - ignore_errors: true - -- name: Undefine VMs if changing state - command: virsh undefine {{ item.name }} - loop: "{{ vms }}" - changed_when: true - ignore_errors: true - -- name: Remove VM disk images if changing state - file: - path: "{{ vm_project_root_dir }}/{{ item.type }}/{{ item.name }}/cek.qcow2" - state: absent - loop: "{{ vms }}" - -- name: Remove VM config images if changing state - file: - path: "{{ vm_project_root_dir }}/{{ item.type }}/{{ item.name }}/cek.iso" - state: absent - loop: "{{ vms }}" - - name: Generate SSH keypair if not present openssh_keypair: path: /root/.ssh/id_rsa @@ -84,20 +49,6 @@ delegate_to: "{{ item }}" loop: "{{ groups['vm_host'] }}" -- name: Generate cloud-init user-data - template: - src: user-data.j2 - dest: "{{ vm_project_root_dir }}/{{ item.type }}/{{ item.name }}/user-data" - mode: 0644 - loop: "{{ vms }}" - -- name: Generate cloud-init meta-data - template: - src: meta-data.j2 - dest: "{{ vm_project_root_dir }}/{{ item.type }}/{{ item.name }}/meta-data" - mode: 0644 - loop: "{{ vms }}" - - name: check if {{ vm_image_destination_file }} is already downloaded stat: path: "{{ vm_image_destination_file }}" @@ -123,6 +74,7 @@ args: executable: /bin/bash register: vm_image_checksum + changed_when: false - name: download vm_image file get_url: @@ -150,16 +102,8 @@ msg: "Image not found in playbook directory" when: not img.stat.exists -- name: Create disk images for VMs - command: > - qemu-img create -f qcow2 -F qcow2 -o backing_file={{ vm_project_root_dir }}/{{ vm_image }} - {{ vm_project_root_dir }}/{{ item.type }}/{{ item.name }}/cek.qcow2 256G - loop: "{{ vms }}" - changed_when: true - -- name: Create config images for VMs - command: > - genisoimage -output {{ vm_project_root_dir }}//{{ item.type }}/{{ item.name }}/cek.iso -volid cidata -joliet - -rock {{ vm_project_root_dir }}/{{ item.type }}/{{ item.name }}/meta-data {{ vm_project_root_dir }}/{{ item.type }}/{{ item.name }}/user-data +- name: Prepare images for each VM + include_tasks: prepare_vm_img.yml loop: "{{ vms }}" - changed_when: true + loop_control: + loop_var: vm diff --git a/roles/vm/manage_imgs/tasks/prepare_vm_img.yml b/roles/vm/manage_imgs/tasks/prepare_vm_img.yml new file mode 100644 index 00000000..bfe8d3ba --- /dev/null +++ b/roles/vm/manage_imgs/tasks/prepare_vm_img.yml @@ -0,0 +1,97 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: Check VM status - {{ vm.name }} + command: virsh list + register: current_vms + changed_when: false + failed_when: false + +- name: Handle VMs + block: + - name: Make sure user directories exist + file: + path: "{{ vm_project_root_dir }}/{{ vm.type }}/{{ vm.name }}" + state: directory + owner: root + group: root + mode: 0700 + recurse: yes + + - name: Destroy VMs if changing state + command: virsh destroy {{ vm.name }} + changed_when: true + register: destroy_result + failed_when: (destroy_result.stderr != '') and + (('domain is not running' not in destroy_result.stderr) and + ('failed to get domain' not in destroy_result.stderr)) + + - name: Undefine VMs if changing state + command: virsh undefine {{ vm.name }} + changed_when: true + register: undefine_result + failed_when: (undefine_result.stderr != '') and + ('failed to get domain' not in undefine_result.stderr) + + - name: Remove VM disk images if changing state + file: + path: "{{ vm_project_root_dir }}/{{ vm.type }}/{{ vm.name }}/cek.qcow2" + state: absent + + - name: Remove VM config images if changing state + file: + path: "{{ vm_project_root_dir }}/{{ vm.type }}/{{ vm.name }}/cek.iso" + state: absent + + - name: Generate cloud-init user-data + template: + src: user-data.j2 + dest: "{{ vm_project_root_dir }}/{{ vm.type }}/{{ vm.name }}/user-data" + mode: 0644 + + - name: Generate cloud-init meta-data + template: + src: meta-data.j2 + dest: "{{ vm_project_root_dir }}/{{ vm.type }}/{{ vm.name }}/meta-data" + mode: 0644 + + - name: Create disk images for VMs + command: > + qemu-img create -f qcow2 -F qcow2 -o backing_file={{ vm_project_root_dir }}/{{ vm_image }} + {{ vm_project_root_dir }}/{{ vm.type }}/{{ vm.name }}/cek.qcow2 256G + changed_when: true + + - name: Create config images for VMs + command: > + genisoimage -output {{ vm_project_root_dir }}//{{ vm.type }}/{{ vm.name }}/cek.iso -volid cidata -joliet + -rock {{ vm_project_root_dir }}/{{ vm.type }}/{{ vm.name }}/meta-data {{ vm_project_root_dir }}/{{ vm.type }}/{{ vm.name }}/user-data + changed_when: true + when: + - (not vm.name in current_vms.stdout) or + vm_recreate_existing | default(true) + +- name: Current VM - {{ vm.name }} + debug: + msg: "Current VM - {{ vm.name }} was not changed" + when: + - (vm.name in current_vms.stdout) + - not vm_recreate_existing | default(true) + +- name: Set current_vms_to_skip for VMs in next play + set_fact: + current_vms_to_skip: "{{ current_vms.stdout }}" + delegate_to: "{{ vm.name }}" + delegate_facts: True diff --git a/roles/vm/manage_imgs/templates/backup-user-data.j2 b/roles/vm/manage_imgs/templates/backup-user-data.j2 index 26ffb6fb..2ada4ad1 100644 --- a/roles/vm/manage_imgs/templates/backup-user-data.j2 +++ b/roles/vm/manage_imgs/templates/backup-user-data.j2 @@ -24,10 +24,6 @@ runcmd: - sed -i 's/ssh_args.*/& -o ServerAliveInterval=60 -o ServerAliveCountMax=10/g' /root/container-experience-kits/ansible.cfg write_files: - - path: /etc/environment - content: | - PROFILE=full_nfv - append: true - path: /root/.ssh/id_rsa content: | {{ lookup('file', '/home/{{ item.0.id }}/.ssh/id_rsa') }} diff --git a/roles/vm/manage_imgs/templates/meta-data.j2 b/roles/vm/manage_imgs/templates/meta-data.j2 index 17cae4a9..35851c21 100644 --- a/roles/vm/manage_imgs/templates/meta-data.j2 +++ b/roles/vm/manage_imgs/templates/meta-data.j2 @@ -1,2 +1,2 @@ -local-hostname: {{ item.name }} +local-hostname: {{ vm.name }} disable_root: 0 diff --git a/roles/vm/manage_imgs/templates/user-data.j2 b/roles/vm/manage_imgs/templates/user-data.j2 index 6dbe1fa7..c9fd306f 100644 --- a/roles/vm/manage_imgs/templates/user-data.j2 +++ b/roles/vm/manage_imgs/templates/user-data.j2 @@ -19,7 +19,7 @@ runcmd: {% if vm_image_distribution == "ubuntu" %} - netplan apply {% endif %} -{% if item.type == "work" %} +{% if vm.type == "work" %} - systemctl daemon-reload - systemctl enable unsafe-iommu.service - systemctl start unsafe-iommu.service @@ -68,7 +68,6 @@ write_files: {% endif %} - path: /etc/environment content: | - PROFILE=full_nfv http_proxy={{ http_proxy | default('') }} https_proxy={{ https_proxy | default('') }} no_proxy= @@ -81,7 +80,7 @@ write_files: content: {{ lookup('file', '/root/.ssh/id_rsa.pub') }} permissions: '0644' -{% if item.type == "work" %} +{% if vm.type == "work" %} - path: /etc/systemd/system/unsafe-iommu.service content: | [Unit] diff --git a/roles/vm/manage_imgs/templates/vm_image_links_vars.yml.j2 b/roles/vm/manage_imgs/templates/vm_image_links_vars.yml.j2 index 25252b69..01149400 100644 --- a/roles/vm/manage_imgs/templates/vm_image_links_vars.yml.j2 +++ b/roles/vm/manage_imgs/templates/vm_image_links_vars.yml.j2 @@ -1,18 +1,30 @@ --- {% if vm_image_distribution == "ubuntu" %} {% if vm_image_version_ubuntu == "22.04" %} -vm_image_url: "https://cloud-images.ubuntu.com/jammy/current/{{ vm_image }}" +vm_image_url: "https://cloud-images.ubuntu.com/jammy/current/{{ vm_image }}" vm_image_checksums: "https://cloud-images.ubuntu.com/jammy/current/MD5SUMS" + {% else %} -vm_image_url: "https://cloud-images.ubuntu.com/releases/{{ vm_image_version_ubuntu }}/release/{{ vm_image }}" +vm_image_url: "https://cloud-images.ubuntu.com/releases/{{ vm_image_version_ubuntu }}/release/{{ vm_image }}" vm_image_checksums: "https://cloud-images.ubuntu.com/releases/{{ vm_image_version_ubuntu }}/release/MD5SUMS" + {% endif %} -vm_image_destination_file: "{{ vm_project_root_dir }}/{{ vm_image }}" + {% elif vm_image_distribution == "rocky" %} -vm_image_url: "https://dl.rockylinux.org/vault/rocky/{{ vm_image_version_rocky }}/images/{{ vm_image }}" +{% if vm_image_version_rocky == "9.0" %} + +vm_image_url: "https://dl.rockylinux.org/pub/rocky/9/images/x86_64/{{ vm_image }}" +vm_image_checksums: "https://dl.rockylinux.org/pub/rocky/9/images/x86_64/CHECKSUM" +{% else %} + +vm_image_url: "https://dl.rockylinux.org/vault/rocky/{{ vm_image_version_rocky }}/images/{{ vm_image }}" vm_image_checksums: "https://dl.rockylinux.org/vault/rocky/{{ vm_image_version_rocky }}/images/CHECKSUM" -vm_image_destination_file: "{{ vm_project_root_dir }}/{{ vm_image }}" + {% endif %} + +{% endif %} + +vm_image_destination_file: "{{ vm_project_root_dir }}/{{ vm_image }}" diff --git a/roles/vm/manage_imgs/templates/vm_image_vars.yml.j2 b/roles/vm/manage_imgs/templates/vm_image_vars.yml.j2 index c7d43b6a..3c3b4e3f 100644 --- a/roles/vm/manage_imgs/templates/vm_image_vars.yml.j2 +++ b/roles/vm/manage_imgs/templates/vm_image_vars.yml.j2 @@ -11,10 +11,15 @@ vm_image_checksum_type: "md5" vm_image_checksum_cut_line: "-f1" vm_os_variant: "ubuntu20.04" {% elif vm_image_distribution|lower == "rocky" %} +vm_image_checksum_type: "sha256" +vm_image_checksum_cut_line: "-f4" vm_image_distribution: "{{ vm_image_distribution|lower }}" vm_image_version: "{{ vm_image_version_rocky }}" +{% if vm_image_version_rocky == "9.0" %} +vm_image: "Rocky-9-GenericCloud-{{ vm_image_version_rocky }}-20220830.0.x86_64.qcow2" +vm_os_variant: "rocky9.0" +{% else %} vm_image: "Rocky-8-GenericCloud-{{ vm_image_version_rocky }}-20211114.2.x86_64.qcow2" -vm_image_checksum_type: "sha256" -vm_image_checksum_cut_line: "-f4" vm_os_variant: "rocky8.5" {% endif %} +{% endif %} diff --git a/roles/vm/manage_vms/tasks/main.yml b/roles/vm/manage_vms/tasks/main.yml index 72e0b5b8..8fca557e 100644 --- a/roles/vm/manage_vms/tasks/main.yml +++ b/roles/vm/manage_vms/tasks/main.yml @@ -14,61 +14,14 @@ ## limitations under the License. ## --- -- name: Allocate requested number of CPUs - cpupin: - name: "{{ item.name }}" - number: "{{ item.cpu_total if item.cpu_total is defined else omit }}" - cpus: "{{ item.cpus if item.cpus is defined else omit }}" - numa: "{{ item.numa if item.numa is defined else omit }}" - number_host_os: "{{ cpu_host_os if cpu_host_os is defined else omit }}" - alloc_all: "{{ item.alloc_all if item.alloc_all is defined else omit }}" - pinning: false - loop: "{{ vms }}" - changed_when: true - register: allocated_cpus - throttle: 1 - -- name: Initialize new_vms variable - set_fact: - cpupin_vms: [] - changed_when: true - -- name: Merge data structures - include: merge_dicts.yml - loop: "{{ vms }}" - loop_control: - loop_var: vm +- name: Allocate cpus in case it's not done already by isolcpus + include_role: + name: bootstrap/allocate_cpus + when: + - isolcpus_cpus_total is not defined and not isolcpus_cpus_total - name: Start VMs - command: > - virt-install - --connect qemu:///system - --name {{ item.name }} - --cpu host - --ram {{ item.memory }} - --vcpus={{ item.cpu_total }},sockets=1,cores={{ (item.cpu_total / 2)|int }},threads=2 - --cpuset={{ item.cpus }} - --os-variant {{ vm_os_variant }} - --disk path={{ vm_project_root_dir }}/{{ item.type }}/{{ item.name }}/cek.qcow2,format=qcow2 - --disk {{ vm_project_root_dir }}/{{ item.type }}/{{ item.name }}/cek.iso,device=cdrom - --network network=vm-default,model=virtio - {%- if item.type == "work" %} - {%- for pci in item.pci %} - --hostdev {{ pci }},address.type=pci - {%- endfor -%} - {%- endif %} - --network network=vxlanbr{{ item.vxlan }},model=virtio - --import --noautoconsole - loop: "{{ cpupin_vms }}" - changed_when: true - -- name: Make VMs persistent accross VM host reboot - command: virsh autostart {{ item.name }} - loop: "{{ cpupin_vms }}" - changed_when: true - -- name: Optimize VMs - include_tasks: optimize.yml - loop: "{{ cpupin_vms }}" + include: start_vm.yml + loop: "{{ cpupin_vms | default([]) }}" loop_control: loop_var: vm diff --git a/roles/vm/manage_vms/tasks/start_vm.yml b/roles/vm/manage_vms/tasks/start_vm.yml new file mode 100644 index 00000000..bd1d3d3e --- /dev/null +++ b/roles/vm/manage_vms/tasks/start_vm.yml @@ -0,0 +1,56 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: Handle VM start + block: + - name: Start VM - {{ vm.name }} + command: > + virt-install + --connect qemu:///system + --name {{ vm.name }} + --cpu host + --ram {{ vm.memory }} + --vcpus={{ vm.cpu_total }},sockets=1,cores={{ (vm.cpu_total / 2) | int }},threads=2 + --cpuset={{ vm.cpus }} + --os-variant {{ vm_os_variant }} + --disk path={{ vm_project_root_dir }}/{{ vm.type }}/{{ vm.name }}/cek.qcow2,format=qcow2 + --disk {{ vm_project_root_dir }}/{{ vm.type }}/{{ vm.name }}/cek.iso,device=cdrom + --network network=vm-default,model=virtio + {%- if vm.type == "work" %} + {%- for pci in vm.pci %} + --hostdev {{ pci }},address.type=pci + {%- endfor -%} + {%- endif %} + --network network=vxlanbr{{ vm.vxlan }},model=virtio + --import --noautoconsole + changed_when: true + + - name: Make VM persistent accross VM host reboot - {{ vm.name }} + command: virsh autostart {{ vm.name }} + changed_when: true + + - name: Optimize VM - {{ vm.name }} + include_tasks: optimize.yml + when: + - (not vm.name in current_vms.stdout) or + vm_recreate_existing | default(true) + +- name: Current VM start - {{ vm.name }} + debug: + msg: "Current VM - {{ vm.name }} was already running" + when: + - (vm.name in current_vms.stdout) + - not vm_recreate_existing | default(true) diff --git a/roles/vm/prepare_bastion_host_config/tasks/main.yml b/roles/vm/prepare_bastion_host_config/tasks/main.yml index f859dd57..993f69da 100644 --- a/roles/vm/prepare_bastion_host_config/tasks/main.yml +++ b/roles/vm/prepare_bastion_host_config/tasks/main.yml @@ -29,24 +29,45 @@ with_items: "{{ vm_ips | dict2items }}" delegate_to: localhost +- name: Initialize bastion host usage + set_fact: + used_bastion: "{{ 'false' | bool }}" + +- name: Check bastion host usage + set_fact: + used_bastion: "{% if item.key in current_vms.stdout %}{{ used_bastion or 'true' | bool }}{% else %}{{ used_bastion }}{% endif %}" + with_items: "{{ vm_ips | dict2items }}" + - name: Remove old ECDSA key fingerprint on for bastion host command: ssh-keygen -f "~/.ssh/known_hosts" -R "{{ ansible_host }}" delegate_to: localhost changed_when: true + when: + - (not used_bastion | bool) or + vm_recreate_existing | default(true) - name: Store ECDSA key fingerprint for bastion host command: ssh -o StrictHostKeyChecking=No {{ ansible_host }} hostname delegate_to: localhost changed_when: true + when: + - (not used_bastion | bool) or + vm_recreate_existing | default(true) - name: Remove old ECDSA key fingerprint on localhost command: ssh-keygen -f "~/.ssh/known_hosts" -R "{{ item.key }}" with_items: "{{ vm_ips | dict2items }}" delegate_to: localhost changed_when: true + when: + - (not item.key in current_vms.stdout) or + vm_recreate_existing | default(true) - name: Store ECDSA key fingerprint on localhost command: ssh -o StrictHostKeyChecking=No {{ item.key }} hostname with_items: "{{ vm_ips | dict2items }}" delegate_to: localhost changed_when: true + when: + - (not item.key in current_vms.stdout) or + vm_recreate_existing | default(true) diff --git a/roles/vm/prepare_bastion_host_config_vxlan/tasks/main.yml b/roles/vm/prepare_bastion_host_config_vxlan/tasks/main.yml index 4844d5c3..08ad9fba 100644 --- a/roles/vm/prepare_bastion_host_config_vxlan/tasks/main.yml +++ b/roles/vm/prepare_bastion_host_config_vxlan/tasks/main.yml @@ -32,18 +32,27 @@ with_items: "{{ vm_vxlan_ips | dict2items }}" delegate_to: localhost changed_when: true + when: + - (not item.key in current_vms.stdout) or + vm_recreate_existing | default(true) - name: Store ECDSA key fingerprint for VXLAN command: ssh -o StrictHostKeyChecking=No {{ item.value }} hostname with_items: "{{ vm_vxlan_ips | dict2items }}" delegate_to: "{{ groups['vm_host'][0] }}" changed_when: true + when: + - (not item.key in current_vms.stdout) or + vm_recreate_existing | default(true) - name: Store ECDSA key fingerprint on localhost for VXLAN command: ssh -o StrictHostKeyChecking=No {{ item.key }} hostname with_items: "{{ vm_vxlan_ips | dict2items }}" delegate_to: localhost changed_when: true + when: + - (not item.key in current_vms.stdout) or + vm_recreate_existing | default(true) - name: Update vm inventory file - all ini_file: diff --git a/roles/vm/prepare_cek/tasks/main.yml b/roles/vm/prepare_cek/tasks/main.yml index 9e980115..0496a89d 100644 --- a/roles/vm/prepare_cek/tasks/main.yml +++ b/roles/vm/prepare_cek/tasks/main.yml @@ -28,7 +28,7 @@ - name: Store primary IPs of running VMs set_fact: - vm_ips: "{{ vm_ips|default({}) | combine( {item.item.name: item.stdout.splitlines() | first} ) }}" + vm_ips: "{{ vm_ips | default({}) | combine( {item.item.name: item.stdout.splitlines() | first} ) }}" when: item.changed and item.item.name is defined # noqa 503 loop: "{{ vm_out.results }}" @@ -56,11 +56,17 @@ command: ssh-keygen -f "~/.ssh/known_hosts" -R "{{ item.value }}" with_items: "{{ vm_ips | dict2items }}" changed_when: true + when: + - (not item.key in current_vms.stdout) or + vm_recreate_existing | default(true) - name: Store ECDSA key fingerprint command: ssh -o StrictHostKeyChecking=No {{ item.value }} hostname with_items: "{{ vm_ips | dict2items }}" changed_when: true + when: + - (not item.key in current_vms.stdout) or + vm_recreate_existing | default(true) - name: Set VM host path set_fact: diff --git a/roles/vm/prepare_cek_vxlan/tasks/main.yml b/roles/vm/prepare_cek_vxlan/tasks/main.yml index f0c38382..1ba51d0b 100644 --- a/roles/vm/prepare_cek_vxlan/tasks/main.yml +++ b/roles/vm/prepare_cek_vxlan/tasks/main.yml @@ -51,7 +51,7 @@ - name: Store VXLAN MACs of running VMs set_fact: - vm_vxlan_macs: "{{ vm_vxlan_macs|default({}) | combine( {item.item.name: item.stdout} ) }}" + vm_vxlan_macs: "{{ vm_vxlan_macs | default({}) | combine( {item.item.name: item.stdout} ) }}" when: item.changed and item.item.name is defined # noqa 503 loop: "{{ vm_vxlan_mac_out.results }}" @@ -60,7 +60,7 @@ var: vm_vxlan_macs - name: Get VXLAN IPs of running VMs - shell: set -o pipefail && ip a | grep -n1 {{ vm_vxlan_macs[item.name] }} | grep 'inet ' | awk '{print $3}' | awk -F'/' '{print $1}' + shell: set -o pipefail && ip a | grep -n2 {{ vm_vxlan_macs[item.name] }} | grep 'inet ' | awk '{print $3}' | awk -F'/' '{print $1}' args: executable: /bin/bash register: vm_vxlan_ip_out @@ -73,7 +73,7 @@ - name: Store VXLAN IPs of running VMs set_fact: - vm_vxlan_ips: "{{ vm_vxlan_ips|default({}) | combine( {item.item.name: item.stdout} ) }}" + vm_vxlan_ips: "{{ vm_vxlan_ips | default({}) | combine( {item.item.name: item.stdout} ) }}" when: item.changed and item.item.name is defined # noqa 503 loop: "{{ vm_vxlan_ip_out.results }}" @@ -93,6 +93,9 @@ command: ssh-keygen -f "~/.ssh/known_hosts" -R "{{ item.value }}" with_items: "{{ vm_vxlan_ips | dict2items }}" changed_when: true + when: + - (not item.key in current_vms.stdout) or + vm_recreate_existing | default(true) - name: Wait up to 300 seconds for port 22 to become available for VXLAN interface wait_for: diff --git a/roles/vm/vm_sgx_enable/tasks/main.yml b/roles/vm/vm_sgx_enable/tasks/main.yml new file mode 100644 index 00000000..fee3f7bc --- /dev/null +++ b/roles/vm/vm_sgx_enable/tasks/main.yml @@ -0,0 +1,24 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: Adding SGX memory definition to VM domain + include: vm-domain-edit.yml + loop: '{{ vms }}' + loop_control: + loop_var: vm + when: + - vm.type == 'work' + - sgx_dp_enabled | default(false) diff --git a/roles/vm/vm_sgx_enable/tasks/vm-domain-edit.yml b/roles/vm/vm_sgx_enable/tasks/vm-domain-edit.yml new file mode 100644 index 00000000..51cce6fd --- /dev/null +++ b/roles/vm/vm_sgx_enable/tasks/vm-domain-edit.yml @@ -0,0 +1,72 @@ +## +## Copyright (c) 2020-2022 Intel Corporation. +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +- name: Handle SGX configuration for VM + block: + - name: Dump domain XML - {{ vm.name }} + shell: virsh dumpxml "{{ vm.name }}" > "{{ (vm_project_root_dir, vm.name) | path_join }}.xml" + changed_when: true + + - name: Adding memory node to domain XML + community.general.xml: + path: "{{ (vm_project_root_dir, vm.name) | path_join }}.xml" + xpath: /domain/devices + add_children: + - memory: + model: 'sgx-epc' + + - name: Adding target node to domain XML + community.general.xml: + path: "{{ (vm_project_root_dir, vm.name) | path_join }}.xml" + xpath: /domain/devices/memory + add_children: + - target: + + - name: Adding size node to domain XML + community.general.xml: + path: "{{ (vm_project_root_dir, vm.name) | path_join }}.xml" + xpath: /domain/devices/memory/target + add_children: + - size: + unit: 'MiB' + + - name: Setting memory size to domain XML + community.general.xml: + path: "{{ (vm_project_root_dir, vm.name) | path_join }}.xml" + xpath: /domain/devices/memory/target/size + value: "{{ sgx_memory_size | string }}" + + - name: VM destroy for sgx modifications + command: virsh destroy "{{ vm.name }}" + changed_when: true + + - name: VM undefine for sgx modifications + command: virsh undefine "{{ vm.name }}" + changed_when: true + + - name: VM create with sgx modifications + command: virsh create "{{ (vm_project_root_dir, vm.name) | path_join }}.xml" + changed_when: true + when: + - (not vm.name in current_vms.stdout) or + vm_recreate_existing | default(true) + +- name: Current VM SGX - {{ vm.name }} + debug: + msg: "Current VM - {{ vm.name }} was already running. Nothing was changed" + when: + - (vm.name in current_vms.stdout) + - not vm_recreate_existing | default(true) diff --git a/roles/whereabouts_install/defaults/main.yml b/roles/whereabouts_install/defaults/main.yml index 6600e204..c73d62b3 100644 --- a/roles/whereabouts_install/defaults/main.yml +++ b/roles/whereabouts_install/defaults/main.yml @@ -16,3 +16,6 @@ --- whereabouts_git_url: "https://github.com/k8snetworkplumbingwg/helm-charts.git" whereabouts_commit_hash: "05cc22a9c8165c5cba875bebfa58d1b504a2e6c9" + +whereabouts_release_name: "whereabouts" +whereabouts_release_namespace: "kube-system" diff --git a/roles/whereabouts_install/tasks/whereabouts.yml b/roles/whereabouts_install/tasks/whereabouts.yml index cfebbdc9..5b497635 100644 --- a/roles/whereabouts_install/tasks/whereabouts.yml +++ b/roles/whereabouts_install/tasks/whereabouts.yml @@ -36,8 +36,8 @@ - name: install Whereabouts Helm chart command: >- - helm install whereabouts + helm install {{ whereabouts_release_name }} {{ (project_root_dir, 'charts', 'whereabouts', 'whereabouts') | path_join }} - --namespace kube-system + --namespace {{ whereabouts_release_namespace }} --set installCRDs=true changed_when: true diff --git a/verification/check_cluster/README.md b/verification/check_cluster/README.md index 2ed3600c..18fe0da7 100644 --- a/verification/check_cluster/README.md +++ b/verification/check_cluster/README.md @@ -2,65 +2,65 @@ Once deployment is complete, check the status of nodes in the cluster: ``` # kubectl get nodes -o wide -NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME -node1 Ready worker 60m v1.22.3 10.166.30.34 Red Hat Enterprise Linux 8.5 (Ootpa) 4.18.0-348.el8.x86_64 docker://20.10.12 -controller Ready control-plane,master 61m v1.22.3 10.166.31.41 Red Hat Enterprise Linux 8.5 (Ootpa) 4.18.0-348.el8.x86_64 docker://20.10.12 +NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME +node1 Ready worker 12h v1.24.3 10.166.30.64 Ubuntu 22.04 LTS 5.15.0-25-generic docker://20.10.17 +controller Ready control-plane 12h v1.24.3 10.166.31.58 Ubuntu 22.04 LTS 5.15.0-25-generic docker://20.10.17 ``` Also check the status of pods running in the cluster. All should be in `Running` or `Completed` status: ``` # kubectl get pods --all-namespaces -NAMESPACE NAME READY STATUS RESTARTS AGE -cert-manager cert-manager-56b686b465-5qxxt 1/1 Running 0 21m -cert-manager cert-manager-cainjector-75c94654d-tztfg 1/1 Running 0 21m -cert-manager cert-manager-webhook-69bd5c9d75-m7rhf 1/1 Running 0 21m -intel-power controller-manager-f584c9458-52wdq 1/1 Running 0 11m -intel-power power-node-agent-2pm84 2/2 Running 0 11m -istio-system istio-ingressgateway-88cc46fd6-n9pfn 1/1 Running 0 5m31s -istio-system istiod-576d9f454-w9nvt 1/1 Running 0 5m37s -kmra kmra-apphsm-85fb47f7f9-8rjb4 2/2 Running 0 14m -kmra kmra-ctk-5558c9954b-zc6zv 0/1 Running 7 (2m7s ago) 14m -kmra kmra-pccs-675f458576-597lq 2/2 Running 0 14m -kube-system bypass-tcpip-5v2vh 1/1 Running 0 5m45s -kube-system bypass-tcpip-nrjxj 1/1 Running 0 5m45s -kube-system calico-kube-controllers-684bcfdc59-8sczm 1/1 Running 2 (22m ago) 23m -kube-system calico-node-865wh 1/1 Running 2 (21m ago) 23m -kube-system calico-node-gb5wz 1/1 Running 1 (22m ago) 23m -kube-system container-registry-55d455b586-z5shr 2/2 Running 0 19m -kube-system coredns-8474476ff8-p67xp 1/1 Running 1 (22m ago) 22m -kube-system coredns-8474476ff8-pkvxs 1/1 Running 1 (22m ago) 22m -kube-system dns-autoscaler-5ffdc7f89d-g9225 1/1 Running 1 (22m ago) 22m -kube-system intel-qat-plugin-flncg 1/1 Running 0 10m -kube-system intel-sgx-aesmd-shhfk 1/1 Running 0 14m -kube-system intel-sgx-plugin-r2jg9 1/1 Running 0 14m -kube-system inteldeviceplugins-controller-manager-6475d97999-d5r8b 2/2 Running 0 17m -kube-system kube-apiserver-ar09-09-cyp 1/1 Running 0 18m -kube-system kube-controller-manager-ar09-09-cyp 1/1 Running 2 (22m ago) 29m -kube-system kube-multus-ds-amd64-kq44m 1/1 Running 1 (21m ago) 23m -kube-system kube-multus-ds-amd64-xllwr 1/1 Running 1 (21m ago) 23m -kube-system kube-proxy-64dqd 1/1 Running 1 (22m ago) 29m -kube-system kube-proxy-pxdzs 1/1 Running 2 (21m ago) 29m -kube-system kube-scheduler-ar09-09-cyp 1/1 Running 0 6m23s -kube-system kubernetes-dashboard-548847967d-slhpt 1/1 Running 1 (22m ago) 22m -kube-system kubernetes-metrics-scraper-6d49f96c97-fn2rc 1/1 Running 1 (22m ago) 22m -kube-system nginx-proxy-ar09-01-cyp 1/1 Running 2 (21m ago) 29m -kube-system node-feature-discovery-controller-5bb85dfdbb-pb8m4 1/1 Running 0 18m -kube-system node-feature-discovery-worker-7g5l2 1/1 Running 0 18m -kube-system tas-telemetry-aware-scheduling-688d74f657-dpr6c 1/1 Running 0 6m29s -minio-operator console-6c9557b87d-tkf4s 1/1 Running 0 5m9s -minio-operator minio-operator-7885bb8d4-2flpf 1/1 Running 0 5m9s -minio-operator minio-operator-7885bb8d4-ddccf 1/1 Running 0 5m9s -minio-operator minio-operator-7885bb8d4-hwqxz 1/1 Running 0 5m9s -minio-operator minio-operator-7885bb8d4-zwv6b 1/1 Running 0 5m9s -minio-tenant minio-tenant-ss-0-0 1/1 Running 0 4m28s -monitoring node-exporter-7bc74 2/2 Running 0 7m23s -monitoring node-exporter-wbxn2 2/2 Running 0 7m23s -monitoring prometheus-k8s-0 4/4 Running 0 7m21s -monitoring prometheus-operator-bf54b8f56-hcc72 2/2 Running 0 7m25s -monitoring telegraf-67nbn 2/2 Running 0 5m49s -sriov-network-operator sriov-device-plugin-z8bjj 1/1 Running 0 12m -sriov-network-operator sriov-network-config-daemon-rkmg2 3/3 Running 0 17m -sriov-network-operator sriov-network-operator-98db5fcbf-6nzcg 1/1 Running 0 17m -sriov-network-operator sriov-network-operator-bb8ff65d9-2td74 1/1 Running 0 40h +NAMESPACE NAME READY STATUS RESTARTS AGE +cert-manager cert-manager-758558b8bd-pcfqz 1/1 Running 0 12h +cert-manager cert-manager-cainjector-6d4984d5f5-jtlsx 1/1 Running 0 12h +cert-manager cert-manager-webhook-bf48fb88d-fpdb8 1/1 Running 0 12h +intel-ethernet-operator 3e60ab2bf2a0e94cd94228eb615d3744dc6f4cbd5ec772ef7fcd0d740a428jf 0/1 Completed 0 11h +intel-ethernet-operator clv-discovery-74pk4 1/1 Running 1 (11h ago) 11h +intel-ethernet-operator fwddp-daemon-c5b4h 1/1 Running 1 (11h ago) 11h +intel-ethernet-operator intel-ethernet-operator-controller-manager-69655db6cf-cgpfb 1/1 Running 1 (11h ago) 11h +intel-ethernet-operator intel-ethernet-operator-controller-manager-69655db6cf-nlr9h 1/1 Running 1 (11h ago) 11h +intel-ethernet-operator intel-ethernet-operators-c5brp 1/1 Running 1 (11h ago) 11h +intel-power controller-manager-5fc8f8f874-r6hs2 1/1 Running 1 (11h ago) 11h +intel-power power-node-agent-qhh2x 2/2 Running 1 (11h ago) 11h +istio-system istio-ingressgateway-545d46d996-g4tns 1/1 Running 1 (11h ago) 11h +istio-system istioctl-5bb557cb8b-zb75b 1/1 Running 1 (11h ago) 11h +istio-system istiod-59876b79cd-tqxq9 1/1 Running 1 (11h ago) 11h +kube-system bypass-tcpip-nrj8h 1/1 Running 0 11h +kube-system calico-node-dwn89 1/1 Running 0 12h +kube-system container-registry-7b8b8b4446-rpfjg 2/2 Running 0 12h +kube-system coredns-74d6c5659f-hrcss 1/1 Running 0 12h +kube-system dns-autoscaler-6656dfd4c6-skrbq 1/1 Running 0 12h +kube-system intel-qat-plugin-dv6w7 1/1 Running 0 11h +kube-system inteldeviceplugins-controller-manager-59b46b7949-swgtj 2/2 Running 1 (11h ago) 11h +kube-system kube-afxdp-device-plugin-e2e-5fpxc 1/1 Running 0 11h +kube-system kube-apiserver-as09-16-wpr 1/1 Running 1 (11h ago) 12h +kube-system kube-controller-manager-as09-16-wpr 1/1 Running 0 12h +kube-system kube-multus-ds-amd64-h5ft4 1/1 Running 1 (11h ago) 12h +kube-system kube-proxy-pgvsk 1/1 Running 1 (11h ago) 12h +kube-system kube-scheduler-as09-16-wpr 1/1 Running 0 11h +kube-system kubernetes-dashboard-648989c4b4-xcdvt 1/1 Running 1 (11h ago) 12h +kube-system kubernetes-metrics-scraper-84bbbc8b75-jpkdx 1/1 Running 0 12h +kube-system node-feature-discovery-master-65ddb4669-dz6g7 1/1 Running 0 12h +kube-system node-feature-discovery-worker-qblwf 1/1 Running 0 12h +kube-system tas-telemetry-aware-scheduling-6f965ff445-fg7j7 1/1 Running 1 (11h ago) 11h +modsec-tadk tadk-intel-tadkchart-7869548b67-r52z9 1/1 Running 0 11h +monitoring kube-state-metrics-5d7b5d5bfc-rsvlk 3/3 Running 1 (11h ago) 11h +monitoring node-exporter-zcw7q 2/2 Running 1 (11h ago) 11h +monitoring otel-telegraf-collector-579846745d-t2j64 1/1 Running 1 (11h ago) 11h +monitoring prometheus-k8s-0 4/4 Running 1 (11h ago) 11h +monitoring prometheus-operator-68d5d49646-xql4v 2/2 Running 1 (11h ago) 11h +monitoring telegraf-w9z67 2/2 Running 1 (11h ago) 11h +observability jaeger-7f8f665d7f-jnzh4 1/1 Running 0 11h +observability jaeger-agent-daemonset-xkvv2 1/1 Running 0 11h +observability jaeger-operator-5f45884b86-cl9ls 2/2 Running 1 (11h ago) 11h +olm catalog-operator-6587ff6f69-6p988 1/1 Running 1 (11h ago) 12h +olm olm-operator-6ccdf8f464-w5pv5 1/1 Running 1 (11h ago) 12h +olm operatorhubio-catalog-7g9db 1/1 Running 0 17h +olm packageserver-799bf4b7bf-b9l69 1/1 Running 1 (11h ago) 12h +olm packageserver-799bf4b7bf-n4t7q 1/1 Running 1 (11h ago) 12h +opentelemetry-operator-system opentelemetry-operator-controller-manager-5656d6df74-7p8dl 2/2 Running 1 (11h ago) 11h +sriov-network-operator sriov-device-plugin-bgtd7 1/1 Running 0 11h +sriov-network-operator sriov-network-config-daemon-wqlcg 3/3 Running 1 (11h ago) 11h +sriov-network-operator sriov-network-operator-69bbd699f8-xvqz6 1/1 Running 1 (11h ago) 11h ``` diff --git a/verification/device_plugins/dsa/README.md b/verification/device_plugins/dsa/README.md index e8879494..ba1f6b81 100644 --- a/verification/device_plugins/dsa/README.md +++ b/verification/device_plugins/dsa/README.md @@ -15,7 +15,7 @@ List the allocatable node resources for the target worker node: ``` # kubectl get node -o json | jq '.status.allocatable' { - "cpu": "77", + "cpu": "95550m", "dsa.intel.com/wq-user-dedicated": "16", "dsa.intel.com/wq-user-shared": "160", "ephemeral-storage": "282566437625", diff --git a/verification/device_plugins/gpu/README.md b/verification/device_plugins/gpu/README.md index d8aac75c..0330af8e 100644 --- a/verification/device_plugins/gpu/README.md +++ b/verification/device_plugins/gpu/README.md @@ -11,14 +11,14 @@ kubectl get node -o json |jq .metadata.labels "feature.node.kubernetes.io/cpu-cpuid.ADX": "true", …… "feature.node.kubernetes.io/intel.qat": "true", - "feature.node.kubernetes.io/kernel-version.full": "5.4.48", + "feature.node.kubernetes.io/kernel-version.full": "5.15.0-25-generic", "feature.node.kubernetes.io/kernel-version.major": "5", - "feature.node.kubernetes.io/kernel-version.minor": "4", - "feature.node.kubernetes.io/kernel-version.revision": "48", + "feature.node.kubernetes.io/kernel-version.minor": "15", + "feature.node.kubernetes.io/kernel-version.revision": "0", …… "gpu.intel.com/cards": "card0.card1.card2.card3", "kubernetes.io/arch": "amd64", - "kubernetes.io/hostname": "as09-16-wpr", + "kubernetes.io/hostname": "node1", "kubernetes.io/os": "linux", "node-role.kubernetes.io/worker": "" } diff --git a/verification/device_plugins/qat/README.md b/verification/device_plugins/qat/README.md index b8254324..4e46476c 100644 --- a/verification/device_plugins/qat/README.md +++ b/verification/device_plugins/qat/README.md @@ -1,18 +1,17 @@ # QAT Device Plugin -This example show how to use the QAT Device Plugin for assigning Virtual Functions (VFs) to workloads in Kubernetes. The QAT deivces provide access to accelerated cryptographic and compression features. +This example shows how to use the QAT Device Plugin for assigning Virtual Functions (VFs) to workloads in Kubernetes. The QAT devices provide access to accelerated cryptographic and compression features. ## Verify Node Resources Start by listing allocatable node resources for the target worker node: ``` # kubectl get node -o json | jq '.status.allocatable' { - "cpu": "93", + "cpu": "95550m", "ephemeral-storage": "452220352993", "hugepages-1Gi": "4Gi", "hugepages-2Mi": "256Mi", - "intel.com/intel_sriov_dpdk_700_series": "2", - "intel.com/intel_sriov_dpdk_800_series": "2", - "intel.com/intel_sriov_netdevice": "4", + "intel.com/node1_ens801f0_intelnics_1": "1", + "intel.com/node1_ens801f0_intelnics_2": "4", "memory": "191733164Ki", "pods": "110", "qat.intel.com/generic": "32" diff --git a/verification/device_plugins/sgx/README.md b/verification/device_plugins/sgx/README.md index 9d52a7b1..fdd0d49e 100644 --- a/verification/device_plugins/sgx/README.md +++ b/verification/device_plugins/sgx/README.md @@ -6,14 +6,14 @@ List the allocatable node resources for the target worker node: ``` # kubectl get node -o json | jq '.status.allocatable' { - "cpu": "125", + "cpu": "95550m", "ephemeral-storage": "353450007582", "hugepages-1Gi": "4Gi", "hugepages-2Mi": "256Mi", - "intel.com/ar09_01_cyp_ens801f0_intelnics_1": "1", - "intel.com/ar09_01_cyp_ens801f0_intelnics_2": "4", - "intel.com/ar09_01_cyp_ens801f0_intelnics_3": "1", - "intel.com/ar09_01_cyp_ens801f1_intelnics_1": "4", + "intel.com/node1_ens801f0_intelnics_1": "1", + "intel.com/node1_ens801f0_intelnics_2": "4", + "intel.com/node1_ens801f0_intelnics_3": "1", + "intel.com/node1_ens801f1_intelnics_1": "4", "memory": "518294736Ki", "pods": "110", "power.intel.com/balance-performance": "76", diff --git a/verification/device_plugins/sriov/README.md b/verification/device_plugins/sriov/README.md index f042a2cc..1fdc845d 100644 --- a/verification/device_plugins/sriov/README.md +++ b/verification/device_plugins/sriov/README.md @@ -44,7 +44,7 @@ Start by listing allocatable node resources for the target worker node: ``` # kubectl get node -o json | jq '.status.allocatable' { - "cpu": "93", + "cpu": "95550m", "ephemeral-storage": "452220352993", "hugepages-1Gi": "4Gi", "hugepages-2Mi": "256Mi", diff --git a/verification/networking_features/bond_cni/README.md b/verification/networking_features/bond_cni/README.md index a6b100f0..f5b8e5ee 100644 --- a/verification/networking_features/bond_cni/README.md +++ b/verification/networking_features/bond_cni/README.md @@ -7,14 +7,14 @@ The Bond CNI Plugin relies on other CNI Plugins to provide the interfaces to be Start by verifying that the Bond CNI Plugin is available on the worker nodes by connecting through SSH and checking that the Bond CNI binary is available: ``` # ll /opt/cni/bin/bond --rwxr-xr-x. 1 root root 3836352 Feb 27 13:03 /opt/cni/bin/bond +-rwxr-xr-x 1 root root 3892645 Sep 16 06:37 /opt/cni/bin/bond* ``` Verify that VFs using the kernel driver are available on the worker nodes in the Kubernetes cluster: ``` # kubectl get node -o json | jq '.status.allocatable' { - "cpu": "93", + "cpu": "95550m", "ephemeral-storage": "452220352993", "hugepages-1Gi": "4Gi", "hugepages-2Mi": "256Mi", diff --git a/verification/networking_features/userspace_cni/README.md b/verification/networking_features/userspace_cni/README.md index 667b56f6..a9668567 100644 --- a/verification/networking_features/userspace_cni/README.md +++ b/verification/networking_features/userspace_cni/README.md @@ -6,7 +6,7 @@ When Userspace CNI is enabled, an example network attachment definintion is crea ``` # kubectl get net-attach-def NAME AGE -userspace-ovs 7d +userspace-ovs 14h ``` ## Deploy Workload @@ -44,7 +44,7 @@ Deploy the pod: Start by verifying that a vhostuser socket has been added to the pod: ``` # kubectl exec pod-userspace-1 -- ls /vhu/ -35998a9a2ce2-net1 +5dee26822a53-net1 ``` If there are multiple worker nodes in the cluster, check which one the pod has been deployed on: ``` @@ -54,15 +54,15 @@ Node: node1/ Connect to that node using the IP found above, and verify that the vhostuser socket and interface has been added to OVS-DPDK: ``` # ovs-vsctl show -b11c98d8-080f-4cad-adaa-467256809265 +6836950b-fe14-42f7-823b-06ae680b88f4 Bridge br0 datapath_type: netdev Port br0 Interface br0 type: internal - Port "35998a9a2ce2-net1" - Interface "35998a9a2ce2-net1" + Port "5dee26822a53-net1" + Interface "5dee26822a53-net1" type: dpdkvhostuser - ovs_version: "2.17.1" + ovs_version: "2.17.2" ``` At this point, the vhostuser socket is ready to use in the pod. The steps for using VPP as the vSwitch are similar, but instead of the Userspace CNI resource name userspace-ovs, use userspace-vpp. diff --git a/verification/nfd/README.md b/verification/nfd/README.md index cfe56c7b..d914f3fc 100644 --- a/verification/nfd/README.md +++ b/verification/nfd/README.md @@ -9,83 +9,109 @@ kube-system node-feature-discovery-worker 1 1 1 1 1 To view the Kubernetes node labels created by NFD: ``` -# kubectl label node --list –-all +# kubectl label node --list --all Listing labels for Node./node1: - feature.node.kubernetes.io/pci-0300_1a03.present=true + feature.node.kubernetes.io/cpu-model.id=143 + feature.node.kubernetes.io/cpu-cpuid.ENQCMD=true + feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true feature.node.kubernetes.io/cpu-rdt.RDTCMT=true - feature.node.kubernetes.io/cpu-pstate.status=active + feature.node.kubernetes.io/kernel-version.revision=0 + feature.node.kubernetes.io/storage-nonrotationaldisk=true feature.node.kubernetes.io/kernel-config.NO_HZ=true - feature.node.kubernetes.io/cpu-cpuid.FMA3=true - feature.node.kubernetes.io/cpu-cpuid.AVX2=true - feature.node.kubernetes.io/kernel-config.NO_HZ_FULL=true - feature.node.kubernetes.io/cpu-cpuid.STIBP=true - feature.node.kubernetes.io/cpu-cpuid.ADX=true + feature.node.kubernetes.io/cpu-cpuid.GFNI=true + feature.node.kubernetes.io/cpu-pstate.scaling_governor=performance + feature.node.kubernetes.io/cpu-cpuid.AVX512BF16=true + feature.node.kubernetes.io/cpu-cpuid.AMXINT8=true feature.node.kubernetes.io/cpu-rdt.RDTMON=true - kubernetes.io/arch=amd64 - feature.node.kubernetes.io/system-os_release.VERSION_ID=8.5 - feature.node.kubernetes.io/memory-numa=true - feature.node.kubernetes.io/cpu-cpuid.SHA=true - feature.node.kubernetes.io/intel.sgx=true - feature.node.kubernetes.io/kernel-version.major=4 - feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true - feature.node.kubernetes.io/system-os_release.VERSION_ID.major=8 + feature.node.kubernetes.io/cpu-cpuid.AMXBF16=true + feature.node.kubernetes.io/network-sriov.capable=true + feature.node.kubernetes.io/kernel-version.minor=14 + feature.node.kubernetes.io/system-os_release.VERSION_ID=9.0 + feature.node.kubernetes.io/cpu-rdt.RDTMBM=true + feature.node.kubernetes.io/kernel-version.major=5 + intel.feature.node.kubernetes.io/dlb=true + feature.node.kubernetes.io/system-os_release.VERSION_ID.major=9 + feature.node.kubernetes.io/cpu-pstate.turbo=true + feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true sgx.configured=true - feature.node.kubernetes.io/kernel-version.revision=0 - feature.node.kubernetes.io/cpu-cpuid.AVX=true + feature.node.kubernetes.io/cpu-power.sst_bf.enabled=true + kubernetes.io/arch=amd64 + feature.node.kubernetes.io/cpu-cpuid.AMXTILE=true + intel.feature.node.kubernetes.io/qat=true + feature.node.kubernetes.io/cpu-pstate.status=active feature.node.kubernetes.io/cpu-cpuid.AESNI=true + feature.node.kubernetes.io/cpu-cpuid.MOVDIR64B=true + feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=0 + feature.node.kubernetes.io/cpu-cpuid.CMPXCHG8=true + feature.node.kubernetes.io/cpu-cpuid.SERIALIZE=true + kubernetes.io/os=linux + feature.node.kubernetes.io/cpu-cpuid.FMA3=true + feature.node.kubernetes.io/cpu-cpuid.AVX=true + intel.feature.node.kubernetes.io/dsa=true + feature.node.kubernetes.io/cpu-cpuid.MOVBE=true + feature.node.kubernetes.io/cpu-cpuid.TSXLDTRK=true + feature.node.kubernetes.io/cpu-rdt.RDTL2CA=true + feature.node.kubernetes.io/cpu-cpuid.MOVDIRI=true + kubernetes.io/hostname=node1 + feature.node.kubernetes.io/cpu-model.vendor_id=Intel + feature.node.kubernetes.io/cpu-cpuid.AVX512BITALG=true + feature.node.kubernetes.io/cpu-cpuid.VPCLMULQDQ=true feature.node.kubernetes.io/cpu-rdt.RDTL3CA=true feature.node.kubernetes.io/cpu-cpuid.VAES=true - feature.node.kubernetes.io/cpu-cpuid.AVX512VNNI=true - feature.node.kubernetes.io/cpu-sgx.enabled=true - feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true - feature.node.kubernetes.io/cpu-cpuid.VPCLMULQDQ=true - feature.node.kubernetes.io/network-sriov.capable=true - feature.node.kubernetes.io/pci-1200_8086.sriov.capable=true - feature.node.kubernetes.io/pci-1200_8086.present=true - feature.node.kubernetes.io/cpu-cpuid.IBPB=true - feature.node.kubernetes.io/cpu-cpuid.AVX512VPOPCNTDQ=true + feature.node.kubernetes.io/cpu-cpuid.OSXSAVE=true feature.node.kubernetes.io/network-sriov.configured=true - storage=minio - feature.node.kubernetes.io/cpu-cpuid.GFNI=true - beta.kubernetes.io/arch=amd64 + feature.node.kubernetes.io/cpu-cpuid.AVX512VPOPCNTDQ=true feature.node.kubernetes.io/cpu-cpuid.WBNOINVD=true - feature.node.kubernetes.io/system-os_release.ID=rhel - feature.node.kubernetes.io/cpu-cpuid.AVX512BITALG=true - feature.node.kubernetes.io/storage-nonrotationaldisk=true - feature.node.kubernetes.io/cpu-pstate.scaling_governor=performance - app=kmra - feature.node.kubernetes.io/cpu-pstate.turbo=true - feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true - feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true - feature.node.kubernetes.io/kernel-version.full=4.18.0-348.el8.x86_64 - feature.node.kubernetes.io/cpu-cpuid.AVX512VBMI2=true - kubernetes.io/hostname=node1 feature.node.kubernetes.io/cpu-cpuid.AVX512VBMI=true - feature.node.kubernetes.io/cpu-rdt.RDTMBA=true - qat.configured=true + feature.node.kubernetes.io/cpu-model.family=6 + feature.node.kubernetes.io/cpu-cpuid.AVX512FP16=true + feature.node.kubernetes.io/pci-0300_1a03.present=true + feature.node.kubernetes.io/cpu-cpuid.SHA=true + feature.node.kubernetes.io/cpu-cpuid.AVX512VBMI2=true + feature.node.kubernetes.io/cpu-cpuid.WAITPKG=true + feature.node.kubernetes.io/cpu-cpuid.ADX=true feature.node.kubernetes.io/cpu-cpuid.VMX=true - feature.node.kubernetes.io/pci-0b40_8086.sriov.capable=true - feature.node.kubernetes.io/intel.qat=true + feature.node.kubernetes.io/system-os_release.ID=rocky + qat.configured=true + feature.node.kubernetes.io/memory-numa=true + feature.node.kubernetes.io/cpu-cpuid.STIBP=true + intel.feature.node.kubernetes.io/sgx=true + feature.node.kubernetes.io/cpu-cpuid.AVX512VP2INTERSECT=true + feature.node.kubernetes.io/cpu-cpuid.LAHF=true + feature.node.kubernetes.io/cpu-cpuid.FXSROPT=true + feature.node.kubernetes.io/kernel-version.full=5.14.0-70.13.1.el9_0.x86_64 + feature.node.kubernetes.io/cpu-sgx.enabled=true + feature.node.kubernetes.io/cpu-hardware_multithreading=true + feature.node.kubernetes.io/cpu-cpuid.AVX512IFMA=true feature.node.kubernetes.io/pci-0b40_8086.present=true + cndp=true + feature.node.kubernetes.io/cpu-cpuid.FXSR=true + feature.node.kubernetes.io/cpu-cpuid.X87=true + beta.kubernetes.io/os=linux + feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true feature.node.kubernetes.io/cpu-cstate.enabled=true + feature.node.kubernetes.io/cpu-rdt.RDTMBA=true + feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true + feature.node.kubernetes.io/pci-0b40_8086.sriov.capable=true + beta.kubernetes.io/arch=amd64 + feature.node.kubernetes.io/cpu-cpuid.CLDEMOTE=true + feature.node.kubernetes.io/cpu-cpuid.IBPB=true + feature.node.kubernetes.io/cpu-cpuid.AVX2=true + feature.node.kubernetes.io/cpu-cpuid.CETIBT=true + feature.node.kubernetes.io/cpu-cpuid.CETSS=true feature.node.kubernetes.io/cpu-cpuid.AVX512F=true - feature.node.kubernetes.io/cpu-hardware_multithreading=true - kubernetes.io/os=linux + feature.node.kubernetes.io/cpu-cpuid.AVX512VNNI=true + feature.node.kubernetes.io/cpu-cpuid.XSAVE=true + feature.node.kubernetes.io/cpu-cpuid.SCE=true + feature.node.kubernetes.io/kernel-config.NO_HZ_FULL=true +Listing labels for Node./controller1: + node.kubernetes.io/exclude-from-external-load-balancers= + beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux - intel.power.node=true - feature.node.kubernetes.io/kernel-version.minor=18 - feature.node.kubernetes.io/cpu-cpuid.AVX512IFMA=true - feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=5 - feature.node.kubernetes.io/cpu-rdt.RDTMBM=true -Listing labels for Node./controller: kubernetes.io/arch=amd64 - kubernetes.io/hostname=controller + kubernetes.io/hostname=controller1 kubernetes.io/os=linux node-role.kubernetes.io/control-plane= - node-role.kubernetes.io/master= - node.kubernetes.io/exclude-from-external-load-balancers= - beta.kubernetes.io/arch=amd64 - beta.kubernetes.io/os=linux ``` The node labels can be used when provisioning a pod. The provided pod manifest [pod-nfd.yml](pod-nfd.yml) can be used to test this. The pod is scheduled only if there is a node with QAT available. The content of the file is: ``` diff --git a/verification/power_manager/README.md b/verification/power_manager/README.md index facb03bc..e89eb214 100644 --- a/verification/power_manager/README.md +++ b/verification/power_manager/README.md @@ -11,6 +11,18 @@ power-node-agent-9dkch 2/2 Running 0 34m ``` **Note:** each profile was deployed in a separate pod +Check the power profiles: +``` +# kubectl get powerprofiles -n intel-power +NAME AGE +balance-performance 30m +balance-performance-node1 30m +balance-power 30m +balance-power-node1 30m +performance 30m +performance-node1 30m +``` + You can check the frequencies that will be set by balance-performance Power Profile ``` # kubectl get PowerProfiles -n intel-power balance-performance-node1 -o yaml diff --git a/verification/qat_openssl/README.md b/verification/qat_openssl/README.md index 02faea19..86423be0 100644 --- a/verification/qat_openssl/README.md +++ b/verification/qat_openssl/README.md @@ -2,20 +2,18 @@ Check the version of Intel QAT crypto engine present on the system. ``` # openssl engine -v qatengine -(qatengine) Reference implementation of QAT crypto engine(qat_sw) v0.6.10 +(qatengine) Reference implementation of QAT crypto engine(qat_sw) v0.6.15 ENABLE_EXTERNAL_POLLING, POLL, ENABLE_HEURISTIC_POLLING, - GET_NUM_REQUESTS_IN_FLIGHT, INIT_ENGINE + GET_NUM_REQUESTS_IN_FLIGHT, INIT_ENGINE, SW_ALGO_BITMAP ``` Run an OpenSSL speed test. ``` # openssl speed rsa2048 -Doing 2048 bits private rsa's for 10s: -10364 2048 bits private RSA's in 9.97s -Doing 2048 bits public rsa's for 10s: -354887 2048 bits public RSA's in 9.98s -OpenSSL 1.1.1k FIPS 25 Mar 2021 -built on: Fri Nov 12 17:53:47 2021 UTC +Doing 2048 bits private rsa's for 10s: 10364 2048 bits private RSA's in 9.97s +Doing 2048 bits public rsa's for 10s: 354887 2048 bits public RSA's in 9.98s +version: 3.0.2 +built on: Mon Jul 4 11:20:23 2022 UTC options:bn(64,64) md2(char) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -Wa,--noexecstack -Wa,--generate-missing-build-notes=yes -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DZLIB -DNDEBUG -DPURIFY -DDEVRANDOM="\"/dev/urandom\"" -DSYSTEM_CIPHERS_FILE="/etc/crypto-policies/back-ends/openssl.config" sign verify sign/s verify/s @@ -26,12 +24,10 @@ Repeat the OpenSSL speed test with QAT engine for increased performance. ``` # openssl speed -engine qatengine -async_jobs 8 rsa2048 engine "qatengine" set. -Doing 2048 bits private rsa's for 10s: -42728 2048 bits private RSA's in 9.93s -Doing 2048 bits public rsa's for 10s: -797952 2048 bits public RSA's in 9.14s -OpenSSL 1.1.1k FIPS 25 Mar 2021 -built on: Fri Nov 12 17:53:47 2021 UTC +Doing 2048 bits private rsa's for 10s: 42728 2048 bits private RSA's in 9.93s +Doing 2048 bits public rsa's for 10s: 797952 2048 bits public RSA's in 9.14s +version: 3.0.2 +built on: Mon Jul 4 11:20:23 2022 UTC options:bn(64,64) md2(char) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -Wa,--noexecstack -Wa,--generate-missing-build-notes=yes -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DZLIB -DNDEBUG -DPURIFY -DDEVRANDOM="\"/dev/urandom\"" -DSYSTEM_CIPHERS_FILE="/etc/crypto-policies/back-ends/openssl.config" sign verify sign/s verify/s diff --git a/verification/sriov_network_operator/README.md b/verification/sriov_network_operator/README.md index 510ca7d5..8d6ef399 100644 --- a/verification/sriov_network_operator/README.md +++ b/verification/sriov_network_operator/README.md @@ -6,7 +6,7 @@ Start by listing allocatable node resources for the target worker node: ``` # kubectl get node -o json | jq '.status.allocatable' { - "cpu": "125", + "cpu": "95550m", "ephemeral-storage": "189274027310", "hugepages-1Gi": "4Gi", "hugepages-2Mi": "256Mi", @@ -25,7 +25,7 @@ To get more details about each of the SR-IOV Network Operator resources, we can ``` # kubectl get SriovNetworkNodeState -n sriov-network-operator -o json | jq '.spec' { - "dpConfigVersion": "3347", + "dpConfigVersion": "5941", "interfaces": [ { "linkType": "eth", diff --git a/verification/tas/README.md b/verification/tas/README.md index d18ae180..89bcdbd0 100644 --- a/verification/tas/README.md +++ b/verification/tas/README.md @@ -1,5 +1,5 @@ # Check Telemetry Aware Scheduler -A Health Metric Demo Policy [https://github.com/intel/platform-aware-scheduling/blob/master/telemetry-aware-scheduling/docs/health-metric-example.md] is deployed for Telemetry Aware Scheduler (TAS) when `tas_enable_demo_policy: true`, in `group_vars/all.yml` as shown below: +A Health Metric Demo Policy (https://github.com/intel/platform-aware-scheduling/blob/master/telemetry-aware-scheduling/docs/health-metric-example.md) is deployed for Telemetry Aware Scheduler (TAS) when `tas_enable_demo_policy: true`, in `group_vars/all.yml` as shown below: ``` # Intel Telemetry Aware Scheduling tas_enabled: true @@ -119,7 +119,7 @@ The pod should fail to schedule and end up in state “Pending” as shown below ``` # kubectl get pods -n kube-system | grep tas-test NAME READY STATUS RESTARTS AGE - tas-test-xxxx-yyyy 1/1 Pending 0 3m + tas-test-xxxx-yyyy 0/1 Pending 0 3m ``` As `node_health_metric` is set to 1 (dontschedule), this is expected. Delete the pod before continuing with the next test: @@ -128,7 +128,7 @@ Delete the pod before continuing with the next test: ``` ## Check Deschedule Policy -To see the impact of the descheduling policy, use a component called descheduler. For more details, visit [https://github.com/intel/platform-aware-scheduling/blob/master/telemetry-aware-scheduling/docs/health-metric-example.md#seeing-the-impact]. +To see the impact of the descheduling policy, use a component called descheduler. For more details, visit (https://github.com/intel/platform-aware-scheduling/blob/master/telemetry-aware-scheduling/docs/health-metric-example.md#seeing-the-impact). Start by setting `node_health_metric` to 0 (scheduleonmetric) on all worker nodes as follows: ``` diff --git a/verification/topology_manager/README.md b/verification/topology_manager/README.md index ac9f7f6a..c5c0e7de 100644 --- a/verification/topology_manager/README.md +++ b/verification/topology_manager/README.md @@ -11,17 +11,11 @@ After the pod is in a terminated state, the Kubernetes scheduler will not attemp To verify that Topology Manager is running, use the following command: ``` # journalctl | grep topology_manager -Feb 03 09:30:38 ar09-09-cyp kubelet[79709]: I0203 09:30:38.418222 79709 topology_manager.go:133] "Creating topology manager with policy per scope" topologyPolicyName="best-effort" topologyScopeName="container" -Feb 03 09:30:38 ar09-09-cyp kubelet[79709]: I0203 09:30:38.620193 79709 topology_manager.go:200] "Topology Admit Handler" -Feb 03 09:30:38 ar09-09-cyp kubelet[79709]: I0203 09:30:38.636035 79709 topology_manager.go:200] "Topology Admit Handler" -Feb 03 09:30:38 ar09-09-cyp kubelet[79709]: I0203 09:30:38.652310 79709 topology_manager.go:200] "Topology Admit Handler" -Feb 03 09:30:44 ar09-09-cyp kubelet[80859]: I0203 09:30:44.589837 80859 topology_manager.go:133] "Creating topology manager with policy per scope" topologyPolicyName="best-effort" topologyScopeName="container" -Feb 03 09:30:44 ar09-09-cyp kubelet[80859]: I0203 09:30:44.761846 80859 topology_manager.go:200] "Topology Admit Handler" -Feb 03 09:30:44 ar09-09-cyp kubelet[80859]: I0203 09:30:44.761979 80859 topology_manager.go:200] "Topology Admit Handler" -Feb 03 09:30:44 ar09-09-cyp kubelet[80859]: I0203 09:30:44.762071 80859 topology_manager.go:200] "Topology Admit Handler" -Feb 03 09:30:57 ar09-09-cyp kubelet[80859]: I0203 09:30:57.175938 80859 topology_manager.go:200] "Topology Admit Handler" -Feb 03 09:31:04 ar09-09-cyp kubelet[85026]: I0203 09:31:04.125414 85026 topology_manager.go:133] "Creating topology manager with policy per scope" topologyPolicyName="best-effort" topologyScopeName="container" -Feb 03 09:31:04 ar09-09-cyp kubelet[85026]: I0203 09:31:04.300730 85026 topology_manager.go:200] "Topology Admit Handler" +Sep 19 05:58:35 node1 kubelet[137442]: I0919 05:58:35.380933 137442 topology_manager.go:200] "Topology Admit Handler" +Sep 19 06:58:35 node1 kubelet[137442]: I0919 06:58:35.161388 137442 topology_manager.go:200] "Topology Admit Handler" +Sep 19 07:58:35 node1 kubelet[137442]: I0919 07:58:35.059673 137442 topology_manager.go:200] "Topology Admit Handler" +Sep 19 08:58:42 node1 kubelet[137442]: I0919 08:58:42.098158 137442 topology_manager.go:200] "Topology Admit Handler" +Sep 19 09:58:42 node1 kubelet[137442]: I0919 09:58:42.903203 137442 topology_manager.go:200] "Topology Admit Handler" ``` ## Change Topology Manager Policy: Redeploy Kubernetes Playbook