Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CML 2.8 deployments on AWS and Azure #29

Open
wants to merge 9 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ aws:
# When specifying a VPC ID below then this prefix must exist on that VPC!
public_vpc_ipv4_cidr: 10.0.0.0/16
enable_ebs_encryption: false
allowed_ipv4_subnets: ["0.0.0.0/0"]
#
# Leave empty to create a custom VPC / Internet gateway, or provide the IDs
# of the VPC / gateway to use, they must exist and properly associated.
Expand All @@ -45,12 +46,12 @@ azure:
size_compute: unused_at_the_moment
storage_account: storage-account-name
container_name: container-name
allowed_ipv4_subnets: ["*"]

common:
disk_size: 64
controller_hostname: cml-controller
key_name: ssh-key-name
allowed_ipv4_subnets: ["0.0.0.0/0"]
enable_patty: true

cluster:
Expand Down
8 changes: 5 additions & 3 deletions documentation/AWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -367,7 +367,7 @@ AWS CLI configurations are stored in `$HOME/.aws`.

If everything was configured correct then you should be able to list instances (remember that we permitted EC2 access for the deployment users):

```
```bash
$ aws ec2 describe-instances
{
"Reservations": []
Expand All @@ -379,7 +379,7 @@ As there are no instances running in this case, the output is empty. The importa

### Configuration file

CML specific settings are specified in the configuration file `config.yml`. See also [VPC support](#vpc-support) and [Cluster support](#cluster-suport) sections further down in the document.
CML specific settings are specified in the configuration file `config.yml`. See also [VPC support](#vpc-support) and [Cluster support](#cluster-support) sections further down in the document.

#### AWS section

Expand Down Expand Up @@ -515,7 +515,7 @@ Start the tool by providing the bucket name as an argument and the location of t

The tool will then display a simple dialog where the images which should be copied to the bucket can be selected:

![](../images/upload-refplat.png)
![Dialog preview](../images/upload-refplat.png)

After selecting OK the upload process will be started immediately. To abort the process, Ctrl-C can be used.

Expand All @@ -539,6 +539,8 @@ export TF_VAR_secret_key="your-secret-key-string-from-iam"

Alternatively, it's also possible to provide values for variables via a file called `terraform.tfvars` file. There are various ways how to define / set variables with Terraform. See the Terraform [documentation](https://developer.hashicorp.com/terraform/language/values/variables#assigning-values-to-root-module-variables) for additional details.

In addition to the above methods, Terraform can also automatically retrieve authentication credentials from the AWS configuration files located in the .aws folder. This includes credentials set up by running `aws configure`, which stores your access key and secret key in the `~/.aws/credentials` file. This method allows Terraform to use the same credentials configured for the AWS CLI, [documentation](https://registry.terraform.io/providers/hashicorp/aws/latest/docs).

## Lifecycle management

When all requirements are met, an instance can be deployed using Terraform.
Expand Down
2 changes: 2 additions & 0 deletions main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ module "deploy" {
source = "./modules/deploy"
cfg = local.cfg
extras = local.extras
azure_subscription_id = var.subscription_id
azure_tenant_id = var.tenant_id
}

provider "cml2" {
Expand Down
17 changes: 9 additions & 8 deletions modules/deploy/aws/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ locals {
"from_port" : 1122,
"to_port" : 1122
"protocol" : "tcp",
"cidr_blocks" : var.options.cfg.common.allowed_ipv4_subnets,
"cidr_blocks" : var.options.cfg.aws.allowed_ipv4_subnets,
"ipv6_cidr_blocks" : [],
"prefix_list_ids" : [],
"security_groups" : [],
Expand All @@ -98,7 +98,7 @@ locals {
"from_port" : 22,
"to_port" : 22
"protocol" : "tcp",
"cidr_blocks" : var.options.cfg.common.allowed_ipv4_subnets,
"cidr_blocks" : var.options.cfg.aws.allowed_ipv4_subnets,
"ipv6_cidr_blocks" : [],
"prefix_list_ids" : [],
"security_groups" : [],
Expand All @@ -109,7 +109,7 @@ locals {
"from_port" : 9090,
"to_port" : 9090
"protocol" : "tcp",
"cidr_blocks" : var.options.cfg.common.allowed_ipv4_subnets,
"cidr_blocks" : var.options.cfg.aws.allowed_ipv4_subnets,
"ipv6_cidr_blocks" : [],
"prefix_list_ids" : [],
"security_groups" : [],
Expand All @@ -120,7 +120,7 @@ locals {
"from_port" : 80,
"to_port" : 80
"protocol" : "tcp",
"cidr_blocks" : var.options.cfg.common.allowed_ipv4_subnets,
"cidr_blocks" : var.options.cfg.aws.allowed_ipv4_subnets,
"ipv6_cidr_blocks" : [],
"prefix_list_ids" : [],
"security_groups" : [],
Expand All @@ -131,7 +131,7 @@ locals {
"from_port" : 443,
"to_port" : 443
"protocol" : "tcp",
"cidr_blocks" : var.options.cfg.common.allowed_ipv4_subnets,
"cidr_blocks" : var.options.cfg.aws.allowed_ipv4_subnets,
"ipv6_cidr_blocks" : [],
"prefix_list_ids" : [],
"security_groups" : [],
Expand All @@ -145,7 +145,7 @@ locals {
"from_port" : 2000,
"to_port" : 7999
"protocol" : "tcp",
"cidr_blocks" : var.options.cfg.common.allowed_ipv4_subnets,
"cidr_blocks" : var.options.cfg.aws.allowed_ipv4_subnets,
"ipv6_cidr_blocks" : [],
"prefix_list_ids" : [],
"security_groups" : [],
Expand All @@ -156,7 +156,7 @@ locals {
"from_port" : 2000,
"to_port" : 7999
"protocol" : "udp",
"cidr_blocks" : var.options.cfg.common.allowed_ipv4_subnets,
"cidr_blocks" : var.options.cfg.aws.allowed_ipv4_subnets,
"ipv6_cidr_blocks" : [],
"prefix_list_ids" : [],
"security_groups" : [],
Expand Down Expand Up @@ -279,6 +279,7 @@ resource "aws_network_interface" "pub_int_cml" {
resource "aws_eip" "server_eip" {
network_interface = aws_network_interface.pub_int_cml.id
tags = { "Name" = "CML-controller-eip-${var.options.rand_id}", "device" = "server" }
depends_on = [aws_instance.cml_controller]
}

#------------- compute subnet, NAT GW, routing and interfaces -----------------
Expand Down Expand Up @@ -498,7 +499,7 @@ data "aws_ami" "ubuntu" {

filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
values = ["ubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-*"]
}

filter {
Expand Down
2 changes: 1 addition & 1 deletion modules/deploy/aws/output.tf
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
#

output "public_ip" {
value = aws_instance.cml_controller.public_ip
value = aws_eip.server_eip.public_ip
}

output "sas_token" {
Expand Down
10 changes: 5 additions & 5 deletions modules/deploy/azure/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ resource "azurerm_network_security_rule" "cml_std" {
protocol = "Tcp"
source_port_range = "*"
destination_port_ranges = [22, 80, 443, 1122, 9090]
source_address_prefix = "*"
source_address_prefixes = var.options.cfg.azure.allowed_ipv4_subnets
destination_address_prefix = "*"
resource_group_name = data.azurerm_resource_group.cml.name
network_security_group_name = azurerm_network_security_group.cml.name
Expand All @@ -120,7 +120,7 @@ resource "azurerm_network_security_rule" "cml_patty_tcp" {
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "2000-7999"
source_address_prefix = "*"
source_address_prefixes = var.options.cfg.azure.allowed_ipv4_subnets
destination_address_prefix = "*"
resource_group_name = data.azurerm_resource_group.cml.name
network_security_group_name = azurerm_network_security_group.cml.name
Expand All @@ -135,7 +135,7 @@ resource "azurerm_network_security_rule" "cml_patty_udp" {
protocol = "Udp"
source_port_range = "*"
destination_port_range = "2000-7999"
source_address_prefix = "*"
source_address_prefixes = var.options.cfg.azure.allowed_ipv4_subnets
destination_address_prefix = "*"
resource_group_name = data.azurerm_resource_group.cml.name
network_security_group_name = azurerm_network_security_group.cml.name
Expand Down Expand Up @@ -235,8 +235,8 @@ resource "azurerm_linux_virtual_machine" "cml" {
# https://canonical-azure.readthedocs-hosted.com/en/latest/azure-explanation/daily-vs-release-images/
source_image_reference {
publisher = "Canonical"
offer = "0001-com-ubuntu-server-focal"
sku = "20_04-lts"
offer = "ubuntu-24_04-lts"
sku = "server"
version = "latest"
}

Expand Down
2 changes: 1 addition & 1 deletion modules/deploy/azure/output.tf
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
#

output "public_ip" {
value = azurerm_linux_virtual_machine.cml.public_ip_address
value = azurerm_public_ip.cml.ip_address
}

output "sas_token" {
Expand Down
37 changes: 30 additions & 7 deletions modules/deploy/data/cml.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,12 @@ source /provision/vars.sh

function setup_pre_aws() {
export AWS_DEFAULT_REGION=${CFG_AWS_REGION}
apt-get install -y awscli
apt-get install -y unzip
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip -q awscliv2.zip
./aws/install
rm -f awscliv2.zip
rm -rf aws/
}

function setup_pre_azure() {
Expand All @@ -25,6 +30,23 @@ function setup_pre_azure() {
chmod a+x /usr/local/bin/azcopy
}

function wait_for_network_manager() {
counter=0
max_wait=60

while ! systemctl is-active --quiet NetworkManager && [ $counter -lt $max_wait ]; do
echo "Waiting for NetworkManager to become active..."
sleep 5
counter=$((counter + 5))
done

if systemctl is-active --quiet NetworkManager; then
echo "NetworkManager is active."
else
echo "NetworkManager did not become active after $max_wait seconds."
fi
}

function base_setup() {

# Check if this device is a controller
Expand Down Expand Up @@ -66,8 +88,9 @@ function base_setup() {
apt-get install -y /tmp/*.deb
# Fixing NetworkManager in netplan, and interface association in virl2-base-config.yml
/provision/interface_fix.py
systemctl restart network-manager
systemctl restart NetworkManager
netplan apply
wait_for_network_manager
# Fix for the headless setup (tty remove as the cloud VM has none)
sed -i '/^Standard/ s/^/#/' /lib/systemd/system/virl2-initial-setup.service
touch /etc/.virl2_unconfigured
Expand Down Expand Up @@ -98,9 +121,8 @@ function base_setup() {
exit 1
fi

# for good measure, apply the network config again
netplan apply
systemctl enable --now ssh.service
wait_for_network_manager

# clean up software .pkg / .deb packages
rm -f /provision/*.pkg /provision/*.deb /tmp/*.deb
Expand All @@ -110,9 +132,10 @@ function base_setup() {
/usr/local/bin/virl2-bridge-setup.py --delete
sed -i /usr/local/bin/virl2-bridge-setup.py -e '2iexit()'
# remove the CML specific netplan config
rm /etc/netplan/00-cml2-base.yaml
find /etc/netplan/ -maxdepth 1 -type f -name '*.yaml' ! -name '50-cloud-init.yaml' -exec rm -f {} +
# apply to ensure gateway selection below works
netplan apply
wait_for_network_manager

# no PaTTY on computes
if ! is_controller; then
Expand Down Expand Up @@ -141,7 +164,7 @@ function cml_configure() {
# Directory doesn't exist - Move the entire .ssh directory
mv /home/$clouduser/.ssh/ /home/${CFG_SYS_USER}/
fi
chown -R ${CFG_SYS_USER}.${CFG_SYS_USER} /home/${CFG_SYS_USER}/.ssh
chown -R ${CFG_SYS_USER}:${CFG_SYS_USER} /home/${CFG_SYS_USER}/.ssh

# disable access for the user but keep it as cloud-init requires it to be
# present, otherwise one of the final modules will fail.
Expand All @@ -152,7 +175,7 @@ function cml_configure() {
chmod g+r /provision/vars.sh

# Change the ownership of the del.sh script to the sysadmin user
chown ${CFG_SYS_USER}.${CFG_SYS_USER} /provision/del.sh
chown ${CFG_SYS_USER}:${CFG_SYS_USER} /provision/del.sh

# Check if this device is a controller
if ! is_controller; then
Expand Down
17 changes: 10 additions & 7 deletions modules/deploy/data/interface_fix.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,21 +30,24 @@ def get_interface_names(netplan_file):
return [interface[0] for interface in interfaces] # Return just the interface names


def update_netplan_config(netplan_file, renderer="NetworkManager"):
def update_netplan_config(netplan_file, primary_interface, renderer="NetworkManager"):
"""Updates the Netplan config file with the specified renderer.

Args:
netplan_file (str): Path to the Netplan configuration file.
primary_interface (str): The primary network interface to update.
renderer (str, optional): The renderer to use. Defaults to 'NetworkManager'.
"""
with open(netplan_file, "r") as f:
netplan_data = yaml.safe_load(f)

if "network" not in netplan_data:
netplan_data["network"] = {}

netplan_data.setdefault("network", {})
netplan_data["network"]["renderer"] = renderer

ethernets = netplan_data["network"].get("ethernets", {})
if primary_interface in ethernets:
ethernets[primary_interface]["renderer"] = renderer

with open(netplan_file, "w") as f:
yaml.safe_dump(netplan_data, f)

Expand Down Expand Up @@ -76,13 +79,13 @@ def main():

# Get interface names
interface_names = get_interface_names(netplan_file)
primary_interface = interface_names[0]
cluster_interface = interface_names[1] if len(interface_names) > 1 else None

# Update Netplan config
update_netplan_config(netplan_file)
update_netplan_config(netplan_file, primary_interface)

# Update VIRL2 config
primary_interface = interface_names[0]
cluster_interface = interface_names[1] if len(interface_names) > 1 else None
update_virl2_config(virl2_config_file, primary_interface, cluster_interface)


Expand Down