From 604eeb5716cf713460c42e49c174625dfee5f51a Mon Sep 17 00:00:00 2001 From: Florian Paul Azim Hoberg Date: Tue, 20 Aug 2024 19:59:22 +0200 Subject: [PATCH] feature: Add storage balancing function. [#51]. feature: Add feature to allow the API hosts being provided as a comma separated list. [#60] Fixes: #51 Fixes: #60 --- .../1.0.3/51_add_storage_balancing.yml | 2 + .changelogs/1.0.3/53_code_improvements.yml | 8 +- ..._hosts_to_be_given_as_an_optional_list.yml | 2 + .../8_add_best_next_node_for_placement.yml | 2 +- README.md | 17 +- proxlb | 538 +++++++++++++++--- 6 files changed, 492 insertions(+), 77 deletions(-) create mode 100644 .changelogs/1.0.3/51_add_storage_balancing.yml create mode 100644 .changelogs/1.0.3/60_allow_api_hosts_to_be_given_as_an_optional_list.yml diff --git a/.changelogs/1.0.3/51_add_storage_balancing.yml b/.changelogs/1.0.3/51_add_storage_balancing.yml new file mode 100644 index 0000000..1c67b75 --- /dev/null +++ b/.changelogs/1.0.3/51_add_storage_balancing.yml @@ -0,0 +1,2 @@ +added: + - Add storage balancing function. [#51] diff --git a/.changelogs/1.0.3/53_code_improvements.yml b/.changelogs/1.0.3/53_code_improvements.yml index 7956fb3..92f5b1d 100644 --- a/.changelogs/1.0.3/53_code_improvements.yml +++ b/.changelogs/1.0.3/53_code_improvements.yml @@ -1,6 +1,6 @@ added: - - Added convert function to cast all bool alike options from configparser to bools. [#53] - - Added config parser options for future features. [#53] - - Added a config versio schema that must be supported by ProxLB. [#53] + - Add a convert function to cast all bool alike options from configparser to bools. [#53] + - Add a config parser options for future features. [#53] + - Add a config versio schema that must be supported by ProxLB. [#53] changed: - - Improved the underlying code base for future implementations. [#53] + - Improve the underlying code base for future implementations. [#53] diff --git a/.changelogs/1.0.3/60_allow_api_hosts_to_be_given_as_an_optional_list.yml b/.changelogs/1.0.3/60_allow_api_hosts_to_be_given_as_an_optional_list.yml new file mode 100644 index 0000000..3f4c625 --- /dev/null +++ b/.changelogs/1.0.3/60_allow_api_hosts_to_be_given_as_an_optional_list.yml @@ -0,0 +1,2 @@ +added: + - Add feature to allow the API hosts being provided as a comma separated list. [#60] diff --git a/.changelogs/1.0.3/8_add_best_next_node_for_placement.yml b/.changelogs/1.0.3/8_add_best_next_node_for_placement.yml index 6e77163..d5ac747 100644 --- a/.changelogs/1.0.3/8_add_best_next_node_for_placement.yml +++ b/.changelogs/1.0.3/8_add_best_next_node_for_placement.yml @@ -1,2 +1,2 @@ added: - - Added cli arg `-b` to return the next best node for next VM/CT placement. [#8] + - Add cli arg `-b` to return the next best node for next VM/CT placement. [#8] diff --git a/README.md b/README.md index e73faad..22ff723 100644 --- a/README.md +++ b/README.md @@ -55,10 +55,14 @@ Automated rebalancing reduces the need for manual actions, allowing operators to ## Features -* Rebalance the cluster by: +* Rebalance VMs/CTs in the cluster by: * Memory * Disk (only local storage) * CPU +* Rebalance Storage in the cluster + * Rebalance VMs/CTs disks to other storage pools + * Rebalance by used storage +* Get best Node for new VM/CT placement in cluster * Performing * Periodically * One-shot solution @@ -66,6 +70,7 @@ Automated rebalancing reduces the need for manual actions, allowing operators to * Rebalance only VMs * Rebalance only CTs * Rebalance all (VMs and CTs) + * Rebalance VM/CT disks (Storage) * Filter * Exclude nodes * Exclude virtual machines @@ -100,7 +105,7 @@ The following options can be set in the `proxlb.conf` file: | Section | Option | Example | Description | |------|:------:|:------:|:------:| -| `proxmox` | api_host | hypervisor01.gyptazy.ch | Host or IP address of the remote Proxmox API. | +| `proxmox` | api_host | hypervisor01.gyptazy.ch | Host or IP address (or comma separated list) of the remote Proxmox API. | | | api_user | root@pam | Username for the API. | | | api_pass | FooBar | Password for the API. | | | verify_ssl | 1 | Validate SSL certificates (1) or ignore (0). (default: 1) | @@ -115,9 +120,11 @@ The following options can be set in the `proxlb.conf` file: | | ignore_vms | testvm01,testvm02 | Defines a comma separated list of VMs to exclude. (`*` as suffix wildcard or tags are also supported) | | | master_only | 0 | Defines is this should only be performed (1) on the cluster master node or not (0). (default: 0) | | `storage_balancing` | enable | 0 | Enables storage balancing. | +| | balanciness | 10 | Value of the percentage of lowest and highest storage consumption may differ before rebalancing. (default: 10) | +| | parallel_migrations | 1 | Defines if migrations should be done parallely or sequentially. (default: 1) | | `update_service` | enable | 0 | Enables the automated update service (rolling updates). | | `api` | enable | 0 | Enables the ProxLB API. | -| | daemon | 1 | Run as a daemon (1) or one-shot (0). (default: 1) | +| `service`| daemon | 1 | Run as a daemon (1) or one-shot (0). (default: 1) | | | schedule | 24 | Hours to rebalance in hours. (default: 24) | | | log_verbosity | INFO | Defines the log level (default: CRITICAL) where you can use `INFO`, `WARN` or `CRITICAL` | | | config_version | 3 | Defines the current config version schema for ProxLB | @@ -323,12 +330,12 @@ Bugs can be reported via the GitHub issue tracker [here](https://github.com/gypt Feel free to add further documentation, to adjust already existing one or to contribute with code. Please take care about the style guide and naming conventions. You can find more in our [CONTRIBUTING.md](https://github.com/gyptazy/ProxLB/blob/main/CONTRIBUTING.md) file. ### Support -If you need assistance or have any questions, we offer support through our dedicated [chat room](https://matrix.to/#/#proxlb:gyptazy.ch) in Matrix and on Reddit. Join our community for real-time help, advice, and discussions. Connect with us in our dedicated chat room for immediate support and live interaction with other users and developers. You can also visit our [Reddit community](https://www.reddit.com/r/Proxmox/comments/1e78ap3/introducing_proxlb_rebalance_your_vm_workloads/) to post your queries, share your experiences, and get support from fellow community members and moderators. You may also just open directly an issue [here](https://github.com/gyptazy/ProxLB/issues) on GitHub. We are here to help and ensure you have the best experience possible. +If you need assistance or have any questions, we offer support through our dedicated [chat room](https://matrix.to/#/#proxlb:gyptazy.ch) in Matrix and on Reddit. Join our community for real-time help, advice, and discussions. Connect with us in our dedicated chat room for immediate support and live interaction with other users and developers. You can also visit our [GitHub Community](https://github.com/gyptazy/ProxLB/discussions/) to post your queries, share your experiences, and get support from fellow community members and moderators. You may also just open directly an issue [here](https://github.com/gyptazy/ProxLB/issues) on GitHub. We are here to help and ensure you have the best experience possible. | Support Channel | Link | |------|:------:| | Matrix | [#proxlb:gyptazy.ch](https://matrix.to/#/#proxlb:gyptazy.ch) | -| Reddit | [Reddit community](https://www.reddit.com/r/Proxmox/comments/1e78ap3/introducing_proxlb_rebalance_your_vm_workloads/) | +| GitHub Community | [GitHub Community](https://github.com/gyptazy/ProxLB/discussions/) | GitHub | [ProxLB GitHub](https://github.com/gyptazy/ProxLB/issues) | ### Author(s) diff --git a/proxlb b/proxlb index aa2ca37..870cc0f 100755 --- a/proxlb +++ b/proxlb @@ -22,6 +22,7 @@ import argparse import configparser +import copy import json import logging import os @@ -177,32 +178,35 @@ def initialize_config_options(config_path): config = configparser.ConfigParser() config.read(config_path) # Proxmox config - proxlb_config['proxmox_api_host'] = config['proxmox']['api_host'] - proxlb_config['proxmox_api_user'] = config['proxmox']['api_user'] - proxlb_config['proxmox_api_pass'] = config['proxmox']['api_pass'] - proxlb_config['proxmox_api_ssl_v'] = config['proxmox']['verify_ssl'] + proxlb_config['proxmox_api_host'] = config['proxmox']['api_host'] + proxlb_config['proxmox_api_user'] = config['proxmox']['api_user'] + proxlb_config['proxmox_api_pass'] = config['proxmox']['api_pass'] + proxlb_config['proxmox_api_ssl_v'] = config['proxmox']['verify_ssl'] # VM Balancing - proxlb_config['vm_balancing_enable'] = config['vm_balancing'].get('enable', 1) - proxlb_config['vm_balancing_method'] = config['vm_balancing'].get('method', 'memory') - proxlb_config['vm_balancing_mode'] = config['vm_balancing'].get('mode', 'used') - proxlb_config['vm_balancing_mode_option'] = config['vm_balancing'].get('mode_option', 'bytes') - proxlb_config['vm_balancing_type'] = config['vm_balancing'].get('type', 'vm') - proxlb_config['vm_balanciness'] = config['vm_balancing'].get('balanciness', 10) - proxlb_config['vm_parallel_migrations'] = config['vm_balancing'].get('parallel_migrations', 1) - proxlb_config['vm_ignore_nodes'] = config['vm_balancing'].get('ignore_nodes', None) - proxlb_config['vm_ignore_vms'] = config['vm_balancing'].get('ignore_vms', None) + proxlb_config['vm_balancing_enable'] = config['vm_balancing'].get('enable', 1) + proxlb_config['vm_balancing_method'] = config['vm_balancing'].get('method', 'memory') + proxlb_config['vm_balancing_mode'] = config['vm_balancing'].get('mode', 'used') + proxlb_config['vm_balancing_mode_option'] = config['vm_balancing'].get('mode_option', 'bytes') + proxlb_config['vm_balancing_type'] = config['vm_balancing'].get('type', 'vm') + proxlb_config['vm_balanciness'] = config['vm_balancing'].get('balanciness', 10) + proxlb_config['vm_parallel_migrations'] = config['vm_balancing'].get('parallel_migrations', 1) + proxlb_config['vm_ignore_nodes'] = config['vm_balancing'].get('ignore_nodes', None) + proxlb_config['vm_ignore_vms'] = config['vm_balancing'].get('ignore_vms', None) # Storage Balancing - proxlb_config['storage_balancing_enable'] = config['storage_balancing'].get('enable', 0) + proxlb_config['storage_balancing_enable'] = config['storage_balancing'].get('enable', 0) + proxlb_config['storage_balancing_method'] = config['storage_balancing'].get('method', 'disk_space') + proxlb_config['storage_balanciness'] = config['storage_balancing'].get('balanciness', 10) + proxlb_config['storage_parallel_migrations'] = config['storage_balancing'].get('parallel_migrations', 1) # Update Support - proxlb_config['update_service'] = config['update_service'].get('enable', 0) + proxlb_config['update_service'] = config['update_service'].get('enable', 0) # API - proxlb_config['api'] = config['update_service'].get('enable', 0) + proxlb_config['api'] = config['update_service'].get('enable', 0) # Service - proxlb_config['master_only'] = config['service'].get('master_only', 0) - proxlb_config['daemon'] = config['service'].get('daemon', 1) - proxlb_config['schedule'] = config['service'].get('schedule', 24) - proxlb_config['log_verbosity'] = config['service'].get('log_verbosity', 'CRITICAL') - proxlb_config['config_version'] = config['service'].get('config_version', 2) + proxlb_config['master_only'] = config['service'].get('master_only', 0) + proxlb_config['daemon'] = config['service'].get('daemon', 1) + proxlb_config['schedule'] = config['service'].get('schedule', 24) + proxlb_config['log_verbosity'] = config['service'].get('log_verbosity', 'CRITICAL') + proxlb_config['config_version'] = config['service'].get('config_version', 2) except configparser.NoSectionError: logging.critical(f'{error_prefix} Could not find the required section.') sys.exit(2) @@ -263,6 +267,8 @@ def api_connect(proxmox_api_host, proxmox_api_user, proxmox_api_pass, proxmox_ap requests.packages.urllib3.disable_warnings() logging.warning(f'{warn_prefix} API connection does not verify SSL certificate.') + proxmox_api_host = __api_connect_get_host(proxmox_api_host) + try: api_object = proxmoxer.ProxmoxAPI(proxmox_api_host, user=proxmox_api_user, password=proxmox_api_pass, verify_ssl=proxmox_api_ssl_v) except urllib3.exceptions.NameResolutionError: @@ -279,6 +285,66 @@ def api_connect(proxmox_api_host, proxmox_api_user, proxmox_api_pass, proxmox_ap return api_object +def __api_connect_get_host(proxmox_api_host): + """ Validate if a list of API hosts got provided and pre-validate the hosts. """ + info_prefix = 'Info: [api-connect-get-host]:' + proxmox_port = 8006 + + if ',' in proxmox_api_host: + logging.info(f'{info_prefix} Multiple hosts for API connection are given. Testing hosts for further usage.') + proxmox_api_host = proxmox_api_host.split(',') + + # Validate all given hosts and check for responsive on Proxmox web port. + for host in proxmox_api_host: + logging.info(f'{info_prefix} Testing host {host} on port tcp/{proxmox_port}.') + reachable = __api_connect_test_ipv4_host(host, proxmox_port) + if reachable: + return host + else: + logging.info(f'{info_prefix} Using host {host} on port tcp/{proxmox_port}.') + return proxmox_api_host + + +def __api_connect_test_ipv4_host(proxmox_api_host, port): + error_prefix = 'Error: [api-connect-test-host]:' + info_prefix = 'Info: [api-connect-test-host]:' + proxmox_connection_timeout = 2 + + sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) + sock.settimeout(proxmox_connection_timeout) + logging.info(f'{info_prefix} Timeout for host {proxmox_api_host} is set to {proxmox_connection_timeout} seconds.') + result = sock.connect_ex((proxmox_api_host,port)) + + if result == 0: + sock.close() + logging.info(f'{info_prefix} Host {proxmox_api_host} is reachable on port tcp/{port}.') + return True + else: + sock.close() + logging.critical(f'{error_prefix} Host {proxmox_api_host} is unreachable on port tcp/{port}.') + return False + + +def __api_connect_test_ipv6_host(proxmox_api_host, port): + error_prefix = 'Error: [api-connect-test-host]:' + info_prefix = 'Info: [api-connect-test-host]:' + proxmox_connection_timeout = 2 + + sock = socket.socket(socket.AF_INET6, socket.SOCK_STREAM) + sock.settimeout(proxmox_connection_timeout) + logging.info(f'{info_prefix} Timeout for host {proxmox_api_host} is set to {proxmox_connection_timeout}.') + result = sock.connect_ex((proxmox_api_host,port)) + + if result == 0: + sock.close() + logging.info(f'{info_prefix} Host {proxmox_api_host} is reachable on port tcp/{port}.') + return True + else: + sock.close() + logging.critical(f'{error_prefix} Host {proxmox_api_host} is unreachable on port tcp/{port}.') + return False + + def execute_rebalancing_only_by_master(api_object, master_only): """ Validate if balancing should only be done by the cluster master. Afterwards, validate if this node is the cluster master. """ info_prefix = 'Info: [only-on-master-executor]:' @@ -376,14 +442,15 @@ def get_node_statistics(api_object, ignore_nodes): def get_vm_statistics(api_object, ignore_vms, balancing_type): """ Get statistics of cpu, memory and disk for each vm in the cluster. """ - info_prefix = 'Info: [vm-statistics]:' - warn_prefix = 'Warn: [vm-statistics]:' - vm_statistics = {} - ignore_vms_list = ignore_vms.split(',') - group_include = None - group_exclude = None - vm_ignore = None - vm_ignore_wildcard = False + info_prefix = 'Info: [vm-statistics]:' + warn_prefix = 'Warn: [vm-statistics]:' + vm_statistics = {} + ignore_vms_list = ignore_vms.split(',') + group_include = None + group_exclude = None + vm_ignore = None + vm_ignore_wildcard = False + _vm_details_storage_allowed = ['ide', 'nvme', 'scsi', 'virtio', 'sata', 'rootfs'] # Wildcard support: Initially validate if we need to honour # any wildcards within the vm_ignore list. @@ -420,11 +487,38 @@ def get_vm_statistics(api_object, ignore_vms, balancing_type): vm_statistics[vm['name']]['disk_used'] = vm['disk'] vm_statistics[vm['name']]['vmid'] = vm['vmid'] vm_statistics[vm['name']]['node_parent'] = node['node'] - vm_statistics[vm['name']]['type'] = 'vm' - # Rebalancing node will be overwritten after calculations. - # If the vm stays on the node, it will be removed at a - # later time. vm_statistics[vm['name']]['node_rebalance'] = node['node'] + vm_statistics[vm['name']]['storage'] = {} + vm_statistics[vm['name']]['type'] = 'vm' + + # Get disk details of the related object. + _vm_details = api_object.nodes(node['node']).qemu(vm['vmid']).config.get() + logging.info(f'{info_prefix} Getting disk information for vm {vm["name"]}.') + + for vm_detail_key, vm_detail_value in _vm_details.items(): + # vm_detail_key_validator = re.sub('\d+$', '', vm_detail_key) + vm_detail_key_validator = re.sub(r'\d+$', '', vm_detail_key) + + if vm_detail_key_validator in _vm_details_storage_allowed: + vm_statistics[vm['name']]['storage'][vm_detail_key] = {} + match = re.match(r'([^:]+):[^/]+/(.+),iothread=\d+,size=(\d+G)', _vm_details[vm_detail_key]) + + # Create an efficient match group and split the strings to assign them to the storage information. + if match: + _volume = match.group(1) + _disk_name = match.group(2) + _disk_size = match.group(3) + + vm_statistics[vm['name']]['storage'][vm_detail_key]['name'] = _disk_name + vm_statistics[vm['name']]['storage'][vm_detail_key]['device_name'] = vm_detail_key + vm_statistics[vm['name']]['storage'][vm_detail_key]['volume'] = _volume + vm_statistics[vm['name']]['storage'][vm_detail_key]['storage_parent'] = _volume + vm_statistics[vm['name']]['storage'][vm_detail_key]['storage_rebalance'] = _volume + vm_statistics[vm['name']]['storage'][vm_detail_key]['size'] = _disk_size[:-1] + logging.info(f'{info_prefix} Added disk for {vm["name"]}: Name {_disk_name} on volume {_volume} with size {_disk_size}.') + else: + logging.info(f'{info_prefix} No disks for {vm["name"]} found.') + logging.info(f'{info_prefix} Added vm {vm["name"]}.') # Add all containers if type is ct or all. @@ -458,11 +552,38 @@ def get_vm_statistics(api_object, ignore_vms, balancing_type): vm_statistics[vm['name']]['disk_used'] = vm['disk'] vm_statistics[vm['name']]['vmid'] = vm['vmid'] vm_statistics[vm['name']]['node_parent'] = node['node'] - vm_statistics[vm['name']]['type'] = 'ct' - # Rebalancing node will be overwritten after calculations. - # If the vm stays on the node, it will be removed at a - # later time. vm_statistics[vm['name']]['node_rebalance'] = node['node'] + vm_statistics[vm['name']]['storage'] = {} + vm_statistics[vm['name']]['type'] = 'ct' + + # Get disk details of the related object. + _vm_details = api_object.nodes(node['node']).lxc(vm['vmid']).config.get() + logging.info(f'{info_prefix} Getting disk information for vm {vm["name"]}.') + + for vm_detail_key, vm_detail_value in _vm_details.items(): + # vm_detail_key_validator = re.sub('\d+$', '', vm_detail_key) + vm_detail_key_validator = re.sub(r'\d+$', '', vm_detail_key) + + if vm_detail_key_validator in _vm_details_storage_allowed: + vm_statistics[vm['name']]['storage'][vm_detail_key] = {} + match = re.match(r'(?P[^:]+):(?P[^,]+),size=(?P\S+)', _vm_details[vm_detail_key]) + + # Create an efficient match group and split the strings to assign them to the storage information. + if match: + _volume = match.group(1) + _disk_name = match.group(2) + _disk_size = match.group(3) + + vm_statistics[vm['name']]['storage'][vm_detail_key]['name'] = _disk_name + vm_statistics[vm['name']]['storage'][vm_detail_key]['device_name'] = vm_detail_key + vm_statistics[vm['name']]['storage'][vm_detail_key]['volume'] = _volume + vm_statistics[vm['name']]['storage'][vm_detail_key]['storage_parent'] = _volume + vm_statistics[vm['name']]['storage'][vm_detail_key]['storage_rebalance'] = _volume + vm_statistics[vm['name']]['storage'][vm_detail_key]['size'] = _disk_size[:-1] + logging.info(f'{info_prefix} Added disk for {vm["name"]}: Name {_disk_name} on volume {_volume} with size {_disk_size}.') + else: + logging.info(f'{info_prefix} No disks for {vm["name"]} found.') + logging.info(f'{info_prefix} Added vm {vm["name"]}.') logging.info(f'{info_prefix} Created VM statistics.') @@ -496,6 +617,57 @@ def update_node_statistics(node_statistics, vm_statistics): return node_statistics +def get_storage_statistics(api_object): + """ Get statistics of all storage in the cluster. """ + info_prefix = 'Info: [storage-statistics]:' + storage_statistics = {} + + for node in api_object.nodes.get(): + + for storage in api_object.nodes(node['node']).storage.get(): + + # Only add enabled and active storage repositories that might be suitable for further + # storage balancing. + if storage['enabled'] and storage['active'] and storage['shared']: + storage_statistics[storage['storage']] = {} + storage_statistics[storage['storage']]['name'] = storage['storage'] + storage_statistics[storage['storage']]['total'] = storage['total'] + storage_statistics[storage['storage']]['used'] = storage['used'] + storage_statistics[storage['storage']]['used_percent'] = storage['used'] / storage['total'] * 100 + storage_statistics[storage['storage']]['used_percent_last_run'] = 0 + storage_statistics[storage['storage']]['free'] = storage['total'] - storage['used'] + storage_statistics[storage['storage']]['free_percent'] = storage_statistics[storage['storage']]['free'] / storage['total'] * 100 + storage_statistics[storage['storage']]['used_fraction'] = storage['used_fraction'] + storage_statistics[storage['storage']]['type'] = storage['type'] + storage_statistics[storage['storage']]['content'] = storage['content'] + storage_statistics[storage['storage']]['usage_type'] = '' + + # Split the Proxmox returned values to a list and validate the supported + # types of the underlying storage for further migrations. + storage_content_list = storage['content'].split(',') + usage_ct = False + usage_vm = False + + if 'rootdir' in storage_content_list: + usage_ct = True + storage_statistics[storage['storage']]['usage_type'] = 'ct' + logging.info(f'{info_prefix} Storage {storage["storage"]} support CTs.') + + if 'images' in storage_content_list: + usage_vm = True + storage_statistics[storage['storage']]['usage_type'] = 'vm' + logging.info(f'{info_prefix} Storage {storage["storage"]} support VMs.') + + if usage_ct and usage_vm: + storage_statistics[storage['storage']]['usage_type'] = 'all' + logging.info(f'{info_prefix} Updateing storage {storage["storage"]} support to CTs and VMs.') + + logging.info(f'{info_prefix} Added storage {storage["storage"]}.') + + logging.info(f'{info_prefix} Created storage statistics.') + return storage_statistics + + def __validate_ignore_vm_wildcard(ignore_vms): """ Validate if a wildcard is used for ignored VMs. """ if '*' in ignore_vms: @@ -584,11 +756,6 @@ def balancing_vm_calculations(balancing_method, balancing_mode, balancing_mode_o node_statistics, vm_statistics = __get_vm_tags_include_groups(vm_statistics, node_statistics, balancing_method, balancing_mode) node_statistics, vm_statistics = __get_vm_tags_exclude_groups(vm_statistics, node_statistics, balancing_method, balancing_mode) - # Remove VMs that are not being relocated. - vms_to_remove = [vm_name for vm_name, vm_info in vm_statistics.items() if 'node_rebalance' in vm_info and vm_info['node_rebalance'] == vm_info.get('node_parent')] - for vm_name in vms_to_remove: - del vm_statistics[vm_name] - logging.info(f'{info_prefix} Balancing calculations done.') return node_statistics, vm_statistics @@ -840,13 +1007,18 @@ def __wait_job_finalized(api_object, node_name, job_id, counter): logging.info(f'{info_prefix} Job {job_id} for migration from {node_name} terminiated succesfully.') -def __run_vm_rebalancing(api_object, vm_statistics_rebalanced, app_args, parallel_migrations): +def __run_vm_rebalancing(api_object, _vm_vm_statistics, app_args, parallel_migrations): """ Run & execute the VM rebalancing via API. """ - error_prefix = 'Error: [rebalancing-executor]:' - info_prefix = 'Info: [rebalancing-executor]:' + error_prefix = 'Error: [vm-rebalancing-executor]:' + info_prefix = 'Info: [vm-rebalancing-executor]:' + + # Remove VMs/CTs that do not have a new node location. + vms_to_remove = [vm_name for vm_name, vm_info in _vm_vm_statistics.items() if 'node_rebalance' in vm_info and vm_info['node_rebalance'] == vm_info.get('node_parent')] + for vm_name in vms_to_remove: + del _vm_vm_statistics[vm_name] - if len(vm_statistics_rebalanced) > 0 and not app_args.dry_run: - for vm, value in vm_statistics_rebalanced.items(): + if len(_vm_vm_statistics) > 0 and not app_args.dry_run: + for vm, value in _vm_vm_statistics.items(): try: # Migrate type VM (live migration). @@ -872,17 +1044,55 @@ def __run_vm_rebalancing(api_object, vm_statistics_rebalanced, app_args, paralle else: logging.info(f'{info_prefix} No rebalancing needed.') + return _vm_vm_statistics + + +def __run_storage_rebalancing(api_object, _storage_vm_statistics, app_args, parallel_migrations): + """ Run & execute the storage rebalancing via API. """ + error_prefix = 'Error: [storage-rebalancing-executor]:' + info_prefix = 'Info: [storage-rebalancing-executor]:' + + # Remove VMs/CTs that do not have a new storage location. + vms_to_remove = [vm_name for vm_name, vm_info in _storage_vm_statistics.items() if all(storage.get('storage_rebalance') == storage.get('storage_parent') for storage in vm_info.get('storage', {}).values())] + for vm_name in vms_to_remove: + del _storage_vm_statistics[vm_name] + + if len(_storage_vm_statistics) > 0 and not app_args.dry_run: + for vm, value in _storage_vm_statistics.items(): + for disk, disk_info in value['storage'].items(): + + if disk_info.get('storage_rebalance', None) is not None: + try: + # Migrate type VM (live migration). + logging.info(f'{info_prefix} Rebalancing storage of VM {vm} from node.') + job_id = api_object.nodes(value['node_parent']).qemu(value['vmid']).move_disk().post(disk=disk,storage=disk_info.get('storage_rebalance', None), delete=1) + + except proxmoxer.core.ResourceException as error_resource: + logging.critical(f'{error_prefix} {error_resource}') + + # Wait for migration to be finished unless running parallel migrations. + if not bool(int(parallel_migrations)): + logging.info(f'{info_prefix} Rebalancing will be performed sequentially.') + __wait_job_finalized(api_object, value['node_parent'], job_id, counter=1) + else: + logging.info(f'{info_prefix} Rebalancing will be performed parallely.') + + else: + logging.info(f'{info_prefix} No rebalancing needed.') + + return _storage_vm_statistics + -def __create_json_output(vm_statistics_rebalanced, app_args): +def __create_json_output(vm_statistics, app_args): """ Create a machine parsable json output of VM rebalance statitics. """ info_prefix = 'Info: [json-output-generator]:' if app_args.json: logging.info(f'{info_prefix} Printing json output of VM statistics.') - print(json.dumps(vm_statistics_rebalanced)) + print(json.dumps(vm_statistics)) -def __create_cli_output(vm_statistics_rebalanced, app_args): +def __create_cli_output(vm_statistics, app_args): """ Create output for CLI when running in dry-run mode. """ info_prefix_dry_run = 'Info: [cli-output-generator-dry-run]:' info_prefix_run = 'Info: [cli-output-generator]:' @@ -895,11 +1105,12 @@ def __create_cli_output(vm_statistics_rebalanced, app_args): info_prefix = info_prefix_run logging.info(f'{info_prefix} Start rebalancing vms to their new nodes.') - vm_to_node_list.append(['VM', 'Current Node', 'Rebalanced Node', 'VM Type']) - for vm_name, vm_values in vm_statistics_rebalanced.items(): - vm_to_node_list.append([vm_name, vm_values['node_parent'], vm_values['node_rebalance'], vm_values['type']]) + vm_to_node_list.append(['VM', 'Current Node', 'Rebalanced Node', 'Current Storage', 'Rebalanced Storage', 'VM Type']) + for vm_name, vm_values in vm_statistics.items(): + for disk, disk_values in vm_values['storage'].items(): + vm_to_node_list.append([vm_name, vm_values['node_parent'], vm_values['node_rebalance'], f'{disk_values.get("storage_parent", "N/A")} ({disk_values.get("device_name", "N/A")})', f'{disk_values.get("storage_rebalance", "N/A")} ({disk_values.get("device_name", "N/A")})', vm_values['type']]) - if len(vm_statistics_rebalanced) > 0: + if len(vm_statistics) > 0: logging.info(f'{info_prefix} Printing cli output of VM rebalancing.') __print_table_cli(vm_to_node_list, app_args.dry_run) else: @@ -929,15 +1140,201 @@ def __print_table_cli(table, dry_run=False): logging.info(f'{info_prefix} {row_format.format(*row)}') -def run_vm_rebalancing(api_object, vm_statistics_rebalanced, app_args, parallel_migrations): +def run_rebalancing(api_object, vm_statistics, app_args, parallel_migrations, balancing_type): """ Run rebalancing of vms to new nodes in cluster. """ - __run_vm_rebalancing(api_object, vm_statistics_rebalanced, app_args, parallel_migrations) - __create_json_output(vm_statistics_rebalanced, app_args) - __create_cli_output(vm_statistics_rebalanced, app_args) + _vm_vm_statistics = {} + _storage_vm_statistics = {} + + if balancing_type == 'vm': + _vm_vm_statistics = copy.deepcopy(vm_statistics) + _vm_vm_statistics = __run_vm_rebalancing(api_object, _vm_vm_statistics, app_args, parallel_migrations) + return _vm_vm_statistics + + if balancing_type == 'storage': + _storage_vm_statistics = copy.deepcopy(vm_statistics) + _storage_vm_statistics = __run_storage_rebalancing(api_object, _storage_vm_statistics, app_args, parallel_migrations) + return _storage_vm_statistics + + +def run_output_rebalancing(app_args, vm_output_statistics, storage_output_statistics): + """ Generate output of rebalanced resources. """ + output_statistics = {**vm_output_statistics, **storage_output_statistics} + __create_json_output(output_statistics, app_args) + __create_cli_output(output_statistics, app_args) + + +def balancing_storage_calculations(storage_balancing_method, storage_statistics, vm_statistics, balanciness, rebalance, processed_vms): + """ Calculate re-balancing of storage on present datastores across the cluster. """ + info_prefix = 'Info: [storage-rebalancing-calculator]:' + + # Validate for a supported balancing method, mode and if rebalancing is required. + __validate_vm_statistics(vm_statistics) + rebalance = __validate_storage_balanciness(balanciness, storage_balancing_method, storage_statistics) + + if rebalance: + vm_name, vm_disk_device = __get_most_used_resources_vm_storage(vm_statistics) + + if vm_name not in processed_vms: + processed_vms.append(vm_name) + resources_storage_most_free = __get_most_free_storage(storage_balancing_method, storage_statistics) + + # Update resource statistics for VMs and storage. + storage_statistics, vm_statistic = __update_resource_storage_statistics(storage_statistics, resources_storage_most_free, vm_statistics, vm_name, vm_disk_device) + + # Start recursion until we do not have any needs to rebalance anymore. + balancing_storage_calculations(storage_balancing_method, storage_statistics, vm_statistics, balanciness, rebalance, processed_vms) + + logging.info(f'{info_prefix} Balancing calculations done.') + return storage_statistics, vm_statistics + + +def __validate_storage_balanciness(balanciness, storage_balancing_method, storage_statistics): + """ Validate for balanciness of storage to ensure further rebalancing is needed. """ + info_prefix = 'Info: [storage-balanciness-validation]:' + error_prefix = 'Error: [storage-balanciness-validation]:' + storage_resource_percent_list = [] + storage_assigned_percent_match = [] + + # Validate for an allowed balancing method and define the storage resource selector. + if storage_balancing_method == 'disk_space': + logging.info(f'{info_prefix} Getting most free storage volume by disk size.') + storage_resource_selector = 'used' + elif storage_balancing_method == 'disk_io': + logging.error(f'{error_prefix} Getting most free storage volume by disk IO is not yet supported.') + sys.exit(2) + else: + logging.error(f'{error_prefix} Getting most free storage volume by disk IO is not yet supported.') + sys.exit(2) + + # Obtain the metrics + for storage_name, storage_info in storage_statistics.items(): + + logging.info(f'{info_prefix} Validating storage: {storage_name} for balanciness for usage with: {storage_balancing_method}.') + # Save information of nodes from current run to compare them in the next recursion. + if storage_statistics[storage_name][f'{storage_resource_selector}_percent_last_run'] == storage_statistics[storage_name][f'{storage_resource_selector}_percent']: + storage_statistics[storage_name][f'{storage_resource_selector}_percent_match'] = True + else: + storage_statistics[storage_name][f'{storage_resource_selector}_percent_match'] = False + + # Update value to the current value of the recursion run. + storage_statistics[storage_name][f'{storage_resource_selector}_percent_last_run'] = storage_statistics[storage_name][f'{storage_resource_selector}_percent'] + + # If all node resources are unchanged, the recursion can be left. + for key, value in storage_statistics.items(): + storage_assigned_percent_match.append(value.get(f'{storage_resource_selector}_percent_match', False)) + + if False not in storage_assigned_percent_match: + return False + + # Add node information to resource list. + storage_resource_percent_list.append(int(storage_info[f'{storage_resource_selector}_percent'])) + logging.info(f'{info_prefix} Storage: {storage_name} with values: {storage_info}') + + # Create a sorted list of the delta + balanciness between the node resources. + storage_resource_percent_list_sorted = sorted(storage_resource_percent_list) + storage_lowest_percent = storage_resource_percent_list_sorted[0] + storage_highest_percent = storage_resource_percent_list_sorted[-1] + + # Validate if the recursion should be proceeded for further rebalancing. + if (int(storage_lowest_percent) + int(balanciness)) < int(storage_highest_percent): + logging.info(f'{info_prefix} Rebalancing for type "{storage_resource_selector}" of storage is needed. Highest usage: {int(storage_highest_percent)}% | Lowest usage: {int(storage_lowest_percent)}%.') + return True + else: + logging.info(f'{info_prefix} Rebalancing for type "{storage_resource_selector}" of storage is not needed. Highest usage: {int(storage_highest_percent)}% | Lowest usage: {int(storage_lowest_percent)}%.') + return False + + +def __get_most_used_resources_vm_storage(vm_statistics): + """ Get and return the most used disk of a VM by storage. """ + info_prefix = 'Info: [get-most-used-disks-resources-vm]:' + + # Get the biggest storage of a VM/CT. A VM/CT can hold multiple disks. Therefore, we need to iterate + # over all assigned disks to get the biggest one. + vm_object = sorted( + vm_statistics.items(), + key=lambda x: max( + (size_in_bytes(storage['size']) for storage in x[1].get('storage', {}).values() if 'size' in storage), + default=0 + ), + reverse=True + ) + + vm_object = vm_object[0] + vm_name = vm_object[0] + vm_disk_device = max(vm_object[1]['storage'], key=lambda x: int(vm_object[1]['storage'][x]['size'])) + logging.info(f'{info_prefix} Got most used VM: {vm_name} with storage device: {vm_disk_device}.') + + return vm_name, vm_disk_device + + +def __get_most_free_storage(storage_balancing_method, storage_statistics): + """ Get the storage with the most free space or IO, depending on the balancing mode. """ + info_prefix = 'Info: [get-most-free-storage]:' + error_prefix = 'Error: [get-most-free-storage]:' + storage_volume = None + logging.info(f'{info_prefix} Starting to evaluate the most free storage volume.') + + if storage_balancing_method == 'disk_space': + logging.info(f'{info_prefix} Getting most free storage volume by disk space.') + storage_volume = max(storage_statistics, key=lambda x: storage_statistics[x]['free_percent']) + + if storage_balancing_method == 'disk_io': + logging.info(f'{info_prefix} Getting most free storage volume by disk IO.') + logging.error(f'{error_prefix} Getting most free storage volume by disk IO is not yet supported.') + sys.exit(2) + + return storage_volume + + +def __update_resource_storage_statistics(storage_statistics, resources_storage_most_free, vm_statistics, vm_name, vm_disk_device): + """ Update VM and storage resource statistics. """ + info_prefix = 'Info: [rebalancing-storage-resource-statistics-update]:' + current_storage = vm_statistics[vm_name]['storage'][vm_disk_device]['storage_parent'] + current_storage_size = storage_statistics[current_storage]['free'] / (1024 ** 3) + rebalance_storage = resources_storage_most_free + rebalance_storage_size = storage_statistics[rebalance_storage]['free'] / (1024 ** 3) + vm_storage_size = vm_statistics[vm_name]['storage'][vm_disk_device]['size'] + vm_storage_size_bytes = int(vm_storage_size) * 1024**3 + + # Assign new storage device to vm + logging.info(f'{info_prefix} Validating VM {vm_name} for potential storage rebalancing.') + if vm_statistics[vm_name]['storage'][vm_disk_device]['storage_rebalance'] == vm_statistics[vm_name]['storage'][vm_disk_device]['storage_parent']: + logging.info(f'{info_prefix} Setting VM {vm_name} from {current_storage} to {rebalance_storage} storage.') + vm_statistics[vm_name]['storage'][vm_disk_device]['storage_rebalance'] = resources_storage_most_free + else: + logging.info(f'{info_prefix} Setting VM {vm_name} from {current_storage} to {rebalance_storage} storage.') + + # Recalculate values for storage + ## Add freed resources to old parent storage device + storage_statistics[current_storage]['used'] = storage_statistics[current_storage]['used'] - vm_storage_size_bytes + storage_statistics[current_storage]['free'] = storage_statistics[current_storage]['free'] + vm_storage_size_bytes + storage_statistics[current_storage]['free_percent'] = (storage_statistics[current_storage]['free'] / storage_statistics[current_storage]['total']) * 100 + storage_statistics[current_storage]['used_percent'] = (storage_statistics[current_storage]['used'] / storage_statistics[current_storage]['total']) * 100 + logging.info(f'{info_prefix} Adding free space of {vm_storage_size}G to old storage with {current_storage_size}G. [free: {int(current_storage_size) + int(vm_storage_size)}G | {storage_statistics[current_storage]["free_percent"]}%]') + + ## Removed newly allocated resources to new rebalanced storage device + storage_statistics[rebalance_storage]['used'] = storage_statistics[rebalance_storage]['used'] + vm_storage_size_bytes + storage_statistics[rebalance_storage]['free'] = storage_statistics[rebalance_storage]['free'] - vm_storage_size_bytes + storage_statistics[rebalance_storage]['free_percent'] = (storage_statistics[rebalance_storage]['free'] / storage_statistics[rebalance_storage]['total']) * 100 + storage_statistics[rebalance_storage]['used_percent'] = (storage_statistics[rebalance_storage]['used'] / storage_statistics[rebalance_storage]['total']) * 100 + logging.info(f'{info_prefix} Adding used space of {vm_storage_size}G to new storage with {rebalance_storage_size}G. [free: {int(rebalance_storage_size) - int(vm_storage_size)}G | {storage_statistics[rebalance_storage]["free_percent"]}%]') + + logging.info(f'{info_prefix} Updated VM and storage statistics.') + return storage_statistics, vm_statistics + + +def size_in_bytes(size_str): + size_unit = size_str[-1].upper() + size_value = float(size_str) + size_multipliers = {'K': 1024, 'M': 1024**2, 'G': 1024**3, 'T': 1024**4} + return size_value * size_multipliers.get(size_unit, 1) def main(): """ Run ProxLB for balancing VM workloads across a Proxmox cluster. """ + vm_output_statistics = {} + storage_output_statistics = {} + # Initialize PAS. initialize_logger('CRITICAL') app_args = initialize_args() @@ -965,17 +1362,24 @@ def main(): # Get metric & statistics for vms and nodes. if proxlb_config['vm_balancing_enable'] or proxlb_config['storage_balancing_enable'] or app_args.best_node: - node_statistics = get_node_statistics(api_object, proxlb_config['vm_ignore_nodes']) - vm_statistics = get_vm_statistics(api_object, proxlb_config['vm_ignore_vms'], proxlb_config['vm_balancing_type']) - node_statistics = update_node_statistics(node_statistics, vm_statistics) + node_statistics = get_node_statistics(api_object, proxlb_config['vm_ignore_nodes']) + vm_statistics = get_vm_statistics(api_object, proxlb_config['vm_ignore_vms'], proxlb_config['vm_balancing_type']) + node_statistics = update_node_statistics(node_statistics, vm_statistics) + storage_statistics = get_storage_statistics(api_object) - # Execute VM balancing sub-routines. + # Execute VM/CT balancing sub-routines. if proxlb_config['vm_balancing_enable'] or app_args.best_node: - # Calculate rebalancing of vms. - node_statistics_rebalanced, vm_statistics_rebalanced = balancing_vm_calculations(proxlb_config['vm_balancing_method'], proxlb_config['vm_balancing_mode'], proxlb_config['vm_balancing_mode_option'], - node_statistics, vm_statistics, proxlb_config['vm_balanciness'], app_args, rebalance=False, processed_vms=[]) - # Rebalance vms to new nodes within the cluster. - run_vm_rebalancing(api_object, vm_statistics_rebalanced, app_args, proxlb_config['vm_parallel_migrations']) + node_statistics, vm_statistics = balancing_vm_calculations(proxlb_config['vm_balancing_method'], proxlb_config['vm_balancing_mode'], proxlb_config['vm_balancing_mode_option'], node_statistics, vm_statistics, proxlb_config['vm_balanciness'], app_args, rebalance=False, processed_vms=[]) + vm_output_statistics = run_rebalancing(api_object, vm_statistics, app_args, proxlb_config['vm_parallel_migrations'], 'vm') + + # Execute storage balancing sub-routines. + if proxlb_config['storage_balancing_enable']: + storage_statistics, vm_statistics = balancing_storage_calculations(proxlb_config['storage_balancing_method'], storage_statistics, vm_statistics, proxlb_config['storage_balanciness'], rebalance=False, processed_vms=[]) + storage_output_statistics = run_rebalancing(api_object, vm_statistics, app_args, proxlb_config['storage_parallel_migrations'], 'storage') + + # Generate balancing output + if proxlb_config['vm_balancing_enable'] or proxlb_config['storage_balancing_enable']: + run_output_rebalancing(app_args, vm_output_statistics, storage_output_statistics) # Validate for any errors. post_validations()