Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oVirt hosted engine installation failure because of missing netaddr for python 3.11 in ovirt-node image #695

Closed
daliborfilus opened this issue Mar 22, 2023 · 39 comments
Labels
bug Something isn't working

Comments

@daliborfilus
Copy link

daliborfilus commented Mar 22, 2023

SUMMARY

Don't know if this is the correct place for this issue, plase, redirect me if it isn't.

COMPONENT NAME

oVirt hosted engine installation procedure.

STEPS TO REPRODUCE

As a new user... go to the download page.

  1. Download ovirt-node 4.5.4, 4.5.3, 4.5.2 ISO (el8).
  2. Install the ovirt-node image, reboot.
  3. Run hosted-engine --deploy --4, fill-in the form.
  4. After ~twenty minutes, watch it hang on "Wait for the host to be up" message... for another tens of minutes. Then it crashes and rolls back the installation.
  5. Optional steps: Pull your hair out trying to find something relevant on this issue via google. Find nothing meaningful. Retry the whole installation multiple times with different versions of the image and network configuration. (I thought the issue was that the engine VM can't connect to the host, which led me to trying to fix non-existing issues with my DNS and other network stuff.) I reinstalled the node image 6+ times total. I re-run the hosted-engine deploy commands 10+ times. Two days of life are gone.
  6. Finally discover that you can go to the failed VM directly (it's still present and active) and hunt for logs there.
  7. Discover /var/log/ovirt-engine/engine.log. See this inside:
2023-03-22 15:55:34,168+01 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [9c814912-05ec-49eb-8e4f-912347e59f0d] Host installation failed for host 'e2fc0443-9a67-4c11-a11e-791903212bc2', 'ovirtnode1.b-one.cz': Task Install ovs failed to execute. Please check logs for more details: /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-ansible-20230322155412-ovirtnode1.b-one.cz-9c814912-05ec-49eb-8e4f-912347e59f0d.log

Go to that log and see this inside:

  "stdout" : "fatal: [ovirt-node-1.lan]: FAILED! => {\"msg\": \"The conditional check 'cluster_switch == \\\"ovs\\\" or (ovn_central is defined and ovn_central | ipaddr)' failed. The error was: The ipaddr filter requires python's netaddr be installed on the ansible controller\\n\\nThe error appears to be in '/usr/share/ovirt-engine/ansible-runner-service-project
/project/roles/ovirt-provider-ovn-driver/tasks/configure.yml': line 3, column 5, but may\\nbe elsewhere in the file depending on the exact syntax problem.\\n\\nThe offending line appears to be:\\n\\n- block:\\n  - name: Install ovs\\n    ^ here\\n\"}",
  1. So you are telling me it's not because of networking or something hardcore, but a missing package? And you are telling me you couldn't just tell me this problem in the main output directly, without having to hunt it down two layers deep?
  2. Curse. Calm down. Find a single metion of this error... in a closed issue, in archived repository. Hmmph.

The real issue is that the task Update all packages also updates ansible on the engine side. That new ansible is now pointing to python3.11, which doesn't have netaddr installed. There is no package python311-netaddr in the repositories. There also isn't even python311-pip.

Before the task "Update all packages", ansible points to python3.8.
After that it's on 3.11.

Because the installation creates this new engine VM for you, you have to do any kind of fixing right after the updates are done, but before the ansible is run.
The only workaround I found which worked was this:

cp -a /usr/lib/python3.8/site-packages/netaddr* /usr/lib/python3.11/site-packages/

(Optionally, you can install python39-netaddr and copy that instead.)

EXPECTED RESULTS

Installation doesn't fail. If it fails, it tells me the reason directly.
A failed installation shouldn't require of you to become expert in oVirt installation details, procedure and internals.

ACTUAL RESULTS

"The host is not up. [..] Go see logs." What logs? Where? It doesn't say.

I'm now an expert in hosted engine installation. Where can I get my certificate? :-)

(P.S. I appreciate all work everyone is doing on this project. I'm just mad after having two days gone on this issue just because of some missing package.)

@daliborfilus daliborfilus added the bug Something isn't working label Mar 22, 2023
@michalskrivanek
Copy link
Member

yeah, ansible "surprised" us by requiring python 3.11 out of a sudden which requires fair amount of packages to be rebuilt and released. no ETA yet.
downgrading ansible would likely work.
also, el9stream works fine.

@michalskrivanek
Copy link
Member

and sorry about the frustrating experience, ansible has a history of these....

@daliborfilus
Copy link
Author

daliborfilus commented Mar 23, 2023

I understand, the dependencies are not in your control and when they break, there's nothing you can do. (Except going the nixos route of static dependencies and forbid users from upgrading packages themselves.) Btw the installation then failed on engine's liveness check and before that it displayed Engine VM IP address is while the engine's he_fqdn ovirtengine.lan resolves to 192.168.88.150. I don't know why is that, but google told me it might be because of broken qemu version (I had qemu-6.2 installed). So either it's another broken dependency or it's (finally) something broken in my network config. I had this error on ovirt-node-4.5.2 ISO, don't know if it isn't fixed in .3 or .4. But it still could be something wrong with my nested KVM setup.

(Offtopic: I need this ovirt installation because I need to write stats gathering app for it - to get host statistics (running VM's, etc.) and show that in Grafana. I found I could do that via vdsm, so that's why I'm installing it in the first place.)
Because this is a test setup, I installed the ovirt node image inside bridged libvirt.
I read everywhere that the engine's IP must be in the same subnet, although I don't know how the nested qemu can work with that without it being bridged too. But my network stuff knowledge is limited. Maybe the scripts assign the IP on the ovirt node host and passthrough to the nested VM? Don't know.

I gave up on the hosted version and am installing the engine manually right now as we speak. Fingers crossed...

@michalskrivanek
Copy link
Member

posted oVirt/ovirt-engine#826. It actually might be enough for the deployment problem. There's no one around who would remember why was it added. so... let's blindly drop it and see what happens...:)

as for the empty address, it means the VM didn't boot up. you'd have to check out the VM...if it's there at all, if it's not stuck in qemu(happens sometimes in CI), try getting to the serial console, maybe the OS has an issue

@daliborfilus
Copy link
Author

daliborfilus commented Mar 23, 2023

Well, that was a quick fix. Thank you.

The vm - it was running (virsh list showed "HostedEngine"), but it didn't respond to my assigned IP and net-dhcp-leases didn't show any either). I didn't think of serial console, that could've done the trick. Well, I deleted the VM and went the non-hosted-install route, so I can't check anymore, sadly.

@jameswadsworth
Copy link

jameswadsworth commented Mar 24, 2023

Some problem here running RHEL 8.7 with ovirt-engine 4.5.4. We were unable to add any additional hosts to the cluster beyond the host we used for the redeployment of the ovirt engine. We resolved in the same way as @daliborfilus by copying the netaddr module from python3.9 to python3.11. We lost a whole nights sleep try debug the issue. We are not ansible/python experts but we know a lot more now!!

@simmonscs
Copy link

The option that worked for me was to edit the /etc/dnf/dnf.conf file on the engine VM and add the line exclude=ansible-core to prevent ansible from being updated when the Update all packages task runs. You can do this as soon as the local engine VM gets an IP address.

@laduchesneau
Copy link

I lost two days trying to figure out why I couldn't rebuild my lab. Like the OP, I was looking in the wrong direction.

The option that worked for me was to edit the /etc/dnf/dnf.conf file on the engine VM and add the line exclude=ansible-core to prevent ansible from being updated when the Update all packages task runs. You can do this as soon as the local engine VM gets an IP address.

The proposed work around worked for me.

@mnecas
Copy link
Member

mnecas commented Mar 30, 2023

Just rebuild the ovirt-ansible-collection with python3.11 for el8, it took some time because of the deps.
Please let me know if the release 3.1.2-1 will work for you
#697

@nodespar
Copy link

nodespar commented Apr 1, 2023

@mnecas Are there instructions on how to build the ansible collection with python3.11? Been pulling my hair for the past 1week to get this installed

@blablak
Copy link

blablak commented Apr 3, 2023

I found a workaround for this issue.
You should start deployment with:
hosted-engine --deploy --4 --ansible-extra-vars=he_pause_before_engine_setup=true
When deployment pouse you shut conectt to VM using ssh and install mising dependency
dnf install python3.11-pip.noarch
python3.11 -m pip install netaddr

@mnecas
Copy link
Member

mnecas commented Apr 3, 2023

@nodespar you can just install it from csb [1] or copr [2] repo. I have already built the collection with python3.11.
Don't know right now the release schedule so don't know when it will be in the 4.5 repo
[1] https://cbs.centos.org/koji/buildinfo?buildID=43404
[2] https://copr.fedorainfracloud.org/coprs/ovirt/ovirt-master-snapshot/

@deepakramanath
Copy link

SUMMARY

Don't know if this is the correct place for this issue, plase, redirect me if it isn't.

COMPONENT NAME

oVirt hosted engine installation procedure.

STEPS TO REPRODUCE

As a new user... go to the download page.

  1. Download ovirt-node 4.5.4, 4.5.3, 4.5.2 ISO (el8).
  2. Install the ovirt-node image, reboot.
  3. Run hosted-engine --deploy --4, fill-in the form.
  4. After ~twenty minutes, watch it hang on "Wait for the host to be up" message... for another tens of minutes. Then it crashes and rolls back the installation.
  5. Optional steps: Pull your hair out trying to find something relevant on this issue via google. Find nothing meaningful. Retry the whole installation multiple times with different versions of the image and network configuration. (I thought the issue was that the engine VM can't connect to the host, which led me to trying to fix non-existing issues with my DNS and other network stuff.) I reinstalled the node image 6+ times total. I re-run the hosted-engine deploy commands 10+ times. Two days of life are gone.
  6. Finally discover that you can go to the failed VM directly (it's still present and active) and hunt for logs there.
  7. Discover /var/log/ovirt-engine/engine.log. See this inside:
2023-03-22 15:55:34,168+01 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [9c814912-05ec-49eb-8e4f-912347e59f0d] Host installation failed for host 'e2fc0443-9a67-4c11-a11e-791903212bc2', 'ovirtnode1.b-one.cz': Task Install ovs failed to execute. Please check logs for more details: /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-ansible-20230322155412-ovirtnode1.b-one.cz-9c814912-05ec-49eb-8e4f-912347e59f0d.log

Go to that log and see this inside:

  "stdout" : "fatal: [ovirt-node-1.lan]: FAILED! => {\"msg\": \"The conditional check 'cluster_switch == \\\"ovs\\\" or (ovn_central is defined and ovn_central | ipaddr)' failed. The error was: The ipaddr filter requires python's netaddr be installed on the ansible controller\\n\\nThe error appears to be in '/usr/share/ovirt-engine/ansible-runner-service-project
/project/roles/ovirt-provider-ovn-driver/tasks/configure.yml': line 3, column 5, but may\\nbe elsewhere in the file depending on the exact syntax problem.\\n\\nThe offending line appears to be:\\n\\n- block:\\n  - name: Install ovs\\n    ^ here\\n\"}",
  1. So you are telling me it's not because of networking or something hardcore, but a missing package? And you are telling me you couldn't just tell me this problem in the main output directly, without having to hunt it down two layers deep?
  2. Curse. Calm down. Find a single metion of this error... in a closed issue, in archived repository. Hmmph.

The real issue is that the task Update all packages also updates ansible on the engine side. That new ansible is now pointing to python3.11, which doesn't have netaddr installed. There is no package python311-netaddr in the repositories. There also isn't even python311-pip.

Before the task "Update all packages", ansible points to python3.8. After that it's on 3.11.

Because the installation creates this new engine VM for you, you have to do any kind of fixing right after the updates are done, but before the ansible is run. The only workaround I found which worked was this:

cp -a /usr/lib/python3.8/site-packages/netaddr* /usr/lib/python3.11/site-packages/

(Optionally, you can install python39-netaddr and copy that instead.)

EXPECTED RESULTS

Installation doesn't fail. If it fails, it tells me the reason directly. A failed installation shouldn't require of you to become expert in oVirt installation details, procedure and internals.

ACTUAL RESULTS

"The host is not up. [..] Go see logs." What logs? Where? It doesn't say.

I'm now an expert in hosted engine installation. Where can I get my certificate? :-)

(P.S. I appreciate all work everyone is doing on this project. I'm just mad after having two days gone on this issue just because of some missing package.)

I'm having the same issue as what you have reported. overt 4.5 on EL-8

@deepakramanath
Copy link

deepakramanath commented Apr 9, 2023

I found a workaround for this issue.
You should start deployment with:
hosted-engine --deploy --4 --ansible-extra-vars=he_pause_before_engine_setup=true
When deployment pouse you shut conectt to VM using ssh and install mising dependency
dnf install python3.11-pip.noarch
python3.11 -m pip install netaddr

I want to give this a try. Do you mean ssh into the engine VM and then install the missing dependency on the engine?

@daliborfilus
Copy link
Author

@ghost
Copy link

ghost commented Apr 13, 2023

When the installation is paused due to he_pause_before_engine_setup you can grab the IP address of the engine VM and ssh into it and do the command.

@bcostescu
Copy link

bcostescu commented Apr 17, 2023

I'd like to propose a different solution:

hosted-engine --deploy --4 --ansible-extra-vars=he_offline_deployment=true

This will prevent the HE VM from updating packages, such that ansible remains at the original, older version, which guarantees a working deployment (at least for the time being).

Once the deployment finishes, the HE VM will run CentOS Stream 8, as this is what ovirt-engine-appliance is based on. At this point, you can log on normally to the HE VM, using the IP or name you gave for during deployment - you don't need to use the temporary IP. In my case, this was followed by (after logging on to the HE VM):

curl -O https://raw.githubusercontent.com/AlmaLinux/almalinux-deploy/master/almalinux-deploy.sh; bash almalinux-deploy.sh --downgrade

which switches the HE VM to AlmaLinux 8.7. This contains the same ansible version as the host, so things continue to work afterwards. Of course, if your goal is to keep running CentOS Stream 8 in the HE VM, this won't help and you're probably better off with installing the missing python 3.11 module.

@michalskrivanek
Copy link
Member

#704 dropped netaddr for good, I hope

@michalskrivanek
Copy link
Member

with https://www.mail-archive.com/[email protected]/msg72302.html in mind(ovirt-master-snapshot with node image from https://resources.ovirt.org/repos/ovirt/github-ci/ovirt-node-ng-image/ ), it should work from now on on el8stream too. el9stream is working for a while.

@daliborfilus
Copy link
Author

Thank you. I agree with going nightly for these cases, because you pratically go "nightly" in "stable" too, because of the included yum update during engine installation.
So either going "fully stable" from known versions (but risking security vulnerabilities, where patches for them require yum update anyway), OR going "all latest" are both valid options.

@sea2space
Copy link

Hello
Thank the gods for finding this thread!

Also killed a couple of days on this problem.
Installation from test build 4.5.5 did not help.

The solution suggested by @blablak helped!

@michalskrivanek
Copy link
Member

@sea2space what didn't work for you exactly? can you describe the exact OS and package versions and what failed?

@sea2space
Copy link

@sea2space what didn't work for you exactly? can you describe the exact OS and package versions and what failed?
Update:

Download from
https://resources.ovirt.org/repos/ovirt/github-ci/ovirt-node-ng-image/
ovirt-node-ng-installer-4.5.5-2023050307.el8.iso
Ran into another error.

Deploy HE again. Now all good.
Sorry, my bad.

Waiting for the official stable 4.5.5 )

@kriipke
Copy link

kriipke commented May 18, 2023

I found a workaround for this issue.
You should start deployment with:
hosted-engine --deploy --4 --ansible-extra-vars=he_pause_before_engine_setup=true
When deployment pouse you shut conectt to VM using ssh and install mising dependency
dnf install python3.11-pip.noarch
python3.11 -m pip install netaddr

^^ This right here worked for me. Finally wrapping up this "afternoon project" 48 hours later smh. Fixed the following error:

The conditional check 'cluster_switch == "ovs" or (ovn_central is defined and ovn_central | ipaddr)' failed.
The error was: The ipaddr filter requires python's netaddr be installed on the ansible controller.

@nodespar
Copy link

Is ovirt a dying project? I'm just surprised there has not been a major release with this fix.
Anyone coming new to try ovirt cannot install this and go forward.

@deepakramanath
Copy link

deepakramanath commented May 23, 2023 via email

@Ecsi1337
Copy link

Currently, the only working solution is to have both the hosts and the hosted-engine version 4.4.10, and then update the hosted-engine to 4.5.4 first, and then the hosts. It is important that you cannot add a 4.4.10 host to the 4.5.4 hosted-engine, everything must be started from 4.4.10.

@jorgevisentini
Copy link

Unfortunately, the problem still persists.... Any workaround? Any tips?

Tested the stable iso 4.4.10, 4.5.4, 4.5.3.2

@jorgevisentini
Copy link

Currently, the only working solution is to have both the hosts and the hosted-engine version 4.4.10, and then update the hosted-engine to 4.5.4 first, and then the hosts. It is important that you cannot add a 4.4.10 host to the 4.5.4 hosted-engine, everything must be started from 4.4.10.

I tested it just now and it didn't work.
In the Engine VM, I added the line "exclude=ansible-core" in the /etc/dnf/dnf.conf file, as @simmonscs commented.

@jorgevisentini
Copy link

Just a update... I tested with the ovirt-node-ng-installer-4.5.5-2023070606.el9 and it worked.

I believe that the next releases will work fine...
Just tips, download the CentOS 9 Stream repo because we dont know if we will have a change, you know? lol

@cgoudie
Copy link

cgoudie commented Sep 24, 2023

I don't quite know why this issue is closed. Problem persists in Sept 2023 installing ovirt hosts.
EL8 and EL9 hosts (Centos Stream)

Edit: To fix you must go to your ovirt host and dnf install python3.11-netaddr.noarch (ovirt host installer didn't automatically update this package -- seems a dependency is missing)

@dim-nail
Copy link

dim-nail commented Oct 9, 2023

Hi, if you trying to install from Centos repo then upgrade ovirt-engine-appliance RPM, on repo they have some bugged version. I've install ovirt-engine-appliance-4.5-20231009063645.1.el9.x86_64.rpm (https://resources.ovirt.org/repos/ovirt/github-ci/ovirt-appliance/el9/) and all installed without any problem and workaround.

@vladsol
Copy link

vladsol commented Nov 15, 2023

oVirt Node 4.5.4 (stable? :) )

python-netaddr version: 0.9.0 (tried 0.8.0-5.el9 also)
same problem:
The ipaddr filter requires python's netaddr be installed on the ansible controller.

Tried ovirt 4.5.4 el8 - same problem.

@mwperina
Copy link
Member

mwperina commented Jan 2, 2024

python-netaddr dependency was removed from oVirt Ansible Collection in 3.1.3 release: #696

Please make you are using the latest available packages during installation. When installing oVirt Hosted Engine from oVirt Engine Appliance you need to pause the deployment using he_pause_before_engine_setup and perform a dnf update (more details can be found at https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine_using_the_command_line/index.html#Deploying_the_Self-Hosted_Engine_Using_the_CLI_install_RHVM)

@safodz
Copy link

safodz commented Feb 22, 2024

@daliborfilus
Thanks for your workaround I followed step it passed than i got the following error , can you please advise ?

[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Obtain SSO token using username/password credentials]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Check if the host is up]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Set host_id]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Collect error events from the Engine]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Generate the error message from the engine events]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Fail with error description]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, deployment errors: code 505: Host rhv2........local installation failed. Failed to configure management network on the host., code 519: Host rhv2.cloudone.cloudz.local does not comply with the cluster Default networks, the following networks are missing on host: 'ovirtmgmt', code 9000: Failed to verify Power Management configuration for Host rhv2.cloudone.cloudz.local., fix accordingly and re-deploy."}

Regards
Sofiane

@almaclang
Copy link

We're also hitting the same issue. We tried all possible workaround but it fails.

@safodz
Copy link

safodz commented Mar 12, 2024

@almaclang for me it works after doing this workaround on the Engine VM (you access it with a temporary IP given during the installation ) if it stucks try to open all the firewall ports (1-9999 tCP/UDP) and verify that DNS resolve and reverse DNS works for Hosts and engine VM

@daliborfilus
Copy link
Author

As the OP of this issue, I'm unsubscribing from notifications, becase the issue is closed and I no longer use oVirt. I understand it comes up from searches and I think there should be some kind of FAQ / Discussion thread somewhere more prominent, instead of this closed issue.

@almaclang
Copy link

@almaclang fir le it work after doing this work around on the Engine vm (you access it with a temporary IP given during the installation ) if it stucks try to open all the firewall ports (1-9999 tCP/UDP) and verify that DNS resolve and reverse DNS works for Hosts and engine VM my issue is after this stage that with Gluster storage and Engine VM an i can still not detect the problem

@ safodz There's no issue with the DNS, it can reach the public yum repo. It is stuck during the package installation and upgrade. Then suddenly it went to "Wait for the host to be up" state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests