NetworkManager-wait-online can fail on slower machines #32

phillxnet · 2020-10-06T17:09:06Z

On some low power devices, i.e Pi4 / Ten64, and slower/older x86_64 machines, the default Network Manager wait online service leaves insufficient time before 'declaring' to it's dependants that no online state is available. This false negative on online status can lead to dependants, i.e. KVM installs or Hashicorp Vault instances, failing to start as their dependency of online state was not indicated.

The proposed fix is to increase the default wait setting for the NetworkManager-wait-online service.

The service derives it's timeout setting from the following parameter:

## Type:        int
## Default:     30
#
# When using NetworkManager you may define a timeout to wait for NetworkManager
# to connect in NetworkManager-wait-online.service.  Other network services
# may require the system to have a valid network setup in order to succeed.
#
# This variable has no effect if NetworkManager is disabled.
#
NM_ONLINE_TIMEOUT="30"

in /etc/sysconfig/network/config

some experimentation has indicated that a setting of 45 seconds looks to resolve the observed failures.

The text was updated successfully, but these errors were encountered:

phillxnet · 2020-10-06T17:25:11Z

Current setting can be retrieved via:

# grep "NM_ONLINE_TIMEOUT" /etc/sysconfig/network/config 
NM_ONLINE_TIMEOUT="30"

And assuming the default is as expected the following will change that specific setting to the proposed 45 seconds:

sed -i 's/NM_ONLINE_TIMEOUT="30"/NM_ONLINE_TIMEOUT="45"/g' /etc/sysconfig/network/config

yast can also configure this setting via:

sudo yast sysconfig set NM_ONLINE_TIMEOUT="45"

But as we are akin to a JeOS install a regular Rocsktor system will not have yast configured and is not, as yet, yast compatible.

phillxnet · 2020-10-06T17:27:54Z

An indication of the failed state of NetworkManager-wait-online can be assessed via:

systemctl status NetworkManager-wait-online

phillxnet · 2020-10-06T19:32:14Z

I am undecided on the route to take here. Adding many tens of seconds to boot times for what looks to be a non critical service may not be the way to go. Especially give that it seems no Rockstor native service is affected. Also note that on for example the Ten64, if one starts this service post boot the time taken for it to start successfully is around 46 seconds. Whereas during boot, the delay required to achieve successful 'no time out' with the typical samba service enable is 185 seconds.

The above increase to 185 seconds (from the default of 30) affects the boot times thus:
From Grub screen to command line:

-	NM_ONLINE_TIMEOUT="30"	NM_ONLINE_TIMEOUT="185"
Rockstor Web-UI login available	120 seconds	120 seconds
command login	60 seconds	210 seconds

Holding off on this change for the time being as this may all be a red-herring of sorts.
Also need timings for when this service is disabled, the consequence of this.

FroggyFlox · 2020-10-06T19:50:36Z

@phillxnet , the same thing happened to me a little while back on some of my Rockstor KVM, but I never could point the source and it was clearly due to my situation at the time... I remember looking around a bit and see some people reporting such timeout at boot when having multiple NICs; this was my best guess at the time as I erroneously was binding a few interfaces to my KVM at the time. I haven't tried those VMs in a while (not sure I still have them), but could the number of interfaces be relevant here? It seems fitting given the high number of interfaces on the Ten64, for instance.

phillxnet · 2020-10-06T20:26:12Z

@FroggyFlox

reporting such timeout at boot when having multiple NICs;

That's interesting the Ten64 does have 10 NIC's so possible, but I've also seen it on a Haswell NUC, single NIC, and an i5 Ivy Bridge desktop with a single NIC, in the latter 2 cases both machines were fairly heavily loaded starting multiple KVM's thought. This was with generic Leap 15.0/15.1.

It's really perplexing, also doesn't look like anything is hanging, just waiting around. I'm inclined to disable actually but not sure of consequences. In the Vault instance I think I removed the dependency on this service at one point as Vault then worked fine anyway in my context here. I think testing on the Pi4 may help shed light as it seems to affect slow / loaded / cpu bound machines. But may just be quirky re hardware as on KVM's here it seems to work immediately.

Early timeout settings were 0 I think, wait for ever. This was changed to 1 at some point to stop infinite hangs on the service in some settings. I've moved to 40 - 60 on some settings to make stuff work and finally got to do some testing here in the Rockstor realm.

FroggyFlox · 2020-10-06T20:37:29Z

I still have my VM that shows that... and it only has one NIC, so the number of NICs seems irrelevant, actually... I'm currently leaning towards IPv6 issue as we still have a lot of log messages with IPv6-related operations failing (understandably so). Maybe we should make sure we're not missing something IPv6-related somewhere.
In this VM, I also see everything running fine as NetworkManager still boots fine, just a little later than NetworkManager-wait-online would like.

FroggyFlox mentioned this issue Dec 12, 2023

Make explicit to systemd our NetworkManager dependency #2685 rockstor/rockstor-core#2762

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NetworkManager-wait-online can fail on slower machines #32

NetworkManager-wait-online can fail on slower machines #32

phillxnet commented Oct 6, 2020

phillxnet commented Oct 6, 2020

phillxnet commented Oct 6, 2020

phillxnet commented Oct 6, 2020 •

edited

Loading

FroggyFlox commented Oct 6, 2020

phillxnet commented Oct 6, 2020 •

edited

Loading

FroggyFlox commented Oct 6, 2020

NetworkManager-wait-online can fail on slower machines #32

NetworkManager-wait-online can fail on slower machines #32

Comments

phillxnet commented Oct 6, 2020

phillxnet commented Oct 6, 2020

phillxnet commented Oct 6, 2020

phillxnet commented Oct 6, 2020 • edited Loading

FroggyFlox commented Oct 6, 2020

phillxnet commented Oct 6, 2020 • edited Loading

FroggyFlox commented Oct 6, 2020

phillxnet commented Oct 6, 2020 •

edited

Loading

phillxnet commented Oct 6, 2020 •

edited

Loading