Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClusterShell.Propagation.RouteResolvingError: No route available to pm4-nod01 #566

Open
MarbolanGos opened this issue Sep 4, 2024 · 0 comments

Comments

@MarbolanGos
Copy link

Hello,

When using clustershell with milkcheck, I have an error:

ClusterShell.Propagation.RouteResolvingError: No route available to pm4-nod01

The error comes as soon as I have a topology file.

The debug mode of milkcheck shows:

Traceback (most recent call last):bmc]
  File "/usr/lib/python3.6/site-packages/MilkCheck/UI/Cli.py", line 538, in execute
    self.manager.call_services(services, action, conf=self._conf)
  File "/usr/lib/python3.6/site-packages/MilkCheck/ServiceManager.py", line 173, in call_services
    self.run(action)
  File "/usr/lib/python3.6/site-packages/MilkCheck/Engine/Service.py", line 236, in run
    action_manager_self().run()
  File "/usr/lib/python3.6/site-packages/MilkCheck/Engine/Action.py", line 182, in run
    self._master_task.run()
  File "/usr/lib/python3.6/site-packages/ClusterShell/Task.py", line 877, in run
    self.resume(timeout)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Task.py", line 831, in resume
    self._resume()
  File "/usr/lib/python3.6/site-packages/ClusterShell/Task.py", line 794, in _resume
    self._run(self.timeout)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Task.py", line 404, in _run
    self._engine.run(timeout)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Engine/Engine.py", line 723, in run
    self.runloop(timeout)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Engine/EPoll.py", line 170, in runloop
    self.remove_stream(client, stream)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Engine/Engine.py", line 520, in remove_stream
    self.remove(client)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Engine/Engine.py", line 495, in remove
    self._remove(client, abort, did_timeout)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Engine/Engine.py", line 483, in _remove
    client._close(abort=abort, timeout=did_timeout)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Worker/Exec.py", line 142, in _close
    self.worker._check_fini()
  File "/usr/lib/python3.6/site-packages/ClusterShell/Worker/Exec.py", line 384, in _check_fini
    self._has_timeout)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Worker/Worker.py", line 55, in _eh_sigspec_invoke_compat
    return method(*args)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Propagation.py", line 417, in ev_close
    mw._relaunch(gateway)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Worker/Tree.py", line 404, in _relaunch
    self._launch(targets)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Worker/Tree.py", line 265, in _launch
    next_hops = self._distribute(self.task.info("fanout"), nodes.copy())
  File "/usr/lib/python3.6/site-packages/ClusterShell/Worker/Tree.py", line 342, in _distribute
    for gw, dstset in self.router.dispatch(dst_nodeset):
  File "/usr/lib/python3.6/site-packages/ClusterShell/Propagation.py", line 106, in dispatch
    yield self.next_hop(host), host
  File "/usr/lib/python3.6/site-packages/ClusterShell/Propagation.py", line 141, in next_hop
    str(dst))
ClusterShell.Propagation.RouteResolvingError: No route available to pm4-nod01

I cannot reproduce the error using clush only:

$ clush --remote=no -u2 -bw pm4-nod01 hostname
---------------
pm4-nod01
---------------
mngt0-2
$ clush -u2 -bw pm4-nod01 hostname
---------------
pm4-nod01
---------------
pm4-nod01
$ cat /etc/clustershell/topology.conf
[routes]
mngt0-1: mngt0-2
mngt0-2: @compute

Python version 3.6.8

In order to have a temporary fix I did change this:

--- /usr/lib/python3.6/site-packages/ClusterShell/Propagation.py.orig   2023-06-27 15:00:39.099237135 +0200
+++ /usr/lib/python3.6/site-packages/ClusterShell/Propagation.py        2023-06-27 15:00:47.504344461 +0200
@@ -405,7 +405,7 @@ class PropagationChannel(Channel):
         self.logger.debug("ev_close rc=%s", self._rc) # may be None

         # NOTE: self._rc may be None if the communication channel has aborted
-        if self._rc != 0:
+        if self._rc != 0 and not self._rc == None:
             self.logger.debug("error on gateway %s (setup=%s)", gateway,
                               self.setup)
             self.task.router.mark_unreachable(gateway)

And this:

--- /bin/milkcheck.orig 2024-09-04 09:19:15.826180684 +0200
+++ /bin/milkcheck      2024-09-04 09:19:22.076099490 +0200
@@ -1,4 +1,4 @@
-#!/usr/libexec/platform-python
+#!/usr/bin/python3
 #
 # Copyright CEA (2011)
 #  Contributor: Jeremie TATIBOUET
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant