Skip to content

Commit

Permalink
Tunnel tests (#260)
Browse files Browse the repository at this point in the history
* Updated tunnel code, tests.
* Added host output tests
* Added scp and sftp tests. Updated sftp code.
* Added ssh-python client tests
* Updated tunnel shutdown
* Updated single client
* Fix issue with identity auth - #222
* Updated documentation
* Updated readme
  • Loading branch information
pkittenis authored Jan 2, 2021
1 parent 8f4d7c4 commit 0af3d23
Show file tree
Hide file tree
Showing 20 changed files with 752 additions and 347 deletions.
18 changes: 18 additions & 0 deletions Changelog.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,24 @@
Change Log
============

2.5.0
+++++

Changes
-------

* Python 2 no longer supported.
* Updated class arguments, refactor for ``pssh.clients.native.tunnel``.

Fixes
-----

* Closed clients with proxy host enabled would not shutdown their proxy servers.
* Clients with proxy host enabled would not disconnect the proxy client on ``.disconnect`` being called.
* Default identity files would not be used when private key was not specified - #222.
* ``ParallelSSHClient(<..>, identity_auth=False`` would not be honoured.


2.4.0
+++++

Expand Down
205 changes: 65 additions & 140 deletions README.rst

Large diffs are not rendered by default.

20 changes: 10 additions & 10 deletions doc/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -513,7 +513,7 @@ Stderr is empty:

.. code-block:: python
for line in output[client.hosts[0]].stderr:
for line in output[0].stderr:
print(line)
No output from ``stderr``.
Expand All @@ -523,9 +523,9 @@ No output from ``stderr``.
SFTP and SCP
*************

SFTP and SCP are both supported by ``parallel-ssh`` and functions are provided by the client for copying files with SFTP to and from remote servers - default native client only.
SFTP and SCP are both supported by ``parallel-ssh`` and functions are provided by the client for copying files to and from remote servers - default native clients only.

Neither SFTP nor SCP have a shell interface and no output is provided for any SFTP/SCP commands.
Neither SFTP nor SCP have a shell interface and no output is sent for any SFTP/SCP commands.

As such, SFTP functions in ``ParallelSSHClient`` return greenlets that will need to be joined to raise any exceptions from them. :py:func:`gevent.joinall` may be used for that.

Expand All @@ -542,15 +542,15 @@ To copy the local file with relative path ``../test`` to the remote relative pat
client = ParallelSSHClient(hosts)
greenlets = client.copy_file('../test', 'test_dir/test')
joinall(greenlets, raise_error=True)
cmds = client.copy_file('../test', 'test_dir/test')
joinall(cmds, raise_error=True)
To recursively copy directory structures, enable the ``recurse`` flag:

.. code-block:: python
greenlets = client.copy_file('my_dir', 'my_dir', recurse=True)
joinall(greenlets, raise_error=True)
cmds = client.copy_file('my_dir', 'my_dir', recurse=True)
joinall(cmds, raise_error=True)
.. seealso::

Expand All @@ -570,8 +570,8 @@ Copying remote files in parallel requires that file names are de-duplicated othe
client = ParallelSSHClient(hosts)
greenlets = client.copy_remote_file('remote.file', 'local.file')
joinall(greenlets, raise_error=True)
cmds = client.copy_remote_file('remote.file', 'local.file')
joinall(cmds, raise_error=True)
The above will create files ``local.file_host1`` where ``host1`` is the host name the file was copied from.

Expand Down Expand Up @@ -855,7 +855,7 @@ Clients for hosts that are no longer on the host list are removed on host list a
<..>
When wanting to reassign host list frequently, it is best to sort or otherwise ensure order is maintained to avoid reconnections on hosts that are still in the host list but in a different order.
When reassigning host list frequently, it is best to sort or otherwise ensure order is maintained to avoid reconnections on hosts that are still in the host list but in a different position.

For example, the following will cause reconnections on both hosts, though both are still in the list.

Expand Down
74 changes: 74 additions & 0 deletions doc/alternatives.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
Comparison With Alternatives
*****************************

There are not many alternatives for SSH libraries in Python. Of the few that do exist, here is how they compare with ``parallel-ssh``.

As always, it is best to use a tool that is suited to the task at hand. ``parallel-ssh`` is a library for programmatic and non-interactive use. If requirements do not match what it provides then it best not be used. Same applies for the tools described below.

Paramiko
________

The default SSH client library in ``parallel-ssh<=1.6.x`` series.

Pure Python code, while having native extensions as dependencies, with poor performance and numerous bugs compared to both OpenSSH binaries and the ``libssh2`` based native clients in ``parallel-ssh`` ``1.2.x`` and above. Recent versions have regressed in performance and have `blocker issues <https://github.com/ParallelSSH/parallel-ssh/issues/83>`_.

It does not support non-blocking mode, so to make it non-blocking monkey patching must be used which affects all other uses of the Python standard library.

Based on its use in historical ``parallel-ssh`` releases as well as `performance testing <https://parallel-ssh.org/post/parallel-ssh-libssh2>`_, paramiko is very far from being mature enough to be used.

This is why ``parallel-ssh`` has moved away from paramiko entirely since ``2.0.0`` where it was dropped as a dependency.

asyncssh
________

Pure Python ``asyncio`` framework using client library. License (`EPL`) is not compatible with GPL, BSD or other open source licenses and `combined works cannot be distributed <https://www.eclipse.org/legal/eplfaq.php#USEINANOTHER>`_.

Therefore unsuitable for use in many projects, including ``parallel-ssh``.

Fabric
______

Port of Capistrano from Ruby to Python. Intended for command line use and is heavily systems administration oriented rather than non-interactive library. Same maintainer as Paramiko.

Uses Paramiko and suffers from the same limitations. More over, uses threads for parallelisation, while `not being thread safe <https://github.com/fabric/fabric/issues/1433>`_, and exhibits very poor performance and extremely high CPU usage even for limited number of hosts - 1 to 10 - with scaling limited to one core.

Library API is non-standard, poorly documented and with numerous issues as API use is not intended.

Ansible
_______

A configuration management and automation tool that makes use of SSH remote commands. Uses, in parts, both Paramiko and OpenSSH binaries.

Similarly to Fabric, uses threads for parallelisation and suffers from the poor scaling that this model offers.

See `The State of Python SSH Libraries <https://parallel-ssh.org/post/ssh2-python/>`_ for what to expect from scaling SSH with threads, as compared `to non-blocking I/O <https://parallel-ssh.org/post/parallel-ssh-libssh2/>`_ with ``parallel-ssh``.

Again similar to Fabric, its intended and documented use is interactive via command line rather than library API based. It may, however, be an option if Ansible is already being used for automation purposes with existing playbooks, the number of hosts is small, and when the use case is interactive via command line.

``parallel-ssh`` is, on the other hand, a suitable option for Ansible as an SSH client that would improve its parallel SSH performance significantly.

ssh2-python
___________

Bindings for ``libssh2`` C library. Used by ``parallel-ssh`` as of ``1.2.0`` and is by same author.

Does not do parallelisation out of the box but can be made parallel via Python's ``threading`` library relatively easily and as it is a wrapper to a native library that releases Python's GIL, can scale to multiple cores.

``parallel-ssh`` uses ``ssh2-python`` in its native non-blocking mode with event loop and co-operative sockets provided by ``gevent`` for an extremely high performance library without the side-effects of monkey patching - see `benchmarks <https://parallel-ssh.org/post/parallel-ssh-libssh2>`_.

In addition, ``parallel-ssh`` uses native threads to offload CPU bound tasks like authentication in order to scale to multiple cores while still remaining non-blocking for network I/O.

``pssh.clients.native.SSHClient`` is a single host natively non-blocking client for users that do not need parallel capabilities but still want a fully featured client with native code performance.

Out of all the available Python SSH libraries, ``libssh2`` and ``ssh2-python`` have been shown, see benchmarks above, to perform the best with the least resource utilisation and ironically for a native code extension the least amount of dependencies. Only ``libssh2`` C library and its dependencies which are included in binary wheels.

However, it lacks support for some SSH features present elsewhere like GSS-API and certificate authentication.

ssh-python
__________

Bindings for ``libssh`` C library. A client option in ``parallel-ssh``, same author. Similar performance to ssh2-python above.

For non-blocking use, only certain functions are supported. SCP/SFTP in particular cannot be used in non-blocking mode, nor can tunnels.

Supports more authentication options compared to ``ssh2-python`` like GSS-API (Kerberos) and certificate authentication.
2 changes: 2 additions & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,8 @@ Single host client is also available with similar API.
advanced
api
clients
scaling
alternatives
Changelog
api_upgrade_2_0

Expand Down
11 changes: 11 additions & 0 deletions doc/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,14 @@ Or for developing changes:
pip install -r requirements_dev.txt
Python 2
--------

As of January 2021, Python 2 is no longer supported by the Python Software Foundation nor ``parallel-ssh`` - see `Sunset Python 2 <https://www.python.org/doc/sunset-python-2/>`_.

Versions of ``parallel-ssh<=2.4.0`` will still work.

Future releases are not guaranteed to be compatible or work at all with Python 2.

If your company requires Python 2 support contact the author directly at the email address on Github commits to discuss rates.
22 changes: 10 additions & 12 deletions doc/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Output::


Step by Step
-------------
============

Make a list or other iterable of the hosts to run on:

Expand Down Expand Up @@ -119,7 +119,7 @@ Standard output, aka ``stdout``, for a given :py:class:`HostOutput <pssh.output.
Iterating over ``stdout`` will only end when the remote command has finished unless interrupted.

The ``timeout`` keyword argument to ``run_command`` may be used to cause output generators to timeout if no output is received after the given number of seconds - see `join and output timeouts <advanced.html#join-and-output-timeouts>`_.
The ``read_timeout`` keyword argument to ``run_command`` may be used to cause reading to timeout if no output is received after the given number of seconds - see `join and output timeouts <advanced.html#join-and-output-timeouts>`_.

``stdout`` is a generator. To retrieve all of stdout can wrap it with list, per below.

Expand Down Expand Up @@ -176,8 +176,8 @@ First, ensure that all commands have finished by either joining on the output ob
.. code-block:: python
client.join(output)
for host, host_output in output:
print("Host %s exit code: %s" % (host, host_output.exit_code))
for host_output in output:
print("Host %s exit code: %s" % (host_output.host, host_output.exit_code))
As of ``1.11.0``, ``client.join`` is not required as long as output has been gathered.

Expand Down Expand Up @@ -235,8 +235,6 @@ To use files under a user's ``.ssh`` directory:

.. code-block:: python
import os
client = ParallelSSHClient(hosts, pkey='~/.ssh/my_pkey')
Expand Down Expand Up @@ -271,8 +269,8 @@ The helper function :py:func:`pssh.utils.enable_host_logger` will enable host lo
from pssh.utils import enable_host_logger
enable_host_logger()
output = client.run_command('uname')
client.join(output, consume_output=True)
client.run_command('uname')
client.join(consume_output=True)
:Output:
.. code-block:: python
Expand All @@ -288,10 +286,10 @@ The ``stdin`` attribute on :py:class:`HostOutput <pssh.output.HostOutput>` is a

.. code-block:: python
output = client.run_command('read')
output = client.run_command('read line; echo $line')
host_output = output[0]
stdin = host_output.stdin
stdin.write("writing to stdin\\n")
stdin.write("writing to stdin\n")
stdin.flush()
for line in host_output.stdout:
print(line)
Expand Down Expand Up @@ -325,8 +323,8 @@ With this flag, the ``exception`` output attribute will contain the exception on
:Output:
.. code-block:: python
host1: 0, None
host2: None, AuthenticationError <..>
Host host1: exit code 0, exception None
Host host2: exit code None, exception AuthenticationError <..>
.. seealso::

Expand Down
24 changes: 24 additions & 0 deletions doc/scaling.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
********
Scaling
********

Some guide lines on scaling ``parallel-ssh`` and pool size numbers.

In general, long lived commands with little or no output *gathering* will scale better. Pool sizes in the multiple thousands have been used successfully with little CPU overhead in the single thread running them in these use cases.

Conversely, many short lived commands with output gathering will not scale as well. In this use case, smaller pool sizes in the hundreds are likely to perform better with regards to CPU overhead in the event loop.

Multiple Python native threads, each of which can get its own event loop, may be used to scale this use case further as number of CPU cores allows. Note that ``parallel-ssh`` imports *must* be done within the target function of the newly started thread for it to receive its own event loop. ``gevent.get_hub()`` may be used to confirm that the worker thread event loop differs from the main thread.

Gathering is highlighted here as output generation does not affect scaling. Only when output is gathered either over multiple still running commands, or while more commands are being triggered, is overhead increased.

Technical Details
******************

To understand why this is, consider that in co-operative multi tasking, which is being used in this project via the ``gevent`` library, a co-routine (greenlet) needs to ``yield`` the event loop to allow others to execute - *co-operation*. When one co-routine is constantly grabbing the event loop in order to gather output, or when co-routines are constantly trying to start new short-lived commands, it causes contention with other co-routines that also want to use the event loop.

This manifests itself as increased CPU usage in the process running the event loop and reduced performance with regards to scaling improvements from increasing pool size.

On the other end of the spectrum, long lived remote commands that generate *no* output only need the event loop at the start, when they are establishing connections, and at the end, when they are finished and need to gather exit codes, which results in practically zero CPU overhead at any time other than start or end of command execution.

Output *generation* is done remotely and has no effect on the event loop until output is gathered - output buffers are iterated on. Only at that point does the event loop need to be held.
19 changes: 9 additions & 10 deletions pssh/clients/base/parallel.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,12 +88,7 @@ def hosts(self, _hosts):
def _check_host_config(self):
if self.host_config is None:
return
host_len = 0
try:
host_len = len(self.hosts)
except TypeError:
# Generator
return
host_len = len(self.hosts)
if host_len != len(self.host_config):
raise ValueError(
"Host config entries must match number of hosts if provided. "
Expand Down Expand Up @@ -169,8 +164,10 @@ def join_shells(self, shells, timeout=None):
finished_shells = [g.get() for g in finished]
unfinished_shells = list(set(shells).difference(set(finished_shells)))
if len(unfinished_shells) > 0:
raise Timeout("Timeout of %s sec(s) reached with commands "
"still running", timeout, finished_shells, unfinished_shells)
raise Timeout(
"Timeout of %s sec(s) reached with commands still running",
timeout, finished_shells, unfinished_shells,
)

def run_command(self, command, user=None, stop_on_errors=True,
host_args=None, use_pty=False, shell=None,
Expand Down Expand Up @@ -354,8 +351,10 @@ def join(self, output=None, consume_output=False, timeout=None,
if unfinished_cmds:
finished_output = self.get_last_output(cmds=finished_cmds)
unfinished_output = list(set.difference(set(output), set(finished_output)))
raise Timeout("Timeout of %s sec(s) reached with commands "
"still running", timeout, finished_output, unfinished_output)
raise Timeout(
"Timeout of %s sec(s) reached with commands still running",
timeout, finished_output, unfinished_output,
)

def _join(self, host_out, consume_output=False, timeout=None,
encoding="utf-8"):
Expand Down
9 changes: 5 additions & 4 deletions pssh/clients/base/single.py
Original file line number Diff line number Diff line change
Expand Up @@ -323,7 +323,10 @@ def auth(self):
def _password_auth(self):
raise NotImplementedError

def _pkey_auth(self, password=None):
def _pkey_auth(self, pkey_file, password=None):
raise NotImplementedError

def _open_session(self):
raise NotImplementedError

def open_session(self):
Expand Down Expand Up @@ -500,9 +503,7 @@ def copy_file(self, local_file, remote_file, recurse=False,
raise NotImplementedError

def _sftp_put(self, remote_fh, local_file):
with open(local_file, 'rb') as local_fh:
for data in local_fh:
self._eagain(remote_fh.write, data)
raise NotImplementedError

def sftp_put(self, sftp, local_file, remote_file):
raise NotImplementedError
Expand Down
Loading

0 comments on commit 0af3d23

Please sign in to comment.