Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integration tests for MFA-enabled SSH/Slurm container #60

Closed
wants to merge 1 commit into from

Conversation

ml-evs
Copy link
Member

@ml-evs ml-evs commented Jan 31, 2024

Related to #58.

This PR adds a test environment for an MFA-enabled SSH/Slurm worker. It uses the google-authenticator-libpam wrapper for Ubuntu to set up time-based MFA on the default jobflow user, with a pre-defined emergency backup code (which can only be used for one login without container rebuild).

Currently, the tests fail, but should probably be upgraded to instead ensure that sensible errors are returned for the MFA enabled worker.

This is hopefully a somewhat realistic emulation of an MFA-enabled supercomputer (perhaps excluding the backup codes), and can be used to test potential future workarounds involving multiplexing or otherwise.

Note, as stated in the Dockerfile.slurm.mfa header, although the two integration test dockerfiles are the same except for the final build stage, as the Docker Python SDK does not support BuildKit (docker/docker-py#2230), we cannot use multi-stage builds here.

@ml-evs
Copy link
Member Author

ml-evs commented Jan 31, 2024

Results of some testing with paramiko and oath:

  1. oath can successfully programmatically generate TOTP codes given the initial secret key/QR code you are provided (e.g., I was able to ssh into jobflow-mfa container using an oath code rather than a backup code). This would require storing the secret key provided by the server (which presumably is exactly what apps like the Google Authenticator do; obviously this would need to be done securely but perhaps this isn't any worse than allowing jobflow to use ssh keys?)
  2. Paramiko does not support password + 2FA verification, instead only relying on pubkey + 2FA. With some gross hacking, I was able to connect to the jobflow-mfa container using a backup code with paramiko, but could not keep the connection open long enough to run commands. I managed to get it to this point by overriding the detection of "partial" auth to work with passwords:
  def _parse_userauth_failure(self, m):
      authlist = m.get_list()
      # TODO 4.0: we aren't giving callers access to authlist _unless_ it's
      # partial authentication, so eg authtype=none can't work unless we
      # tweak this.
      partial = m.get_boolean()
      if partial:
          self._log(INFO, "Authentication continues...")
          self._log(DEBUG, "Methods: " + str(authlist))
          self.transport.saved_exception = PartialAuthentication(authlist)

(https://github.com/paramiko/paramiko/blob/eb470b129ce0651102aec6340a08190fe393b94e/paramiko)/auth_handler.py#L740-L744)

It might be possible with the help of paramiko devs to enable this route, otherwise we could make a specialised SSHClient for this case if password + 2FA token is going to be a common thing.

If 2. was implemented, I think this would give us everything required to use 2FA (as long as its based on OATH), as jobflow could mimic the user running an app on a mobile phone for auth. Of course, the problem with this is is that it presumably breaks the usage terms of the supercomputers that use 2FA (which in some cases might be actually illegal too? 😅)

@gpetretto
Copy link
Contributor

I managed to connect with password+OPT using paramiko with this code:

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("mfahost.example.com", 22))
ts = paramiko.Transport(sock)
ts.start_client(timeout=10)
ts.auth_interactive_dumb(user)

from here. This prompts for both. I did not test it, but in principle it should be possible to replace auth_interactive_dumb with auth_interactive and define a handler that inserts password and OTP (as suggested in another answer to the same question). I was investigating this to reduce the constraints of the solution I mentioned in the issue.

However, I would rather not offer a standardized way of automatically generating the OTP by storing the secret locally, unless this is a policy generally approved by the clusters. I imagine that such an approach will make the OTP generation pointless from the point of view of the administrators. While we cannot prevent individual users from hacking their way through, the administrators will likely disapprove if we actively document and support tools that break their policies. Maybe we can get in touch with the administrators of a few computing centers and check before proceeding with this approach?

@ml-evs
Copy link
Member Author

ml-evs commented Feb 2, 2024

I managed to connect with password+OPT using paramiko with this code:

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("mfahost.example.com", 22))
ts = paramiko.Transport(sock)
ts.start_client(timeout=10)
ts.auth_interactive_dumb(user)

Nice, my attempts were to essentially trigger the call to the "dumb" route using the normal SSHClient but I didn't really get anywhere.

However, I would rather not offer a standardized way of automatically generating the OTP by storing the secret locally, unless this is a policy generally approved by the clusters. I imagine that such an approach will make the OTP generation pointless from the point of view of the administrators. While we cannot prevent individual users from hacking their way through, the administrators will likely disapprove if we actively document and support tools that break their policies. Maybe we can get in touch with the administrators of a few computing centers and check before proceeding with this approach?

I think reaching out is a good idea. Whilst this should never be the default, I'm sure some enterprising individuals would try this anyway so it might be good to have a proper encrypted store version built-in to jobflow. If the first login when launching a runner still required the TOTP then this is somehow spiritually the same as trusting Google or whoever with your secret key...

@ml-evs
Copy link
Member Author

ml-evs commented Jun 3, 2024

This has gone stale

@ml-evs ml-evs closed this Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants