-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling Multi-Factor Authentication for workers #58
Comments
One "simple" approach that might work here (and that I have seen suggested on some machines) is to use SSH multiplexing, so that once an authenticated connection is created (by the user, with TOTP or whatever), then the connection is kept open and all further connections within the session go through it. This is handled with a simple socket file, so Paramiko/Fabric etc. should just seamlessly work too. This would require some structure whereby JFR notifies the user that a new TOTP code is required to keep monitoring jobs, and will depend on how stringently machines enforce these timeouts (in practice, the timeout can be infinite...). The relevant SSH config parameters are |
Also related is the previous discussion at Matgenix/qtoolkit#14 |
Would this also require a setup without password? We have both, password and MFA at the moment. Using a key pair does not help with the password. |
I think in general it would not require a password (at least from my tests). The problem is that unfortunately it seems that multiplexing is not (yet) supported by paramiko: paramiko/paramiko#852. The fact that jobflow-remote should keep the connection with the host open would allow to pass the OTP when the Runner starts. |
As an aside, I've just pushed #60 which can be used as a test bed for some of these approaches (both by manually building and launching the MFA-enabled Slurm container and testing locally, and by the eventual full JFR automation...). For now we should at least add clear error messages and a docs page about this until we have a real solution. |
Also, I'm going to assume this isn't the case for the supercomputers in question, Again, at least for Cambridge, resetting TOTP requires a video call where you show Government ID, which I assume we don't want to try to spoof 😅 |
I have a solution that "works" under certain conditions:
These can be relatively strict, but given the above limitations, I have tested JFR with a simple VM with a MFA based on google authenticator. Just setting an OTP as The limitation on having a single password prompted is not strictly a limitation for paramiko, but as far as I have seen it is not possible for fabric with built-in options. I would need a bit more time to check how to use the lower level paramiko machinery to properly set up the fabric Connection in that case. I agree that it would be better not to mess with the token generation. I suppose in some cases this could lead to a ban from the cluster. |
As an update, I managed to create a fabric connection even with password+OTP. It is a bit involved, but should be possible to implement it in jobflow-remote, if needed. |
I am still testing with the cluster support to see if the key-pair connection could at least allow for a passwordless connection. They think it should work but, in practice, it does not work yet... I will keep you updated. |
Update on this topic. I have managed to implement a solution to address this issue. In case anyone else is interested it can be found in this branch: https://github.com/Matgenix/jobflow-remote/tree/interactive. I will merge it after testing it more. The idea is the following: if an OTP needs to be provided, when the daemon is started, the CLI will then allow to connect to the daemon process and interact with it (through supervisor's "foreground" option). In this specific case the Runner will immediately try to connect to the remote host and the user will be prompted for password (if requested) and OTP. This of course still has some of the limitations listed above:
I should add that the administrators of one computing center told us that storing the secret locally (even encrypted) is not considered an acceptable procedure for them. So I am afraid that the main limitations will remain for the moment. |
Following question from @JaGeo, opening here an issue on the MFA topic. Let's gather ideas, info, existing solutions, problems, ... related to the fact that clusters are slowly (or maybe rapidly ?) moving to MFA authentication.
@gpetretto did some tests (could you maybe summarize insights here ?)
The text was updated successfully, but these errors were encountered: