Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add entries for DRAC cluster during mila init [MT-61] #54

Merged
merged 48 commits into from
Jan 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
e3cb1c5
Add SSH entries for the DRAC clusters
lebrice Aug 29, 2023
fbbef41
Update tests for mila init to add DRAC setup
lebrice Aug 31, 2023
3fa6b1c
Also setup the passwordless authentication to DRAC
lebrice Aug 31, 2023
9dcc4c0
Bug: Unable to check passwordless SSH on DRAC
lebrice Sep 25, 2023
68b3575
Re-add the "User" in DRAC entries
lebrice Sep 25, 2023
19b1341
Fix isort issue
lebrice Nov 13, 2023
0fe8d83
Fix test broken because of additional prompts
lebrice Nov 28, 2023
81449d4
Simplify the code a bit, show entries in dict
lebrice Nov 30, 2023
18a40a1
`mila init` sets up SSH access to DRAC
lebrice Dec 10, 2023
36d2334
Add explicit `timeout` to `run` method, fix tests
lebrice Dec 10, 2023
329c367
Fix ordering of entries in windows config from WSL
lebrice Dec 11, 2023
753c490
Fix pre-commit issues
lebrice Dec 18, 2023
653a98f
Tweak ssh-keygen call and test timeout
lebrice Dec 18, 2023
bc621ef
Fix bug in `create_ssh_keypair`
lebrice Dec 18, 2023
9f82833
Dont add SSH multiplexing for DRAC compute nodes
lebrice Dec 18, 2023
8c0ef8c
Fix issue with create_ssh_keypair on Windows
lebrice Jan 11, 2024
4fc6bce
Try to make the ssh-keygen work on Windows in CI
lebrice Jan 15, 2024
aa4eeb5
Fix issue(?) with ssh-keygen on Windows
lebrice Jan 16, 2024
e06c254
Fix weird issues with test_local.py on Windows
lebrice Jan 16, 2024
0dced13
Fix test for create_ssh_keypair on Windows
lebrice Jan 16, 2024
98670da
Simplify redundant use of xfail mark
lebrice Jan 16, 2024
2c28618
Move test_check_passwordless to test_local.py
lebrice Jan 16, 2024
15bf96b
Move init command steps to init_command.py
lebrice Jan 17, 2024
17c75f3
Simplify check_passwordless and add more tests
lebrice Jan 17, 2024
82465e0
Add link to DRAC website for passwordless SSH
lebrice Jan 17, 2024
ca8b6f5
Greatly simplify check_passwordless
lebrice Jan 17, 2024
3009e89
Update `mila init`, setup passworless SSH to DRAC
lebrice Jan 17, 2024
75be8c8
Add failing test stub
lebrice Jan 17, 2024
2d1a19e
Add tests for setup_passwordless_ssh_to_cluster
lebrice Jan 18, 2024
78814e0
Move common fixtures to common.py
lebrice Jan 18, 2024
6456887
Add a test for _get_drac_username
lebrice Jan 18, 2024
8fc8fae
Fix issue with shutil.copytree for py3.7
lebrice Jan 18, 2024
a7b3b5c
Add integration test for setup_passwordless_ssh
lebrice Jan 18, 2024
85d8111
Remove outdated comments
lebrice Jan 18, 2024
f09469f
Fix issue with check_passwordless, fix test
lebrice Jan 19, 2024
2933fe2
Add comments in test
lebrice Jan 19, 2024
3a3ba14
Replace computecanada.ca with alliancecan.ca
lebrice Jan 19, 2024
4dc62ed
Fix bug in test_setup_passwordless_ssh_access
lebrice Jan 19, 2024
d44a216
Increase timeout value to try to help Win issue
lebrice Jan 19, 2024
bdd187d
Increase timeout value for a test
lebrice Jan 19, 2024
e4c8c8a
Increase timeout for test_create_ssh_keypair
lebrice Jan 19, 2024
7d39086
Fix issue with setup_passwordless_ssh on Windows
lebrice Jan 22, 2024
762b25d
Remove the check for ssh access to niagara
lebrice Jan 22, 2024
61ec17e
Make an ssh key for Mila and Drac clusters
lebrice Jan 22, 2024
6ea36d4
Fix tests for setup_passwordless_ssh
lebrice Jan 22, 2024
146d66e
Revert "Fix tests for setup_passwordless_ssh"
lebrice Jan 22, 2024
ed2955a
Revert "Make an ssh key for Mila and Drac clusters"
lebrice Jan 22, 2024
1202630
Increase the timeout value in test_init_command.py
lebrice Jan 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 9 additions & 111 deletions milatools/cli/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,13 @@
from urllib.parse import urlencode

import questionary as qn
from invoke.exceptions import UnexpectedExit
from typing_extensions import TypedDict

from ..version import version as mversion
from .init_command import (
create_ssh_keypair,
print_welcome_message,
setup_keys_on_login_node,
setup_passwordless_ssh_access,
setup_ssh_config,
setup_vscode_settings,
setup_windows_ssh_config_from_wsl,
Expand All @@ -47,13 +48,13 @@
randname,
running_inside_WSL,
with_control_file,
yn,
)

logger = get_logger(__name__)
if typing.TYPE_CHECKING:
from typing_extensions import Unpack

logger = get_logger(__name__)


def main():
if sys.platform != "win32" and get_fully_qualified_name().endswith(
Expand Down Expand Up @@ -139,12 +140,12 @@ def mila():
intranet_parser.set_defaults(function=intranet)

# ----- mila init ------

init_parser = subparsers.add_parser(
"init",
help="Set up your configuration and credentials.",
formatter_class=SortingHelpFormatter,
)

init_parser.set_defaults(function=init)

# ----- mila forward ------
Expand Down Expand Up @@ -396,7 +397,6 @@ def init():
print("Checking ssh config")

ssh_config = setup_ssh_config()
print("# OK")

# if we're running on WSL, we actually just copy the id_rsa + id_rsa.pub and the
# ~/.ssh/config to the Windows ssh directory (taking care to remove the
Expand All @@ -405,116 +405,14 @@ def init():
if running_inside_WSL():
setup_windows_ssh_config_from_wsl(linux_ssh_config=ssh_config)

setup_passwordless_ssh_access()
success = setup_passwordless_ssh_access(ssh_config=ssh_config)
if not success:
exit()
setup_keys_on_login_node()
setup_vscode_settings()
print_welcome_message()


def setup_passwordless_ssh_access():
print("Checking passwordless authentication")

here = Local()

# Check that there is an id file
ssh_private_key_path = Path.home() / ".ssh" / "id_rsa"

sshdir = os.path.expanduser("~/.ssh")
if not any(
entry.startswith("id") and entry.endswith(".pub")
for entry in os.listdir(sshdir)
):
if yn("You have no public keys. Generate one?"):
# Run ssh-keygen with the given location and no passphrase.
create_ssh_keypair(ssh_private_key_path, here)
else:
exit("No public keys.")

# Check that it is possible to connect using the key

if not here.check_passwordless("mila"):
if yn(
"Your public key does not appear be registered on the cluster. Register it?"
):
# NOTE: If we're on a Windows machine, we do something different here:
if sys.platform == "win32":
command = (
"powershell.exe type $env:USERPROFILE\\.ssh\\id_rsa.pub | ssh mila "
'"cat >> ~/.ssh/authorized_keys"'
)
here.run(command)
else:
here.run("ssh-copy-id", "mila")
if not here.check_passwordless("mila"):
exit("ssh-copy-id appears to have failed")
else:
exit("No passwordless login.")


def setup_keys_on_login_node():
print("Checking connection to compute nodes")

remote = Remote("mila")
try:
pubkeys = remote.get_lines("ls -t ~/.ssh/id*.pub")
print("# OK")
except UnexpectedExit:
print("# MISSING")
if yn("You have no public keys on the login node. Generate them?"):
# print("(Note: You can just press Enter 3x to accept the defaults)")
# _, keyfile = remote.extract(
# "ssh-keygen",
# pattern="Your public key has been saved in ([^ ]+)",
# wait=True,
# )
private_file = "~/.ssh/id_rsa"
remote.run(f'ssh-keygen -q -t rsa -N "" -f {private_file}')
pubkeys = [f"{private_file}.pub"]
else:
exit("Cannot proceed because there is no public key")

common = remote.with_bash().get_output(
"comm -12 <(sort ~/.ssh/authorized_keys) <(sort ~/.ssh/*.pub)"
)
if common:
print("# OK")
else:
print("# MISSING")
if yn(
"To connect to a compute node from a login node you need one id_*.pub to "
"be in authorized_keys. Do it?"
):
pubkey = pubkeys[0]
remote.run(f"cat {pubkey} >> ~/.ssh/authorized_keys")
else:
exit("You will not be able to SSH to a compute node")


def print_welcome_message():
print(T.bold_cyan("=" * 60))
print(T.bold_cyan("Congrats! You are now ready to start working on the cluster!"))
print(T.bold_cyan("=" * 60))
print(T.bold("To connect to a login node:"))
print(" ssh mila")
print(T.bold("To allocate and connect to a compute node:"))
print(" ssh mila-cpu")
print(T.bold("To open a directory on the cluster with VSCode:"))
print(" mila code path/to/code/on/cluster")
print(T.bold("Same as above, but allocate 1 GPU, 4 CPUs, 32G of RAM:"))
print(" mila code path/to/code/on/cluster --alloc --gres=gpu:1 --mem=32G -c 4")
print()
print(
"For more information, read the milatools documentation at",
T.bold_cyan("https://github.com/mila-iqia/milatools"),
"or run `mila --help`.",
"Also make sure you read the Mila cluster documentation at",
T.bold_cyan("https://docs.mila.quebec/"),
"and join the",
T.bold_green("#mila-cluster"),
"channel on Slack.",
)


def forward(
remote: str,
page: str | None,
Expand Down
Loading
Loading