Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to ubuntu for GCP and avoid key pair checking #1641

Closed
wants to merge 16 commits into from

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Jan 27, 2023

Prerequisite: skypilot-org/skypilot-catalog#25

Main changes:

  1. Have GCP default to ubuntu image instead of Debian to fix Python version mismatch when migrating from AWS to GCP #1604
  2. Fix the ssh connection issue caused by control master for GCP with ubuntu image
    Before the fix, the ssh connection will be disconnected when ray trying to setup the runtime dependencies, which is probably because the ssh connection is unstable when the cluster is just provisioned [autoscaler][gcp] "ray up" on GCP isn't working ray-project/ray#16539 (comment). We added retry for the ssh commands executed by ray up, which is ok since our setup commands are idempotent.
  3. Refactor the authentication
  4. Use the cloud-init to setup the key pairs for GCP, to avoid the permissions to check the key_pairs.
  5. Use the cloud-init to disable the apt unattended-upgrade, to avoid the lock for apt install (Note: cloud-init is not installed on debian image, so we could not do it for our previous debian image)

This PR may cause some backward compatibility issue for GCP user who created images from our previous debian-based image, as the cloud-init is not installed on those images. Trying to fix that now. It should now be fixed.

A good thing about the new image is that the ray==2.4.0 is installed by default in the image, the provision time would be faster if #1734 is merged.

Tested (run the relevant ones):

  • sky launch --cloud gcp --gpus t4 test.yaml
    setup: |
      echo "setup"
      for i in `seq 1 10000000`; do
        echo $i
        sleep 1
      done
    
  • sky launch --cloud gcp --gpus k80 test.yaml
    setup: |
      echo "setup"
      for i in `seq 1 10000000`; do
        echo $i
        sleep 1
      done
    
  • sky launch -c test-new-image-1 --cloud gcp echo hi
  • sky launch -c test-old-image --cloud gcp echo hi --image-id projects/deeplearning-platform-release/global/images/common-cpu-v20221215
  • for i in seq 5; do pytest tests/test_smoke.py::test_cancel_pytorch --terminate-on-failure; done makes sure that the wait time is enough (the magic number in the head_start_ray_commands)
  • All smoke tests: pytest tests/test_smoke.py
  • pytest tests/test_smoke.py --aws
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

@Michaelvll Michaelvll marked this pull request as draft January 27, 2023 23:29
@Michaelvll
Copy link
Collaborator Author

The deep learning ubuntu image on GCP does not work well, as the ssh connection will be disconnected when the VM is firstly booted. Here are several things we have tried:

  1. The ubuntu-os-cloud image does not have the problem, but it does not have the NVIDIA driver installed. We have to install it ourselves (here), which will take a while.
  2. Changing the ServerAliveInternal and ClientAliveInterval does not work well, i.e. the cluster will still be disconnected.
  3. Removing the control path helps a bit with the disconnection problem, but the problem still exists and it will make the ssh slower
  4. Adding retry for the ssh command runner for ray will help the ray up, but the ssh commands by our SkyPilot could still fail due to the same disconnection issue.

Due to the investigation above, we decided to defer this PR and probably host our own image in the future instead (e.g. we can create an image based on the ubuntu-os-cloud).

@Michaelvll Michaelvll marked this pull request as ready for review May 10, 2023 02:39
@Michaelvll
Copy link
Collaborator Author

Michaelvll commented May 10, 2023

Just figured out that the interruption of the ssh was caused by the async installation of the nvidia-driver, which will restart the server after it finishes. We now wait for the nvidia-driver to be installed before start running the user's program, and it works.

use ubuntu

fix

Disable control master for GCP

fix

test no control master and longer alive interval

retry run

command runner with retry

add sleep

retry for ray status

Longer timeout

avoid control persist

format

set ClientAliveInterval

revert command runner

revert command_runner

fix ssh disconnection issue

format

fallback to debian for K80

fix k80

Backward compatibility for old debian based images

minor fix

Fix TPU VM

add clientaliveinterval

add markers for specific clouds

uninstall before install

uninstall -y

azure uninstall

longer wait time

remove apt related setting

fix ssh name
Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Michaelvll! Still need to read the PR. But I just tried it out with a previously launched & autostopped GCP controller, and got:

» sky spot launch echo hi --cloud gcp
...
I 05-14 09:33:18 cloud_vm_ray_backend.py:1033] Cluster 'sky-spot-controller-8a3968f2' (status: STOPPED) was previously launched in GCP us-central1. Relaunching in that region.
...

  File "/Users/zongheng/Dropbox/workspace/riselab/sky-computing/sky/backends/cloud_vm_ray_backend.py", line 1259, in _retry_zones
    keep_launch_fields_in_existing_config=cluster_exists)
  File "/Users/zongheng/Dropbox/workspace/riselab/sky-computing/sky/utils/common_utils.py", line 241, in _record
    return f(*args, **kwargs)
  File "/Users/zongheng/Dropbox/workspace/riselab/sky-computing/sky/backends/backend_utils.py", line 797, in write_cluster_config
    to_provision, region, zones)
  File "/Users/zongheng/Dropbox/workspace/riselab/sky-computing/sky/clouds/gcp.py", line 356, in make_deploy_resources_variables
    assert image_id is not None, (image_id, r)
AssertionError: (None, GCP(n2-standard-4, disk_size=50))

@@ -177,7 +177,7 @@ setup_commands:
(type -a pip | grep -q pip3) || echo 'alias pip=pip3' >> ~/.bashrc;
(which conda > /dev/null 2>&1 && conda init > /dev/null) || (wget -nc https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && bash Miniconda3-latest-Linux-x86_64.sh -b && eval "$(~/miniconda3/bin/conda shell.bash hook)" && conda init && conda config --set auto_activate_base true);
source ~/.bashrc;
(pip3 list | grep ray | grep {{ray_version}} 2>&1 > /dev/null || pip3 install -U ray[default]=={{ray_version}}) && mkdir -p ~/sky_workdir && mkdir -p ~/.sky/sky_app;
(pip3 list | grep ray | grep {{ray_version}} 2>&1 > /dev/null || pip3 uninstall -y ray ray-cpp && pip3 install -U ray[default]=={{ray_version}}) && mkdir -p ~/sky_workdir && mkdir -p ~/.sky/sky_app;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: since we do a grep ray ... || ..., this means ray is not found. Why is pip3 uninstall necessary? Why add ray-cpp?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the grep {{ray_version}} to check the correct ray version is installed on the remote VM. It is possible that ray with another version is pre-installed on the VM. For example, the ubuntu image on GCP has the ray==2.4.0 installed, which will cause pip problem if we directly pip install -U ray[default]==2.0.1 (causing the ray package corrupted.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Can we copy this comment into the j2 files? Something like "Ensure only one Ray version (which is our ray_version) is installed, regardless of if the image comes pre-installed with another Ray version."

Comment on lines 298 to 299
# We use the debian image, as the ubuntu image has some connectivity
# issue when first booted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remnant?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using ubuntu image now, we use grep ubuntu-2004 to find the correct images.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, meant that

        # We use the debian image, as the ubuntu image has some connectivity
        # issue when first booted.

could be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, good catch! Removed. Thanks!

Copy link
Collaborator Author

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @concretevitamin! This PR requires the PR for adding the ubuntu image in GCP catalog skypilot-org/skypilot-catalog#25. I am planning to increase the version number of the catalog, to make sure users will download the latest catalog automatically to avoid the issue of staled catalog.

Comment on lines 298 to 299
# We use the debian image, as the ubuntu image has some connectivity
# issue when first booted.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using ubuntu image now, we use grep ubuntu-2004 to find the correct images.

@@ -177,7 +177,7 @@ setup_commands:
(type -a pip | grep -q pip3) || echo 'alias pip=pip3' >> ~/.bashrc;
(which conda > /dev/null 2>&1 && conda init > /dev/null) || (wget -nc https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && bash Miniconda3-latest-Linux-x86_64.sh -b && eval "$(~/miniconda3/bin/conda shell.bash hook)" && conda init && conda config --set auto_activate_base true);
source ~/.bashrc;
(pip3 list | grep ray | grep {{ray_version}} 2>&1 > /dev/null || pip3 install -U ray[default]=={{ray_version}}) && mkdir -p ~/sky_workdir && mkdir -p ~/.sky/sky_app;
(pip3 list | grep ray | grep {{ray_version}} 2>&1 > /dev/null || pip3 uninstall -y ray ray-cpp && pip3 install -U ray[default]=={{ray_version}}) && mkdir -p ~/sky_workdir && mkdir -p ~/.sky/sky_app;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the grep {{ray_version}} to check the correct ray version is installed on the remote VM. It is possible that ray with another version is pre-installed on the VM. For example, the ubuntu image on GCP has the ray==2.4.0 installed, which will cause pip problem if we directly pip install -U ray[default]==2.0.1 (causing the ray package corrupted.)

Michaelvll

This comment was marked as duplicate.

Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, haven't got the chance to test it due to the V6 catalog.

sky/authentication.py Outdated Show resolved Hide resolved
if image_id.startswith('skypilot:'):
return DEFAULT_GCP_IMAGE_GB
@classmethod
def get_image_infos(cls, image_id) -> Dict[str, Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: get_image_info() and s/infos/info elsewhere?

def get_image_size(self, image_id: str, region: Optional[str]) -> float:
del region # unused
if image_id.startswith('skypilot:'):
return DEFAULT_GCP_IMAGE_GB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: how do we guarantee that the ubuntu & debian tags have the same size, DEFAULT_GCP_IMAGE_GB?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image size can be got using gcloud compute images describe projects/deeplearning-platform-release/global/images/common-cu113-v20230501-ubuntu-2004-py37, and it seems both of them have the same size of 50GB. Added a comment for the hack.

Comment on lines 272 to 273
if 'diskSizeGb' not in image_infos:
return DEFAULT_GCP_IMAGE_GB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reading get_image_size() for the first time and this seems a bit unclear. If user passes in a custom image that does not have diskSizeGb, returning a default 50GB seems like a guess (maybe ok), rather than we the func name suggests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the images should have that field, but this is just a safeguard to avoid the function raising the error, as the image size check is not critical. Added a comment.

Comment on lines 298 to 299
# We use the debian image, as the ubuntu image has some connectivity
# issue when first booted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, meant that

        # We use the debian image, as the ubuntu image has some connectivity
        # issue when first booted.

could be removed.

sky/templates/gcp-ray.yml.j2 Show resolved Hide resolved
@@ -170,8 +218,8 @@ setup_commands:
test -f /home/gcpuser/miniconda3/etc/profile.d/conda.sh && source /home/gcpuser/miniconda3/etc/profile.d/conda.sh && conda activate base || true;
pip3 install --upgrade google-api-python-client;
{%- endif %}
(pip3 list | grep ray | grep {{ray_version}} 2>&1 > /dev/null || pip3 install -U ray[default]=={{ray_version}}) && mkdir -p ~/sky_workdir && mkdir -p ~/.sky/sky_app;
(pip3 list | grep skypilot && [ "$(cat {{sky_remote_path}}/current_sky_wheel_hash)" == "{{sky_wheel_hash}}" ]) || (pip3 uninstall skypilot -y; pip3 install "$(echo {{sky_remote_path}}/{{sky_wheel_hash}}/skypilot-{{sky_version}}*.whl)[gcp]" && echo "{{sky_wheel_hash}}" > {{sky_remote_path}}/current_sky_wheel_hash || exit 1);
(pip3 list | grep ray | grep {{ray_version}} 2>&1 > /dev/null && python3 -c "import ray" || pip3 uninstall -y ray ray-cpp && pip3 install -U ray[default]=={{ray_version}}) && mkdir -p ~/sky_workdir && mkdir -p ~/.sky/sky_app;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: GCP j2 has an extra "python3 -c "import ray" vs. other j2's. Any reason?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is because the reboot caused by the nvidia-driver installation can cause the pip install interrupted during execution, leading to a corrupted installation of ray. The python3 -c "import ray" is to check the integrity of the ray package installation. Added a comment above. Thanks!

Comment on lines 249 to 251
# Magic number for waiting the nvidia driver to be ready and the instance
# to be rebooted.
sleep 18
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Magic number for waiting the nvidia driver to be ready and the instance
# to be rebooted.
sleep 18
# Magic number for waiting for the nvidia driver to be ready and the instance
# to be rebooted.
sleep 18

By magic number, do you mean it's arbitrarily chosen?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, sorry for the confusion. The number was chosen by binary search, finding the smallest number that will make sure the ssh connection issue not happens for the later commands.

sky/templates/gcp-ray.yml.j2 Outdated Show resolved Hide resolved
head_start_ray_commands:
# Start skylet daemon. (Should not place it in the head_setup_commands, otherwise it will run before sky is installed.)
# NOTE: --disable-usage-stats in `ray start` saves 10 seconds of idle wait.
# Line "which prlimit ..": increase the limit of the number of open files for the raylet process, as the `ulimit` may not take effect at this point, because it requires
# all the sessions to be reloaded. This is a workaround.
{%- if gpu is not none %}
- |
echo "Installing NVIDIA GPU driver." >> ~/.sky/nvlog
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the reason we need to do this some peculiarities of the ubuntu image?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the reason we need this is that for the ubuntu image, after the nvidia-driver is installed, the machine will be rebooted, causing the ssh connection issue for the later user script, and the ray cluster not existing problem. We have this here to make sure the reboot is finished before we proceed to the next stages.

Copy link
Collaborator Author

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized I have some pending comments. Will merge the master branch and test this PR with smoke test again:

  • pytest tests/test_smoke.py

sky/authentication.py Outdated Show resolved Hide resolved
def get_image_size(self, image_id: str, region: Optional[str]) -> float:
del region # unused
if image_id.startswith('skypilot:'):
return DEFAULT_GCP_IMAGE_GB
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image size can be got using gcloud compute images describe projects/deeplearning-platform-release/global/images/common-cu113-v20230501-ubuntu-2004-py37, and it seems both of them have the same size of 50GB. Added a comment for the hack.

Comment on lines 272 to 273
if 'diskSizeGb' not in image_infos:
return DEFAULT_GCP_IMAGE_GB
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the images should have that field, but this is just a safeguard to avoid the function raising the error, as the image size check is not critical. Added a comment.

Comment on lines 298 to 299
# We use the debian image, as the ubuntu image has some connectivity
# issue when first booted.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, good catch! Removed. Thanks!

sky/clouds/gcp.py Outdated Show resolved Hide resolved
head_start_ray_commands:
# Start skylet daemon. (Should not place it in the head_setup_commands, otherwise it will run before sky is installed.)
# NOTE: --disable-usage-stats in `ray start` saves 10 seconds of idle wait.
# Line "which prlimit ..": increase the limit of the number of open files for the raylet process, as the `ulimit` may not take effect at this point, because it requires
# all the sessions to be reloaded. This is a workaround.
{%- if gpu is not none %}
- |
echo "Installing NVIDIA GPU driver." >> ~/.sky/nvlog
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the reason we need this is that for the ubuntu image, after the nvidia-driver is installed, the machine will be rebooted, causing the ssh connection issue for the later user script, and the ray cluster not existing problem. We have this here to make sure the reboot is finished before we proceed to the next stages.

Comment on lines 249 to 251
# Magic number for waiting the nvidia driver to be ready and the instance
# to be rebooted.
sleep 18
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, sorry for the confusion. The number was chosen by binary search, finding the smallest number that will make sure the ssh connection issue not happens for the later commands.

sky/templates/gcp-ray.yml.j2 Outdated Show resolved Hide resolved
{%- if gpu is not none %}
{%- if 'tesla-k80' in gpu %}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, good catch! It should be if... else. Will run the tests again.

Comment on lines 79 to 83
- path: /etc/ssh/sshd_config
append: true
content: |
ClientAliveInterval 720
ClientAliveCountMax 720
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that is a remnant. Just tested it again without it, and it seems working well. Removed. Thanks!

@cblmemo cblmemo mentioned this pull request Jul 10, 2023
5 tasks
Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this feature! 🚀 Just went through the code and it looks good to me except for some minor details. Left some comments here:

sky/authentication.py Show resolved Hide resolved
SSHCommandRunner,
)

class SSHCommandRunnerWithRetry(SSHCommandRunner):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe consider moving this to sky/skylet/providers/command_runner.py after #1910 is merged?

if _image_df[_image_df['Tag'] == 'skypilot:cpu-ubuntu-2004'].empty:
# Update the image catalog if it does not include the updated images
# https://github.com/skypilot-org/skypilot-catalog/pull/25.
_image_df = common.read_catalog('gcp/images.csv', pull_frequency_hours=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure, after increasing our version number, we don't need this right?

@@ -187,7 +186,7 @@ setup_commands:
(type -a pip | grep -q pip3) || echo 'alias pip=pip3' >> ~/.bashrc;
(which conda > /dev/null 2>&1 && conda init > /dev/null) || (wget -nc https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && bash Miniconda3-latest-Linux-x86_64.sh -b && eval "$(~/miniconda3/bin/conda shell.bash hook)" && conda init && conda config --set auto_activate_base true);
source ~/.bashrc;
(pip3 list | grep "ray " | grep {{ray_version}} 2>&1 > /dev/null || pip3 install --exists-action w -U ray[default]=={{ray_version}}) && mkdir -p ~/sky_workdir && mkdir -p ~/.sky/sky_app;
(pip3 list | grep "ray " | grep {{ray_version}} 2>&1 > /dev/null || pip3 uninstall -y ray ray-cpp && pip3 install --exists-action w -U ray[default]=={{ray_version}}) && mkdir -p ~/sky_workdir && mkdir -p ~/.sky/sky_app;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a comment like sky/templates/gcp-ray.yml.j2, L192?

@@ -126,7 +126,7 @@ setup_commands:
(type -a pip | grep -q pip3) || echo 'alias pip=pip3' >> ~/.bashrc;
which conda > /dev/null 2>&1 || (wget -nc https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && bash Miniconda3-latest-Linux-x86_64.sh -b && eval "$(/home/azureuser/miniconda3/bin/conda shell.bash hook)" && conda init && conda config --set auto_activate_base true);
source ~/.bashrc;
(pip3 list | grep "ray " | grep {{ray_version}} 2>&1 > /dev/null || pip3 install --exists-action w -U ray[default]=={{ray_version}}) && mkdir -p ~/sky_workdir && mkdir -p ~/.sky/sky_app && touch ~/.sudo_as_admin_successful;
(pip3 list | grep "ray " | grep {{ray_version}} 2>&1 > /dev/null || pip3 uninstall -y ray ray-cpp && pip3 install --exists-action w -U ray[default]=={{ray_version}}) && mkdir -p ~/sky_workdir && mkdir -p ~/.sky/sky_app && touch ~/.sudo_as_admin_successful;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before

(pip3 list | grep "ray " | grep {{ray_version}} 2>&1 > /dev/null || pip3 install --exists-action w -U ray[default]=={{ray_version}}) && mkdir -p ~/sky_workdir && mkdir -p ~/.sky/sky_app;
(pip3 list | grep "skypilot " && [ "$(cat {{sky_remote_path}}/current_sky_wheel_hash)" == "{{sky_wheel_hash}}" ]) || (pip3 uninstall skypilot -y; pip3 install "$(echo {{sky_remote_path}}/{{sky_wheel_hash}}/skypilot-{{sky_version}}*.whl)[gcp]" && echo "{{sky_wheel_hash}}" > {{sky_remote_path}}/current_sky_wheel_hash || exit 1);
(pip3 list | grep "ray " | grep {{ray_version}} 2>&1 > /dev/null && python3 -c "import ray" || pip3 uninstall -y ray ray-cpp && pip3 install --exists-action w -U ray[default]=={{ray_version}}) && mkdir -p ~/sky_workdir && mkdir -p ~/.sky/sky_app;
(pip3 list | grep "skypilot " && [ "$(cat {{sky_remote_path}}/current_sky_wheel_hash)" == "{{sky_wheel_hash}}" ]) && python3 -c "import sky" || (pip3 uninstall skypilot -y; pip3 install "$(echo {{sky_remote_path}}/{{sky_wheel_hash}}/skypilot-{{sky_version}}*.whl)[gcp]" && echo "{{sky_wheel_hash}}" > {{sky_remote_path}}/current_sky_wheel_hash || exit 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need import ray here? IIUC ray with correct version is installed in L211?

Copy link
Contributor

github-actions bot commented Nov 8, 2023

This PR is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Nov 8, 2023
Copy link
Contributor

This PR was closed because it has been stalled for 10 days with no activity.

@github-actions github-actions bot closed this Nov 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python version mismatch when migrating from AWS to GCP
3 participants