-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UX] warning before launching jobs/serve when using a reauth required credentials #4479
base: master
Are you sure you want to change the base?
Conversation
@@ -536,6 +536,10 @@ def get_credential_file_mounts(self) -> Dict[str, str]: | |||
""" | |||
raise NotImplementedError | |||
|
|||
def can_credential_expire(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
def can_credential_expire(self) -> bool: | |
def can_credentials_expire(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checks the active credential(only one), the original make sense I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @weih1121!
SSO/Container-role refresh token: | ||
https://docs.aws.amazon.com/solutions/latest/dea-api/auth-refreshtoken.html | ||
""" | ||
# TODO(hong): Add a check for the expiration of the temporary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incomplete?
# TODO(hong): Add a check for the expiration of the temporary | |
# TODO(hong): Add a CLI based check for the expiration of the temporary credentials |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When provisioning an AWS instance for a jobs controller or server, we attach an IAM Role to the instance, which automatically refreshes the credentials. We only update the local credentials for clusters when the SHARED_CREDENTIALS credential type is configured. For other types such as ENV or SSO, we ignore them.
There are some reasons why I have marked the following as TODO:
- For ENV, SHARED_CREDENTIALS, and IAM ROLE credential types, there is no valid CLI tool that can assist.
- For SSO, although I can check the expiry from the config files in
~/.aws/sso/cache/xxx.json
, the cluster does not use localSSO
credentials at all. - The only credentials that are updated are SHARED_CREDENTIALS, which are generated from the AWS access portal. These credentials are expirable.
- Another factor is performance concern but it is not the most important
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I meant the TODO sentence was incomplete :)
sky/backends/cloud_vm_ray_backend.py
Outdated
warnings = (f'\nWarning: Expiring credentials detected for ' | ||
f'{expirable_clouds}. Clusters may be leaked if ' | ||
f'the credentials expire while jobs are running. ' | ||
f'It is recommended to use credentials that never' | ||
f' expire or a service account.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we do not verify if the credentials are expiring, we should update the message:
warnings = (f'\nWarning: Expiring credentials detected for ' | |
f'{expirable_clouds}. Clusters may be leaked if ' | |
f'the credentials expire while jobs are running. ' | |
f'It is recommended to use credentials that never' | |
f' expire or a service account.') | |
warnings = (f'\nWarning: Credentials used for {expirable_clouds} may expire. ' | |
f' Resources may be leaked if ' | |
f'the credentials expire while jobs are running. ' | |
f'It is recommended to use credentials that never' | |
f' expire or a service account.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
sky/backends/backend_utils.py
Outdated
|
||
credentials = sky_check.get_cloud_credential_file_mounts(excluded_clouds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by accident, removed
sky/backends/cloud_vm_ray_backend.py
Outdated
f'the credentials expire while jobs are running. ' | ||
f'It is recommended to use credentials that never' | ||
f' expire or a service account.') | ||
click.secho(warnings, fg='yellow') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of importing and using click, can we use logger.warning with colors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, I think this choice is for performance concern, right?
issue link: #4433
Solution:
context:
detail:
LOCAL_CREDENTIALS
and credentials are expiring and warning to the user before provision.Test case 1:
config in
~/.sky/config.yaml
Local
AWS
credential configured bySSO
Launch controller on
GCP
and task onAWS
Command to run:
Logs:
Warnings detail:
worker provision failed since
AWS
not accessible fromGCP
cloud.Test case 2:
Provision controller on
AWS
and worker onGCP
Still warning for expiring local credentials.
Job completed in this case
Test case 3:
same as test case 2 but using service account for
GCP
Provision instance without warning message.
Job on
GCP
finished.@Michaelvll @romilbhardwaj can you help review?
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh