Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSPS-324 Add download command #12

Merged
merged 13 commits into from
Nov 20, 2024
Merged

TSPS-324 Add download command #12

merged 13 commits into from
Nov 20, 2024

Conversation

mmorgantaylor
Copy link
Collaborator

@mmorgantaylor mmorgantaylor commented Nov 19, 2024

Description

Add terralab download command. Checks that the requested pipeline run's status is SUCCEEDED, then downloads all the output files in parallel.

How it looks mid-download (in this screenshot the first two files have already completed):
image

How it looks post download:
image

Note: created TSPS-378 to address the newline weirdness in above screenshot.

Help menus:
image

Jira Ticket

https://broadworkbench.atlassian.net/browse/TSPS-324

@mmorgantaylor mmorgantaylor marked this pull request as ready for review November 19, 2024 20:47
@@ -1,5 +1,6 @@
# Teaspoons service API URL
TEASPOONS_API_URL=https://tsps.dsde-dev.broadinstitute.org
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops revert this

Copy link
Collaborator

@jsotobroad jsotobroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 with some questions/suggestions

from terralab.logic import pipelines_logic

LOGGER = logging.getLogger(__name__)


@click.command()
@click.argument("pipeline_name")
@click.command(short_help="Submit a pipeline run")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be a short help for all of our commands?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly, it might make things clearer for developers. the way the help text works is that the top-level help output (what you see when you run terralab -h) comes from the docstrings, but if the docstrings are long or multiline, it gets cut off in sometimes arbitrary places with ... at the end. so adding the short_help here lets you enforce what shows up there, and the longer docstring is used for the command-specific -h output. so I think I'd only added it in places where the top-level help output was getting truncated weirdly, but we could just add it everywhere if we prefer. what do you think?

"""Submit a pipeline run"""
"""Submit a pipeline run

PIPELINE_NAME is the name of the pipeline to run"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why this was added, where is PIPELINE_NAME coming from?

If you're going to want to add info for the parameters of a function we should follow one of the many patterns that exist

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it comes from the fact that when you run e.g. terralab submit -h the help text begins with a usage hint that looks like Usage: terralab submit [OPTIONS] PIPELINE_NAME, so somewhere in the click documentation there was a suggestion to explain the PIPELINE_NAME field in this help text.

that said, we can definitely choose a more conventional pattern to follow, like perhaps

Parameters
----------
PIPELINE_NAME : str
    The name of the pipeline to run

I don't have strong feelings about this, let me know what you think

Copy link
Collaborator Author

@mmorgantaylor mmorgantaylor Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently the output of terralab download -h is

✗ terralab download -h
Usage: terralab download [OPTIONS] PIPELINE_NAME JOB_ID

  Download all output files from a pipeline run

  PIPELINE_NAME is the name of the pipeline that was run

  JOB_ID is the uuid identifier for the pipeline run

Options:
  -d, --local_destination DIRECTORY
                                  optional location to download results to.
                                  defaults to the current directory.
  -h, --help                      Show this message and exit.

i.e. click doesn't describe the arguments, so if we want descriptions we need to add them ourselves. I'm all for making it more consistent with the "Options" formatting, though, like:

✗ terralab download -h
Usage: terralab download [OPTIONS] PIPELINE_NAME JOB_ID

  Download all output files from a pipeline run

  PIPELINE_NAME       the name of the pipeline that was run
  JOB_ID              the uuid identifier for the pipeline run

Options:
  -d, --local_destination DIRECTORY
                                  optional location to download results to.
                                  defaults to the current directory.
  -h, --help                      Show this message and exit.

unfortunately I don't think we'll get the indentation/spacing to match up exactly

help="optional location to download results to. defaults to the current directory.",
)
@handle_api_exceptions
def download(pipeline_name: str, job_id: uuid.UUID, local_destination: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should job_id be type str here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, yes, good catch. are you making the other uuid->str changes in your PR or should I make them here?

@@ -75,3 +82,76 @@ def upload_file_with_signed_url(local_file_path, signed_url):
except Exception as e:
LOGGER.error(add_blankline_after(f"Error uploading file: {e}"))
exit(1)


class SignedUrlDownload:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oooo nice, would be nice to have a descripton of what the class does/is used for

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, adding!

## download action


def get_result_and_download_pipeline_run_outputs(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will we eventually have a status command in the CLI that will just try to grab the status of the pipeline run or will it just be this download call?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will add a CLI status command next

) -> Optional[str]:
job_status: str = pipeline_run_response.job_report.status
if job_status == "SUCCEEDED":
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might wanna raise if it's meant to handle non success statuses and its getting a success status

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, also renamed the function to be a bit more descriptive

def download_with_pbar(download: SignedUrlDownload) -> str:
"""Helper function to take a SignedUrlDownload object and perform a download with a progress bar.
Return the local file path of the downloaded file."""
download_block_size = 1024
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what makes this the optimal number? (and should it be configurable?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was in an example I'd found, so no justification there. it looks like 8192 is the accepted number: https://stackoverflow.com/questions/48719893/why-is-the-block-size-for-python-httplibs-reads-hard-coded-as-8192-bytes

doesn't seem like it should be configurable, but let me know if you disagree?

Copy link

sonarcloud bot commented Nov 20, 2024

@mmorgantaylor mmorgantaylor merged commit f43fa45 into main Nov 20, 2024
4 checks passed
@mmorgantaylor mmorgantaylor deleted the TSPS-324_mma_download branch November 20, 2024 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants