Skip to content

google-github-actions/run-vertexai-notebook

Use this GitHub action with your project
Add this Action to an existing workflow or create a new one
View on Marketplace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

run-vertexai-notebook GitHub Action

GitHub composite action to trigger asynchronous execution of a Jupyter Notebook via Google Cloud Vertex AI.

The typical SDLC for a Jupyter Notebook includes source control of the notebook file without it's output cells. It is a best practice that notebooks should be stored this way to prevent commiting potentially sensitive data. A downside of this practice is that code reviewers will not be able to see the output while reviewing and may not be able to accurately gauge the impact of a change.

The main purpose of this action is to provide a secure way to execute a notebook, store the output (outside of source control), and serve it to a reviewer with proper access controls.

This action relies on the notebook execution functionality of Google Cloud's Vertex AI to execute the notebook and store the executed notebook with output cells in Google Cloud Storage. Access to the output is controled by Google Cloud Storage ACLs.

NOTE: Notebooks executed by this action will fall under the notebook executor requirements defined by Vertex AI.

This action will provision cloud resources with associated costs so it is recommended that you control the usage of this action by:

  • Limiting the triggers of this action: e.g. on pull request with a specific label

  • Limiting the set of notebooks that it executes for via the allowlist parameter

  • Managing the size of the Vertex AI infrastructure via the vertex_machine_type parameter

This is not an officially supported Google product, and it is not covered by a Google Cloud support contract. To report bugs or request features in a Google Cloud product, please contact Google Cloud support.

Prerequisites

This action requires Google Cloud credentials to execute gcloud commands. See setup-gcloud for details.

Setup

  1. Create a new Google Cloud Project (or select an existing project) and enable the Vertex AI APIs.

  2. Create or reuse a GitHub repository for the example workflow:

    1. Create a repository.

    2. Move into the repository directory:

      $ cd <repo>
      
    3. Copy the example into the repository:

      $ cp -r <path_to>/notebook-review-action/examples/notebook-review/ .
      
  3. Create a GCS bucket if one does not already exist.

  4. Create a Google Cloud service account if one does not already exist.

  5. Add the following Cloud IAM roles to your service account:

    • roles/aiplatform.user - allows running jobs in Vertex AI
    • roles/storage.objectWriter - allows writing notebook files to object storage

    Note: These permissions are overly broad to favor a quick start. They do not represent best practices around the Principle of Least Privilege. To properly restrict access, you should create a custom IAM role with the most restrictive permissions.

  6. Setup authenticaion to Google Cloud using workload identity federation with the above service account.

Usage

 jobs:
   notebook-review:
    name: Notebook Review
    needs: changes
    runs-on: ubuntu-latest

    steps:
    - id: 'auth'
      name: 'Authenticate to Google Cloud'
      uses: 'google-github-actions/auth@v0'
      with:
        workload_identity_provider: 'projects/123456789/locations/global/workloadIdentityPools/my-pool/providers/my-provider'
        service_account: '[email protected]'

    - id: notebook-review
      uses: google-github-actions/run-vertexai-notebook@v0
        with:
          gcs_source_bucket: '${{ env.GCS_SOURCE }}'
          gcs_output_bucket: '${{ env.GCS_OUTPUT }}'
          allowlist: '${{ needs.changes.outputs.notebooks_files }}'

Running R notebooks

R requires a different base container and kernel

    - id: notebook-review
      uses: google-github-actions/run-vertexai-notebook@v0
        with:
          gcs_source_bucket: '${{ env.GCS_SOURCE }}'
          gcs_output_bucket: '${{ env.GCS_OUTPUT }}'
          allowlist: '${{ needs.changes.outputs.notebooks_files }}'
          vertex_container_name: 'gcr.io/deeplearning-platform-release/r-cpu.4-1:latest' # R base container
          kernel_name: 'ir' # The stock R kernel

See a more complete example in examples.

Inputs

  • gcs_source_bucket - (Required) Google Cloud Storage bucket to store notebooks to be run by Vertex AI. e.g. mygcp-bucket-0001/nbr/source. This bucket was created during setup above.

  • gcs_output_bucket - (Required) Google Cloud Storage bucket to store the results of the notebooks executed by Vertex AI. e.g. mygcp-bucket-0001/nbr/output. This bucket was created during setup above.

    Note: It is recommended that the source and output values share the same bucket and utilize a path structure to seperate source from output.

  • region: (Optional) Google Cloud region to execute Vertex AI jobs in. Defaults to us-central1.

  • vertex_machine_type - (Optional) Machine type to use for Vertex AI job execution. Defaults to a n1-standard-4 machine shape.

  • allowlist - (Required) List of notebooks to execute. Comma separated list of files to run on Vertex AI. e.g. mynotebook.ipynb,somedir/another_notebook.pynb. It is expected that this is the output from an action like dorny/paths-filter.

  • add_comment - (Optional) By default the action will attempt to write a comment to the open PR or issue that triggered this action. This flag allows workflows that are triggered on direct push to a branch to disable this behavior.

  • kernel_name - (Optional) Kernel to use as the environment for the notebook when it executes. Defaults to python3.

  • vertex_container_name - (Optional) The base container to use for the notebook execution job. Defaults to gcr.io/deeplearning-platform-release/base-cu110:latest