Skip to content

Latest commit

 

History

History
246 lines (185 loc) · 12.9 KB

README.md

File metadata and controls

246 lines (185 loc) · 12.9 KB

Stable Diffusion XL in Red Hat OpenShift AI

This readme shows how to serve a Stable Diffusion XL model in Red Hat OpenShift AI (RHOAI) along with the steps to fine-tune the model.

This project takes the latest SDXL model and familiarizes it with Toy Jensen via finetuning on a few pictures, thereby teaching it to generate new images which include him when it didn't recognize him previously.

Once the model is fine-tuned we will show the steps to deploy the model in RHOAI as well as to access the deployed model to generate the image.

Prerequisites

Before you can fine-tune and serve a model in Red Hat OpenShift AI, you will need to install RHOAI and enable NVIDIA GPU by following these links:

This project generates LoRA (Low-Rank Adaptation of Large Language Models) weights when the base model is fine-tuned. These weights can be uploaded to one of the following:

  1. MinIO
    • Install the oc client if using MinIO for model storage
  2. AWS S3
    • Setup IAM user/credentials/permissions to create the bucket as well as to upload the objects when using this appraoch

Quickstart

Open up Red Hat OpenShift AI by selecting it from OpenShift Application Launcher. This will open up Red Hat OpenShift AI in a new tab.

Create Data Science project

Select Data Science Projects in the left navigation menu.

Create a new Data Science project by clicking on Create data science project button. Provide the Name as well as the Resource name for the project, and click on Create button. This will create a new data science project for you.

Select your newly created project by clicking on it.

Below is a gif showing Create data science project dialogs: Create Project gif

Setup LoRA upload approach

The LoRA weights that are generated while fine-tuning the base image need to be uploaded to either AWS S3 or MinIO so that they are available to the model when the model is deployed.

Setup MinIO

To setup MinIO, for storing the LoRA weights, execute the following commands in a terminal/console:

# Login to OpenShift (if not already logged in)
oc login --token=<OCP_TOKEN>

# Install MinIO
MINIO_USER=<USERNAME> \
   MINIO_PASSWORD="<PASSWORD>" \
   envsubst < minio-setup.yml | \
   oc apply -f - -n <PROJECT_CREATED_IN_PREVIOUS_STEP>
  • Set <USERNAME> and <PASSWORD> to some valid values, in the above command, before executing it

Once MinIO is setup, you can access it within your project. The yaml that was applied above creates these two routes:

  • minio-ui - for accessing the MinIO UI
  • minio-api - for API access to MinIO
    • Take note of the minio-api route location as that will be needed in next section.

Setup AWS S3

To setup AWS S3, for storing the LoRA weights, setup the following:

  • Create IAM user
  • Add following permissions to the user, with the Effect: "Allow":
    • s3:ListBucket
    • s3:*Object
    • s3:ListAllMyBuckets
    • s3:CreateBucket
      • This permission is ONLY needed if you want bucket to be created by the notebook
  • For the above permissions, set Resource to:
    • arn:aws:s3:::*
    • If an already existing bucket is used, then the Resource can be set to the specific bucket, e.g.
      • arn:aws:s3:::<EXISTING_BUCKET_NAME>

Create workbench

To use RHOAI for this project, you need to create a workbench first. In the newly created data science project, create a new Workbench by clicking Create workbench button in the Workbenches tab.

When creating the workbench, add the following environment variables:

  • AWS_ACCESS_KEY_ID

    • MinIO user name if using MinIO else use AWS credentials
  • AWS_SECRET_ACCESS_KEY

    • MinIO password if using MinIO else use AWS credentials
  • AWS_S3_ENDPOINT

    • minio-api route location if using MinIO else use AWS S3 endpoint that is in the format of https://s3.<REGION>.amazonaws.com
  • AWS_S3_BUCKET

    • This bucket should either be existing or will be created by one of the Jupyter notebooks to upload the LoRA weights.
    • If using AWS S3 and the bucket does not exist, make sure correct permissions are assigned to the IAM user to be able to create the bucket, as shown here
  • AWS_DEFAULT_REGION

    • Set it to us-east-1 if using MinIO otherwise use the correct AWS region

    The environment variables can be added one by one, or all together by uploading a secret yaml file

Use the following values for other fields:

  • Notebook image:
    • Image selection: PyTorch
    • Version selection: 2024.1
  • Deployment size:
    • Container size: Medium
    • Accelerator: NVIDIA GPU
    • Number of accelerators: 1
  • Cluster storage: 50GB

Create the workbench with above settings.

Below is a gif showing various sections of Create Workbench: Create Workbench gif

Create Data connection

Create a new data connection that can be used by the init-container (storage-initializer) to fetch the LoRA weights generated in next step when deploying the model.

To create a Data connection, use the following steps:

  • Click on Add data connection button in the Data connections tab in your newly created project
  • Use the following values for this data connection:
    • Name: minio
    • Access key: value specified for AWS_ACCESS_KEY_ID field in Create Workbench section
    • Secret key: value specified for AWS_SECRET_ACCESS_KEY field in Create Workbench section
    • Endpoint: value specified for AWS_S3_ENDPOINT field in Create Workbench section
    • Access key: value specified for AWS_DEFAULT_REGION field in Create Workbench section
    • Bucket: value specified for AWS_S3_BUCKET field in Create Workbench section
  • Create the data connection by clicking on Add data connection button

Below is a gif showing the Add data connection dialog (the values shown are for MinIO): Add data connecction gif

Add Serving runtime

You can either build the Serving runtime, from igm-repo sub-directory by following the instructions provided there, or use the existing yaml for adding the serving runtime for deploying the model generated in this project.

Follow these steps to use the existing yaml:

  • Expand Settings sidebar menu in RHOAI
  • Click on Serving runtimes in the expanded sidebar menu
  • Click on Add serving runtime button
  • Use the following values in the Add serving runtime page:
    • Select the model serving platforms this runtime supports: Single-model serving platform
    • Select the API protocol this runtime supports: REST
    • YAML: Drag & drop Stable_Diffusion-ServingRuntime yaml file or paste the contents of this file after selecting Start from scratch option
  • Click on Create button to create this new Serving runtime

You can read more about Model serving here

Below is a gif showing fields on Add serving runtime page: Add serving runtime gif

Open workbench

Now that the workbench is created and running, follow these steps to setup the project:

  • Select your newly created project by clicking on Data Science Projects in the sidebar menu
  • Click on Workbenches tab and open the newly created workbench by clicking on the Open link
  • The workbench will open up in a new tab
  • When the workbench is opened for the first time, you will be shown an Authorize Access page.
    • Click Allow selected permissions button in this page.
  • In the workbench, click on Terminal icon in the Launcher tab.
  • Clone this repository in the Terminal by running the following command:
    git clone https://github.com/sgahlot/workbench-example-sdxl-customization.git

Below is a gif showing Open workbench pages: Open workbench gif

Run Jupyter notebook

The notebook mentioned in this section is used to take the base model and fine-tune it to generate LoRA weights that are used later on to generate toy-jensen image

  • Once the repository is cloned, select the folder where you cloned the repository (in the sidebar) and navigate to code/rhoai directory and open up FineTuning-SDXL.ipynb
  • If AWS S3 is used to store the LoRA weights, modify the last cell as shown below:
    • XFER_LOCATION = 'MINIO' - change it to XFER_LOCATION = 'AWS'
  • Run this notebook by selecting Run -> Run All Cells menu item
  • When the notebook successfully runs, your fine-tuned model should have been uploaded to AWS or MinIO in the bucket specified for AWS_S3_BUCKET in Create Workbench section.

Deploy model

Once the initial notebook has run successfully and the data connection is created, you can deploy the model by following these steps:

  • In the RHOAI tab, select Models tab (for your newly created project) and click on Deploy model button
  • Fill in the following fields as described below:
    • Model name: <PROVIDE_a_name_for_the_model>
    • Serving runtime: Stable Diffusion
    • Model framework: sdxl
    • Model server size: Small
    • Accelerator: NVIDIA GPU
    • Model route:
      • If you want to access this model endpoint from outside the cluster, make sure to check the Make deployed models available through an external route checkbox. By default the model endpoint is only available as an internal service.
    • Model location: Select Existing data connection option
      • Name: Name of data connection created in previous step
      • Path: model
  • Click on Deploy to deploy this model

Copy the inference endpoint once the model is deployed successfully (it will take a few minutes to deploy the model).

Generate image

A toy-jensen image can now be generated, using the deployed model. To generate and retrieve the image, use the following steps:

  • Open up GenerateImageUsingModel.ipynb
  • Set the value of inference_endpoint variable correctly by pointing it to your model's inference endpoint
    • Your model inference endpoint should have been copied in the previous section
  • Run this notebook by selecting Run -> Run All Cells menu item
  • When the notebook successfully runs, you should see a toy-jensen image generated in the last cell.

System used

  • Red Hat OpenShift AI: 2.10.0, 2.13.0, 2.14.0
  • GPU: 1x NVIDIA A10G
  • Storage: 50GB

Python module versions

Even though the latest version is used for all the modules that are installed for this project, here are the versions that are used underneath (in case any version incompatibility occurs in future):

  • accelerate: 1.1.1
  • boto3: 1.34.111
  • botocore: 1.34.111
  • dataclass_wizard: 0.26.0
  • diffusers: 0.32.0.dev0
  • ipywidgets: 8.1.2
  • jupyterlab: 3.6.8
  • huggingface_hub: 0.26.2
  • minio: 7.2.9
  • peft: 0.13.2
  • transformers: 4.46.2
  • torch: 2.2.2+cu121
  • torchvision: 0.17.2+cu121

Notebooks with output

The following notebooks contain output to give you an idea on how the outputs will look when the notebooks are run:

Links