This readme shows how to serve a Stable Diffusion XL model in Red Hat OpenShift AI (RHOAI) along with the steps to fine-tune the model.
This project takes the latest SDXL model and familiarizes it with Toy Jensen via finetuning on a few pictures, thereby teaching it to generate new images which include him when it didn't recognize him previously.
Once the model is fine-tuned we will show the steps to deploy the model in RHOAI as well as to access the deployed model to generate the image.
Before you can fine-tune and serve a model in Red Hat OpenShift AI, you will need to install RHOAI and enable NVIDIA GPU by following these links:
This project generates LoRA (Low-Rank Adaptation of Large Language Models) weights when the base model is fine-tuned. These weights can be uploaded to one of the following:
MinIO
- Install the
oc
client if using MinIO for model storage
- Install the
AWS S3
- Setup IAM user/credentials/permissions to create the bucket as well as to upload the objects when using this appraoch
Open up Red Hat OpenShift AI
by selecting it from OpenShift Application Launcher. This will open up Red Hat OpenShift AI in a new tab.
Select Data Science Projects in the left navigation menu.
Create a new Data Science project by clicking on Create data science project
button.
Provide the Name
as well as the Resource name
for the project, and click on Create
button. This will create a new data science project for you.
Select your newly created project by clicking on it.
Below is a gif showing Create data science project
dialogs:
The LoRA weights that are generated while fine-tuning the base image need to be uploaded to either AWS S3 or MinIO so that they are available to the model when the model is deployed.
To setup MinIO, for storing the LoRA weights, execute the following commands in a terminal/console:
# Login to OpenShift (if not already logged in)
oc login --token=<OCP_TOKEN>
# Install MinIO
MINIO_USER=<USERNAME> \
MINIO_PASSWORD="<PASSWORD>" \
envsubst < minio-setup.yml | \
oc apply -f - -n <PROJECT_CREATED_IN_PREVIOUS_STEP>
- Set
<USERNAME>
and<PASSWORD>
to some valid values, in the above command, before executing it
Once MinIO is setup, you can access it within your project. The yaml that was applied above creates these two routes:
minio-ui
- for accessing the MinIO UIminio-api
- for API access to MinIO- Take note of the
minio-api
route location as that will be needed in next section.
- Take note of the
To setup AWS S3, for storing the LoRA weights, setup the following:
- Create IAM user
- Add following permissions to the user, with the
Effect: "Allow"
:s3:ListBucket
s3:*Object
s3:ListAllMyBuckets
s3:CreateBucket
- This permission is ONLY needed if you want bucket to be created by the notebook
- For the above permissions, set
Resource
to:arn:aws:s3:::*
- If an already existing bucket is used, then the
Resource
can be set to the specific bucket, e.g.arn:aws:s3:::<EXISTING_BUCKET_NAME>
To use RHOAI for this project, you need to create a workbench first. In the newly created data science project, create a new Workbench by clicking Create workbench
button in the Workbenches
tab.
When creating the workbench, add the following environment variables:
-
AWS_ACCESS_KEY_ID
- MinIO user name if using
MinIO
else useAWS
credentials
- MinIO user name if using
-
AWS_SECRET_ACCESS_KEY
- MinIO password if using
MinIO
else useAWS
credentials
- MinIO password if using
-
AWS_S3_ENDPOINT
minio-api
route location if usingMinIO
else useAWS S3
endpoint that is in the format ofhttps://s3.<REGION>.amazonaws.com
-
AWS_S3_BUCKET
- This bucket should either be existing or will be created by one of the Jupyter notebooks to upload the LoRA weights.
- If using
AWS S3
and the bucket does not exist, make sure correct permissions are assigned to the IAM user to be able to create the bucket, as shown here
-
AWS_DEFAULT_REGION
- Set it to us-east-1 if using
MinIO
otherwise use the correctAWS
region
The environment variables can be added one by one, or all together by uploading a secret yaml file
- Set it to us-east-1 if using
Use the following values for other fields:
- Notebook image:
- Image selection: PyTorch
- Version selection: 2024.1
- Deployment size:
- Container size: Medium
- Accelerator: NVIDIA GPU
- Number of accelerators: 1
- Cluster storage: 50GB
Create the workbench with above settings.
Below is a gif showing various sections of Create Workbench
:
Create a new data connection that can be used by the init-container (storage-initializer
) to fetch the LoRA weights generated in next step when deploying the model.
To create a Data connection, use the following steps:
- Click on
Add data connection
button in theData connections
tab in your newly created project - Use the following values for this data connection:
- Name:
minio
- Access key: value specified for
AWS_ACCESS_KEY_ID
field inCreate Workbench
section - Secret key: value specified for
AWS_SECRET_ACCESS_KEY
field inCreate Workbench
section - Endpoint: value specified for
AWS_S3_ENDPOINT
field inCreate Workbench
section - Access key: value specified for
AWS_DEFAULT_REGION
field inCreate Workbench
section - Bucket: value specified for
AWS_S3_BUCKET
field inCreate Workbench
section
- Name:
- Create the data connection by clicking on
Add data connection
button
Below is a gif showing the Add data connection
dialog (the values shown are for MinIO):
You can either build the Serving runtime, from igm-repo sub-directory by following the instructions provided there, or use the existing yaml for adding the serving runtime for deploying the model generated in this project.
Follow these steps to use the existing yaml:
- Expand
Settings
sidebar menu in RHOAI - Click on
Serving runtimes
in the expanded sidebar menu - Click on
Add serving runtime
button - Use the following values in the
Add serving runtime
page:- Select the model serving platforms this runtime supports:
Single-model serving platform
- Select the API protocol this runtime supports:
REST
- YAML: Drag & drop Stable_Diffusion-ServingRuntime yaml file or paste the contents of this file after selecting
Start from scratch
option
- Select the model serving platforms this runtime supports:
- Click on
Create
button to create this new Serving runtime
You can read more about Model serving here
Below is a gif showing fields on Add serving runtime
page:
Now that the workbench is created and running, follow these steps to setup the project:
- Select your newly created project by clicking on
Data Science Projects
in the sidebar menu - Click on
Workbenches
tab and open the newly created workbench by clicking on theOpen
link - The workbench will open up in a new tab
- When the workbench is opened for the first time, you will be shown an
Authorize Access
page.- Click
Allow selected permissions
button in this page.
- Click
- In the workbench, click on
Terminal
icon in theLauncher
tab. - Clone this repository in the
Terminal
by running the following command:
git clone https://github.com/sgahlot/workbench-example-sdxl-customization.git
Below is a gif showing Open workbench
pages:
The notebook mentioned in this section is used to take the base model and fine-tune it to generate LoRA weights that are used later on to generate toy-jensen image
- Once the repository is cloned, select the folder where you cloned the repository (in the sidebar) and navigate to
code/rhoai
directory and open up FineTuning-SDXL.ipynb - If
AWS S3
is used to store the LoRA weights, modify the last cell as shown below:XFER_LOCATION = 'MINIO'
- change it toXFER_LOCATION = 'AWS'
- Run this notebook by selecting
Run
->Run All Cells
menu item - When the notebook successfully runs, your fine-tuned model should have been uploaded to AWS or MinIO in the bucket specified for
AWS_S3_BUCKET
inCreate Workbench
section.
Once the initial notebook has run successfully and the data connection is created, you can deploy the model by following these steps:
- In the RHOAI tab, select
Models
tab (for your newly created project) and click onDeploy model
button - Fill in the following fields as described below:
- Model name: <PROVIDE_a_name_for_the_model>
- Serving runtime: Stable Diffusion
- Model framework: sdxl
- Model server size: Small
- Accelerator: NVIDIA GPU
- Model route:
- If you want to access this model endpoint from outside the cluster, make sure to check the
Make deployed models available through an external route
checkbox. By default the model endpoint is only available as an internal service.
- If you want to access this model endpoint from outside the cluster, make sure to check the
- Model location: Select Existing data connection option
- Name: Name of data connection created in previous step
- Path: model
- Click on
Deploy
to deploy this model
Copy the inference endpoint
once the model is deployed successfully (it will take a few minutes to deploy the model).
A toy-jensen image can now be generated, using the deployed model. To generate and retrieve the image, use the following steps:
- Open up GenerateImageUsingModel.ipynb
- Set the value of
inference_endpoint
variable correctly by pointing it to your model's inference endpoint- Your model
inference endpoint
should have been copied in the previous section
- Your model
- Run this notebook by selecting
Run
->Run All Cells
menu item - When the notebook successfully runs, you should see a toy-jensen image generated in the last cell.
- Red Hat OpenShift AI:
2.10.0
,2.13.0
,2.14.0
- GPU: 1x NVIDIA
A10G
- Storage: 50GB
Even though the latest version is used for all the modules that are installed for this project, here are the versions that are used underneath (in case any version incompatibility occurs in future):
- accelerate:
1.1.1
- boto3:
1.34.111
- botocore:
1.34.111
- dataclass_wizard:
0.26.0
- diffusers:
0.32.0.dev0
- ipywidgets:
8.1.2
- jupyterlab:
3.6.8
- huggingface_hub:
0.26.2
- minio:
7.2.9
- peft:
0.13.2
- transformers:
4.46.2
- torch:
2.2.2+cu121
- torchvision:
0.17.2+cu121
The following notebooks contain output to give you an idea on how the outputs will look when the notebooks are run: