-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from redhat-ai-services/doc_updates
Merging Doc updates for Philip
- Loading branch information
Showing
10 changed files
with
168 additions
and
109 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
{ | ||
"asciidoc.antora.enableAntoraSupport": true | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,15 @@ | ||
# Bootcamp instructions | ||
This repository contains the instructions for Red Hat OpenShift AI Accelerate Bootcamp. | ||
# Red Hat OpenShift AI (RHOAI) Accelerator Boot Camp | ||
|
||
To view the static version of the instructions, please use this URL: https://redhat-ai-services.github.io/ai-accelerator-bootcamp-instructions/ | ||
This repository contains the source code for the instructions used by the Red Hat OpenShift AI Accelerator project boot camp. | ||
|
||
To view the static version of the instructions, please use this URL: | ||
|
||
https://redhat-ai-services.github.io/ai-accelerator-bootcamp-instructions/ | ||
|
||
## Authoring | ||
|
||
The framework of this repo is derived from the format found at (redhat-scholars/build-course)[https://github.com/redhat-scholars/build-course]. | ||
|
||
See the [course documentation)[https://redhat-scholars.github.io/build-course/rhs-build-course] for instructions on authoring and workflows building. | ||
|
||
All course content can be found inside the [content/modules/ROOT/pages](content/modules/ROOT/pages) subdirectory. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,109 +1,36 @@ | ||
:preinstall_operators: %preinstall_operators% | ||
# Red Hat OpenShift Artificial Intelligence - Accelerator Boot Camp | ||
|
||
# Welcome to the Red Hat OpenShift Artificial Intelligence Accelerator Bootcamp | ||
Welcome to the Red Hat OpenShift Artificial Intelligence (RHOAI) accelerator boot camp! | ||
|
||
For this bootcamp, provision 3 instances of the Red Hat OpenShift Container Platform Cluster from demo.redhat.com. | ||
This module is a short, intensive and rigorous course of training covering the following topics: | ||
|
||
These would be used as demo, dev and prod clusters in this bootcamp. | ||
* Rapidly **provisioning RHOAI** onto your new OpenShift cluster | ||
* Working with **Kustomize** to tailor your GitOps based installation | ||
* Creating and deploying custom **data science container images** | ||
## Spin up Demo Cluster | ||
Navigate to https://demo.redhat.com/catalog?search=aws&item=babylon-catalog-prod%2Fsandboxes-gpte.sandbox-ocp.prod and order a _**AWS with OpenShift Open Environment**_. | ||
Once the RHOAI environment is up and running on your cluster, we will dive deeper into some common use cases: | ||
|
||
IMPORTANT: The Control Plane Instance Type should be set to _**m6a.4xlarge**_ | ||
Please make sure this is set correctly or your cluster will not have enough resources | ||
* Training a ML model using a customized development container image | ||
* Using https://en.wikipedia.org/wiki/Amazon_S3[Amazon S3] based storage to save the newly trained model | ||
* Serving the model using a custom inference engine (runtime) | ||
* Using data science pipelines | ||
* Distributed model training, utilizing multiple compute nodes | ||
* Using Large Language Models (LLMs) in RHOAI | ||
[.bordershadow] | ||
image::OrderAWS_env.png[width=35%] | ||
At completion we hope that you'll have a greater understanding of the features in RHOAI, and the ability to rapidly implement your own AI/ML based project. | ||
|
||
This will take around 1-1:20 hrs. | ||
## What is the RHOAI Accelerator Project? | ||
|
||
### Spin up Dev and Prod Cluster | ||
Building an OpenShift cluster to support containerized AL/ML workloads can be quite complex. There are a number of additional operators, features and configuration that must be provisioned in order to expose features such as GPU compute enabled nodes, monitoring, ML training and deployment lifecycle and so on. | ||
|
||
Navigate to https://demo.redhat.com/catalog?search=Red+Hat+OpenShift+Container+Platform+Cluster&item=babylon-catalog-prod%2Fopenshift-cnv.ocpmulti-wksp-cnv.prod and order 2 _**Red Hat OpenShift Container Platform Cluster**_. One for the _**Dev**_ environment and one for the _**Prod**_ environment. Create using the defaults: | ||
It's possible to install all the various components by hand, making notes such as an installation "cheat sheet" while doing so. However this process will be quite tedious and take a fair amount of time, which is a problem for a number of reasons: | ||
|
||
[.bordershadow] | ||
image::clustersettings_Dev_Prod.png[width=50%] | ||
**Repetition**: For short lived clusters (e.g. demo.redhat.com) the cluster has a very short lifespan, measured in days. It's important that we have a way to rapidly re-provision a new cluster through the use of automation. | ||
|
||
**We'll be using and setting up the DEV and PROD clusters for the later sections.** | ||
**Customizable Framework**: The intent of the Accelerator project is to be cloned, copied and altered to match your exact requirements. This could be a customized cluster for experimentation on specific components, features or architecture. Or it could be a great starting point for building a GitOps cluster design for real world customer facing implementations. | ||
|
||
## RHOAI Accelerator - Open Source | ||
|
||
## Install and Setup RHOAI & Components: Demo Environment | ||
The source code for the AI Accelerator can be found at: https://github.com/redhat-ai-services/ai-accelerator | ||
|
||
The environment install and setup will be performed with the help of the Red Hat AI Accelerator repository. This repo is intended to provide a core set of OpenShift features that would commonly be used for a Data Science environment, but can also be highly customized for specific scenarios. | ||
|
||
### Set up Demo cluster | ||
Follow the following steps to complete the install and setup: | ||
|
||
. After the cluster is running and ready, log in as the admin. | ||
|
||
. In the top right drop down, select the _**Copy Login Command**_. Enter credentials again. Copy the login token as shown in the image. Paste and run the command in your local terminal. This should log you into the cluster through the terminal. | ||
|
||
[.bordershadow] | ||
image::Login_command.png[Copy the login token] | ||
|
||
NOTE: If the `oc login` command fails because of certificate issue, use: `--insecure-skip-tls-verify` | ||
|
||
[start=3] | ||
. Git clone the following repository to your local machine: | ||
[.console-input] | ||
[source,adoc] | ||
---- | ||
git clone https://github.com/redhat-ai-services/ai-accelerator.git | ||
---- | ||
|
||
[start=4] | ||
. Navigate to the cloned folder with the command: | ||
[source,terminal] | ||
---- | ||
cd ai-accelerator/ | ||
---- | ||
|
||
[start=5] | ||
. Run the bootstrap script by running the bootstrap.sh script | ||
[source,terminal] | ||
---- | ||
./bootstrap.sh | ||
---- | ||
* This will first install the GitOps Operator and then provide the user with the following overlays: | ||
* If the script times out waiting for GitOps to come up, you may need to run the bootstrap script again. | ||
[.bordershadow] | ||
image::Bootstrap_selection_1.png[] | ||
[start=6] | ||
. For _**Demo**_ cluster type the number 3 and press Enter. | ||
This will install all the applications in the bootstrap script and also provide a openshift-gitops-server (ArgoCD) link. | ||
[.bordershadow] | ||
image::Bootstrap_argo_url.png[] | ||
[start=7] | ||
. Log into the Argo CD link with the Openshift credentials and wait till everything syncs successfully. | ||
[.bordershadow] | ||
image::Argo_home_screen.png[] | ||
This will take around 25-30 minutes for everything to spin up. | ||
This will install RHOAI and related operators. Since we are using GPUs for the demo instance, it will also install the Nvidia GPU Operator and the Node Feature Discovery (NFD) Operator. | ||
This GPU overlay also uses _**MachineAutoscalers**_. Since there are Inferencing Service examples that use GPUs, a _**g5.2xlarge**_ machineset (with GPU) will spin up. This can take a few minutes. | ||
[NOTE] | ||
==== | ||
If the granite inference service fails to spin up, delete the deployment and Argo should redeploy it. | ||
[SOURCE] | ||
---- | ||
oc delete deployment granite-predictor-00001-deployment -n ai-example-single-model-serving | ||
---- | ||
|
||
==== | ||
We will cover the ai-accelerator project overview in a later section. | ||
--- | ||
Continue using the _**DEMO**_ cluster for the exercises. | ||
The accelerator was created and is currently maintained by the Red Hat AI Services organization, however contributions from anywhere are always greatly appreciated! |
48 changes: 48 additions & 0 deletions
48
content/modules/ROOT/pages/05_environment_provisioning.adoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
:preinstall_operators: %preinstall_operators% | ||
|
||
# Provisioning a Demonstration Environment | ||
|
||
The subsequent sections of this lab will utilize three instances of the Red Hat OpenShift Container Platform Cluster: | ||
|
||
. **Demo Cluster**: for running the model training and deployment exercises | ||
. **Dev Cluster**: to simulate a development OpenShift cluster | ||
. **Prod Cluster**: to simulate a production OpenShift cluster | ||
|
||
In this section we will order all three clusters at demo.redhat.com. Note that these clusters are fairly short lived, typically they have a 6 hour runtime and are deleted after 48 hours, although the runtime can be temporarily extended as needed. | ||
|
||
The demo clusters typically take 1-2 hours to provision, although they may take a little longer on Monday mornings as they are using AWS spot instances and demand is usually high at the start of the work week. So it's suggested to provision them, and continue with the subsequent sections that don't require cluster access yet. | ||
|
||
## Provision a Demo Cluster | ||
|
||
The first cluster where we will run our demonstration projects requires a little more resources that the other two, so lets provision this first. | ||
|
||
. Navigate to https://demo.redhat.com/catalog?search=aws&item=babylon-catalog-prod%2Fsandboxes-gpte.sandbox-ocp.prod[demo.redhat.com] and select the _**AWS with OpenShift Open Environment**_. Note that this is a blank / empty instance of OpenShift with no other operators or demo components preloaded, perfect for our lab exercise. | ||
. Change the the **Control Plane Instance Type** to **m6a.4xlarge**, as the default machine configuration does not have sufficient compute resources for the additional component we will be installing. | ||
. You should be able to use the default version of OpenShift, however since there are continuous releases it's a good idea to double check the https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/[RHOAI Documentation], under the **Supported configurations** subsection to ensure compatibility. | ||
|
||
[.bordershadow] | ||
image::OrderAWS_env.png[width=35%] | ||
|
||
## Provision the Development and Production Clusters | ||
|
||
These are simple clusters used to demonstrate Kustomize overlays in a subsequent lab. One cluster will be used for the _**Dev**_ environment and the other for the _**Prod**_ environment. | ||
|
||
. Navigate to https://demo.redhat.com/catalog?search=Red+Hat+OpenShift+Container+Platform+Cluster&item=babylon-catalog-prod%2Fopenshift-cnv.ocpmulti-wksp-cnv.prod[demo.redhat.com] and order the _**Red Hat OpenShift Container Platform Cluster (AWS)**_ | ||
. Select all default configuration options | ||
. Repeat these steps twice, one for Dev and the next for Prod clusters | ||
|
||
[.bordershadow] | ||
image::clustersettings_Dev_Prod.png[width=50%] | ||
|
||
## While You Wait | ||
|
||
The provisioning process will take a while to complete, so why not take some time to check out some of the documentation in the AI Accelerator project that we will be installing once the clusters are ready: | ||
|
||
* https://github.com/redhat-ai-services/ai-accelerator[Project Introduction README] | ||
* https://github.com/redhat-ai-services/ai-accelerator/blob/main/documentation/overview.md[AI Accelerator Overview]. | ||
* https://github.com/redhat-ai-services/ai-accelerator/blob/main/documentation/installation.md[AI Accelerator Installation Procedure]. | ||
* https://github.com/redhat-ai-services/ai-accelerator/tree/main/tenants[Tenants documentation]. | ||
## When the Cluster is Ready | ||
|
||
Once the clusters have been provisioned, you should receive an email containing the cluster URLs as well as an administrative user (such as `kubeadmin`) and password. You can also obtain these from the status dashboard at https://demo.redhat.com[demo.redhat.com] as well as perform administrative functions on your clusters, such as starting/stopping or extending the lifespan if desired. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# Bootstrap the AI Accelerator Project | ||
|
||
In this section we will execute the bootstrap installation script found in the AI Accelerator. | ||
|
||
TIP: Note the term "bootstrap" rather than "install", since this script simply sets up the bare minimum components such as ArgoCD and thereafter ArgoCD takes over to perform the remainder of the GitOps process to install the rest of the software stack, including RHOAI. | ||
|
||
## (Optional) Create a Fork of the AI Accelerator Project | ||
|
||
It's highly recommended that you https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo[create a fork] of the AI Accelerator project, which gives you a copy that you can customize and manipulate as desired. | ||
|
||
[start=1] | ||
. Navigate to the project: https://github.com/redhat-ai-services/ai-accelerator | ||
. Click the "fork" button in the navigation bar | ||
. Select the Owner. This is typically your personal GitHub account, but could be an organization if desired. | ||
|
||
TIP: You can see who else has forked the repo by clicking the https://github.com/redhat-ai-services/ai-accelerator/forks[forks link] in the "About" section. It's interesting to see who else is using this accelerator project! | ||
|
||
## Clone the AI Accelerator Project | ||
|
||
Clone (download) the Git repository containing the AI Accelerator, since we will be running the bootstrap scripts from your local machine. | ||
|
||
TIP: If you can't or prefer not to run the installation from your local machine (such as in a locked down corporate environment), you can also use the Bastion host instead. This is a Linux virtual machine running on AWS, the SSH login details are provided in the provisioning email you received from demo.redhat.com. Just be aware that the Basion host is deprovisioned when the cluster is deleted, so be sure to git commit any changes frequently. | ||
|
||
[start=2] | ||
. Git clone the following repository to your local machine. If you're using a fork, then change the repository URL in the command below to match yours: | ||
[.console-input] | ||
[source,adoc] | ||
---- | ||
git clone https://github.com/redhat-ai-services/ai-accelerator.git | ||
---- | ||
|
||
## Bootstrap the Demo Cluster | ||
|
||
Carefully follow the instructions found in https://github.com/redhat-ai-services/ai-accelerator/blob/main/documentation/installation.md[`documentation/installation.md`], with the following specifics: | ||
|
||
[start=3] | ||
. Use the _**Demo**_ cluster credentials when logging into OpenShift | ||
. Select number 3 when prompted: | ||
[.bordershadow] | ||
image::Bootstrap_selection_1.png[] | ||
|
||
This will install all the applications in the bootstrap script and also provide a openshift-gitops-server (ArgoCD) link. Option 3 will install RHOAI and related operators. Since we are using GPUs for the demo instance, it will also install the Nvidia GPU Operator and the Node Feature Discovery (NFD) Operator. | ||
|
||
[.bordershadow] | ||
image::Bootstrap_argo_url.png[] | ||
|
||
[start=4] | ||
. Log into the Argo CD link with the Openshift credentials and wait till everything syncs successfully. | ||
[.bordershadow] | ||
image::Argo_home_screen.png[] | ||
|
||
This will take around 25-30 minutes for everything to provision and start up. You can monitor the status of all the components in the ArgoCD console. | ||
|
||
TIP: Once provisioned, the project will create a link to ArgoCD in the Applications menu of OpenShift. However you can also copy the ArgoCD URL from the terminal once the bootstrap script has completed. | ||
|
||
This GPU overlay also uses _**MachineAutoscalers**_. Since there are Inferencing Service examples that use GPUs, a _**g5.2xlarge**_ machineset (with GPU) will spin up. This can take a few minutes. | ||
|
||
[NOTE] | ||
==== | ||
If the granite inference service fails to spin up, delete the deployment and Argo should redeploy it. | ||
[SOURCE] | ||
---- | ||
oc delete deployment granite-predictor-00001-deployment -n ai-example-single-model-serving | ||
---- | ||
==== | ||
|
||
|
||
We will cover the ai-accelerator project overview in a later section. | ||
|
||
--- | ||
Continue using the _**DEMO**_ cluster for the subsequent exercises. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters