Automate machineset creation with rosa cli (#58)

* Automate machineset creation with rosa cli. * Further refine setup instructions.
odh-labs · Jul 1, 2024 · a3064d3 · a3064d3
1 parent d8a184e
commit a3064d3
Showing 1 changed file with 70 additions and 33 deletions.
diff --git a/README.org b/README.org
@@ -6,41 +6,95 @@ This repository contains a basic [[https://nextjs.org/][nextjs]] frontend design
 
 Below are the instructions for manually setting up an environment to run the hackathon.
 
-* Cluster provisioning
 
-** Cloud cluster
+* Pre-requisites
+
+This guide assumes you have the following packages installed locally:
+- [[https://formulae.brew.sh/formula/openshift-cli][OpenShift client]] `oc`
+- [[https://formulae.brew.sh/formula/rosa-cli][Red Hat OpenShift Service on AWS client]] `rosa`
+- [[https://formulae.brew.sh/formula/awscli][Amazon Web Services client]] `aws`
+
+#+NAME: Check pre-requisites
+#+begin_src tmux
+oc version && rosa version && aws --version
+#+end_src
+
 
-Each team participating in the hackathon will require a [[https://aws.amazon.com/rosa][Red Hat OpenShift on AWS (ROSA)]] cluster, which we will provision via the [[https://demo.redhat.com/catalog?item=babylon-catalog-prod/sandboxes-gpte.rosa.prod&utm_source=webapp&utm_medium=share-link][Red Hat Demo System]]. When requesting the environments we enable the workshop user interface.
+* Cluster provisioning
 
-** On-premises cluster
+Each team participating in the hackathon will require a [[https://aws.amazon.com/rosa][Red Hat OpenShift on AWS (ROSA)]] cluster, which we will provision via the [[https://demo.redhat.com/catalog?item=babylon-catalog-prod/sandboxes-gpte.rosa.prod&utm_source=webapp&utm_medium=share-link][Red Hat Demo System]]. When requesting the environments we enable the workshop user interface with:
+- The title `OpenShift AI Hackathon`
+- Number of instances set to `12`
+- AWS Region `eu-west-1`
 
-Each team will also require a Single Node OpenShift cluster, representing their teams on premises environment which we will provision via the [[https://demo.redhat.com/catalog?item=babylon-catalog-prod/openshift-cnv.ocpmulti-single-node-cnv.prod&utm_source=webapp&utm_medium=share-link][Red Hat Demo System]]. When requesting the environments we enable the workshop user interface.
 
+* Cluster setup
 
-* Cluster setup - on-premises
+For each cluster provisioned for the hackathon, the following steps need to be performed:
 
-For each on-premises cluster provisioned for the hackathon, the following steps need to be performed:
 
+** Log in to cluster and rosa cli
 
-** Log in to cluster
+Before we begin lets ensure our command line tools are authenticated. For `rosa` you'll need a token from the [[https://console.redhat.com/openshift/create/rosa/getstarted][Rosa Console]].
 
+#+NAME: Authenticate cli tools
 #+begin_src tmux
-oc login --web <api-route>
+oc login --username "cluster-admin" --password "${PASSWORD}" <api-route>
+
+rosa login --token "${ROSA_TOKEN}"
+
+aws configure
 #+end_src
 
-** Install minio via oc
 
+** Create gpu machine pool
+
+Our first task is to ensure each cluster has a GPU `MachineSet` present, we can follow the instructions from https://cloud.redhat.com/experts/rosa/gpu to complete this.
+
+#+NAME: Create machine pool
 #+begin_src tmux
-oc new-project minio
+# Define paramaters for machineset
+export GPU_INSTANCE_TYPE='g5.8xlarge'
+export CLUSTER_NAME=rosa-jfccs
+export MACHINE_POOL_NAME=nvidia-gpu-pool
+export MACHINE_POOL_REPLICA_COUNT=1
+
+# Create the machineset with rosa cli
+rosa create machinepool \
+  --cluster="${CLUSTER_NAME}" \
+  --name="${MACHINE_POOL_NAME}" \
+  --replicas="${MACHINE_POOL_REPLICA_COUNT}" \
+  --instance-type="${GPU_INSTANCE_TYPE}"
+
+# Wait for the machineset to be ready
+oc wait --for=jsonpath='{.status.readyReplicas}'=1 machineset \
+  --selector hive.openshift.io/machine-pool="${MACHINE_POOL_NAME}" \
+  --namespace openshift-machine-api \
+  --timeout=600s
+#+end_src
+
+
+** Install and configure minio via oc
+
+Once the cluster gpu machinepool has been created we need to deploy [[https://min.io/][minio]] so we can create storage buckets and pre seed models on the cluster for hackathon participants to consume.
 
-oc apply -f setup/minio-setup.yaml
+#+NAME: Install minio via oc
+#+begin_src tmux
+# Deploy minio
+oc new-project minio && oc --namespace minio apply -f setup/minio-setup.yaml
 
-oc rollout status deployment/minio --watch
+# Wait for minio to come up
+oc --namespace minio rollout status deployment/minio --watch
 #+end_src
 
-** TODO Consider creating a cluster web terminal pod
 
-** TODO Create Minio Bucket `models`
+With minio deployed we need to create a bucket and upload some content to it.
+
+#+NAME: Configure minio via oc
+#+begin_src tmux
+oc get pods -n minio
+#+end_src
+
 
 ** Download model from huggingface into each `on prem` clusters Minio `model`'s bucket
 
@@ -55,7 +109,7 @@ or use this:
 https://github.com/tnscorcoran/rhods-finetunning-demo/blob/main/vllm_get_from_huggingface.ipynb
 
 
-** Upload model to on-prem cluster minio
+** Upload model to cluster minio
 Consider using this:
 https://github.com/tnscorcoran/rhods-finetunning-demo/blob/main/vllm_push_to_minio.ipynb
 
@@ -72,20 +126,3 @@ python3 setup/minio-upload.py
 #+end_src
 
 
-* Cluster setup - cloud
-
-** Login to rosa cluster
-
-#+begin_src tmux
-oc login --web <api-route>
-#+end_src
-
-** Install minio via oc
-
-#+begin_src tmux
-oc new-project minio
-
-oc apply -f setup/minio-setup.yaml
-
-oc rollout status minio --watch
-#+end_src