Skip to content

Commit

Permalink
Automate machineset creation with rosa cli (#58)
Browse files Browse the repository at this point in the history
* Automate machineset creation with rosa cli.

* Further refine setup instructions.
  • Loading branch information
jmhbnz authored Jul 1, 2024
1 parent d8a184e commit a3064d3
Showing 1 changed file with 70 additions and 33 deletions.
103 changes: 70 additions & 33 deletions README.org
Original file line number Diff line number Diff line change
Expand Up @@ -6,41 +6,95 @@ This repository contains a basic [[https://nextjs.org/][nextjs]] frontend design

Below are the instructions for manually setting up an environment to run the hackathon.

* Cluster provisioning

** Cloud cluster
* Pre-requisites

This guide assumes you have the following packages installed locally:
- [[https://formulae.brew.sh/formula/openshift-cli][OpenShift client]] `oc`
- [[https://formulae.brew.sh/formula/rosa-cli][Red Hat OpenShift Service on AWS client]] `rosa`
- [[https://formulae.brew.sh/formula/awscli][Amazon Web Services client]] `aws`

#+NAME: Check pre-requisites
#+begin_src tmux
oc version && rosa version && aws --version
#+end_src


Each team participating in the hackathon will require a [[https://aws.amazon.com/rosa][Red Hat OpenShift on AWS (ROSA)]] cluster, which we will provision via the [[https://demo.redhat.com/catalog?item=babylon-catalog-prod/sandboxes-gpte.rosa.prod&utm_source=webapp&utm_medium=share-link][Red Hat Demo System]]. When requesting the environments we enable the workshop user interface.
* Cluster provisioning

** On-premises cluster
Each team participating in the hackathon will require a [[https://aws.amazon.com/rosa][Red Hat OpenShift on AWS (ROSA)]] cluster, which we will provision via the [[https://demo.redhat.com/catalog?item=babylon-catalog-prod/sandboxes-gpte.rosa.prod&utm_source=webapp&utm_medium=share-link][Red Hat Demo System]]. When requesting the environments we enable the workshop user interface with:
- The title `OpenShift AI Hackathon`
- Number of instances set to `12`
- AWS Region `eu-west-1`

Each team will also require a Single Node OpenShift cluster, representing their teams on premises environment which we will provision via the [[https://demo.redhat.com/catalog?item=babylon-catalog-prod/openshift-cnv.ocpmulti-single-node-cnv.prod&utm_source=webapp&utm_medium=share-link][Red Hat Demo System]]. When requesting the environments we enable the workshop user interface.

* Cluster setup

* Cluster setup - on-premises
For each cluster provisioned for the hackathon, the following steps need to be performed:

For each on-premises cluster provisioned for the hackathon, the following steps need to be performed:

** Log in to cluster and rosa cli

** Log in to cluster
Before we begin lets ensure our command line tools are authenticated. For `rosa` you'll need a token from the [[https://console.redhat.com/openshift/create/rosa/getstarted][Rosa Console]].

#+NAME: Authenticate cli tools
#+begin_src tmux
oc login --web <api-route>
oc login --username "cluster-admin" --password "${PASSWORD}" <api-route>

rosa login --token "${ROSA_TOKEN}"

aws configure
#+end_src

** Install minio via oc

** Create gpu machine pool

Our first task is to ensure each cluster has a GPU `MachineSet` present, we can follow the instructions from https://cloud.redhat.com/experts/rosa/gpu to complete this.

#+NAME: Create machine pool
#+begin_src tmux
oc new-project minio
# Define paramaters for machineset
export GPU_INSTANCE_TYPE='g5.8xlarge'
export CLUSTER_NAME=rosa-jfccs
export MACHINE_POOL_NAME=nvidia-gpu-pool
export MACHINE_POOL_REPLICA_COUNT=1

# Create the machineset with rosa cli
rosa create machinepool \
--cluster="${CLUSTER_NAME}" \
--name="${MACHINE_POOL_NAME}" \
--replicas="${MACHINE_POOL_REPLICA_COUNT}" \
--instance-type="${GPU_INSTANCE_TYPE}"

# Wait for the machineset to be ready
oc wait --for=jsonpath='{.status.readyReplicas}'=1 machineset \
--selector hive.openshift.io/machine-pool="${MACHINE_POOL_NAME}" \
--namespace openshift-machine-api \
--timeout=600s
#+end_src


** Install and configure minio via oc

Once the cluster gpu machinepool has been created we need to deploy [[https://min.io/][minio]] so we can create storage buckets and pre seed models on the cluster for hackathon participants to consume.

oc apply -f setup/minio-setup.yaml
#+NAME: Install minio via oc
#+begin_src tmux
# Deploy minio
oc new-project minio && oc --namespace minio apply -f setup/minio-setup.yaml

oc rollout status deployment/minio --watch
# Wait for minio to come up
oc --namespace minio rollout status deployment/minio --watch
#+end_src

** TODO Consider creating a cluster web terminal pod

** TODO Create Minio Bucket `models`
With minio deployed we need to create a bucket and upload some content to it.

#+NAME: Configure minio via oc
#+begin_src tmux
oc get pods -n minio
#+end_src


** Download model from huggingface into each `on prem` clusters Minio `model`'s bucket

Expand All @@ -55,7 +109,7 @@ or use this:
https://github.com/tnscorcoran/rhods-finetunning-demo/blob/main/vllm_get_from_huggingface.ipynb


** Upload model to on-prem cluster minio
** Upload model to cluster minio
Consider using this:
https://github.com/tnscorcoran/rhods-finetunning-demo/blob/main/vllm_push_to_minio.ipynb

Expand All @@ -72,20 +126,3 @@ python3 setup/minio-upload.py
#+end_src


* Cluster setup - cloud

** Login to rosa cluster

#+begin_src tmux
oc login --web <api-route>
#+end_src

** Install minio via oc

#+begin_src tmux
oc new-project minio

oc apply -f setup/minio-setup.yaml

oc rollout status minio --watch
#+end_src

0 comments on commit a3064d3

Please sign in to comment.