Skip to content

Commit

Permalink
Merge pull request #3 from jmhbnz/main
Browse files Browse the repository at this point in the history
Progress on setup guide and scenario 3
  • Loading branch information
tnscorcoran authored Jun 5, 2024
2 parents 3d25dd7 + 2f81c59 commit d238d71
Show file tree
Hide file tree
Showing 5 changed files with 282 additions and 28 deletions.
89 changes: 70 additions & 19 deletions README.org
Original file line number Diff line number Diff line change
@@ -1,32 +1,83 @@
#+TITLE: OpenShift Workshops
#+AUTHOR: James Blair
#+DATE: <2023-12-04 Mon>
#+TITLE: OpenShift AI Hackathon
#+AUTHOR: James Blair, Tom Corcoran, Neo Xu
#+DATE: <2024-06-06 Thu>

This repository contains a basic [[https://nextjs.org/][nextjs]] frontend designed to be exported as a static site and served via [[https://pages.github.com/][github pages]].
This repository contains a basic [[https://nextjs.org/][nextjs]] frontend designed to be exported as a static site and served via [[https://pages.github.com/][github pages]], for the purposes of running an OpenShift AI hackathon.

The frontend is used to serve workshop instructions for several workshops.
Below are the instructions for manually setting up an environment to run the hackathon.

** Local development
* Cluster provisioning

To set up a local development environment run the following:
** Cloud cluster

#+begin_src bash
# Install dependencies
npm install
Each team will require an AWS ROSA cluster, which we will provision via the [[https://demo.redhat.com/catalog?item=babylon-catalog-prod/sandboxes-gpte.rosa.prod&utm_source=webapp&utm_medium=share-link][Red Hat Demo System]]. When requesting the environments we enable the workshop user interface.

# Build and serve the site
npm run build && npm run serve
** On-premises cluster

Each team will require a Single Node OpenShift cluster, representing their teams on premises environment which we will provision via the [[https://demo.redhat.com/catalog?item=babylon-catalog-prod/openshift-cnv.ocpmulti-single-node-cnv.prod&utm_source=webapp&utm_medium=share-link][Red Hat Demo System]]. When requesting the environments we enable the workshop user interface.


* Cluster setup - on-premises

For each cluster provisioned for the hackathon, the following steps need to be performed:


** Log in to cluster

#+begin_src tmux
oc login --web <api-route>
#+end_src

** Install minio via oc

#+begin_src tmux
oc new-project minio

oc apply -f setup/minio-setup.yaml

oc rollout status minio --watch
#+end_src

** TODO Consider creating a cluster web terminal pod

** Download model from huggingface

#+begin_src tmux
HUGGINGFACE_TOKEN="HUGGINGFACE_TOKEN"
pip install --upgrade huggingface_hub
huggingface-cli login --token "${HUGGINGFACE_TOKEN}"
git clone https://huggingface.co/instructlab/granite-7b-lab
#+end_src

** Upload model to on-prem cluster minio

TODO Run aws configure and pull values out of that automatically.

#+begin_src tmux
export AWS_ACCESS_KEY_ID=<placeholder>
export AWS_SECRET_ACCESS_KEY=<placeholder>
export AWS_DEFAULT_REGION=<placeholder>
export AWS_S3_ENDPOINT=$(oc get route minio-api -o jsonpath='.spec.host')
export AWS_S3_BUCKET="models"

python3 setup/minio-upload.py
#+end_src


* Cluster setup - cloud

** Login to rosa cluster

#+begin_src tmux
oc login --web <api-route>
#+end_src

** Exporting static site
** Install minio via oc

To export the site to static html to serve for example via github pages, run:
#+begin_src tmux
oc new-project minio

#+begin_src bash
# Install dependencies
npm install
oc apply -f setup/minio-setup.yaml

# Build and export the site
npm run build && npm run export
oc rollout status minio --watch
#+end_src
22 changes: 13 additions & 9 deletions data/workshop/scenario2.mdx
Original file line number Diff line number Diff line change
@@ -1,23 +1,27 @@
---
title: Hybrid Cloud AI model deployment
title: Enabling GPU accelerators
exercise: 2
date: '2024-06-05'
date: '2024-06-06'
tags: ['openshift','ai','kubernetes']
draft: false
authors: ['default']
summary: "Let's deploy the first model across the hybrid cloud."
summary: "How do we use GPU accelerators??"
---

As a sales team you've got an upcoming demo with the Acme Financial Services data science team, who have been training models on their laptops.
The team have given you access to one of their models in the ACME Financial Services object storage and want to see how this could be deployed to a cluster running in the cloud.
As a sales team you've got an upcoming demo with the Acme Financial Services data science team, who have asked you to show them how to enable GPU support in Red Hat OpenShift Service on AWS (ROSA).

You've spun up a demo environment to show them how it's done.

## 2.1 - Replicate Model to Cloud Storage
## 3.1 - Replicate Model to Cloud Storage

For this task, your team are required to use the `granite-7b-lab` model available in the object storage running in the ACME Financial Services on prem cluster which is based on Minio.

After locating the model in on premises object storage, your team need to replicate this model to the ACME Financial Services cloud object storage
After locating the model in on premises object storage, your team need to replicate this model to the ACME Financial Services cloud cluster object storage so that it could be served in future.

## 2.2 - Deploy the model

For this challenge you'll be given two OpenShift clusters
## 3.2 - Install openshift ai

Now that you've helped the ACME team replicate their chosen model to their cloud OpenShift Cluster, they want to serve the model ASAP.

For this challenge your team must demonstrate to ACME how to install OpenShift AI, and serve the existing model called `granite-7b-lab` via OpenShift AI.

27 changes: 27 additions & 0 deletions data/workshop/scenario3.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
title: Hybrid Cloud AI model deployment
exercise: 3
date: '2024-06-05'
tags: ['openshift','ai','kubernetes']
draft: false
authors: ['default']
summary: "Let's deploy the first model across the hybrid cloud."
---

As a sales team you've got an upcoming demo with the Acme Financial Services data science team, who have been training models on their laptops.
The team have given you access to one of their models in the ACME Financial Services object storage and want to see how this could be deployed to a cluster running in the cloud.


## 3.1 - Replicate Model to Cloud Storage

For this task, your team are required to use the `granite-7b-lab` model available in the object storage running in the ACME Financial Services on prem cluster which is based on Minio.

After locating the model in on premises object storage, your team need to replicate this model to the ACME Financial Services cloud cluster object storage so that it could be served in future.


## 3.2 - Install openshift ai

Now that you've helped the ACME team replicate their chosen model to their cloud OpenShift Cluster, they want to serve the model ASAP.

For this challenge your team must demonstrate to ACME how to install OpenShift AI, and serve the existing model called `granite-7b-lab` via OpenShift AI.

164 changes: 164 additions & 0 deletions setup/minio-setup.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: minio-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 40Gi

---
kind: Secret
apiVersion: v1
metadata:
name: minio-secret
stringData:
# change the username and password to your own values.
# ensure that the user is at least 3 characters long and the password at least 8
minio_root_user: minio
minio_root_password: minio123

---
kind: Deployment
apiVersion: apps/v1
metadata:
name: minio
spec:
replicas: 1
selector:
matchLabels:
app: minio
template:
metadata:
creationTimestamp: null
labels:
app: minio
spec:
volumes:
- name: data
persistentVolumeClaim:
claimName: minio-pvc
containers:
- resources:
limits:
cpu: 250m
memory: 1Gi
requests:
cpu: 20m
memory: 100Mi
readinessProbe:
tcpSocket:
port: 9000
initialDelaySeconds: 5
timeoutSeconds: 1
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
name: minio
livenessProbe:
tcpSocket:
port: 9000
initialDelaySeconds: 30
timeoutSeconds: 1
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
env:
- name: MINIO_ROOT_USER
valueFrom:
secretKeyRef:
name: minio-secret
key: minio_root_user
- name: MINIO_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: minio-secret
key: minio_root_password
ports:
- containerPort: 9000
protocol: TCP
- containerPort: 9090
protocol: TCP
imagePullPolicy: IfNotPresent
volumeMounts:
- name: data
mountPath: /data
subPath: minio
terminationMessagePolicy: File
image: quay.io/minio/minio:RELEASE.2024-06-04T19-20-08Z
args:
- server
- /data
- --console-address
- :9090
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
securityContext: {}
schedulerName: default-scheduler
strategy:
type: Recreate
revisionHistoryLimit: 10
progressDeadlineSeconds: 600

---
kind: Service
apiVersion: v1
metadata:
name: minio-service
spec:
ipFamilies:
- IPv4
ports:
- name: api
protocol: TCP
port: 9000
targetPort: 9000
- name: ui
protocol: TCP
port: 9090
targetPort: 9090
internalTrafficPolicy: Cluster
type: ClusterIP
ipFamilyPolicy: SingleStack
sessionAffinity: None
selector:
app: minio

---
kind: Route
apiVersion: route.openshift.io/v1
metadata:
name: minio-api
spec:
to:
kind: Service
name: minio-service
weight: 100
port:
targetPort: api
wildcardPolicy: None
tls:
termination: edge
insecureEdgeTerminationPolicy: Redirect

---
kind: Route
apiVersion: route.openshift.io/v1
metadata:
name: minio-ui
spec:
to:
kind: Service
name: minio-service
weight: 100
port:
targetPort: ui
wildcardPolicy: None
tls:
termination: edge
insecureEdgeTerminationPolicy: Redirect
8 changes: 8 additions & 0 deletions setup/minio-upload.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
import os
import boto3

key_id = os.getenv("AWS_ACCESS_KEY_ID")
secret_key = os.getenv("AWS_SECRET_ACCESS_KEY")
region = os.getenv("AWS_DEFAULT_REGION")
endpoint = os.getenv("AWS_S3_ENDPOINT")
bucket_name = os.getenv("AWS_S3_BUCKET")

0 comments on commit d238d71

Please sign in to comment.