diff --git a/README.org b/README.org index 2e94a8a..8650b69 100644 --- a/README.org +++ b/README.org @@ -1,32 +1,83 @@ -#+TITLE: OpenShift Workshops -#+AUTHOR: James Blair -#+DATE: <2023-12-04 Mon> +#+TITLE: OpenShift AI Hackathon +#+AUTHOR: James Blair, Tom Corcoran, Neo Xu +#+DATE: <2024-06-06 Thu> -This repository contains a basic [[https://nextjs.org/][nextjs]] frontend designed to be exported as a static site and served via [[https://pages.github.com/][github pages]]. +This repository contains a basic [[https://nextjs.org/][nextjs]] frontend designed to be exported as a static site and served via [[https://pages.github.com/][github pages]], for the purposes of running an OpenShift AI hackathon. -The frontend is used to serve workshop instructions for several workshops. +Below are the instructions for manually setting up an environment to run the hackathon. -** Local development +* Cluster provisioning -To set up a local development environment run the following: +** Cloud cluster -#+begin_src bash -# Install dependencies -npm install +Each team will require an AWS ROSA cluster, which we will provision via the [[https://demo.redhat.com/catalog?item=babylon-catalog-prod/sandboxes-gpte.rosa.prod&utm_source=webapp&utm_medium=share-link][Red Hat Demo System]]. When requesting the environments we enable the workshop user interface. -# Build and serve the site -npm run build && npm run serve +** On-premises cluster + +Each team will require a Single Node OpenShift cluster, representing their teams on premises environment which we will provision via the [[https://demo.redhat.com/catalog?item=babylon-catalog-prod/openshift-cnv.ocpmulti-single-node-cnv.prod&utm_source=webapp&utm_medium=share-link][Red Hat Demo System]]. When requesting the environments we enable the workshop user interface. + + +* Cluster setup - on-premises + +For each cluster provisioned for the hackathon, the following steps need to be performed: + + +** Log in to cluster + +#+begin_src tmux +oc login --web #+end_src +** Install minio via oc + +#+begin_src tmux +oc new-project minio + +oc apply -f setup/minio-setup.yaml + +oc rollout status minio --watch +#+end_src + +** TODO Consider creating a cluster web terminal pod + +** Download model from huggingface + +#+begin_src tmux +HUGGINGFACE_TOKEN="HUGGINGFACE_TOKEN" +pip install --upgrade huggingface_hub +huggingface-cli login --token "${HUGGINGFACE_TOKEN}" +git clone https://huggingface.co/instructlab/granite-7b-lab +#+end_src + +** Upload model to on-prem cluster minio + +TODO Run aws configure and pull values out of that automatically. + +#+begin_src tmux +export AWS_ACCESS_KEY_ID= +export AWS_SECRET_ACCESS_KEY= +export AWS_DEFAULT_REGION= +export AWS_S3_ENDPOINT=$(oc get route minio-api -o jsonpath='.spec.host') +export AWS_S3_BUCKET="models" + +python3 setup/minio-upload.py +#+end_src + + +* Cluster setup - cloud + +** Login to rosa cluster + +#+begin_src tmux +oc login --web +#+end_src -** Exporting static site +** Install minio via oc -To export the site to static html to serve for example via github pages, run: +#+begin_src tmux +oc new-project minio -#+begin_src bash -# Install dependencies -npm install +oc apply -f setup/minio-setup.yaml -# Build and export the site -npm run build && npm run export +oc rollout status minio --watch #+end_src diff --git a/data/workshop/scenario2.mdx b/data/workshop/scenario2.mdx index bf1f86a..05b97b4 100644 --- a/data/workshop/scenario2.mdx +++ b/data/workshop/scenario2.mdx @@ -1,23 +1,27 @@ --- -title: Hybrid Cloud AI model deployment +title: Enabling GPU accelerators exercise: 2 -date: '2024-06-05' +date: '2024-06-06' tags: ['openshift','ai','kubernetes'] draft: false authors: ['default'] -summary: "Let's deploy the first model across the hybrid cloud." +summary: "How do we use GPU accelerators??" --- -As a sales team you've got an upcoming demo with the Acme Financial Services data science team, who have been training models on their laptops. -The team have given you access to one of their models in the ACME Financial Services object storage and want to see how this could be deployed to a cluster running in the cloud. +As a sales team you've got an upcoming demo with the Acme Financial Services data science team, who have asked you to show them how to enable GPU support in Red Hat OpenShift Service on AWS (ROSA). +You've spun up a demo environment to show them how it's done. -## 2.1 - Replicate Model to Cloud Storage +## 3.1 - Replicate Model to Cloud Storage For this task, your team are required to use the `granite-7b-lab` model available in the object storage running in the ACME Financial Services on prem cluster which is based on Minio. -After locating the model in on premises object storage, your team need to replicate this model to the ACME Financial Services cloud object storage +After locating the model in on premises object storage, your team need to replicate this model to the ACME Financial Services cloud cluster object storage so that it could be served in future. -## 2.2 - Deploy the model -For this challenge you'll be given two OpenShift clusters +## 3.2 - Install openshift ai + +Now that you've helped the ACME team replicate their chosen model to their cloud OpenShift Cluster, they want to serve the model ASAP. + +For this challenge your team must demonstrate to ACME how to install OpenShift AI, and serve the existing model called `granite-7b-lab` via OpenShift AI. + diff --git a/data/workshop/scenario3.mdx b/data/workshop/scenario3.mdx new file mode 100644 index 0000000..73b42e9 --- /dev/null +++ b/data/workshop/scenario3.mdx @@ -0,0 +1,27 @@ +--- +title: Hybrid Cloud AI model deployment +exercise: 3 +date: '2024-06-05' +tags: ['openshift','ai','kubernetes'] +draft: false +authors: ['default'] +summary: "Let's deploy the first model across the hybrid cloud." +--- + +As a sales team you've got an upcoming demo with the Acme Financial Services data science team, who have been training models on their laptops. +The team have given you access to one of their models in the ACME Financial Services object storage and want to see how this could be deployed to a cluster running in the cloud. + + +## 3.1 - Replicate Model to Cloud Storage + +For this task, your team are required to use the `granite-7b-lab` model available in the object storage running in the ACME Financial Services on prem cluster which is based on Minio. + +After locating the model in on premises object storage, your team need to replicate this model to the ACME Financial Services cloud cluster object storage so that it could be served in future. + + +## 3.2 - Install openshift ai + +Now that you've helped the ACME team replicate their chosen model to their cloud OpenShift Cluster, they want to serve the model ASAP. + +For this challenge your team must demonstrate to ACME how to install OpenShift AI, and serve the existing model called `granite-7b-lab` via OpenShift AI. + diff --git a/setup/minio-setup.yaml b/setup/minio-setup.yaml new file mode 100644 index 0000000..d82e370 --- /dev/null +++ b/setup/minio-setup.yaml @@ -0,0 +1,164 @@ +--- +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: minio-pvc +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 40Gi + +--- +kind: Secret +apiVersion: v1 +metadata: + name: minio-secret +stringData: + # change the username and password to your own values. + # ensure that the user is at least 3 characters long and the password at least 8 + minio_root_user: minio + minio_root_password: minio123 + +--- +kind: Deployment +apiVersion: apps/v1 +metadata: + name: minio +spec: + replicas: 1 + selector: + matchLabels: + app: minio + template: + metadata: + creationTimestamp: null + labels: + app: minio + spec: + volumes: + - name: data + persistentVolumeClaim: + claimName: minio-pvc + containers: + - resources: + limits: + cpu: 250m + memory: 1Gi + requests: + cpu: 20m + memory: 100Mi + readinessProbe: + tcpSocket: + port: 9000 + initialDelaySeconds: 5 + timeoutSeconds: 1 + periodSeconds: 5 + successThreshold: 1 + failureThreshold: 3 + terminationMessagePath: /dev/termination-log + name: minio + livenessProbe: + tcpSocket: + port: 9000 + initialDelaySeconds: 30 + timeoutSeconds: 1 + periodSeconds: 5 + successThreshold: 1 + failureThreshold: 3 + env: + - name: MINIO_ROOT_USER + valueFrom: + secretKeyRef: + name: minio-secret + key: minio_root_user + - name: MINIO_ROOT_PASSWORD + valueFrom: + secretKeyRef: + name: minio-secret + key: minio_root_password + ports: + - containerPort: 9000 + protocol: TCP + - containerPort: 9090 + protocol: TCP + imagePullPolicy: IfNotPresent + volumeMounts: + - name: data + mountPath: /data + subPath: minio + terminationMessagePolicy: File + image: quay.io/minio/minio:RELEASE.2024-06-04T19-20-08Z + args: + - server + - /data + - --console-address + - :9090 + restartPolicy: Always + terminationGracePeriodSeconds: 30 + dnsPolicy: ClusterFirst + securityContext: {} + schedulerName: default-scheduler + strategy: + type: Recreate + revisionHistoryLimit: 10 + progressDeadlineSeconds: 600 + +--- +kind: Service +apiVersion: v1 +metadata: + name: minio-service +spec: + ipFamilies: + - IPv4 + ports: + - name: api + protocol: TCP + port: 9000 + targetPort: 9000 + - name: ui + protocol: TCP + port: 9090 + targetPort: 9090 + internalTrafficPolicy: Cluster + type: ClusterIP + ipFamilyPolicy: SingleStack + sessionAffinity: None + selector: + app: minio + +--- +kind: Route +apiVersion: route.openshift.io/v1 +metadata: + name: minio-api +spec: + to: + kind: Service + name: minio-service + weight: 100 + port: + targetPort: api + wildcardPolicy: None + tls: + termination: edge + insecureEdgeTerminationPolicy: Redirect + +--- +kind: Route +apiVersion: route.openshift.io/v1 +metadata: + name: minio-ui +spec: + to: + kind: Service + name: minio-service + weight: 100 + port: + targetPort: ui + wildcardPolicy: None + tls: + termination: edge + insecureEdgeTerminationPolicy: Redirect diff --git a/setup/minio-upload.py b/setup/minio-upload.py new file mode 100644 index 0000000..9e4ba46 --- /dev/null +++ b/setup/minio-upload.py @@ -0,0 +1,8 @@ +import os +import boto3 + +key_id = os.getenv("AWS_ACCESS_KEY_ID") +secret_key = os.getenv("AWS_SECRET_ACCESS_KEY") +region = os.getenv("AWS_DEFAULT_REGION") +endpoint = os.getenv("AWS_S3_ENDPOINT") +bucket_name = os.getenv("AWS_S3_BUCKET")