Here you can find examples of using Katib with Argo Workflows.
Note: You have to install Argo Workflows >= v3.1.3
to use it in Katib Experiments.
To deploy Argo Workflows v3.1.3
, run the following commands:
kubectl create namespace argo
kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/download/v3.1.3/install.yaml
Check that Argo Workflow components are running:
$ kubectl get pods -n argo
NAME READY STATUS RESTARTS AGE
argo-server-5bbd69cc6b-6nvb6 1/1 Running 0 20s
workflow-controller-5f48fb7c8-vw9bp 1/1 Running 0 20s
After that, run below command to enable Katib Metrics Collector sidecar injection:
kubectl patch namespace argo -p '{"metadata":{"labels":{"katib.kubeflow.org/metrics-collector-injection":"enabled"}}}'
Note: Argo Workflows are using docker
as a
default container runtime executor.
Since Katib is using Metrics Collector sidecar container and Argo Workflows controller
should not kill sidecar containers, you have to modify this
executor to emissary
.
Run the following command to change the containerRuntimeExecutor
to emissary
in the
Argo workflow-controller-configmap
kubectl patch ConfigMap -n argo workflow-controller-configmap --type='merge' -p='{"data":{"containerRuntimeExecutor":"emissary"}}'
Verify that containerRuntimeExecutor
has been modified:
$ kubectl get ConfigMap -n argo workflow-controller-configmap -o yaml | grep containerRuntimeExecutor
containerRuntimeExecutor: emissary
To run Argo Workflow within Katib Trials you have to update Katib ClusterRole's rules with the appropriate permission:
- apiGroups:
- argoproj.io
resources:
- workflows
verbs:
- "get"
- "list"
- "watch"
- "create"
- "delete"
Run the following command to update Katib ClusterRole:
kubectl patch ClusterRole katib-controller -n kubeflow --type=json \
-p='[{"op": "add", "path": "/rules/-", "value": {"apiGroups":["argoproj.io"],"resources":["workflows"],"verbs":["get", "list", "watch", "create", "delete"]}}]'
In addition to that, you have to modify Katib
Controller args
with the new flag --trial-resources
.
Run the following command to update Katib Controller args:
kubectl patch Deployment katib-controller -n kubeflow --type=json \
-p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--trial-resources=Workflow.v1alpha1.argoproj.io"}]'
Check that Katib Controller's pod was restarted:
$ kubectl get pods -n kubeflow
NAME READY STATUS RESTARTS AGE
katib-controller-784994d449-9bgj9 1/1 Running 0 28s
katib-db-manager-78697c7bd4-ck7l8 1/1 Running 0 6m13s
katib-mysql-854cdb87c4-krcm9 1/1 Running 0 6m13s
katib-ui-57b9d7f6dd-cv6gn 1/1 Running 0 6m13s
Check logs from Katib Controller to verify Argo Workflow integration:
$ kubectl logs $(kubectl get pods -n kubeflow -o name | grep katib-controller) -n kubeflow | grep '"CRD Kind":"Workflow"'
{"level":"info","ts":1628032648.6285546,"logger":"trial-controller","msg":"Job watch added successfully","CRD Group":"argoproj.io","CRD Version":"v1alpha1","CRD Kind":"Workflow"}
If you ran the above steps successfully, you should be able to run Argo Workflow examples.
Learn more about using custom Kubernetes resource as a Trial template in the official Kubeflow guides.