In order to interact with your underlying k8s cluster, for instance connecting to a pod, you need to provide your kube config
file. Copy your kubernetes config file to this location within the pod: /root/.kube/config
.
Protip: kubectl cp <src> <pod-name>:/root/.kube/config
Note: We are assuming that kubectl
is installed locally on the machine you are using to access a k8s cluster.
In order to communicate with Pachyderm and connect to a preferred database (e.g. the ChEMBL database), you need to work from a pod within the cluster. Thus first create a pod from the following yaml file: manager-pod.yaml
with the command kubectl create -f manager-pod.yaml --namespace=labinf
.
To connect to the pod, use kubectl exec -it <pod-name> bash
.
By following this helm chart's instructions, it should be relatively quick to deploy a mysql server pod within the cluster. Depending on one's scenario, it may be necessary to enter custom flags' values different than the default ones available in the related values.yaml
file. In this specific case, installing the helm chart with default values is more than enough since the underlying k8s cluster supports dynamic provisioning for persistent storages. Only exceptions might be to double-check resource.request
and resource.limits
for the mysql pod.
In order to provide various credentials to the Pachyderm workers, it is possible to provide them as kubernetes secrets. This is done in the pipeline specification under the transform
section, see here. Credentials common to most services can be shared, like the CPSign license or Modelingweb Keycloak credentials, and don't require creating new secrets.
To communicate with Pachyderm via its command-line tool pachctl
, one needs to either set the environment variable ADDRESS
equal to the IP of the pachd
service (i.e. export ADDRESS=<pachd IP>
), or to execute the pachctl
's built-in port-forward
command like so: pachctl port-forward --namespace=<desired namespace> &
. Every time one connects to the pod this is required.
The first time with a new workflow the Pachyderm pipelines need to be set up. Modify pipeline specifications as neccessary and create a repo called config
, then create the pipelines from the specs using pachctl create-pipeline -f <spec file>
. From now, the workflow should be up and running and await the initial input!
The following steps assumes that you are connected to the manager pod
Modify the json file configuration.json
as neccessary and add it to the pachyderm repo config
via the command: pachctl put-file config master -o -f configuration.json
. The first time, the -o
flag (overwrite) might cause an error since the file does not exist yet, so just omit it. Whenever this file is updated in the Pachyderm repo, this triggers the workflow from start to finish, and since each change is versioned a good way to keep things tidy is to just overwrite the config file every time.
If you wish to modify the pipeline stages, edit the specification json files under the pipeline-setup
folder (which exists within the pod), then run: pachctl update-pipeline -f <spec file>
.