Skip to content

Repo with tools and info for managing data ingestion from new ChEMBL data for LogD training

Notifications You must be signed in to change notification settings

pharmbio/cpsign-service-management

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CPSign-based service management (WIP)

workflow_overview

Initial setup of workflow

Configure kubernetes

In order to interact with your underlying k8s cluster, for instance connecting to a pod, you need to provide your kube config file. Copy your kubernetes config file to this location within the pod: /root/.kube/config.

Protip: kubectl cp <src> <pod-name>:/root/.kube/config

Note: We are assuming that kubectl is installed locally on the machine you are using to access a k8s cluster.

Create manager pod

In order to communicate with Pachyderm and connect to a preferred database (e.g. the ChEMBL database), you need to work from a pod within the cluster. Thus first create a pod from the following yaml file: manager-pod.yaml with the command kubectl create -f manager-pod.yaml --namespace=labinf .

To connect to the pod, use kubectl exec -it <pod-name> bash .

Creating and populate database

By following this helm chart's instructions, it should be relatively quick to deploy a mysql server pod within the cluster. Depending on one's scenario, it may be necessary to enter custom flags' values different than the default ones available in the related values.yaml file. In this specific case, installing the helm chart with default values is more than enough since the underlying k8s cluster supports dynamic provisioning for persistent storages. Only exceptions might be to double-check resource.request and resource.limits for the mysql pod.

Managing secrets

In order to provide various credentials to the Pachyderm workers, it is possible to provide them as kubernetes secrets. This is done in the pipeline specification under the transform section, see here. Credentials common to most services can be shared, like the CPSign license or Modelingweb Keycloak credentials, and don't require creating new secrets.

Connecting workspace to pachd

To communicate with Pachyderm via its command-line tool pachctl, one needs to either set the environment variable ADDRESS equal to the IP of the pachd service (i.e. export ADDRESS=<pachd IP>), or to execute the pachctl's built-in port-forward command like so: pachctl port-forward --namespace=<desired namespace> &. Every time one connects to the pod this is required.

Set up pachyderm pipelines

The first time with a new workflow the Pachyderm pipelines need to be set up. Modify pipeline specifications as neccessary and create a repo called config, then create the pipelines from the specs using pachctl create-pipeline -f <spec file>. From now, the workflow should be up and running and await the initial input!

Running training in workflow

The following steps assumes that you are connected to the manager pod

Start

Modify the json file configuration.json as neccessary and add it to the pachyderm repo configvia the command: pachctl put-file config master -o -f configuration.json. The first time, the -o flag (overwrite) might cause an error since the file does not exist yet, so just omit it. Whenever this file is updated in the Pachyderm repo, this triggers the workflow from start to finish, and since each change is versioned a good way to keep things tidy is to just overwrite the config file every time.

(Advanced) modify pachyderm pipeline

If you wish to modify the pipeline stages, edit the specification json files under the pipeline-setup folder (which exists within the pod), then run: pachctl update-pipeline -f <spec file>.

About

Repo with tools and info for managing data ingestion from new ChEMBL data for LogD training

Resources

Stars

Watchers

Forks

Packages

No packages published