Skip to content
This repository has been archived by the owner on Jul 10, 2024. It is now read-only.

[DESIGN] Add a CRD to install and control jupyter image in k8s #895

Open
cdmikechen opened this issue Mar 5, 2022 · 0 comments
Open

[DESIGN] Add a CRD to install and control jupyter image in k8s #895

cdmikechen opened this issue Mar 5, 2022 · 0 comments

Comments

@cdmikechen
Copy link
Contributor

cdmikechen commented Mar 5, 2022

At present, the juypter image size is very large, so that when users deploy juypter service in a new k8s cluster or node, there will be a long waiting process.
This issue is mainly to discuss the design idea of building an operator based on CRD that can connect with existing submarine services and has certain controllability / predictability. Based on a new CRD, we can automatically call the image pull action in every suitable node before the juypter service is deployed, so that every node in k8s has the corresponding image.

In this case, we need to create a CRD which contains a list of images to be obtained, the refresh time, and the pull secret key of each image (if necessary). Examples of CRD are as follows:

apiVersion: org.apache.submarine/v1
kind: JupyterImagePuller
metadata:
  name: example-image-puller
  namespace: submarine
spec:
  images: # the list of images to pre-pull
    - name: jupyter # environment name
      image: apache/submarine:jupyter-notebook-0.7.0 # image name
    - name: jupyter-gpu
      image: xxx.harbor.com/5000/apache/submarine:jupyter-notebook-gpu-0.7.0
      auth: # docker registry authentication
        username: xxxx
        password: xxxx
        email: [email protected] # Optional
    - name: jupyter 
      image: apache/submarine:jupyter-notebook-0.7.0-chinese
      auth: 
        secret: xxxx # If there is already a specified secret, we can fill in the secret name 
  nodeSelector: {} # node selector applied to pods created by the daemonset
  refreshHours: '2' # number of hours between health checks
status:
  images:
    - name: apache/submarine:jupyter-notebook-0.7.0
      state: success/failure/pulling
      message: Reasons for pull failure ...
      digest: sha256:f04468d5ec5bdcda7a6ebdd65b20a7b363f348f1caef915df4a6cc8d1eb09029
      nodes:
        - worker1.xxxx.com

Every time submarine updates the environments, it will update the image list in CRD. After reading the spec of CRD and triggering the addition / modification, the operator can create a DaemonSet in the specified namespace (with nodeSelector). The DaemonSet will contain N (images list size) containers which can pull every image by CRD.
This operation will modify the entrypiont script in the docker image and output words like "Pulling complete", so it's a lightweight task.

spec:
 initContainers:
    - name: image-pull-{image-name}
      command:
        - /bin/sh
        - -c
        - echo "Pulling complete"

Docker image registry authorization

Docker authentication should be provided in environment. We should consider some private clouds or private image registry (like harbor). In some cases, we need to provide the docker authentication for downloading.
We can support users to directly enter the user name and password or use the authentication information already on k8s's Secret.

{
  "name": "notebook-gpu-env",
  "dockerImage": "apache/submarine:jupyter-notebook-gpu-0.7.0",
  "dockerAuthSpec": {
    "type": "password",
    "username": "xxxx",
    "password": "xxxx",
    "email": null
  }
} 

Image nodeSelector

At present, this chapter mainly considers GPU image. We need to consider that some k8s GPU resources may only exist on some exclusive nodes. Therefore, we need to add a nodeselector to the deployment of pod. Meanwhile, the environment also needs to add nodeselector. In this way, the GPU image pod can be deployed on the correct node.

Docker image version update strategy

Build a ConfigMap to save the image and tag information.
When the refresh hour is reached, the updated image will be compared with the latest tag / hash to distinguish whether there is an image update. If there is an update, a new pull operation is automatically triggered.

kind: ConfigMap
apiVersion: v1
metadata:
  name: submarine-image-pull-operator-records
  namespace: submarine
data:
  images.list: |-
    apache/submarine:jupyter-notebook-0.7.0@sha256:f04468d5ec5bdcda7a6ebdd65b20a7b363f348f1caef915df4a6cc8d1eb09029
    xxx.harbor.com/5000/apache/submarine:jupyter-notebook-gpu-0.7.0@sha256:1ccc0a0ca577e5fb5a0bdf2150a1a9f842f47c8865e861fa0062c5d343eb8cac

It should be noted that there may be jupyter and jupyter-gpu images under each tenant. Since the repeated image pull operation will not bring too much additional burden, we allow different tenants to have the same image resources.


There are still some contents to be designed, which will be explained later.

  • TODO 1: How to init/replace docker image name and authorization when deploy a new submarine service.
@cdmikechen cdmikechen changed the title Add a CRD to install and control jupyter image in k8s [DESIGN] Add a CRD to install and control jupyter image in k8s May 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant