-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exclude image-field from notebook controller reconciliation for all containers present in annotation #72
exclude image-field from notebook controller reconciliation for all containers present in annotation #72
Conversation
Hi @shalberd. Thanks for your PR. I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
For functional testing, assuming the ci tests run through alright, I will take the notebook controller image from this PR, |
…tatefulset if present in notebook, to exclude image-fields in compare, to set image-field to new value in reconcile
…ranch changes in tag v.1.6.1
3c5297c
to
2497bf2
Compare
/ok-to-test |
regarding tests, absolutely, yes. Just give me a pointer where I need to go to, please. |
…per utility differences related to image-field reconciliation and affected containers exclude
…ents passed as key-value-pairs for logging and controller restarting by correctly using go-logr (msg, keysAndValues...)
It worked. Tested on an OCP 4.10.43 Cluster without internal Openshift Registry. Used Manifests and Operator Version 1.4.2. to
Modified a Notebook CR assembled by odh-dashboard to include the image change trigger annotation. Image-field-value of the container in the notebook spec can be anything, it does not matter, could even be empty. The generated StatefulSet has the same annotation now and the image-field-value is resolved correctly by the image policy admission plug-in from the imagestream name and tag in the annotation to imagestream.tag.from.name, as I do not have an internal openshift registry. And, most important, the StatefulSet container image field value stays that way :-) |
Now thinking of a good reconciliation unit test at https://github.com/opendatahub-io/kubeflow/blob/v1.6-branch/components/notebook-controller/controllers/notebook_controller_test.go .... not with regards to notebook status, but to notebook-derived statefulset image-field pre-reconcile vs. post-reconcile. @VaishnaviHire I don't really see any detailed testing logic regarding notebook controller reconciliation utils. How critical is it having them? Tested the built kubeflow-notebook-controller in conjunction with odh dashboard and the new notebook Cr assembly logic, working well. That is, the original placeholder-value of the container image field is not kept as the source of truth and the updated container image field value is set as the new desired value, leading to reconciliation not overwriting the image-value that the openshift image policy plug-in set based on the annotation. |
@shalberd Was the original issue caused by the application of |
no, the original issue was not directly caused by the application of image.openshift.io.triggers on the StatefulSet by K8S. The image.openshift.io/triggers annotation with the help of the openshift ImagePolicy admission plug-in fills in the image-field value of the container correctly after the StatefulSet is applied, but then, the notebook controller overwrites it to the original value on reconciling StatefulSet new values and original values. This makes sense, though, because Kubernetes does not know the concept of image trigger admission controllers and imagestreams. Background discussion with Ben Parees, Senior Principal Software Engineer at RedHat The annotation needs to be applied manually to the Notebook CR also because we do not build StatefulSet CRs, we assemble and submit Notebook CRs, that the contoller reacts to and assembles a StatefulSet from. Now, here is the historical flow of events leading to this solution:
|
End-to-end test here: |
/hold Holding this PR until we test a scenario for long running notebooks, and how this change will affect Notebook restarts |
@VaishnaviHire @lucferbux @andrewballantyne @LaVLaS First of all, we assume imagestream spec.lookupPolicy.local: true, as it is in all our imagestreams. To your question on what happens if the tag (of the image underlying the imagestream tag) moves, that is, if the SHA256 digest of an image tag changes: It depends on which tag.referencePolicy.type you use in the imagestream. You can set that per-tag. If it is set to "Source", then the container is created from the external tag url, so the new image behind the tag is always used. Pod parameter imagePullPolicy: Always is important in this case otherwise the old image may be cached on the node by the container runtime. If set to "Local", then the ImageStream tag points to the sha256 ID of the image, so the container is created from the same image even if the external tag moves. In contrast, using --scheduled=true or manually refreshing the ImageStream (e.g. via oc apply of the imagestream) will update the sha256 ID it refers to, can be seen under imagestream status section for all tags. ImageStreams can periodically (15 mins) monitor if the external tag changes by setting tag[x]. importPolicy: {scheduled: true} By default, imagestream.tag importpolicy is set to importPolicy: {} non-scheduled. I think most of your imagestreams, except Elyra in the overlay, use tag.referencePolicy.type: Local. That is the current situation. About triggering updates on imagestream changes (the new annotation way): When one of the core Kubernetes resources (e.g StatefulSet) contains both a pod template and this annotation, OpenShift Container Platform attempts to update the object by using the image currently associated with the image stream tag that is referenced by trigger. Meaning imagestream.status.tag[x].items.image. So, currently, unless you have tag.referencePolicy.Type: Source or you re-apply an imagestream yaml as a whole or you set tag[x]. importPolicy: {scheduled: true}, the status-section of the imagestream and the tag[x].items.image sha256 digest should always stay the same when the image behind the -weekly build tag changes. That holds true for the new image change trigger annotation, too. As a whole, it really depends on what you want to achieve. Currently, your -weekly images in the imagestream tags for CICD have referencePolicy: type: Local. Infos compiled from https://itnext.io/variations-on-imagestreams-in-openshift-4-f8ee5e8be633 Essentially, on any given openshift cluster, with the way you use the dynamic -weekly imagestream tags right now, your containers will stay stable and refer to the old digest pre-update of the -weekly external image. Let me know how your running notebooks behave once an image changes in -weekly (digest and last-update time changes in quai.io). They should stay stable and not restart, provided everything else on the cluster is unchanged, including the imagestream yaml. |
@shalberd I might be missing some steps, I tested this with
However, the image is not updated Podspec
|
@VaishnaviHire when you add the annotation manually to a Notebook CR instance, it is important to check what the name of that first container is. is the name of the notebook container, the one where you then checked the image-field in the created statefulset, really In any case, it has to be the same as the container name field value. fieldPath in the annotation is kind of a lookup pointing to what image-field to update (of which container, by container name). |
Yes it matches with my container name. I also do not see the log message defined here. |
/test kf-notebook-controller-pr-image-mirror |
Also, what about the namespace in the annotation? Is it really the namespace of the main namespace where e.g. odh dashboard and notebook controller and imagestreams live in? In my case, I was using a namespace called opendatahub, but I know sometimes your default is e.g. odh. There are also wrongly-formatted quotation marks surrounding opendatahub and before fieldPath in the snippet you pasted. Sorry I did not see that before in your snippet you pasted:
If you test this without odh-dashboard:pr-800, then it is also important you take a notebook def not managed by odh dahsboard, I think. Just make a copy and name the notebook CR sligthly differently. But, regarding this point, I do not think it is that critical. You mentioned you are manually testing on own Notebook CRs anyways. Try out the quotation marks being correct |
/retest |
/retest |
@harshad16 this change lets notebook controller reconcile all elements of a notebook / statefulset pod spec when no image change trigger annotation is present. That is, old notebooks assembled the old way will still work as expected. For newly created notebooks in ODH Dashboard PR 800 way, the image change triggger annotation admission plugin of openshift will handle setting the value of container image field. When there is no internal openshift registry, the value will always be the sha256 digest location and path from the external registry, e.. quay.io. When there is an internal openshift registry and tag referencePolicy is set to Local, the internal registry location will be used. When there is an internal openshift registry and tag referencePolicy is set to Source, the sha256 digest location and path from the external registry, e.. quay.io, will be used. The changes in this PR here make sure that the original placeholder value for the pod container image field from Notebook CR does not come back to Statefulset and therefore pod-container image field after image change trigger admission plugin did its work resolving the image-location. @VaishnaviHire thank you, too, for all the help and having looked around on this, especially with regards to long-running notebooks. |
fixes kubeflow/notebooks#98
superseded by PR against v1.7-branch in PR-133
Looking for feedback before thinking of merging. AFIK, this should be pretty close to being mergeable, though.
seems to build alright locally, except unclear to me how to update the common/components, where reconcilehelper lies.
Got an error that is probably easy to resolve and dependent on the build chain:
that Problem was resolved by referencing our version of the reconcilehelper utilities-module under /components/common in notebook controlller go.mod:
Key idea:
Updating notebook controller logic and reconcilehelper util logic to take into account optional Openshift image policy admission plug-in image-change trigger annotation with one or more containers and their related imagestreams (json array) that leads to container image-field update and to avoid overwriting the updated image-field value by the notebook controller.
Discussed problem with @bparees , received input from @VaishnaviHire, member of Open Data Hub (a RedHat project for Openshift) team. Also discussed how to resolve the issue in Notebook reconciler with Ben.
Context: Notebook yaml contains the (optional) metadata annotation
referencing imagestream name and tag, imagestream namespace, and fieldPath reference to the image-value to be replaced by the image policy admission plug-in for a given container name. This annotation can reference one or more containers, in the form of a json array. This image-name resolve by the image policy admission plug-in happens in basic Kubernetes objects, not in the Notebook yaml itself, but in the StatefulSet derived from the Notebook yaml. The resolved image-name is set briefly after application of the StatefulSet to the server.
Or, referencing the openshift documentation:
When one of the core Kubernetes resources contains both a pod template and this annotation, OpenShift Container Platform attempts to update the object by using the image currently associated with the image stream tag that is referenced by trigger. The update is performed against the fieldPath specified.
To ensure that image-fields referenced via this annotation in the StatefulSet are not overwritten by the kubeflow notebook controller reconcile process, we set image-fields-values before change and after change to be equal, thus leading to DeepEqual returning false even when the image-field value changes.
Because the annotation is optional, checks are introduced in the code to ensure any custom logic is only done when the annotation is present.
Background:
making image-field value independent of internal openshift registry and supporting ImageContentSourcePolicy for Openshift environments with a third-party Docker repo, like VMWare Harbor. This is archieved by using the imagestream abstraction. Referencing imagestream name and tag instead of image-digest-values directly. Imagestreams are kind of an abstraction in openshift.
See what happens to the image-field cause of the image policy admission plug-in doing its work:
<image stream name>:<image stream tag>
references in image-field or annotation of a Kubernetes object (Deployment, StatefulSet, Pod etc.) always resolve to external image reference from imagestream spec.tags[tagname].from.name regardless of whether spec.tags[tagname].referencePolicy.type is set to Local or Source in image stream--> perfect for air gapped and ImageContentSourcePolicy use or if one just wants to pull directly from the external location when there is no internal openshift registry.if spec.tags[tagname].referencePolicy.type of the imagestream is set to Source then
<image stream name>:<image stream tag>
references in image-field or annotation of a Kubernetes object (Deployment, StatefulSet, Pod etc.) resolves to the remote repo location:if spec.tags[tagname].referencePolicy.type of the imagestream is set to Local then
<image stream name>:<image stream tag>
references in image-field or annotation of a Kubernetes object (Deployment, StatefulSet, Pod etc.) resolves to the namespace-specific location of the image in the open shift-internal registry: