-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: upgrade kubernetes client from 11.0.0 -> 24.2.0, implement List+Watch in KubeWatcher #32
Conversation
#configuration = client.Configuration() | ||
#api_batch_v1 = client.BatchV1Api(client.ApiClient(configuration)) | ||
#api_v1 = client.CoreV1Api(client.ApiClient(configuration)) | ||
api_batch_v1 = client.BatchV1Api() | ||
api_v1 = client.CoreV1Api() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overriding configuration has changed between versions, but this is not required for how we are using the K8S API client
# Resource version is used to keep track of stream progress (in case of resume) | ||
# List all pods in watched namespace to get resource_version | ||
namespaced_jobs: V1JobList = kubejob.api_batch_v1.list_namespaced_job(namespace=kubejob.get_namespace()) | ||
resource_version = namespaced_jobs.metadata.resource_version if namespaced_jobs.metadata.resource_version else resource_version | ||
|
||
# Then, watch for new events using the most recent resource_version | ||
# Resource version is used to keep track of stream progress (in case of resume/retry) | ||
k8s_event_stream = w.stream(func=kubejob.api_batch_v1.list_namespaced_job, | ||
namespace=kubejob.get_namespace(), | ||
timeout_seconds=timeout_seconds, | ||
resource_version=resource_version) | ||
resource_version=resource_version, | ||
timeout_seconds=timeout_seconds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attempt to implement List+Watch pattern, as described here:
kubernetes-client/python#843 (comment)
Problem
Our Kubernetes Python client is horribly outdated
Intermittent HTTP 500 errors when connecting to Kube API. Once first encountered, this error loops endlessly
Approach
jobmgr.platform.moleculemaker.org
How to Test
This image has been deployed to
job-manager-staging
CLEAN
MOLLI
Error Handling
With no way to reliably reproduce the error, all we can do is wait for a few days and watch the logs to see if the error surfaces again 😔
mmli1
cluster:kubectl config use-context mmli1
job-manager-staging
:kubectl logs -f deploy/job-manager-staging -n staging