-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wait_until_pods_running should be able to rule out error pods that are not related to the final state of the owners #1611
Comments
Please clarify what are the specific requirements for undoubtedly ruling a collection of pods as "running". If the answer is "the requirements depend on what we're waiting to be running", then this function should be removed from |
Alternatively we can delete those failed ones after retry started, we don't need to make the change here |
We are going to reimplement this function with Go and won't make any incremental changes to this function. /remove-kind good-first-issue |
This issue is stale because it has been open for 90 days with no |
/reopen |
Seems to me like this entire function could just be:
This doesn't properly check jobs, but what we have today is fairly hit or miss. |
In
test-infra/scripts/library.sh
Line 133 in 97901db
wait_until_pods_running
will only succeed if all pods in the given namespace are inRunning
orCompleted
state.But since k8s has some retry logic, e.g. K8s Job can create a new pod if there is an error, one error pod does not necessarily mean the Job fails - https://prow.knative.dev/view/gcs/knative-prow/pr-logs/pull/knative_serving/6440/pull-knative-serving-integration-tests/1214219978266382337 is an example. In such scenario
wait_until_pods_running
will return an error that is not necessarily true.This function should be general enough to consider and rule out error pods that are not related to the final state of the owners, e.g.
...
FYI @mattmoor
The text was updated successfully, but these errors were encountered: