Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current check for Pipeline/Monovertex health assumes Conditions will remain healthy if they're initially healthy #445

Open
juliev0 opened this issue Dec 7, 2024 · 3 comments · May be fixed by #499
Assignees
Labels
bug Something isn't working

Comments

@juliev0
Copy link
Collaborator

juliev0 commented Dec 7, 2024

Describe the bug
Current check for Pipeline/Monovertex health assumes that once all Conditions are "true" plus Pipeline/monovertex is "Running", that it's healthy. However, it can momentarily be healthy and then become unhealthy after processing some data.

Therefore, we should check that the Pipeline/Monovertex enters the healthy condition by some time x and then remains healthy by some time x + y.

(Note, currently the only Failed Pipelines/Monovertices that have been tested have had invalid specs, so they were marked "Failed". So, we haven't tested any with failing Conditions. We need to test as many types of failed child health conditions as we can.)


Message from the maintainers:

Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.

@juliev0 juliev0 added the bug Something isn't working label Dec 7, 2024
@juliev0 juliev0 changed the title Current check for Pipeline/Monovertex health assumes Conditions have been fully populated Current check for Pipeline/Monovertex health assumes Conditions will remain healthy if they're initially healthy Dec 13, 2024
@juliev0
Copy link
Collaborator Author

juliev0 commented Dec 13, 2024

I have spoken to @kohlisid and gotten clarification on this. Due to the fact that Numaflow does initialize conditions at the beginning of reconciliation, we can feel good that the Conditions have been fully populated, which was my original concern.

But there is still a separate concern which I have updated in the revised description/title for this Issue.

@juliev0
Copy link
Collaborator Author

juliev0 commented Dec 13, 2024

moving to future sprint

@juliev0
Copy link
Collaborator Author

juliev0 commented Jan 4, 2025

Note: we probably need to do the same for isbsvc as well. The isbsvc is a little bit different in that its Health Assessment is comprised of both the health of the isbsvc itself as well as the health assessment results of the pipelines using it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants