You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Current check for Pipeline/Monovertex health assumes that once all Conditions are "true" plus Pipeline/monovertex is "Running", that it's healthy. However, it can momentarily be healthy and then become unhealthy after processing some data.
Therefore, we should check that the Pipeline/Monovertex enters the healthy condition by some time x and then remains healthy by some time x + y.
(Note, currently the only Failed Pipelines/Monovertices that have been tested have had invalid specs, so they were marked "Failed". So, we haven't tested any with failing Conditions. We need to test as many types of failed child health conditions as we can.)
Message from the maintainers:
Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.
The text was updated successfully, but these errors were encountered:
juliev0
changed the title
Current check for Pipeline/Monovertex health assumes Conditions have been fully populated
Current check for Pipeline/Monovertex health assumes Conditions will remain healthy if they're initially healthy
Dec 13, 2024
I have spoken to @kohlisid and gotten clarification on this. Due to the fact that Numaflow does initialize conditions at the beginning of reconciliation, we can feel good that the Conditions have been fully populated, which was my original concern.
But there is still a separate concern which I have updated in the revised description/title for this Issue.
Note: we probably need to do the same for isbsvc as well. The isbsvc is a little bit different in that its Health Assessment is comprised of both the health of the isbsvc itself as well as the health assessment results of the pipelines using it.
Describe the bug
Current check for Pipeline/Monovertex health assumes that once all Conditions are "true" plus Pipeline/monovertex is "Running", that it's healthy. However, it can momentarily be healthy and then become unhealthy after processing some data.
Therefore, we should check that the Pipeline/Monovertex enters the healthy condition by some time x and then remains healthy by some time x + y.
(Note, currently the only Failed Pipelines/Monovertices that have been tested have had invalid specs, so they were marked "Failed". So, we haven't tested any with failing Conditions. We need to test as many types of failed child health conditions as we can.)
Message from the maintainers:
Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.
The text was updated successfully, but these errors were encountered: