-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
health_check: add stats counters to monitor health check behavior #37409
base: main
Are you sure you want to change the base?
Conversation
/assign adisuissa |
316b739
to
c3e09d4
Compare
Signed-off-by: Rohit Agrawal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, thanks!
Left a couple of high-level questions, but mostly minor comments.
Statistics | ||
---------- | ||
|
||
The health check filter outputs statistics in the ``<stat_prefix>.health_check.`` namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify which stats_prefix is used here (and if it is optional, what will be the default value).
Another option to consider is adding stats support as optional (i.e., only if the stat_prefix is set). This adds the ability to avoid paying the stats costs if they are not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified it. Please let me know if you think we must do this part - "adding stats support as optional (i.e., only if the stat_prefix is set). This adds the ability to avoid paying the stats costs if they are not needed."
@@ -173,9 +174,16 @@ void HealthCheckFilter::onComplete() { | |||
if (!Http::CodeUtility::is2xx(enumToInt(final_status))) { | |||
callbacks_->streamInfo().setResponseFlag( | |||
StreamInfo::CoreResponseFlag::FailedLocalHealthCheck); | |||
stats_->failed_.inc(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud: should the failed_/ok_ also be incremented for the cached response?
Either way, the doc above (about the new statistics) should reflect this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does make sense to have fail/ok include the cached response as well. I have clarified this in the doc. Let me know if you think otherwise.
Signed-off-by: Rohit Agrawal <[email protected]>
Signed-off-by: Rohit Agrawal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
/assign-from @envoyproxy/senior-maintainers
@envoyproxy/senior-maintainers assignee is @RyanTheOptimist |
Description
This PR adds stats to the health check HTTP filter. These new stats provide visibility into health check behavior including request counts, successful/failed checks, cached responses, and cluster health status. These stats help operators monitor the health checking system and diagnose issues.
Here is a list of key stats added:
Commit Message: health_check: add stats counters to monitor health check behavior
Additional Description: This change improves observability of the health check filter by exposing key metrics about health check processing and cluster health state. The stats are scoped under the connection manager and follow standard Envoy stats naming conventions.
Risk Level: Low
Testing: Added unit and integration tests verifying all stats counters
Docs Changes: Added
Release Notes: Added