-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tempo-query: add ReadinessProbe #1061
Conversation
a0bca33
to
29ddb12
Compare
@@ -0,0 +1,16 @@ | |||
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' | |||
change_type: enhancement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this enhancement or bug_fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my point of view its a pure enhancement. Everything works as before. We only support the kubernetes readiness check.
component: tempostack | ||
|
||
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). | ||
note: Add ReadinessProbe to tempo-query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this being added? What are the consequences of this patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of showing that the pod is ready and handling the failure internally, the k8s api will show that its not ready. K8s will do regular calls to the gRPC endpoint to verify that its alive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and we should document #1061 (comment) in the changelog. Impact of adding readiness probe to our deployment.
I would also say that this is more a bug_fix than enhancement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. improves reliability at startup, avoiding lost data
cc @pavolloffay |
Yes, but does it fixes any particular issues in our deployments? |
y, during my multi tenancy debug session I noticed that we sometimes lost some measurements. The test did check for 10 reports but we only had 7-8. Adding a delay solved it. But having the readiness probe made it more resilient. While typing, I think it would actually be a good Idea to add this kind of check to Jaeger-Query too. There is a extra health endpoint on Server Response {"status":"Server available","upSince":"2024-10-15T08:44:13.514559835Z","uptime":"12.13965331s"} |
29ddb12
to
5d6ecd6
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1061 +/- ##
==========================================
+ Coverage 69.14% 69.18% +0.04%
==========================================
Files 110 110
Lines 7059 7069 +10
==========================================
+ Hits 4881 4891 +10
Misses 1888 1888
Partials 290 290
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
# (Optional) One or more lines of additional information to render under the primary note. | ||
# These lines will be padded with 2 spaces and then inserted directly into the document. | ||
# Use pipe (|) for multiline entries. | ||
subtext: Without a readiness check in place, there is a risk that data will be lost when the queryfrontend pod is ready but the tempo query API is not yet available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can be data lost in query frontend? It does not ingest any data? It's used only for querying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The queryfrontend with enabled jaeger contains 3 containers in its pod. tempo-query is part of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that, the point I am making is that the
data will be lost when the queryfrontend pod
implies to me that this fixes data ingestion, which is not the case
Signed-off-by: Benedikt Bongartz <[email protected]>
5d6ecd6
to
fc77c2b
Compare
Closes #1058
Requieres: