Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve observability of index queries #11064

Merged
merged 8 commits into from
Oct 27, 2023

Conversation

dannykopping
Copy link
Contributor

@dannykopping dannykopping commented Oct 27, 2023

What this PR does / why we need it:
Requests to the following APIs previously had limited or incorrect logging, which made understanding their behaviour at runtime difficult.

GET /loki/api/v1/labels
GET /loki/api/v1/label//values
GET /loki/api/v1/series
GET /loki/api/v1/index/stats
GET /loki/api/v1/index/volume
GET /loki/api/v1/index/volume_range

All of these APIs now have querier and query-frontend logs; sharding and time-based splitting are applied to these requests, and it's valuable to see all requests.

Which issue(s) this PR fixes:
N/A

Special notes for your reviewer:
Logs are produced by both queriers and query-frontends, but only the QFs produced logs which showed caller=metrics.go. We regularly search on this substring to find our request log, but querier logs were shown as coming from spanlogger.go. I've included a small hack in fixLogger which adds caller=metrics.go to the log line if a SpanLogger is used. It's ugly, but I wanted to get this change in to help with our operational issues rather than refactoring the way we log on some occasions.

We're also considering adding limits (and later pagination) to these endpoints. I propose that we let this change soak in production for a couple weeks so we can make some data-driven decisions about what limits are reasonable.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory.

Danny Kopping added 5 commits October 27, 2023 13:46
@dannykopping dannykopping force-pushed the dannykopping/index-o11y branch from 5abcd5d to ba72587 Compare October 27, 2023 11:46
@dannykopping dannykopping marked this pull request as ready for review October 27, 2023 11:57
@dannykopping dannykopping requested a review from a team as a code owner October 27, 2023 11:57
@slim-bean
Copy link
Collaborator

I love this PR!!! sorry it feels like I am nitting it to death 😬

@dannykopping
Copy link
Contributor Author

dannykopping commented Oct 27, 2023

I love this PR!!! sorry it feels like I am nitting it to death 😬

Not at all! I appreciate the high level of scrutiny; much prefer it to a rubber stamp

pkg/logql/metrics.go Show resolved Hide resolved
pkg/logql/metrics.go Show resolved Hide resolved
Copy link
Collaborator

@slim-bean slim-bean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dannykopping dannykopping enabled auto-merge (squash) October 27, 2023 13:13
Signed-off-by: Danny Kopping <[email protected]>
@dannykopping dannykopping merged commit 34b9b9a into grafana:main Oct 27, 2023
3 checks passed
@dannykopping dannykopping deleted the dannykopping/index-o11y branch October 27, 2023 13:46
rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024
**What this PR does / why we need it**:
Requests to the following APIs previously had limited or incorrect
logging, which made understanding their behaviour at runtime difficult.

[GET
/loki/api/v1/labels](https://grafana.com/docs/loki/latest/reference/api/#list-labels-within-a-range-of-time)
[GET
/loki/api/v1/label/<name>/values](https://grafana.com/docs/loki/latest/reference/api/#list-label-values-within-a-range-of-time)
[GET
/loki/api/v1/series](https://grafana.com/docs/loki/latest/reference/api/#list-series)
[GET
/loki/api/v1/index/stats](https://grafana.com/docs/loki/latest/reference/api/#index-stats)
[GET
/loki/api/v1/index/volume](https://grafana.com/docs/loki/latest/reference/api/#volume)
[GET
/loki/api/v1/index/volume_range](https://grafana.com/docs/loki/latest/reference/api/#volume)

All of these APIs now have querier and query-frontend logs; sharding and
time-based splitting are applied to these requests, and it's valuable to
see all requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants