Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v14] Add a guide to metrics for monitoring Teleport #47411

Merged
merged 2 commits into from
Oct 15, 2024

Conversation

ptgott
Copy link
Contributor

@ptgott ptgott commented Oct 9, 2024

Backport #46645 to branch/v14

@ptgott ptgott added the no-changelog Indicates that a PR does not require a changelog entry label Oct 9, 2024
@ptgott ptgott enabled auto-merge October 10, 2024 14:43
@ptgott ptgott force-pushed the bot/backport-46645-branch/v14 branch from c24bb0c to 33d85e2 Compare October 15, 2024 18:50
Closes #40664

This change turns the Metrics guide in `admin-guides` into a conceptual
guide to the most important metrics for monitoring a Teleport cluster.

Since Agent metrics have inconsistent comprehensiveness across Teleport
services--and to reduce the scope of this change--this guide focuses on
self-hosted clusters.

To make this a conceptual guide instead of a reference, this change
removes the reference table from the `admin-guides` metrics page. There
is already a table in the dedicated metrics reference guide.

Note that, while the new metrics guide is specific to self-hosted
clusters, this change does not move the guide to the subsection of Admin
Guides related to self-hosting Teleport. Doing this would mean having
one subsection of Admin Guides for diagnostics-related guides and one
subsection for self-hosted-specific diagnostics, which is potentially
confusing. We may also want to add Agent-specific metrics eventually.

Finally, this change does not include alert thresholds for the metrics
it describes. We can define these in a subsequent change.
- Describe `backend_write_requests_failed_precondition_total`
- Include the precondition metric in the write availability formula.
- Turn the `registered_servers` discussion into a discussion of Teleport
  instance version, since it's not possible to group this metric by
  service and subtract the count of Auth Service/Proxy Service instances
  from the count of all registered services.
@ptgott ptgott force-pushed the bot/backport-46645-branch/v14 branch from 33d85e2 to 2d3c2f3 Compare October 15, 2024 18:50
@ptgott ptgott added this pull request to the merge queue Oct 15, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 15, 2024
@ptgott ptgott added this pull request to the merge queue Oct 15, 2024
Merged via the queue into branch/v14 with commit 6ab05bb Oct 15, 2024
27 checks passed
@ptgott ptgott deleted the bot/backport-46645-branch/v14 branch October 15, 2024 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport documentation no-changelog Indicates that a PR does not require a changelog entry size/md
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants