Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodically check connectivity between peer proxies #48838

Merged
merged 3 commits into from
Nov 13, 2024

Conversation

espadolini
Copy link
Contributor

The current implementation of proxy peering reports the state of each connection through the proxy_peer_client_connections metric, but each connection follows the default gRPC behavior of dropping to IDLE after 30 minutes of disuse, so any connectivity problems will only be noticed when a new connection is attempted as a result of user interaction.

This PR adds a periodic health check of proxy peering connections, initiated by the client side of the connection. The state of the health checks is exposed through two new metrics, teleport_proxy_peer_client_pings_total and teleport_proxy_peer_client_failed_pings_total, labeled with the host ID, hostname and group ID of the peer. The metrics can be used to proactively alert for connectivity issues, either for a specific cluster or across clusters (if the group ID matches some geographical region or deployment group, for example).

changelog: added periodic health checks between proxies in proxy peering

@espadolini
Copy link
Contributor Author

The old PR was marked as merged due to an unfortunate rebase/merge incident; the content should be the same save for the QUIC implementation which now lives in #47587 instead.

Base automatically changed from espadolini/quic-proxy-peering-preparation to master November 13, 2024 16:24
@espadolini espadolini force-pushed the espadolini/proxy-peering-ping branch from 5cc26a9 to 9422a05 Compare November 13, 2024 16:42
@espadolini espadolini enabled auto-merge November 13, 2024 16:42
@espadolini espadolini added this pull request to the merge queue Nov 13, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 13, 2024
@espadolini espadolini enabled auto-merge November 13, 2024 17:41
@espadolini espadolini added this pull request to the merge queue Nov 13, 2024
Merged via the queue into master with commit 0f3d691 Nov 13, 2024
42 checks passed
@espadolini espadolini deleted the espadolini/proxy-peering-ping branch November 13, 2024 18:04
@public-teleport-github-review-bot

@espadolini See the table below for backport results.

Branch Result
branch/v17 Create PR

github-merge-queue bot pushed a commit that referenced this pull request Nov 14, 2024
* Make the peer clientConn generic

* Convert the peer server to slog

* Move lib/proxy/clusterdial to lib/peer/dial

* Move peer.clientConn to lib/proxy/peer/internal

* Periodically check connectivity between peer proxies (#48838)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants