-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds prometheus export point to the agent pod. #2983
base: main
Are you sure you want to change the base?
Adds prometheus export point to the agent pod. #2983
Conversation
Note - this should be behind a flag. |
…int-to-the-agent-pod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't go over the code in agent submodules yet, because I have an important request that might require changes there - please no new actor framework ;_;
You're already using axum
router, so why not go with State
? 👀
I'd just put all metrics in there (AtomicU64
), wrap them together in Arc
and share between the modules and the metrics endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're missing:
- Stolen HTTP requests in progress
- Connected clients
- DNS lookups in progress
I see you went for global gauges, that's fine as well
mirrord/agent/src/entrypoint.rs
Outdated
tokio::spawn(async move { | ||
start_metrics(address) | ||
.await | ||
.inspect_err(|fail| tracing::error!(?fail, "Failed starting metrics server!")) | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something should watch over the metrics task. Maybe the CancellationToken
we have in start_agent
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Help, how do I make this happen? Just clone
the cancellation token before tokio::spawn
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't like statics, you can go with sth like this:
#[derive(Clone)]
struct Metrics {
open_file_descriptors: IntCounter,
// other counters
}
// pass registry to the metrics http server
// pass metrics clones to agent tasks
fn setup_metrics_registry() -> (Registry, Metrics) {
let open_file_descriptors = IntCounter::new("name", "help").unwrap();
let registry = Registry::new();
registry.register(Box::new(open_file_descriptors.clone())).unwrap();
let metrics = Metrics {
open_file_descriptors,
};
(registry, metrics)
}
// metrics endpoint handler
async fn get_metrics(State(registry): State<Registry>) -> (StatusCode, String) {
let mut buffer = String::new();
let encoder = TextEncoder::new();
let metric_families = registry.gather();
match encoder.encode_utf8(&metric_families, &mut buffer) {
Ok(()) => (StatusCode::OK, buffer),
Err(error) => {
(StatusCode::INTERNAL_SERVER_ERROR, error.to_string())
}
}
}
// metrics http server router
async fn metrics_router(registry: Registry) -> Router {
Router::new().route("/metrics", get(get_metrics)).with_state(registry)
}
Imho looks nicer and will allow for unit testing (statics would be shared across tests)
Co-authored-by: Michał Smolarek <[email protected]>
Co-authored-by: Michał Smolarek <[email protected]>
(wip, missing docs)
Note for reviewer
I've tried keeping the changes in each system to each commit, hopefully that helps when reviewing this, since it touches a bunch of files.