Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus integration #456

Open
2 tasks
vitaliimelnychuk opened this issue Jul 25, 2022 · 1 comment
Open
2 tasks

Prometheus integration #456

vitaliimelnychuk opened this issue Jul 25, 2022 · 1 comment
Assignees
Labels
agent api Maestro API enhancement Enhancement of an existing feature

Comments

@vitaliimelnychuk
Copy link
Contributor

vitaliimelnychuk commented Jul 25, 2022

Description

We start having some performance issues with metrics rendering but more important with making analysis on top of already available metrics.

Solution

To make the performance better for real-time metrics we have to think about time-series databases. Having Prometheus as the main data source for metrics can be useful as we make integration with Grafana available by default.

Proof of concept

  • Use pushgateway to send metrics from every agent
  • Integration with Prometheus API to make queries
@vitaliimelnychuk vitaliimelnychuk added enhancement Enhancement of an existing feature api Maestro API agent labels Jul 25, 2022
@vitaliimelnychuk vitaliimelnychuk self-assigned this Jul 25, 2022
@vitaliimelnychuk vitaliimelnychuk pinned this issue Jul 25, 2022
@vitaliimelnychuk
Copy link
Contributor Author

@Farfetch/team-maestro

Since we have a lot of dynamic agents that can be provisioned for Maestro it makes sense to allow them automatically push data to Prometheus instead of scrapping metrics by Prometheus.

Another thing it's security. Agents by default don't expose any port outside. They work just on Pull basics and get regularly updated by making requests to API.

Saying that I think the best way of using Prometheus as a time-serious database we need to use Prometheus pushgateway to allow Maestro agents to send metrics directly there. This is a good way to keep metrics aggregation and sending as agent responsibility based on runner type.

The main downside is Prometheus pushgateway is going to be a single point of failure and the main performance bottleneck. In the future, we probably can have a way to use more than one gateway to scale things up but I don't see this is as a problem for the first versions.

Here is also a quick diagram of how the things will look like:

  graph TB;
     M[Maestro API] -->| GET | P[Prometheus];
     G[Grafana] --> | GET | P;
      P --> | GET | PG[Prometheus PushGateway];
      
      
      MA1[Maestro Agent]  --> | PUSH | PG
      MA2[Maestro Agent] --> | PUSH | PG;
      MA3[Maestro Agent] --> | PUSH | PG;
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent api Maestro API enhancement Enhancement of an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant