You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We'd like to understand more about runner's && providers.
We have metrics for the GH API calls, but no metrics for provider calls. We don't currently see if a runner just failed to reach idle state and is just recreated over and over due to the bootstrap timeout.
Let's try to add metrics for provider calls.
The text was updated successfully, but these errors were encountered:
we are already running a patched version of v0.1.4 where we cherry-picked some of the changes (and #217 is in there) we wanted on our side. (feel free to build our patched garm-version by your own and give them a try - all patches are already part of main branch in garm itself)
Out of curiosity: do you want to have more (from a metrics point of view) metrics or is this exactly what you are looking for?
promql-query:
(
sum by (operation, provider) (
rate(
garm_runner_errors_total{app_kubernetes_io_instance="garm-prod",app_kubernetes_io_name="garm"}[5m]
)
)
or
sum by (operation, provider) (
garm_runner_operations_total{app_kubernetes_io_instance="garm-prod",app_kubernetes_io_name="garm"}
*
0
)
)
/
sum by (operation, provider) (
rate(
garm_runner_operations_total{app_kubernetes_io_instance="garm-prod",app_kubernetes_io_name="garm"}[5m]
)
)
*
100
We'd like to understand more about runner's && providers.
We have metrics for the GH API calls, but no metrics for provider calls. We don't currently see if a runner just failed to reach idle state and is just recreated over and over due to the bootstrap timeout.
Let's try to add metrics for provider calls.
The text was updated successfully, but these errors were encountered: