Adds prometheus metrics #37

anguswilliams · 2024-10-16T23:54:10Z

Adds endpoint for prometheus metrics and instrument the snapshot function.

This allows for alerts to be created when snapshots fail or have not been taken for some time.

Adds endpoint for prometheus metrics and instrument the snapshot function. This allows for alerts to be created when snapshots fail or have not been taken for some time.

anguswilliams · 2024-10-16T23:56:32Z

Hey @Argelbargel, I've added some prometheus metrics, as I like to alert if backups fail. This allows me to generate alerts via alert manager, e.g.

vrsa_last_snapshot_success != 1

Will also add additional serviceMonitor and ports to your helm chart if you're happy with this.

Argelbargel · 2024-11-04T13:11:12Z

Hi @anguswilliams,

sorry, did not have the time yet to review your pull-request. I'll try to find some time for it this week...

erlisb · 2024-12-10T23:05:18Z

Hi @anguswilliams, thanks a lot this PR.

I tried to test it locally:

erlisb ✗ go build -o vault-raft-snapshot-agent cmd/vault-raft-snapshot-agent/main.go

Using the following config.yaml file:

vault:
  nodes:
    urls:
      - https://127.0.0.1:8200
  auth:
    token: hvs.mP0IOUkQ6SzOXpEehQjGc3Di
snapshots:
  frequency: "2m"
  retain: 10
  storages:
    local:
      path: ./snapshots

erlisb ✗ ./vault-raft-snapshot-agent --config ./config.yaml --log-level="info" --metrics-port=10050                    
2024/12/10 23:56:18 INFO Using configuration from /home/erlisb/hashicorp-vault/vault-raft-snapshot-agent/config.yaml...
2024/12/10 23:56:19 INFO (re-)connected to leader node=https://127.0.0.1:8200
2024/12/10 23:56:19 INFO Successfully uploaded snapshot to all scheduled destinations nextSnapshot=2024-12-10T23:58:18.879+01:00

Provided feature seems to work as expected:

erlisb ✗ curl -k http://127.0.0.1:10050/metrics | grep vrsa
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  9285    0  9285    0     0  1511k      0 --:--:-- --:--:-- --:--:-- 1511k
# HELP vrsa_last_snapshot_size Size of the last snapshot in bytes
# TYPE vrsa_last_snapshot_size gauge
vrsa_last_snapshot_size 17665
# HELP vrsa_last_snapshot_success Returns 1 if the last snapshot was successful and 0 if not
# TYPE vrsa_last_snapshot_success gauge
vrsa_last_snapshot_success 1
# HELP vrsa_last_snapshot_time Unix timestamp of the last snapshot time
# TYPE vrsa_last_snapshot_time gauge
vrsa_last_snapshot_time 1.733871595e+09
# HELP vrsa_next_snapshot_time Unix timestamp of the next scheduled snapshot time
# TYPE vrsa_next_snapshot_time gauge
vrsa_next_snapshot_time 1.733871715e+09

Could you also provide some documentation on how you can access the metrics and what's the idea behind each metric?
That would be really helpful for the end user.

@Argelbargel can you also have a look at it and maybe we can merge it ?

Thank you very much.

erlisb · 2024-12-17T22:54:52Z

Hi @Argelbargel, another kind reminder as we are depending a bit on this feature.
Thank you.

Adds prometheus metrics

e36ede2

Adds endpoint for prometheus metrics and instrument the snapshot function. This allows for alerts to be created when snapshots fail or have not been taken for some time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds prometheus metrics #37

Adds prometheus metrics #37

anguswilliams commented Oct 16, 2024

anguswilliams commented Oct 16, 2024 •

edited

Loading

Argelbargel commented Nov 4, 2024

erlisb commented Dec 10, 2024 •

edited

Loading

erlisb commented Dec 17, 2024

Adds prometheus metrics #37

Are you sure you want to change the base?

Adds prometheus metrics #37

Conversation

anguswilliams commented Oct 16, 2024

anguswilliams commented Oct 16, 2024 • edited Loading

Argelbargel commented Nov 4, 2024

erlisb commented Dec 10, 2024 • edited Loading

erlisb commented Dec 17, 2024

anguswilliams commented Oct 16, 2024 •

edited

Loading

erlisb commented Dec 10, 2024 •

edited

Loading