Skip to content

Latest commit

 

History

History
128 lines (90 loc) · 7.56 KB

README.md

File metadata and controls

128 lines (90 loc) · 7.56 KB

repo standards badge Deployment

NVVS DevOps Monitor

What is it?

Monitoring solution developed by the NVVS DevOps team (Network Voice Video Service DevOps team) to monitor the applications that this team currently manages.

What applications are being monitored by this solution?

What metrics are monitored?

This is a high level list of metrics which are monitored, if a metric is not mentioned here this does not necessarily mean it is not monitored.

  • MoJO DNS:
    • Uptime
    • Bandwidth
  • MoJO DHCP:
    • Uptime
    • Subnet usage
    • Bandwidth
    • Runtime errors
  • DNS / DHCP Admin Portal
  • SMTP Relay:
    • Message count
    • Deferred messages count
  • Network Access Control Service:
    • Uptime
    • Resource
    • Errors
    • Authentication success / failures
  • Monitoring infrastructure (EKS Cluster):
    • Uptime
    • Resource
    • Bandwidth / Network

Where do we send alerts to?

Alerts are sent to various slack channels and pagerduty.

How it works?

This solution consists of Prometheus, Thanos, Grafana and other exporters. Exporters enable Prometheus to scrape metrics from different sources and Grafana produces dashboards with those metrics. Thanos leverages the Prometheus storage format to cost-efficiently store historical metric data in a S3 bucket while retaining fast query latencies. Additionally, it provides a global query view across all Prometheus installations. This means Prometheus instances running elsewhere can remotely write metrics to this system, Grafana can then visualise them and metrics are stored in the central storage.

Helm charts used in this solution:

To access the dashboards and query metrics use grafana at the below address.

📊 Grafana
https://monitoring-alerting.staff.service.justice.gov.uk

Logon access to grafana is managed on Production Azure AD. Please contact azure team to gain access.

To consume metrics from other Prometheus instances using the remote write functionality, configure your prometheus to remote write to the below url:

✍️ Prometheus Remote Write
https://thanos-receive.monitoring-alerting.staff.service.justice.gov.uk/api/v1/receive

For technical details, HLDs, LLDs and developer instructions, please visit the technical documentation page.

Setting up the Development environment

In order to test changes to our monitoring solution, we have a Development environment setup. To get that environment up and running locally, you will have to work through the following steps:

  • Check which environment the the Kube context is pointing to, this is likely to be Production.
kubectl config get-contexts
  • To add the Development context you will have to generate the kube config. To do this, run the following commands:
make clean
make gen-env
make get-kubeconfig
  • Re-check the Kube context for the Development environment
kubectl config get-contexts
  • Check the connection to the cluster by running
kubectl get pods -A
  • The Grafana dashboard is not available over the internet. To access the dashboard it will need to be done locally, by using port forwarding.
kubectl port-forward svc/grafana 3000:80 -n grafana
  • Access using localhost:3000

  • The Grafana dashboard requires a username and password. Username is 'admin'. To get the password you will need to run:

make grafana-pwd

Useful commands

  • To view grafana versions
Helm list -n grafana
  • To view status of deployment
kubectl get pods -n grafana