Skip to content

Commit

Permalink
Add boards description
Browse files Browse the repository at this point in the history
  • Loading branch information
sergeymatov authored and Frostman committed Jul 8, 2024
1 parent 7406361 commit 099371b
Show file tree
Hide file tree
Showing 2 changed files with 120 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/user-guide/.pages
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ nav:
- external.md
- harvester.md
- shrink-expand.md
- grafana.md
- ...
119 changes: 119 additions & 0 deletions docs/user-guide/grafana.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Grafana Dashboards

To provide monitoring for most critical metrics from the switches managed by Hedgehog Fabric there are several Dashboards that may be used in Grafana deployments.

## Alloy enablement

`hedgehog-alloy` is installed for each SONiC-based switch in Hedgehog Fabric. However, the configuration of service has to be defined on the Fabric init stage.
Following config should be added to `hhfab.yaml` init file.

```yaml
...
fabric:
controlProxy: true
alloy:
agentScrapeIntervalSeconds: # Interval between metrics collection from agent
unixScrapeIntervalSeconds: # Interval between metrics collection for Node Exporter
unixExporterEnabled: true # Enble Node Exporter Full
collectSyslogEnabled: true # Enable inspection of Syslog in Loki
lokiTargets:
lab:
url: # Url for Loki to push logs
useControlProxy: true # Route messages through Control node
labels:
env: # Environment name applied as label on metrics
prometheusTargets:
lab:
url: # Url for Prometheus to push metrics
useControlProxy: true
labels:
env:
sendIntervalSeconds: # Interval between pushes to Prometheus collector
...
```


## Variables
List of common variables used in Hedgehog Grafana dashboards

- **env** (Label: Env): `label_values(env)` - Environment to monitor
- **node** (Label: Switch): `label_values(hostname)` - Switch Name
- **vrf** (Label: VRF): `label_values(vrf)` - VRF name (Multi-value)
- **neighbor** (Label: Neighbor): `label_values(neighbor)` - BGP Neighbor IP address(Multi-value)
- **interface** (Label: Interface): `label_values(interface)` - Switch Interface name as defined in wiring (Multi-value)
- **file** (Label: File): `label_valuse(filename)` - Name of Logs file to inspect (Loki)

## Switch Critical Resources

This table reports usage and capacity of ASIC's programmable resources
such as:

- ACLs
- IPv4 Routes
- IPv4 Nexthops
- IPv4 Neihbours
- IPMC Table
- FDB

[JSON](./boards/grafana_crm.json)
## Fabric

Fabric underlay and external peering monitoring. Including reporing for:

- BGP Neighbors
- BGP Session state
- Number of BGP Updates and Prefixes sent/received for each BGP Neighbor
- Keepalive counters


[JSON](./boards/grafana_fabric.json)

## Interfaces

Switch interfaces monitoring visualization that includes:

- Interface Oper/Admin state
- Total input/output packets counter
- Input/output PPS/Bits rate
- Interface utilization
- Counters for Unicast/Broadcast/Multicast packets
- Errors and discards counters


[JSON](./boards/grafana_interfaces.json)

## Logs

System and fabric logs:

- Kernel and BGP logs from Syslog
- Errors in agent and syslog
- Full output of defined file

[JSON](./boards/grafana_logs.json)


## Platform

Information from PSU, temperature sensors and fan trays:

- Input/output PSU voltage
- Fan speed
- Temperature from switch sensors (CPU, PSU, etc)
- For transceivers with DOM - optic sensor temperature


[JSON](./boards/grafana_platform.json)

## Node Exporter

[Grafana Node Exporter Full](https://grafana.com/grafana/dashboards/1860-node-exporter-full/) is an opensource Grafana board that provide
visualizations for monitoring Linux nodes. In particular case Node Exporter is used to track SONiC OS own stats such as

- Memory/disks usage
- CPU/System utilization
- Networking stats (traffic that hits SONiC interfaces)
...


[JSON](./boards/grafana_node_exporter.json)

0 comments on commit 099371b

Please sign in to comment.