Skip to content

Commit

Permalink
Merge pull request #16 from thenodon/main
Browse files Browse the repository at this point in the history
Documentation fixes
  • Loading branch information
camrossi authored Jul 18, 2024
2 parents 74aabc5 + dc4b4b4 commit 2c4d548
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 25 deletions.
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ The ACI-Monitoring-Stack integrates the following key components:

- [Syslog-ng](https://github.com/syslog-ng/syslog-ng): is an open-source implementation of the Syslog protocol, its role in this stack is to translate syslog messages from RFC 3164 to 5424. This is needed because Promtail only support Syslog RFC 5424 over TCP and this capability is only available in ACI 6.1 and above.

- [ACI-Exporter](https://github.com/opsdis/aci-exporter): A custom-built exporter that serves as the bridge between your Cisco ACI environment and the Prometheus monitoring ecosystem. The ACI-Exporter translates ACI-specific metrics into a format that Prometheus can ingest, ensuring that all crucial data points are captured and monitored effectively.
- [aci-exporter](https://github.com/opsdis/aci-exporter): A Prometheus exporter that serves as the bridge between your Cisco ACI environment and the Prometheus monitoring ecosystem. The aci-exporter translates ACI-specific metrics into a format that Prometheus can ingest, ensuring that all crucial data points are captured and monitored effectively.

- Pre-configured ACI data collections queries, alerts, and dashboards (Work In Progress): The ACI-Monitoring-Stack provides a solid foundation for monitoring an ACI fabric with its pre-defined queries, dashboards, and alerts. While these tools are crafted based on best practices to offer immediate insights into network performance, they are not exhaustive. The strength of the ACI-Monitoring-Stack lies in its community-driven approach. Users are invited to contribute their expertise by providing feedback, sharing custom solutions, and helping enhance the stack. Your input helps to refine and expand the stack's capabilities, ensuring it remains a relevant and powerful tool for network monitoring.

Expand All @@ -42,7 +42,7 @@ flowchart-elk
PT["Promtail"]
SL["Syslog-ng"]
AM["Alertmanager"]
A["ACI Exporter"]
A["aci-exporter"]
G--"PromQL"-->P
G--"LogQL"-->L
P-->AM
Expand Down Expand Up @@ -75,7 +75,7 @@ If you want to contribute to this project star from [Here](docs/development.md)
## Pre Requisites
- Familiarity with Kubernetes: This installation guide is intended to assist with the setup of the ACI Monitoring stack and assumes prior familiarity with Kubernetes; it is not designed to provide instruction on Kubernetes itself.
- A Kubernetes Cluster: Currently the stack has been tested on `Upstream Kubernetes 1.30.x` and `Minikube`.
- Persistent Volumes: 10G should be plenty for a small/demo environment. Many Storage provisioner support Volume expansion so should be easy to increase this post installation.
- Persistent Volumes: 10G should be plenty for a small/demo environment. Many storage provisioner support Volume expansion so should be easy to increase this post installation.
- Ability to expose services for:
- Access to the Grafana/Prometheus and Alert Manager dashboards: This will be ideally achieved via an `Ingress Controller`
- (Optional) Wildcard DNS Entries for the ingress controller domain.
Expand All @@ -93,19 +93,19 @@ If you are installing on Minikube please follow the [Minikube Preparation Steps]

## Config Preparation

The ACI Monitoring Stack is a combination of several [Charts](charts/aci-monitoring-stack/charts), if you are familiar with Helm you are aware of the struggle to propagate dynamic values to sub-charts. For example it is not possible to pass to a sub-chart the name of a service in a dynamic way.
The ACI Monitoring Stack is a combination of several [Charts](charts/aci-monitoring-stack/charts), if you are familiar with Helm you are aware of the struggle to propagate dynamic values to sub-charts. For example, it is not possible to pass to a sub-chart the name of a service in a dynamic way.

In order to simplify the user experience the `chart` comes with a few pre-configured parameters that are populated in the configurations of the various sub-charts.

For example the ACI Exporter Service Name is pre configured as `aci-exporter-svc` and this value is then passed to Prometheus as service Discovery URL.
For example the aci-exporter Service Name is pre-configured as `aci-exporter-svc` and this value is then passed to Prometheus as service Discovery URL.

All these values can be customized and if you need to you can refer to the [Values](charts/aci-monitoring-stack/values.yaml) file.

*Note:* This is the first HELM char `camrossi` created and he is sure it can be improved. If you have suggestions they are extremely welcome! :)
*Note:* This is the first HELM char `camrossi` created, and he is sure it can be improved. If you have suggestions they are extremely welcome! :)

### ACI Exporter
### The aci-exporter

ACI Exporter is the bridge between your Cisco ACI environment and the Prometheus monitoring ecosystem, for it to works it needs to know:
The aci-exporter is the bridge between your Cisco ACI environment and the Prometheus monitoring ecosystem, for it to works it needs to know:
- `fabrics`: A list of fabrics and how to connect to the APICs.
- Requires a **ReadOnly** **Admin** User
- `service_discovery`: Configure if devices are reachable via Out Of Band (`oobMgmtAddr`) or InBand (`inbMgmtAddr`).
Expand Down Expand Up @@ -198,7 +198,7 @@ Grafana is installed via its [own Chart](https://github.com/grafana/helm-charts/
- The `ingress` config: External URL which can access Grafana.
- Persistent Volume Capacity
- (Optional) `adminPassword`: If not set will be auto generated and can be found in the `grafana` secret
- (Optional) `viewers_can_edit`: This allows users with a `view only` role to modify the dashboards and access `Explorer` to execute queries against `Pormetheus` and `Loki`. However the user will not be able to save any changes.
- (Optional) `viewers_can_edit`: This allows users with a `view only` role to modify the dashboards and access `Explorer` to execute queries against `Pormetheus` and `Loki`. However, the user will not be able to save any changes.
- (Optional) `deploymentStrategy`: if Grafana `Persistent Volume` is of type `ReadWriteOnce` rolling updates will get stuck as the new pod cannot start before the old one releases the PVC. Setting `deploymentStrategy.type` to `Recreate` destroy the original pod before starting the new one.

Below an example:
Expand All @@ -222,7 +222,7 @@ grafana:
```
### Syslog config

The syslog config is the most complicated part as it relies on 3 components (`promtail`, `loki` and `syslog-ng`) with their own individual configs. Furthermore there are two issues we need to overcome:
The syslog config is the most complicated part as it relies on 3 components (`promtail`, `loki` and `syslog-ng`) with their own individual configs. Furthermore, there are two issues we need to overcome:

- The Syslog messages don't contain the ACI Fabric name: to be able to distinguish the messaged from one fabric to another the only solution is to use dedicated `external services` with unique `IP:Port` pair per Fabric.
- Until ACI 6.1 we need `syslog-ng` between `ACI` and `Promtail` to convert from RFC 3164 to 5424
Expand Down
26 changes: 13 additions & 13 deletions docs/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,19 +16,19 @@ Data is currently collected in two ways:

- Syslog Ingestion: The ACI Side Config "decides" what to send and assuming the correct logging level is selected you can then build the dashboards in grafana using Loki as a data source. You can take a look at the `Contract Drops Logs` dashboard for inspiration.

- [ACI Exporter](https://github.com/opsdis/aci-exporter) Queries: which queries and how the data is collected is highly customizable.
- [aci-exporter](https://github.com/opsdis/aci-exporter) Queries: which queries and how the data is collected is highly customizable.

### ACI Exporter and Prometheus
### aci-exporter and Prometheus

The general idea is to use aci-exporter to convert ACI Rest API Calls in the Prometheus exposition format.

The exporter also have the capability to directly scrape individual switches using the aci-exporter inbuilt http based service discovery. Doing direct spine and leaf queries is typical useful in large fabrics, where doing all api calls through the APIC can put a high load on the APIC and result in high response time.

**Note:** In the context of this HELM Chart a query **MUST** be executed against a switch if possible. Any code submission that does not adhere to this convention will not be accepted.

#### ACI Exporter Quick Start
#### aci-exporter Quick Start

Before working on aci-exporter, Prometheus and Grafana at the same time I strongly suggest to take a look at the [ACI Exporter](https://github.com/opsdis/aci-exporter) git repo and understand how it works and how is configured.
Before working on aci-exporter, Prometheus and Grafana at the same time I strongly suggest to take a look at the [aci-exporter](https://github.com/opsdis/aci-exporter) git repo and understand how it works and how is configured.

Here a complete example to get you started (you need to [install go](https://go.dev/doc/install))

Expand All @@ -48,7 +48,7 @@ fabrics:
- https://apic1
- https://apic2
```
- ACI Exporter will, by default, load the queries it can execute from the `config.d` directory. For now we don't want that so we can start the exporter with this command that will just load the bare minimum config to access the fabric.
- The aci-exporter will, by default, load the queries it can execute from the `config.d` directory. For now, we don't want that so we can start the exporter with this command that will just load the bare minimum config to access the fabric.

```bash
./build/aci-exporter -config fab1.yaml -config_dir /dev/null
Expand All @@ -57,7 +57,7 @@ fabrics:
{"config_file":"/home/cisco/aci-exporter/fab1.yaml","level":"info","msg":"aci-exporter starting","port":9643,"read_timeout":0,"time":"2024-07-18T14:17:59+10:00","version":"undefined","write_timeout":0}
```

- Now ACI Exporter is running on our host on port 9643, let's try a Service Discovery just run a HTTP request against the `/sd` URL.
- Now aci-exporter is running on our host on port 9643, let's try a Service Discovery just run an HTTP request against the `/sd` URL.

``` bash
curl http://aci-exporter-ip:9643/sd
Expand Down Expand Up @@ -95,7 +95,7 @@ This should return a list with all the Controllers and Switches in your fabric a
Now let's try to build a query to check the `interface operation state and speed`.

- The ACI Class we can use for this query is `ethpmPhysIf`
- This class is available both on the APIC as well as from the Switches: we will run this query **against the switchers** because it is the core principle for this HELM chart and it scales better.
- This class is available both on the APIC and on the Switches: we will run this query **against the switches** because it is the core principle for this HELM chart, and it scales better.
- *Tip:* If you use Visual Studio Code you can install the `Thunder Client` to test API Calls.

Every switch will return one `ethpmPhysIf` object for every interface. An example is provided below:
Expand Down Expand Up @@ -171,14 +171,14 @@ Of all the various properties of `ethpmPhysIf` we need only 3:
- `interface_type`: Physical, Port-Channel etc...
- `interface`: The interface name, i.e. Eth1/1

With these infos we can create 2 metrics that I am gonna call:
With these infos we can create 2 metrics that I am going to call:

- `interface_oper_speed`
- `interface_oper_state`

Both metrics will be labeled with the`interface_type` and `interface` (name). However we are faced with an issue... Promethesu can only ingest numbers so we can't just pass `40G` or `up` as a valid metric.
Both metrics will be labeled with the`interface_type` and `interface` (name). However, we are faced with an issue... Prometheus can only ingest numbers, so we can't just pass `40G` or `up` as a valid metric.

Thankfully one of the many ACI Exporter capabilities is to perform `value_transform` so we can write something like this:
Thankfully one of the many aci-exporter capabilities is to perform `value_transform` so we can write something like this:

```yaml
value_transform:
Expand All @@ -199,7 +199,7 @@ value_transform:
```
To convert text to numbers and allow Prometheus to ingest this data.

Lastly we need to also extract the `labels` from the `dn`. The format for this specific class is always something similar to `"sys/phys-[eth1/34]/phys"` to do this ACI Exporter employs RegEx, below an example:
Lastly we need to also extract the `labels` from the `dn`. The format for this specific class is always something similar to `"sys/phys-[eth1/34]/phys"` to do this aci-exporter employs RegEx, below an example:

```yaml
labels:
Expand Down Expand Up @@ -252,7 +252,7 @@ class_queries:
regex: "^sys/(?P<interface_type>[a-z]+)-\\[(?P<interface>[^\\]]+)\\]/"
```

Now Copy Paste this into the config file.
Now Copy/Paste this into the config file.

Based on the service discovery we executed before we have all the required infos to run a query against a switch, the aci-exporter URL has the following format:

Expand Down Expand Up @@ -295,7 +295,7 @@ Selection between APIC or Switches is done by using different re-labeling config

To add a new query follow these steps:

- Develop a new ACI-Exporter query and test is with `curl` to ensure it returns the expected data
- Develop a new aci-exporter query and test is with `curl` to ensure it returns the expected data
- Add the query to one of the files in the [config.d](../charts/aci-monitoring-stack/config.d) folder or create a new file if your query dosen't belong to any of the existing categoris.
- add the query name in the `queries` list of the APIC or Switches inside the [ScrapeConfigs](../charts/aci-monitoring-stack/templates/prometheus/configmap-config.yaml).

Expand Down
4 changes: 2 additions & 2 deletions docs/minikube.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This can be used to run aci-monitoring-stack locally (say on your laptop).

By default minikube only provide access locally and this is an issue for logs ingestion however for a lab you can configure HAProxy to expose you Minikube instance over the Host IP Address. This implies that you should configure all your External Services as `NodePort` and configure HAProxy to send the traffic to the correct `NodePort`
By default, minikube only provide access locally and this is an issue for logs ingestion however for a lab you can configure HAProxy to expose you Minikube instance over the Host IP Address. This implies that you should configure all your External Services as `NodePort` and configure HAProxy to send the traffic to the correct `NodePort`

I have configured minikube with 4GB or RAM and 4 CPU and that was plenty to monitor a small 10 switch ACI Fabric.

Expand Down Expand Up @@ -61,7 +61,7 @@ While installing Minikube I hit the following issues:

## minikube/podman wrong CNI Version

If minikube dosen't start and complains about the wrong CNI version for bridge open /etc/cni/net.d/11-crio-ipv4-bridge.conflist and set "cniVersion": "0.4.0" from 1.0.0
If minikube doesn't start and complains about the wrong CNI version for bridge open /etc/cni/net.d/11-crio-ipv4-bridge.conflist and set "cniVersion": "0.4.0" from 1.0.0

## Prometheus does not install under minikube/podman

Expand Down

0 comments on commit 2c4d548

Please sign in to comment.