Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDI additive changes #2768

Merged
merged 2 commits into from
Aug 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 30 additions & 62 deletions content/rdi/monitoring-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,103 +5,71 @@ description: Monitor RDI Engine and data processing jobs
weight: 70
alwaysopen: false
categories: ["redis-di"]
aliases:
aliases:
---

RDI Engine accumulates operating statistics that you can:
RDI Engine accumulates operating statistics that you can:

* Observe and analyze to discover various types of problems.
* Use for optimization purposes.
- Observe and analyze to discover various types of problems.
- Use for optimization purposes.

## Console metrics

Some basic RDI operating metrics can be displayed using the [`redis-di status`]({{<relref "/rdi/reference/cli/redis-di-status">}}) command. The command provides information about the current RDI Engine status, target database configuration and processing statistics broken down by stream. This tool is intended to be used by an Operator to get the current snapshot of the system.
RDI can display its operating metrics in the console using the [`redis-di status`]({{<relref "/rdi/reference/cli/redis-di-status">}}) command. The command provides information about the current RDI Engine status, target database configuration and processing statistics broken down by stream. This tool is intended to be used by an Operator to get the current snapshot of the system as well as the ongoing data processing monitoring (when used in live mode).

## Prometheus integration

RDI allows exporting its metrics to [Prometheus](https://prometheus.io/) and visualizing them in [Grafana](https://grafana.com/). Currently, RDI relies on the external [OSS Redis Exporter](https://github.com/oliver006/redis_exporter) that connects to RDI database to source the metrics and serve them for Prometheus job scraping. The diagram below describes this flow and components involved.
RDI allows collecting and exporting its metrics to [Prometheus](https://prometheus.io/) and visualizing them in [Grafana](https://grafana.com/). Operator can start the built-in exporter using the [`redis-di monitor`]({{<relref "/rdi/reference/cli/redis-di-monitor">}}) command. The diagram describes this flow and components involved:

![Metrics architecture](/images/rdi/monitoring-architecture.png)
![Metrics architecture](/images/rdi/monitoring-diagram.png)

> Note: The host names and ports above are examples only and can be changed as needed.

## Install and configure the Exporter
### Test RDI metrics exporter

OSS Metrics Exporter is available as a [pre-built docker container](https://hub.docker.com/r/oliver006/redis_exporter) so that you integrate it into container-based or Kubernetes environments. Alternatively, you can be [build and install](https://hub.docker.com/r/oliver006/redis_exporter) it as a binary to any compute node that has access to an RDI database that needs to be monitored.
Start the RDI metrics exporter using the command below:

To connect OSS Metrics Exporter to the RDI database, provide the following information by using the command-line options or environment variables:

| Variable Name | Description | Example |
| --------------------- | ------------------------------------------------------------------------------------------------ | ------------------------ |
| REDIS_ADDR | RDI database host/port | redis://localhost:12001 |
| REDIS_USER | RDI database user (optional, if `default` is used) | RedisUser |
| REDIS_PASSWORD | RDI database password | Redis123 |
| REDIS_EXPORTER_SCRIPT | Lua script that triggers the Metrics Collector [(see below)](#lua-script-for-metrics-collection) | /scripts/rdi_metrics.lua |

### Lua script for metrics collection

Create the following [Lua script](https://redis.io/docs/manual/programmability/eval-intro/) and make it available for the OSS Metrics Exporter by using the `REDIS_EXPORTER_SCRIPT` environment variable:

```lua
return (redis.call('RG.TRIGGER', 'GetMetrics', '*'))[1]
```bash
redis-di monitor
```

### Test OSS Metrics Exporter
> Note: The default port for the exporter is `9121`. If you need to change it, use the `--exporter-port` option. The default metrics collection interval is 5 seconds. If you need to change it, use the `--collect-interval` option.

Start the OSS Metrics Exporter and navigate to `http://localhost:9121/metrics` to see the exported metrics. You should be able to see the following metric:
Then navigate to `http://localhost:9121/` to see the exported metrics. You should be able to see the following metric:

```
redis_script_values{key="rdi_engine_status{status=RUNNING}"} 1
rdi_engine_state{state="RUNNING",sync_mode="UNKNOWN"} 1.0
```

> Note: The actual value of the metric above can be 0, if you haven't started RDI Engine yet. You must have the RDI database created and configured before observing any metrics. If you are not seeing it or getting an error value instead, this indicates that either the OSS Metrics Exporter or the RDI database is not properly configured.
> Note: The actual value of the metric above can be 0, if you haven't started RDI Engine yet (in which case, the `state` label should indicate that as well). You must have the RDI database created and configured before observing any metrics. If you are not seeing it or getting an error value instead, this indicates that the RDI database is not properly configured.

## Configure Prometheus

Next, configure the Prometheus scraper. Edit the `prometheus.yml` file to add the following scraper config:

```yaml
scrape_configs:
# scrape OSS Metrics Exporter
- job_name: redis-exporter
# scrape RDI metrics exporter
- job_name: rdi-exporter
static_configs:
- targets: ["redis-exporter:9121"]
metric_relabel_configs:
- source_labels: [key]
regex: ".+operation=([^,}]+).+"
target_label: "operation"
replacement: "${1}"
- source_labels: [key]
regex: ".+data_source=([^,]+).+"
target_label: "data_source"
replacement: "${1}"
- source_labels: [key]
regex: ".+status=([^,]+).+"
target_label: "status"
replacement: "${1}"
- source_labels: [key]
regex: "([^{]+){.+"
target_label: "metric_name"
replacement: "${1}"
- source_labels: [__name__, metric_name]
regex: "redis_script_values;(.*)"
target_label: __name__
replacement: "${1}"
- action: labeldrop
regex: "metric_name|key"
```

> Note: Make sure the `targets` value above points to the host and port you configured to run the OSS Metrics Exporter.
> Notes:

- Make sure the `targets` value above points to the host and port you configured to run the RDI metrics exporter.
- The `scrape_interval` setting in Prometheus should be the same or more than the `collect_interval` setting for the exporter. For example, if the `collect_interval` is set to 5 seconds, the `scrape_interval` should also be set to 5 seconds or more. If the `scrape_interval` is set to less than the `collect_interval`, Prometheus will scrape the exporter before it has a chance to collect and refresh metrics, and you will see the same values duplicated in Prometheus.

## Test Prometheus scraper

### Test Prometheus scraper
After the scraper config is added to the Prometheus configuration, you should now be able to navigate to `http://<HOSTNAME>:9090/graph` (replace `<HOSTNAME>` with a valid Prometheus hostname or IP address).

After the scraper config is added to the Prometheus configuration, you should now be able to navigate to `http://<HOSTNAME>:9090/graph` (replace `<HOSTNAME>` with a valid Prometheus hostname or IP address).
Explore RDI metrics using the [expression browser](https://prometheus.io/docs/visualization/browser/).
Explore RDI metrics using the [expression browser](https://prometheus.io/docs/visualization/browser/).

In the expression box, type in a metric name (for example, `rdi_engine_status`) and select `Enter` or the `Execute` button to see the following result:
In the expression box, type in a metric name (for example, `rdi_engine_state`) and select `Enter` or the `Execute` button to see the following result:

```
rdi_engine_status{instance="redis-exporter:9121", job="redis-exporter", status="RUNNING"} 1
rdi_engine_state{instance="redis-exporter:9121", job="rdi-exporter", status="RUNNING", sync_mode="UNKNOWN"} 1
```

> Note: You may see more than just one RDI metric, if RDI Engine has already processed any data. If you do not see any metrics please check your scraper job configuration in Prometheus.
Expand All @@ -114,9 +82,9 @@ Optionally, you may deploy the sample Grafana dashboard to monitor the status of

1. Log into Grafana and navigate to the list of dashboards, then choose **New -> Import**:

![New dashboard creation](/images/rdi/monitoring-grafana-new-dash.png.png)
![New dashboard creation](/images/rdi/monitoring-grafana-new-dash.png)

1. On the next screen, select **Upload JSON file** and upload the file you downloaded in step 1. Make sure you select the data source that is connected to the OSS Metrics Exporter:
1. On the next screen, select **Upload JSON file** and upload the file you downloaded in step 1. Make sure you select the data source that is connected to the RDI metrics exporter:

![Data source connection](/images/rdi/monitoring-grafana-dash-configure.png)

Expand All @@ -130,6 +98,6 @@ This list shows exported RDI metrics along with their descriptions:

| Metric Name | Labels | Values | Description |
| --------------------------- | ---------------------------------------------------------------------------------------------------------- | -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| rdi_engine_status | {status=RUNNING \| STOPPED} | 0, 1 | Status of RDI Engine. 0 - RDI Engine is stopped, 1 - RDI Engine is running. |
| rdi_engine_state | {status=RUNNING \| STOPPED, sync_mode=SNAPSHOT \| STREAMING \| UNKNOWN} | 0, 1 | Status of RDI Engine. 0 - RDI Engine is stopped, 1 - RDI Engine is running. Sync mode label indicates the last reported ingest synchronization mode. |
| rdi_incoming_entries | {data_source=`<stream name>`, operation=pending \| inserted \| updated \| deleted \| filtered \| rejected} | `<count of records>` | Counters, indicating the number of operations performed for each stream. |
| rdi_stream_event_latency_ms | {data_source=`<stream name>`} | 0 - &infin; | Latency calculated for each stream. Indicates the time in milliseconds the first available record has spent in the stream waiting to be processed by RDI Engine. If no records pending it will always return zero. |
Binary file removed static/images/rdi/monitoring-architecture.png
Binary file not shown.
Binary file added static/images/rdi/monitoring-diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.