diff --git a/content/rdi/monitoring-guide.md b/content/rdi/monitoring-guide.md index 6aaae301a03..47ca7fde6ae 100644 --- a/content/rdi/monitoring-guide.md +++ b/content/rdi/monitoring-guide.md @@ -5,56 +5,43 @@ description: Monitor RDI Engine and data processing jobs weight: 70 alwaysopen: false categories: ["redis-di"] -aliases: +aliases: --- -RDI Engine accumulates operating statistics that you can: +RDI Engine accumulates operating statistics that you can: -* Observe and analyze to discover various types of problems. -* Use for optimization purposes. +- Observe and analyze to discover various types of problems. +- Use for optimization purposes. ## Console metrics -Some basic RDI operating metrics can be displayed using the [`redis-di status`]({{}}) command. The command provides information about the current RDI Engine status, target database configuration and processing statistics broken down by stream. This tool is intended to be used by an Operator to get the current snapshot of the system. +RDI can display its operating metrics in the console using the [`redis-di status`]({{}}) command. The command provides information about the current RDI Engine status, target database configuration and processing statistics broken down by stream. This tool is intended to be used by an Operator to get the current snapshot of the system as well as the ongoing data processing monitoring (when used in live mode). ## Prometheus integration -RDI allows exporting its metrics to [Prometheus](https://prometheus.io/) and visualizing them in [Grafana](https://grafana.com/). Currently, RDI relies on the external [OSS Redis Exporter](https://github.com/oliver006/redis_exporter) that connects to RDI database to source the metrics and serve them for Prometheus job scraping. The diagram below describes this flow and components involved. +RDI allows collecting and exporting its metrics to [Prometheus](https://prometheus.io/) and visualizing them in [Grafana](https://grafana.com/). Operator can start the built-in exporter using the [`redis-di monitor`]({{}}) command. The diagram describes this flow and components involved: -![Metrics architecture](/images/rdi/monitoring-architecture.png) +![Metrics architecture](/images/rdi/monitoring-diagram.png) > Note: The host names and ports above are examples only and can be changed as needed. -## Install and configure the Exporter +### Test RDI metrics exporter -OSS Metrics Exporter is available as a [pre-built docker container](https://hub.docker.com/r/oliver006/redis_exporter) so that you integrate it into container-based or Kubernetes environments. Alternatively, you can be [build and install](https://hub.docker.com/r/oliver006/redis_exporter) it as a binary to any compute node that has access to an RDI database that needs to be monitored. +Start the RDI metrics exporter using the command below: -To connect OSS Metrics Exporter to the RDI database, provide the following information by using the command-line options or environment variables: - -| Variable Name | Description | Example | -| --------------------- | ------------------------------------------------------------------------------------------------ | ------------------------ | -| REDIS_ADDR | RDI database host/port | redis://localhost:12001 | -| REDIS_USER | RDI database user (optional, if `default` is used) | RedisUser | -| REDIS_PASSWORD | RDI database password | Redis123 | -| REDIS_EXPORTER_SCRIPT | Lua script that triggers the Metrics Collector [(see below)](#lua-script-for-metrics-collection) | /scripts/rdi_metrics.lua | - -### Lua script for metrics collection - -Create the following [Lua script](https://redis.io/docs/manual/programmability/eval-intro/) and make it available for the OSS Metrics Exporter by using the `REDIS_EXPORTER_SCRIPT` environment variable: - -```lua -return (redis.call('RG.TRIGGER', 'GetMetrics', '*'))[1] +```bash +redis-di monitor ``` -### Test OSS Metrics Exporter +> Note: The default port for the exporter is `9121`. If you need to change it, use the `--exporter-port` option. The default metrics collection interval is 5 seconds. If you need to change it, use the `--collect-interval` option. -Start the OSS Metrics Exporter and navigate to `http://localhost:9121/metrics` to see the exported metrics. You should be able to see the following metric: +Then navigate to `http://localhost:9121/` to see the exported metrics. You should be able to see the following metric: ``` -redis_script_values{key="rdi_engine_status{status=RUNNING}"} 1 +rdi_engine_state{state="RUNNING",sync_mode="UNKNOWN"} 1.0 ``` -> Note: The actual value of the metric above can be 0, if you haven't started RDI Engine yet. You must have the RDI database created and configured before observing any metrics. If you are not seeing it or getting an error value instead, this indicates that either the OSS Metrics Exporter or the RDI database is not properly configured. +> Note: The actual value of the metric above can be 0, if you haven't started RDI Engine yet (in which case, the `state` label should indicate that as well). You must have the RDI database created and configured before observing any metrics. If you are not seeing it or getting an error value instead, this indicates that the RDI database is not properly configured. ## Configure Prometheus @@ -62,46 +49,27 @@ Next, configure the Prometheus scraper. Edit the `prometheus.yml` file to add th ```yaml scrape_configs: - # scrape OSS Metrics Exporter - - job_name: redis-exporter + # scrape RDI metrics exporter + - job_name: rdi-exporter static_configs: - targets: ["redis-exporter:9121"] - metric_relabel_configs: - - source_labels: [key] - regex: ".+operation=([^,}]+).+" - target_label: "operation" - replacement: "${1}" - - source_labels: [key] - regex: ".+data_source=([^,]+).+" - target_label: "data_source" - replacement: "${1}" - - source_labels: [key] - regex: ".+status=([^,]+).+" - target_label: "status" - replacement: "${1}" - - source_labels: [key] - regex: "([^{]+){.+" - target_label: "metric_name" - replacement: "${1}" - - source_labels: [__name__, metric_name] - regex: "redis_script_values;(.*)" - target_label: __name__ - replacement: "${1}" - - action: labeldrop - regex: "metric_name|key" ``` -> Note: Make sure the `targets` value above points to the host and port you configured to run the OSS Metrics Exporter. +> Notes: + +- Make sure the `targets` value above points to the host and port you configured to run the RDI metrics exporter. +- The `scrape_interval` setting in Prometheus should be the same or more than the `collect_interval` setting for the exporter. For example, if the `collect_interval` is set to 5 seconds, the `scrape_interval` should also be set to 5 seconds or more. If the `scrape_interval` is set to less than the `collect_interval`, Prometheus will scrape the exporter before it has a chance to collect and refresh metrics, and you will see the same values duplicated in Prometheus. + +## Test Prometheus scraper -### Test Prometheus scraper +After the scraper config is added to the Prometheus configuration, you should now be able to navigate to `http://:9090/graph` (replace `` with a valid Prometheus hostname or IP address). -After the scraper config is added to the Prometheus configuration, you should now be able to navigate to `http://:9090/graph` (replace `` with a valid Prometheus hostname or IP address). -Explore RDI metrics using the [expression browser](https://prometheus.io/docs/visualization/browser/). +Explore RDI metrics using the [expression browser](https://prometheus.io/docs/visualization/browser/). -In the expression box, type in a metric name (for example, `rdi_engine_status`) and select `Enter` or the `Execute` button to see the following result: +In the expression box, type in a metric name (for example, `rdi_engine_state`) and select `Enter` or the `Execute` button to see the following result: ``` -rdi_engine_status{instance="redis-exporter:9121", job="redis-exporter", status="RUNNING"} 1 +rdi_engine_state{instance="redis-exporter:9121", job="rdi-exporter", status="RUNNING", sync_mode="UNKNOWN"} 1 ``` > Note: You may see more than just one RDI metric, if RDI Engine has already processed any data. If you do not see any metrics please check your scraper job configuration in Prometheus. @@ -114,9 +82,9 @@ Optionally, you may deploy the sample Grafana dashboard to monitor the status of 1. Log into Grafana and navigate to the list of dashboards, then choose **New -> Import**: -![New dashboard creation](/images/rdi/monitoring-grafana-new-dash.png.png) +![New dashboard creation](/images/rdi/monitoring-grafana-new-dash.png) -1. On the next screen, select **Upload JSON file** and upload the file you downloaded in step 1. Make sure you select the data source that is connected to the OSS Metrics Exporter: +1. On the next screen, select **Upload JSON file** and upload the file you downloaded in step 1. Make sure you select the data source that is connected to the RDI metrics exporter: ![Data source connection](/images/rdi/monitoring-grafana-dash-configure.png) @@ -130,6 +98,6 @@ This list shows exported RDI metrics along with their descriptions: | Metric Name | Labels | Values | Description | | --------------------------- | ---------------------------------------------------------------------------------------------------------- | -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| rdi_engine_status | {status=RUNNING \| STOPPED} | 0, 1 | Status of RDI Engine. 0 - RDI Engine is stopped, 1 - RDI Engine is running. | +| rdi_engine_state | {status=RUNNING \| STOPPED, sync_mode=SNAPSHOT \| STREAMING \| UNKNOWN} | 0, 1 | Status of RDI Engine. 0 - RDI Engine is stopped, 1 - RDI Engine is running. Sync mode label indicates the last reported ingest synchronization mode. | | rdi_incoming_entries | {data_source=``, operation=pending \| inserted \| updated \| deleted \| filtered \| rejected} | `` | Counters, indicating the number of operations performed for each stream. | | rdi_stream_event_latency_ms | {data_source=``} | 0 - ∞ | Latency calculated for each stream. Indicates the time in milliseconds the first available record has spent in the stream waiting to be processed by RDI Engine. If no records pending it will always return zero. | diff --git a/static/images/rdi/monitoring-architecture.png b/static/images/rdi/monitoring-architecture.png deleted file mode 100644 index 54048d25f0f..00000000000 Binary files a/static/images/rdi/monitoring-architecture.png and /dev/null differ diff --git a/static/images/rdi/monitoring-diagram.png b/static/images/rdi/monitoring-diagram.png new file mode 100644 index 00000000000..457a02997c8 Binary files /dev/null and b/static/images/rdi/monitoring-diagram.png differ