Skip to content

Commit

Permalink
udpdate SLO doc images (#23036)
Browse files Browse the repository at this point in the history
* udpdate SLO doc images

* address some comments

* blur name in images

* remove link
  • Loading branch information
roxanne-moslehi authored May 10, 2024
1 parent e985be5 commit 503bc15
Show file tree
Hide file tree
Showing 9 changed files with 18 additions and 27 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@ For a description of key terminology around SLOs, including *error budgets*, see
4. Set an alert to trigger when the percentage of the error budget consumed is above the `threshold`.
over the past `target` number of days.
4. Add [Notification information][5] into the **Say what's happening** and **Notify your team** sections.
5. Click the **Save and Set Alert** button on the SLO configuration page.
5. Click the **Create & Set Alert** button on the SLO configuration page.

{{< img src="service_management/service_level_objectives/save_set_alert.png" alt="Save SLO and set up an error budget alert">}}
{{< img src="service_management/service_level_objectives/slo_create_set_alert.png" alt="Create SLO and set up an error budget alert" style="width:80%;">}}

### API and Terraform

Expand Down
12 changes: 5 additions & 7 deletions content/en/service_management/service_level_objectives/metric.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@ further_reading:

Metric-based SLOs are useful for a count-based stream of data where you are differentiating good and bad events. A metric query uses the sum of the good events divided by the sum of total events over time to calculate a Service Level Indicator (or SLI). You can use any metric to create SLOs, including custom metrics generated from [APM spans][1], [RUM events][2], and [logs][3]. For an overview on how SLOs are configured and calculated, see the [Service Level Objective][4] page.

{{< img src="service_management/service_level_objectives/metric-based-slo-example.png" alt="example metric-based SLO" >}}
{{< img src="service_management/service_level_objectives/metric_slo_side_panel.png" alt="example metric-based SLO" >}}

## Setup

On the [SLO status page][5], select **New SLO +**. Then select [**Metric**][6].
On the [SLO status page][5], click **+ New SLO**. Then select, [**By Count**][6].

### Define queries

Expand All @@ -38,18 +38,16 @@ Why is `HTTP 3xx` excluded? - These are typically redirects and should not count

#### Multi-group for metric-based SLIs

Metric-based SLIs allow you to focus on the most important attributes of your SLIs. You can add groups to your metric-based SLIs in the editor by using tags like `datacenter`, `partition`, `availability-zone`, `resource`, or any other relevant group:
Metric-based SLIs allow you to focus on the most important attributes of your SLIs. You can add groups to your metric-based SLIs in the editor by using tags like `datacenter`, `env`, `availability-zone`, `resource`, or any other relevant group:

{{< img src="service_management/service_level_objectives/metric_editor.png" alt="grouped metric-based SLO editor" >}}
{{< img src="service_management/service_level_objectives/metric_slo_creation.png" alt="grouped metric-based SLO editor" >}}

By grouping these SLIs you can visualize each individual group's status, good request counts, and remaining error budget on the detail panel:

{{< img src="service_management/service_level_objectives/metric_results.png" alt="metric-based SLO group results" >}}
{{< img src="service_management/service_level_objectives/metric_slo_history_groups.png" alt="metric-based SLO group results" >}}

By default, the bar graph shows the overall counts of good and bad requests for the entire SLO. You can scope the bar graph down to an individual group's good and bad requests counts by clicking on its corresponding row in the table. In addition, you can also choose to show or hide good request counts or bad request counts by selecting the appropriate option in the legend directly below the bar graph.

**Note**: If you are using monitor-based SLIs, you can also [view monitor groups][8].

### Set your SLO targets

An SLO target is comprised of the target percentage and the time window. When you set a target for a metric-based SLO the target percentage specifies what portion of the total events specified in the denominator of the SLO should be good events, while the time window specifies the rolling time period over which the target should be tracked.
Expand Down
29 changes: 11 additions & 18 deletions content/en/service_management/service_level_objectives/monitor.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ further_reading:
---

## Overview
To build an SLO from new or existing Datadog monitors, create a monitor-based SLO.
To build an SLO from new or existing Datadog monitors, create a monitor-based SLO. Using a monitor-based SLO, you can calculate the Service Level Indicator (SLI) by dividing the amount of time your system exhibits good behavior by the total time.

Time-based data sets usually map well to monitor-based SLOs. Using a monitor-based SLO, you can calculate the Service Level Indicator (SLI) by dividing the amount of time your system exhibits good behavior by the total time.
<div class="alert alert-info">Time Slice SLOs are another way to create SLOs with a time-based SLI calculation. With Time Slice SLOs, you can create an uptime SLO without going through a monitor, so you don’t have to create and maintain both a monitor and an SLO.</div>

{{< img src="service_management/service_level_objectives/grouped_monitor_based_slo.png" alt="monitor-based SLO example" >}}
{{< img src="service_management/service_level_objectives/monitor_slo_side_panel.png" alt="monitor-based SLO example" >}}

## Prerequisites

Expand All @@ -31,24 +31,23 @@ Datadog monitor-based SLOs support the following monitor types:

## Setup

On the [SLO status page][2], select **New SLO**.

Under **Define the source**, select **Monitor Based**.
On the [SLO status page][2], click **+ New SLO**. Then, select **By Monitor Uptime**.

### Define queries


In the search box, start typing the name of a monitor. A list of matching monitors appears. Click on a monitor name to add it to the source list.

If you're only using a single multi alert monitor in an SLO, you can optionally select "Calculate on selected groups" and pick up to 20 groups. Group selection is not supported for SLOs that contain multiple monitors. For SLOs with multiple monitors, you can add up to 20 monitors.
**Notes**:

- If you're using a single multi alert monitor in an SLO, you can optionally select "Calculate on selected groups" and pick up to 20 groups.
- If you're adding multiple monitors to your SLO, group selection is not supported. You can add up to 20 monitors.

### Set your SLO targets

Select a **target** percentage, **time window**, and optional **warning** level.

The target percentage specifies the portion of time the underlying monitor(s) of the SLO should not be in the ALERT state.

The time window specifies the rolling period the SLO runs its calculation.
The target percentage specifies the portion of time the underlying monitor(s) of the SLO should not be in the ALERT state. The time window specifies the rolling period the SLO runs its calculation.

Depending on the value of the SLI, the Datadog UI displays the SLO status in a different color:
- While the SLI remains above the target, the UI displays the SLO status in green.
Expand All @@ -67,13 +66,11 @@ If you need finer granularity than the once a minute monitor evaluation, conside

### Add name and tags

Choose a name and extended description for your SLO. Select any tags you would like to associate with your SLO.

Select **Save & Exit** to save your new SLO.
Choose a name and extended description for your SLO. Select any tags you would like to associate with your SLO. Select **Create** or **Create & Set Alert** to save your new SLO.

## Status calculation

{{< img src="service_management/service_level_objectives/aggregate_slo.jpg" alt="SLO detail showing 99 percent green with 8 groups aggregated" >}}
{{< img src="service_management/service_level_objectives/monitor_slo_overall_status.png" alt="Monitor-based SLO with groups" >}}

Datadog calculates the overall SLO status as the uptime percentage across all monitors or monitor groups, unless specific groups have been selected:
- If specific groups have been selected (up to 20), the SLO status is calculated with only those groups. The UI displays all selected groups.
Expand Down Expand Up @@ -129,10 +126,6 @@ SLO Replay also triggers when you change the underlying metric monitor's query t

**Note:** SLOs based on Synthetic tests or Service Checks do not support SLO Replay.

## Other considerations

Confirm you are using the preferred SLI type for your use case. Datadog supports monitor-based SLIs and [metric-based][3] SLIs.

## Further Reading

{{< partial name="whats-next/whats-next.html" >}}
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 503bc15

Please sign in to comment.