Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new Metrics Volume Management Doc #21935

Merged
merged 28 commits into from
Aug 2, 2024
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
192477b
Create volume_management page for Metris
ijkaylin Jan 3, 2024
8e1b096
Update volume_management
ijkaylin Jan 3, 2024
26fcd11
Update volume_management
ijkaylin Feb 23, 2024
132463b
Create volume
ijkaylin Feb 23, 2024
189bea4
Delete static/images/metrics/volume
ijkaylin Feb 23, 2024
266d2cf
Create Test
ijkaylin Feb 23, 2024
aebdcaa
Add files via upload
ijkaylin Feb 23, 2024
9e996eb
Update volume_management
ijkaylin Feb 23, 2024
112699b
Merge branch 'master' into kathy.lin/metricsvolumepage
hestonhoffman Feb 23, 2024
9c33846
Merge branch 'master' into kathy.lin/metricsvolumepage
estherk15 Mar 13, 2024
e30431b
Documentation editorial review (#22185)
estherk15 Jul 1, 2024
1609cb7
Resolve merge conflicts
estherk15 Jul 1, 2024
1d8a07e
Re-review of the overview
estherk15 Jul 1, 2024
d8fb405
Fix links for change alert monitor reference
estherk15 Jul 1, 2024
7c8efc0
Merge branch 'master' into kathy.lin/metricsvolumepage
estherk15 Jul 2, 2024
1da4f50
Update volume.md
ijkaylin Jul 10, 2024
deeabac
Add files via upload
ijkaylin Jul 24, 2024
407fdd8
Update volume.md
ijkaylin Jul 24, 2024
1c97434
Update volume.md
ijkaylin Jul 31, 2024
c3c5821
Update volume.md
ijkaylin Jul 31, 2024
c21ba8f
Merge branch 'master' into kathy.lin/metricsvolumepage
estherk15 Aug 2, 2024
ab8e963
Update content/en/metrics/volume.md
ijkaylin Aug 2, 2024
3369985
Update content/en/metrics/volume.md
ijkaylin Aug 2, 2024
be425db
Update content/en/metrics/volume.md
ijkaylin Aug 2, 2024
c877916
Update content/en/metrics/volume.md
ijkaylin Aug 2, 2024
d2fbe69
Update content/en/metrics/volume.md
ijkaylin Aug 2, 2024
d8624c5
Update content/en/metrics/volume.md
ijkaylin Aug 2, 2024
44da270
Update content/en/metrics/volume.md
ijkaylin Aug 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions config/_default/menus/main.en.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1714,21 +1714,26 @@ menu:
url: metrics/summary/
parent: metrics_top_level
weight: 6
- name: Volume
identifier: metrics_volume
url: metrics/volume/
parent: metrics_top_level
weight: 7
- name: Advanced Filtering
url: metrics/advanced-filtering/
parent: metrics_top_level
identifier: metrics_advanced_filtering
weight: 7
weight: 8
- name: Metrics Without Limits™
identifier: metrics_without_limits
url: metrics/metrics-without-limits/
parent: metrics_top_level
weight: 8
weight: 9
- name: Guides
url: metrics/guide
parent: metrics_top_level
identifier: metrics_guide
weight: 9
weight: 10
- name: Watchdog
url: watchdog/
identifier: watchdog_top_level
Expand Down
120 changes: 120 additions & 0 deletions content/en/metrics/volume.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
---
title: Volume
description: "Understand and manage your custom metrics usage and costs with the Volume Management page."
further_reading:
- link: "/metrics/summary/"
tag: "Documentation"
text: "Metrics Summary"
- link: "/metrics/metrics-without-limits/"
tag: "Documentation"
text: "Metrics without Limits™"
- link: "/metrics/custom_metrics/"
tag: "Documentation"
text: "Custom Metrics"

---

## Overview

{{< img src="metrics/volume/metrics_volume_overview.png" alt="Metrics Volume page set to a timeframe of the past hour (by default) showing the search, filter, facet and column sorting features available" style="width:100%;" >}}

Cloud-based applications generate massive amounts of data, which can be overwhelming for your organization as it scales. Observability costs become a significant budget item but core observability teams lack visibility into what is truly valuable to each individual engineering team. Individual teams are less incentivized to be proactive in helping manage this growth because they have limited insights into the costs of the metrics and tags they're submitting.

Check notice on line 21 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.sentencelength

Suggestion: Try to keep your sentence length to 25 words or fewer.

Datadog's [Metrics Volume Management page][1] provides comprehensive visibility and intelligent insights for which metrics you should focus your cost-optimization efforts. When used with [Metrics without Limits™][3], Metrics Volume allows for flexible configuration of metrics ingestion and indexing to reduce costs without sacrificing accuracy.

With the Metrics Volume Management page you can quickly answer the following questions in real-time:

Check warning on line 25 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.words

Use 'ly' instead of 'quickly'.
ijkaylin marked this conversation as resolved.
Show resolved Hide resolved
- What is my overall account's realtime estimated Indexed Custom Metrics usage?

Check warning on line 26 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.pronouns

Avoid first-person pronouns such as 'my'.
- What is my overall account's realtime estimated Ingested Custom Metrics usage?

Check warning on line 27 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.pronouns

Avoid first-person pronouns such as 'my'.
- What are the Top 500 largest Metrics without Limits configured metric names by Ingested Custom Metrics volume?
- What are the Top 500 largest metric names by Indexed Custom Metrics volume?
- What are the Top 500 spiking cardinality metric names?
Comment on lines +28 to +30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to group these into one sentence? A short list of bullet points is easier to digest.

Suggested change
- What are the Top 500 largest Metrics without Limits configured metric names by Ingested Custom Metrics volume?
- What are the Top 500 largest metric names by Indexed Custom Metrics volume?
- What are the Top 500 spiking cardinality metric names?
- The largest metrics grouped by Indexed and Ingested Custom Metrics
- The Top 500 spiking cardinality metric names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we preserve it as 3 different questions? here's why:

  • Was hopeful we can keep this summary section framed as questions that customers would ask. And wanted to follow the example external docs' (see gdoc for governance guide) where they call out each individual functionality to emphasize the huge value the tool brings. Even if people gloss over the summary section, they'll at least see numerically that the page offers a lot.
  • Think it still makes sense to keep as separate bullets here because indexed vs ingested are separate SKUs.
  • conceptually my top 500 largest metric names by indexed volume are likely a different group of metrics and require a different user action to access than the top 500 spiking metric names.

- Which team owns these Top 500 metric names and is responsible for optimizing?
- Which metrics are actually valuable (or not) to my organization?

Check warning on line 32 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.pronouns

Avoid first-person pronouns such as 'my'.
Comment on lines +26 to +32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- What is my overall account's realtime estimated Indexed Custom Metrics usage?
- What is my overall account's realtime estimated Ingested Custom Metrics usage?
- What are the Top 500 largest Metrics without Limits configured metric names by Ingested Custom Metrics volume?
- What are the Top 500 largest metric names by Indexed Custom Metrics volume?
- What are the Top 500 spiking cardinality metric names?
- Which team owns these Top 500 metric names and is responsible for optimizing?
- Which metrics are actually valuable (or not) to my organization?
- Your account's estimated Indexed Custom Metrics usage and Ingested Custom Metrics usage
- The Top 500 largest Metrics without Limits™ configured metric names by Ingested Custom Metrics volume
- The Top 500 largest metric names by Indexed Custom Metrics volume
- The Top 500 spiking cardinality metric names
- The team that owns and is responsible for optimizing metric names
- The metrics that are or are not valuable to your organization

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the whole page, I'm not sure we need this section. Your headers are very clear in what a reader can expect to gain from the Metrics Volume, and adding this puts more friction for a user to scroll through. Let me know what you think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agreed it adds a lot of overhead but we should assume that folks don't have patience to scroll vertically to read through the docs' headers -- an overview bulleted section of questions they can answer with this doc is present in other dd docs like logs' docs and other external docs too. Let's keep it


## Real-time visibility and monitoring on your organization's Custom Metrics usage
Datadog provides you real-time _estimated_ usage metrics OOTB so you can understand and alert on your usage in real-time. You can quickly see a breakdown of:

Check warning on line 35 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.words

Use 'ly' instead of 'quickly'.
- Your account's indexed custom metrics volume in real-time (and how much of that indexed volume hasn't been optimized with [Metrics without Limits™][3] yet)
- Your account's ingested custom metrics (emitted from metrics that have been configured with [Metrics without Limits™][3]) in real-time

{{< img src="metrics/volume/volume_graphs.png" alt="Estimated real-time indexed and ingested Custom Metrics volume. Upon clicking export, you can easily create a monitor or export the graph to a notebook to share." style="width:100%;" >}}


## Search, filter, and sort

Use the search, filter, and sort features to understand:
- Which team owns what metric names?
- Which metric names your team should focus on optimizing?
- Which metrics have the highest cardinality, and which metric names are spiking(aka have the highest increase in volume)?

The Metric and Tag search bars provide a set of actions to filter the list of metrics. Enter keywords to search metric names. Type in any tag key value pair in the *Filter by Tag Value* box to filter the list by a specific team, application, or service.

Facets can also filter your metrics by:
- **Configuration**: Metrics with tag configurations
- **Percentiles**: Distribution metrics enabled by percentiles/advanced query capabilities
- **Historical Metrics**: Metrics that have historical metrics ingestion enabled
- **Query Activity** (Beta): Metrics not actively queried in the app or by the API in the past 30 days
- **Metric Type**: Differentiate between distribution and non-distribution metrics (counts, gauges, rates)
- **Distribution Metric Origin**: The product from which the metric originated (for example, metrics generated from Logs or APM Spans)

The Volume page displays a list of your metrics reported to Datadog sorted by estimated custom metrics or by the change in volume. To sort metrics by either of these options, click on the column header of the metric table.
| Column | Description |
|--------|-------------|
|**Top 500 Metric Names by Estimated Real-time Cardinality** | Identify the top 500 metric names by cardinality (aka custom metrics volume).|
|**Top 500 Metric Names by Change in Volume** |Discover the top 500 metric names that have the greatest variance in their cardinality. These metrics may have anomalously (potentially unintentionally) spiked in the timeframe of your choosing. If you receive an alert on your account's estimated real-time custom metrics usage, you can use this view to investigate the metric spike. |

Check warning on line 63 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.words

Use 'time frame' instead of 'timeframe'.

## Compare a metric's cardinality (volume) over time

{{< img src="metrics/volume/compare_metric_cardinality.png" alt="Metrics Volume filtered down to metric names with “shopist”, sorted by estimated custom metrics. On hover over the change in volume, displays the cardinality graph of the metric over the past day" style="width:100%;" >}}

When identifying your top 500 metric names by change in volume, you can additionally hover over the number to compare a metric name's # of indexed custom metrics (its cardinality) over time. As a reminder, a single metric name can emit multiple indexed custom metrics (quick refresher on how we meter and bill for custom metrics here[6])

Check notice on line 69 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.sentencelength

Suggestion: Try to keep your sentence length to 25 words or fewer.

Check notice on line 69 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.sentencelength

Suggestion: Try to keep your sentence length to 25 words or fewer.

Check warning on line 69 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.words

Use '' instead of 'quick'.

Check warning on line 69 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.pronouns

Avoid first-person pronouns such as 'we'.
ijkaylin marked this conversation as resolved.
Show resolved Hide resolved

To compare your spiking metric's cardinality over time:
1. Select a time frame in the top right hand corner (the recommended time frame is **Past 1 Day** or **Past 4 Weeks**).
2. Select the metric name which you want the view the cardinality over time and in the same row click on the value under the **Change in Volume** column. This opens up a modal showing a graph comparing your metric's cardinality over time and the percentage increase in its spike.
ijkaylin marked this conversation as resolved.
Show resolved Hide resolved
3. (Optional) Create a Change monitor for `% change` to proactively alert on this spiking metric. For more information, see the [Change Alert Monitor][2] documentation.
ijkaylin marked this conversation as resolved.
Show resolved Hide resolved

## Identify less valuable, unqueried metrics

{{< img src="metrics/volume/id_unqueried_metrics.png" alt="Facet fields for Query Activity with the 'Not actively queried' facet selected" style="width:100%;" >}}

To start reducing custom metrics costs, organizations often start with their largest metric names that aren't valuable to the organization; in other words, ones that aren't actively queried. Datadog's intelligent query insights analyze your queries and surfaces your unqueried metrics over the past 30 days. Our analysis is constantly running in the background ensuring that your unqueried metrics are always up-to-date and available self-service.

Check notice on line 80 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.sentencelength

Suggestion: Try to keep your sentence length to 25 words or fewer.

Check warning on line 80 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.pronouns

Avoid first-person pronouns such as 'Our'.
ijkaylin marked this conversation as resolved.
Show resolved Hide resolved

To find the metrics not actively queried in the past 30 days, click on **Not Actively Queried** in the *Query Activity Facet* box. Selecting **Not Actively Queried** generates a list of unused metric names across dashboards, notebooks, monitors, SLOs, Metrics Explorer, and the API.

## How to quickly reduce metric volume and cost

Check warning on line 84 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.words

Use 'ly' instead of 'quickly'.

After you identify unqueried metrics, you can quickly and confidently eliminate the volume and cost of these metric names by using [Metrics without Limits™][3] without a single line of code. By using Metrics without Limits, you ensure that you pay only for the metrics that you use by eliminating timeseries that are never or rarely leveraged. Based on our intelligent query insights, the average customer's custom metrics volume can be reduced by 70% if they were to use Metrics without Limits on their unqueried metric names.

Check notice on line 86 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.sentencelength

Suggestion: Try to keep your sentence length to 25 words or fewer.

Check notice on line 86 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.sentencelength

Suggestion: Try to keep your sentence length to 25 words or fewer.

Check notice on line 86 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.sentencelength

Suggestion: Try to keep your sentence length to 25 words or fewer.
ijkaylin marked this conversation as resolved.
Show resolved Hide resolved
estherk15 marked this conversation as resolved.
Show resolved Hide resolved

To configure multiple unqueried metrics at once
1. Click the **Configure Metrics** dropdown and select **Manage Tags** to open the [Metrics without Limits™ Tag configuration modal][4].
2. Specify the metric namespace of the unqueried metrics you'd like to apply a bulk tag configuration to.
3. Select **Include tags…** and set an empty allowlist of tags.

{{< img src="metrics/volume/configure_metrics.png" alt="Configure Metric dropdown at the top of the page highlighting the Manage tags option" style="width:100%;" >}}

You have full control over the cardinality of your metrics without the need to change your applications nor the requirement of a remote-write setup. Below is an example of how eliminating timeseries that are rarely used can significantly reduce your custom metrics volumes and costs.

In this example, the tag configuration modal shows a metric with a current volume of 13690031 indexed custom metrics. After you select Include tags… with an empty allowlist of tags, the modal shows an estimated new volume of 1. You can reduce the number of indexed custom metrics by 13690030.

{{< img src="metrics/volume/reduce_metric_vol_cost_tags.png" alt="Tag configuration modal showing an example metric with a current volume of 13690031 index metrics and an estimated new volume of 1, with an empty allowlist of tags" style="width:80%;" >}}

## Analyze metrics' utility and relative value in Datadog
As part of our Metrics without Limits suite of governance features, you can now quickly pinpoint valuable metrics that are underutilized in Datadog with the Metrics Related Assets feature. A metrics related asset refers to any dashboard, notebook, monitor or SLO that queries a particular metric. Our intelligent query insights surface the popularity of these related assets as well as the quantity so you can evaluate metric utility within your organization, enabling data-driven decisions. This feature allows you to identify how your team can utilize existing metrics to get more value from your observability spend and [reduce metric volume and cost].

Check notice on line 102 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.sentencelength

Suggestion: Try to keep your sentence length to 25 words or fewer.

Check notice on line 102 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.oxfordcomma

Suggestion: Use the Oxford comma in 'A metrics related asset refers to any dashboard, notebook, monitor or'.

Check notice on line 102 in content/en/metrics/volume.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.sentencelength

Suggestion: Try to keep your sentence length to 25 words or fewer.
ijkaylin marked this conversation as resolved.
Show resolved Hide resolved

{{< img src="metrics/volume/related_assets.png" alt="Metric detail side panel showing the Related Assets section. The example metric is applied to one dashboard" style="width:100%;" >}}

To view a metric's related assets:
1. Click on the metric name to open its details side panel.
2. Scroll down to the section of the side panel titled **Related Assets**.
3. Click the dropdown button to view the type of related asset you are interested in (dashboards, monitors, notebooks, SLOs). You can use the search bar to validate specific assets.

## Further reading

{{< partial name="whats-next/whats-next.html" >}}

[1]: https://app.datadoghq.com/metric/volume
[2]: /monitors/types/change-alert/
[3]: /metrics/metrics-without-limits
[4]: https://app.datadoghq.com/metric/volume?bulk_manage_tags=true&facet.query_activity=-queried&sort=volume_total
[5]: #reduce-metric-volume-and-cost
[6]: https://docs.datadoghq.com/account_management/billing/custom_metrics/?tab=countrategauge
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/images/metrics/volume/volume_graphs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading