Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Profiler Images and Videos to Reflect New Nav #22705

Merged
merged 8 commits into from
Apr 19, 2024
14 changes: 7 additions & 7 deletions content/en/profiler/connect_traces_and_profiles.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ You can move directly from span information to profiling data on the Code Hotspo

## Identify code hotspots in slow traces

{{< img src="profiler/code_hotspots_tab-2.mp4" alt="Code Hotspots tab shows profiling information for a APM trace span" video=true >}}
{{< img src="profiler/code_hotspots_tab.png" alt="Code Hotspots tab shows profiling information for a APM trace span" >}}

### Prerequisites

Expand Down Expand Up @@ -179,7 +179,7 @@ Click the plus icon `+` to expand the stack trace to that method **in reverse or

### Span execution timeline view

{{< img src="profiler/code_hotspots_tab-timeline.mp4" alt="Code Hotspots tab has a timeline view that breakdown execution over time and threads" video=true >}}
{{< img src="profiler/code_hotspots_tab-timeline.png" alt="Code Hotspots tab has a timeline view that breakdown execution over time and threads" >}}

The **Timeline** view surfaces time-based patterns and work distribution over the period of the span.

Expand Down Expand Up @@ -238,9 +238,9 @@ Lanes on the top are runtime activities that may add extra latency to your reque

### Viewing a profile from a trace

{{< img src="profiler/flamegraph_view-1.mp4" alt="Opening a view of the profile in a flame graph" video=true >}}
{{< img src="profiler/view_profile_from_trace.png" alt="Opening a view of the profile in a flame graph" >}}

For each type from the breakdown, click **View In Full Page** to see the same data opened up in a in a new page . From there you can change visualization to the flame graph.
For each type from the breakdown, click **Open in Profiling** to see the same data opened up in a in a new page . From there you can change visualization to the flame graph.
brett0000FF marked this conversation as resolved.
Show resolved Hide resolved
Click the **Focus On** selector to define the scope of the data:

- **Span & Children** scopes the profiling data to the selected span and all descendant spans in the same service.
Expand Down Expand Up @@ -320,7 +320,7 @@ With endpoint profiling you can:
- Isolate the top endpoints responsible for the consumption of valuable resources such as CPU, memory, or exceptions. This is particularly helpful when you are generally trying to optimize your service for performance gains.
- Understand if third-party code or runtime libraries are the reason for your endpoints being slow or resource-consumption heavy.

{{< img src="profiler/endpoint_agg.mp4" alt="Troubleshooting a slow endpoint by using endpoint aggregation" video=true >}}
{{< img src="profiler/endpoint_agg.png" alt="Troubleshooting a slow endpoint by using endpoint aggregation" >}}

### Surface code that impacted your production latency

Expand All @@ -346,9 +346,9 @@ The following image shows that `GET /store_history` is periodically impacting th

Select `Per endpoint call` to see behavior changes even as traffic shifts over time. This is useful for progressive rollout sanity checks or analyzing daily traffic patterns.

The following video shows that CPU per request doubled for `/GET train`:
The following example shows that CPU per request doubled for `/GET train`:

{{< img src="profiler/endpoint_per_request.mp4" alt="Troubleshooting a endpoint that started using more resource per request" video=true >}}
{{< img src="profiler/endpoint_per_request.png" alt="Troubleshooting a endpoint that started using more resource per request" >}}

## Further reading

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,18 +22,18 @@ This guide describes how to use the Datadog Continuous Profiler to investigate t

The first step in a performance investigation is to identify anomalies in resource usage over time. Consider the following graph of CPU utilization over the past hour for the service `product-recommendation`:

{{< img src="profiler/guide-monolithic-outliers/1-outliers-monolith-cpu-usage.png" alt="" style="width:100%;" >}}
{{< img src="profiler/guide-monolithic-outliers/1-outliers-monolith-cpu-usage-2.png" alt="" style="width:100%;" >}}

This doesn't provide the exact root cause, but you can see anomalous peaks in CPU usage.

Select the **Show - Avg of** dropdown (highlighted in the previous image) and change the graph to show `CPU Cores for Top Endpoints` instead. This graph shows how different parts of the application contribute to the overall CPU utilization:

{{< img src="profiler/guide-monolithic-outliers/2-outliers-monolith-cpu-top-endpoints.png" alt="" style="width:100%;" >}}
{{< img src="profiler/guide-monolithic-outliers/2-outliers-monolith-cpu-top-endpoints-2.png" alt="" style="width:100%;" >}}


The yellow peaks indicate that the `GET /store_history` endpoint has some intermittent usage corresponding to the anomalies identified earlier. However, the peaks might be due to differences in traffic to that endpoint. To understand if profiles can provide further insights, change the metric to `CPU - Average Time Per Call for Top Endpoints`:

{{< img src="profiler/guide-monolithic-outliers/3-outliers-monolith-cpu-avg-time-per-call.png" alt="" style="width:100%;" >}}
{{< img src="profiler/guide-monolithic-outliers/3-outliers-monolith-cpu-avg-time-per-call-2.png" alt="" style="width:100%;" >}}

The updated graph reveals that there is an intermittent spike in CPU utilization where each call to `GET /store_history` takes on average three seconds of CPU time. This suggests the spikes aren't due to an increase in traffic, but instead an increase in CPU usage per request.

Expand All @@ -42,7 +42,7 @@ The updated graph reveals that there is an intermittent spike in CPU utilization

To determine the cause of increased CPU usage each time `GET /store_history` is called, examine the profiling flame graph for this endpoint during one of the spikes. Select a time range where `GET /store_history` is showing more CPU utilization and scope the profiling page to that time range. Then switch to the **Flame Graph** visualization to see the methods using the CPU at this time:

{{< img src="profiler/guide-monolithic-outliers/4-outliers-monolith-flame-graph.png" alt="Your image description" style="width:100%;" >}}
{{< img src="profiler/guide-monolithic-outliers/4-outliers-monolith-flame-graph-2.png" alt="Your image description" style="width:100%;" >}}

To better understand why the `GET /store_history` endpoint is using more CPU, refer to the table highlighted in the previous image, where the endpoint is second from the top. Select that row to focus the flame graph on the CPU utilization caused by the `GET /store_history` endpoint.

Expand All @@ -56,7 +56,7 @@ To see if there are differences in which methods are using a lot of CPU time bet

The view shows two graphs, labeled **A** and **B**, each representing a time range for CPU utilization per `GET /store_history` call. Adjust the time selector for **A** so that it is scoped to a period with low CPU utilization per call:

{{< img src="profiler/guide-monolithic-outliers/5-outliers-monolith-compare-flame-graphs.png" alt="Your image description" style="width:100%;" >}}
{{< img src="profiler/guide-monolithic-outliers/5-outliers-monolith-compare-flame-graphs-2.png" alt="Your image description" style="width:100%;" >}}

The comparison reveals the different methods causing CPU utilization during the spike (timeframe **B**) that are not used during normal CPU usage (timeframe **A**). As shown in the previous image,`Product.loadAssets(int)`, is causing the spikes.

Expand All @@ -70,7 +70,7 @@ There are other attributes available in the profiler. For example, you can filte

The APM `Trace operation` attribute lets you filter and group a flame graph with the same granularity as the traces for the selected endpoints. This is a good balance between the high granularity of threads or methods, and the low granularity of entire endpoints. To isolate operations, select `Trace Operation` from the **CPU time by** dropdown:

{{< img src="profiler/guide-monolithic-outliers/7-outliers-monolith-trace-operation.png" alt="Your image description" style="width:100%;" >}}
{{< img src="profiler/guide-monolithic-outliers/7-outliers-monolith-trace-operation-2.png" alt="Your image description" style="width:100%;" >}}

In the previous image, notice that the `ModelTraining` operation is taking more CPU time than its primary use in the `GET /train` endpoint, so it must be used elsewhere. Click the operation name to determine where else it is used. In this case, `ModelTraining` is also use by `POST /update_model`.

Expand Down
2 changes: 1 addition & 1 deletion content/en/profiler/profile_visualizations.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ further_reading:

## Search profiles

{{< img src="profiler/search_profiles2.mp4" alt="Search profiles by tags" video=true >}}
{{< img src="profiler/search_profiles3.mp4" alt="Search profiles by tags" video=true >}}

Go to **APM -> Profiles** and select a service to view its profiles. Select a profile type to view different resources (for example, CPU, Memory, Exception, and I/O).

Expand Down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a /GET train example, more interesting to see

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/images/profiler/code_hotspots_tab.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/images/profiler/endpoint_agg.png
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can we use the CPU profile type instead?
  • I think we should hover the endpoint name on the right, to highlight that we filtered on it
  • We barely see the FG in this image, is there a vertical cropping or this is the full size of the screen?

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/images/profiler/endpoint_per_request.png
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to keep the video here, otherwise it is not clear how you end up here

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/images/profiler/search_profiles3.mp4
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading