Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Add new documentation for distributed tracing #4964

Merged
merged 72 commits into from
Oct 12, 2023
Merged
Changes from 19 commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
3b46742
Create PR
vagimeli Sep 5, 2023
31bc59e
First draft for technical review
vagimeli Sep 5, 2023
888319b
Fix copy labels
vagimeli Sep 5, 2023
083d6fc
Fix copy labels
vagimeli Sep 5, 2023
d07bac9
Fix copy labels
vagimeli Sep 5, 2023
a6f8969
Fix copy labels
vagimeli Sep 5, 2023
82b79b3
Merge branch 'main' into distributed-tracing
vagimeli Sep 5, 2023
0c688e8
Fix copy labels
vagimeli Sep 5, 2023
0333e21
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Sep 11, 2023
cb3afd0
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Sep 11, 2023
1cf17f8
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Sep 11, 2023
e65ade3
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Sep 11, 2023
186fb61
Merge branch 'main' into distributed-tracing
vagimeli Sep 11, 2023
dc02dab
Merge branch 'main' into distributed-tracing
vagimeli Sep 12, 2023
c92aeb9
Merge branch 'main' into distributed-tracing
vagimeli Sep 25, 2023
be8b234
Address SME feedback
vagimeli Sep 25, 2023
081856f
Merge branch 'main' into distributed-tracing
vagimeli Sep 25, 2023
3fbbc50
Merge branch 'main' into distributed-tracing
vagimeli Oct 4, 2023
a8a8734
Merge branch 'main' into distributed-tracing
vagimeli Oct 5, 2023
d8562dc
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 6, 2023
a21d0c5
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 6, 2023
c621d02
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 6, 2023
0a28f0a
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 6, 2023
35bdd90
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 6, 2023
c850b49
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 6, 2023
fd5efe3
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 6, 2023
0f15b6f
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 6, 2023
5bc7420
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 6, 2023
b470dc8
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 6, 2023
13302da
Merge branch 'main' into distributed-tracing
vagimeli Oct 6, 2023
6c42a37
Address tech review comments
vagimeli Oct 6, 2023
3335e73
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
3552b14
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
9fe9e1b
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
20db8ed
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
8709dd3
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
fb35d80
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
26ba8d6
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
74baab5
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
65ab1fe
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
23dc319
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
f1991a6
Merge branch 'main' into distributed-tracing
vagimeli Oct 11, 2023
313c407
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
1ab79ba
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
046a2a8
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
4a1ad3d
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
1222de8
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
074d03c
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
d28298a
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
9551fd7
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
7c80cc0
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
e02d33e
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
a310dd3
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
9c7312c
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
1b6de17
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
2af4a53
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
b9aceb9
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 11, 2023
e00a470
Address doc review feedback
vagimeli Oct 11, 2023
1786927
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 12, 2023
138a8bc
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 12, 2023
30512e2
Update distributed-tracing.md
vagimeli Oct 12, 2023
66fe27e
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 12, 2023
ef33ea1
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 12, 2023
0080688
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 12, 2023
b12a52f
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 12, 2023
8ff95df
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 12, 2023
e1f3bd3
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 12, 2023
ebebef1
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 12, 2023
517552b
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 12, 2023
a5ecf52
Update _observing-your-data/trace/distributed-tracing.md
vagimeli Oct 12, 2023
c91e137
address editorial comments
vagimeli Oct 12, 2023
c5172f2
Merge branch 'main' into distributed-tracing
vagimeli Oct 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 139 additions & 0 deletions _observing-your-data/trace/distributed-tracing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
---
layout: default
title: Distrbuted tracing
parent: Trace Analytics
nav_order: 65
---

# Distributed tracing
This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](<insertlink>) or the [project board](insert-link).
{: .warning}

Distributed tracing is used to monitor and debug distributed systems. You can track the flow of requests through a system and identify performance bottlenecks and errors. A _trace_ is a complete end-to-end path of a request as it flow through a distributed systems. It represents the journey of a specific operation as it traverses various components and services in a distributed architecture. In distributed tracing, a single trace contains a series of tagged time intervals called _spans_. Spans have a start and end time, and may include other metadata like logs or tags to help classify what happened.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

Distributed tracing offers several benefits, including:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

- **Performance optimization:** Identify and resolve bottlenecks, reducing latency in your applications.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
- **Debugging:** Quickly pinpoint the source of errors or unexpected behavior in your distributed system.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
- **Resource allocation:** Optimize resource allocation by understanding usage patterns of different services.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
- **Service dependencies:** Visualize dependencies between services, helping you to manage architectures.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

## Distributed tracing pipeline

OpenSearch provides a distributed tracing pipeline that can be used to ingest, process, and visualize tracing data with query and alerting. OpenTelemtry is an open-source observability framework that provides a set of APIs, libraries, agents, and collectors for generating, capturing, and exporting telemetry data. The distributed tracing pipeline consists of the following components:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

- **Creation:** Instrumenting your application code with OpenTelemetry SDKs.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
- **Propagation:** Injecting trace context into requests as they propagate through your system.
- **Collection:** Collecting trace data from your application and sending it to a backend.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
- **Processing:** Aggregating trace data from multiple sources and enriching it with additional metadata.
- **Exporting:** Sending trace data to a backend for storage and analysis.

OpenSearch serves as the sink for traces.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

## Trace analytics

OpenSearch provides a `trace-analytics` plugin for visualizing trace data in real time. The plugin includes pre-built dashboards for analyzing trace data, such as service maps, latency histograms, and error rates. With OpenSearch's distributed tracing pipeline, you can quickly identify bottlenecks and errors in your applications. See the [Trace analytics]({{site.url}}{{site.baseurl}}/observing-your-data/trace/index/) documentation for more information.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

## Get started

The distributed tracing feature is experimental as of OpenSearch 2.11. To begin using the distributed tracing feature, you need to first enable it using `opensearch.experimental.feature.telemetry.enabled` feature flag, and subsequently activate the tracer, using the dynamic setting `telemetry.tracer.enabled`. It's important to exercise caution when enabling this feature, as it can consume system resources. Detailed information on enabling and configuring distributed tracing, including on-demand debugging and request sampling, is described in the following sections.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

### Enable on a node using a tarball install
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "Enabling a [noun] on a node using a tarball installation"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised: Enabling the flag on a node using tarball


The flag is toggled using a new Java Virtual Machine (JVM) parameter that is set either in `OPENSEARCH_JAVA_OPTS` or in `config/jvm.options`.

vagimeli marked this conversation as resolved.
Show resolved Hide resolved
#### Option 1: Modify jvm.options
Copy link
Contributor

@reta reta Oct 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gaganjuneja the first option should be modifying opensearch.yml (see please opensearch-project/OpenSearch#4102), other options are not what we would recommend but still viable, could you please add this section to documentation?

vagimeli marked this conversation as resolved.
Show resolved Hide resolved

Add the following lines to `config/jvm.options` before starting the OpenSearch process to enable the feature and its dependency:

```bash
-Dopensearch.experimental.feature.telemetry.enabled=true

vagimeli marked this conversation as resolved.
Show resolved Hide resolved
Run OpenSearch
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

vagimeli marked this conversation as resolved.
Show resolved Hide resolved
./bin/opensearch
```
{% include copy-curl.html %}
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

#### Option 2: Enable from an environment variable
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

As an alternative to directly modifying `config/jvm.options`, you can define the properties by using an environment variable. This can be done in a single command when you start OpenSearch or by defining the variable with export.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there words missing at the end of this sentence? It seems like an abrupt stop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised


To add these flags inline when starting OpenSearch, run the following command:

```bash
OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.telemetry.enabled=true" ./opensearch-2.9.0/bin/opensearch
```
{% include copy-curl.html %}
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

To define the environment variable separately, prior to running OpenSearch, run the following command:

```bash
export OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.telemetry.enabled=true"
./bin/opensearch
```
{% include copy-curl.html %}
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

### Enable with Docker containers

If you’re running Docker, add the following line to `docker-compose.yml` underneath the `opensearch-node` and environment section:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

```bash
OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.telemetry.enabled=true"
```
{% include copy-curl.html %}
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

### Enable for OpenSearch development

To enable the distributed feature, you must first enable these features by adding the correct properties to `run.gradle` before building OpenSearch. See the [Developer Guide](https://github.com/opensearch-project/OpenSearch/blob/main/DEVELOPER_GUIDE.md#gradle-build) for information about to use how Gradle to build OpenSearch.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

Add the following properties to `run.gradle` to enable the feature:

```bash
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
testClusters {
runTask {
testDistribution = 'archive'
if (numZones > 1) numberOfZones = numZones
if (numNodes > 1) numberOfNodes = numNodes
systemProperty 'opensearch.experimental.feature.telemetry.enabled', 'true'
}
}
```
{% include copy-curl.html %}
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

### Enable distributed tracing

Once you've enabled the feature flag, you can enable the tracer using the following dynamic setting. This setting can be adjusted dynamically to enable or disable tracing in the running cluster:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

```bash
telemetry.tracer.enabled=true
```
{% include copy-curl.html %}
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

## Install the OpenSearch OpenTelemetry plugin

OpenSearch's distributed tracing framework aims to support various telemetry solutions through plugins. The OpenSearch OpenTelemetry plugin `telemetry-otel` is available and must be installed to enable tracing. The following guide provides you with the installation instructions.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

### Exporters

The distributed tracing feature generates traces and spans for HTTP requests only. These traces and spans are initially kept in memory using the OpenTelemetry `BatchSpanProcessor` and then are sent to an exporter based on configured settings. The following are the key components:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

1. **Span processors:** As spans conclude on the request path, OpenTelemetry provides them to the `SpanProcessor` for processing and exporting. OpenSearch's distributed tracing framework uses the `BatchSpanProcessor`, which batches spans for specific configurable intervals and then sends them to the exporter. The following configurations are available for the `BatchSpanProcessor`:
- `telemetry.otel.tracer.exporter.max_queue_size`: Defines the maximum queue size. When the queue reaches this value, it will be written to the exporter. Default is `2048`.
- `telemetry.otel.tracer.exporter.delay`: Defines the delay. If there are not enough spans to fill the `max_queue_size` until this delay time, they will be flushed. Default is `2 seconds`.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
- `telemetry.otel.tracer.exporter.batch_size`: Configures the maximum batch size for each export to reduce input/output. This value should always be less than the `max_queue_size`. Default is `512`.
2. **Exporters:** Exporters are responsible for persisting the data. OpenTelemetry provides several out-of-the-box exporters, and OpenSearch supports the following:
- `LoggingSpanExporter`: Exports spans to a log file, generating a separate file in the logs directory `_otel_traces.log`. Default is `telemetry.otel.tracer.span.exporter.class=io.opentelemetry.exporter.logging.LoggingSpanExporter`.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

### Sampling

Distributed tracing can generate numerous spans, consuming system resources unnecessarily. To reduce the number of traces, also called samples, you can enable sampling. Sampling is configured by default for only 1% of all HTTP requests. Sampling has the following types:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

1. **Head sampling:** Sampling decisions are made before initiating the root span of a request. OpenSearch supports two head sampling methods:
- **Probabilistic:** A blanket limit on incoming requests, dynamically adjustable with the `telemetry.tracer.sampler.probability` setting. This setting ranges between 0 and 1. Default is 0.01, which indicates that 1% of incoming requests are sampled.
- **On-Demand:** For debugging specific requests, users can send the `trace=true` attribute as part of the header, causing those requests to be sampled regardless of the probabilistic sampling setting.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
2. **Tail base sampling:** To configure tail-based sampling, follow the [OpenTelemetry Sampling](https://opentelemetry.io/docs/concepts/sampling/) documentation. Configuration depends on the type of collector you choose. Updates on ongoing work for OpenSearch are in the [RFC](https://github.com/opensearch-project/OpenSearch/issues/8918) on GitHub.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

### Collection of spans

The `SpanProcessor` writes spans to the exporter, and the choice of exporter defines the endpoint, which can be logs or gRPC. To collect spans by using gRPC, you need to configure the collector as a sidecar process running on each OpenSearch node. From the collectors, these spans can be written to the sync of your choice, such as Jaeger, Prometheus, Grafana, and FileStore, for further analysis.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
Loading