Skip to content

Commit

Permalink
Doc update
Browse files Browse the repository at this point in the history
  • Loading branch information
bonnefoa committed Mar 19, 2024
1 parent 6efac5d commit ceb749e
Show file tree
Hide file tree
Showing 2 changed files with 195 additions and 167 deletions.
204 changes: 37 additions & 167 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,42 @@
# pg_tracing - Distributed Tracing for PostgreSQL
# pg_tracing

> [!WARNING]
> This extension is still in early development and may be unstable.
## Introduction

The pg_tracing PostgreSQL extension allows to generate server-side spans for distributed tracing.
pg_tracing is a PostgreSQL extension allows to generate server-side spans for distributed tracing.

When pg_tracing is active, it generates spans on sampled queries. To access these spans, the extension provides two views: `pg_tracing_consume_spans` and `pg_tracing_peek_spans`. The utility functions `pg_tracing_reset` and `pg_tracing_info` provide ways to read and reset extension's statistics. These are not available globally but can be enabled for a specific database with `CREATE EXTENSION pg_tracing`.

Trace propagation currently relies on [SQLCommenter](https://google.github.io/sqlcommenter/). More mechanisms will be added in the future.

> [!WARNING]
> This extension is still in early development and may be unstable.
## PostgreSQL Version Compatibility

pg_tracing only supports PostgreSQL 16 for the moment.

## Compile From Sources
## Documentation

The following list of files is found in the [doc](doc) folder of the pg_tracing github repository. For [installation instructions](#installation), please see the next section of this README.

| File | Description |
|-------------------------------------------------------------------|-------------------------------------------------------------------------------|
| [pg_tracing.md](doc/pg_tracing.md) | Main reference documentation for pg_tracing. |


## Installation

### From Source

pg_tracing can be compiled against an installed copy of PostgreSQL with development packages using `PGXS`.

To compile and install the extension, run:

```bash
git clone https://github.com/datadog/pg_tracing.git
git clone https://github.com/DataDog/pg_tracing.git
cd pg_tracing
make install
```

## Usage
## Setup

The extension must be loaded by adding pg_tracing to the [shared_preload_libraries](https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-SHARED-PRELOAD-LIBRARIES) in `postgresql.conf`. A server restart is needed to add or remove the extension.

Expand All @@ -44,7 +53,24 @@ The extension requires additional shared memory proportional to `pg_tracing.max_

When `pg_tracing` is active, it generates spans on sampled queries. To access these spans, the extension provides two views: `pg_tracing_consume_spans` and `pg_tracing_peek_spans`. The utility functions `pg_tracing_reset` and `pg_tracing_info` provide ways to read and reset extension's statistics. These are not available globally but can be enabled for a specific database with `CREATE EXTENSION pg_tracing`.

### Trace Example
## Usage

Trace context can be propagated through [SQLCommenter](https://google.github.io/sqlcommenter/). By default, all queries with a SQLCommenter with a sampled flag enabled will generate spans.

```sql
-- Query with trace context and sampled flag enable
/*traceparent='00-00000000000000000000000000000123-0000000000000123-01'*/ SELECT 1;

-- Check the generated spans
select trace_id, parent_id, span_id, span_start, span_end, span_type, span_operation from pg_tracing_consume_spans order by span_start;
trace_id | parent_id | span_id | span_start | span_end | span_type | span_operation
----------------------------------+------------------+------------------+-------------------------------+-------------------------------+--------------+----------------
00000000000000000000000000000123 | 0000000000000123 | 4268a4281c5316dd | 2024-03-19 13:46:43.97958+00 | 2024-03-19 13:46:43.980121+00 | Select query | SELECT $1;
00000000000000000000000000000123 | 4268a4281c5316dd | 87cb96b6459880a0 | 2024-03-19 13:46:43.979642+00 | 2024-03-19 13:46:43.979978+00 | Planner | Planner
00000000000000000000000000000123 | 4268a4281c5316dd | f5994f9159d8e80d | 2024-03-19 13:46:43.980081+00 | 2024-03-19 13:46:43.980111+00 | Executor | ExecutorRun
```

Queries can also be sampled randomly through the `pg_tracing.sample_rate` parameter. Setting this to 1 will trace all queries.

```sql
-- Enable tracing for all queries
Expand All @@ -60,164 +86,8 @@ select trace_id, parent_id, span_id, span_start, span_end, span_type, span_opera
458fbefd7034e670eb3d9c930862c378 | eb3d9c930862c378 | bdecb6e35d429f3d | 2024-01-10 09:54:16.321253+00 | 2024-01-10 09:54:16.321587+00 | Select query | SELECT $1;
458fbefd7034e670eb3d9c930862c378 | bdecb6e35d429f3d | ad49f27543b0175d | 2024-01-10 09:54:16.3213+00 | 2024-01-10 09:54:16.321412+00 | Planner | Planner
458fbefd7034e670eb3d9c930862c378 | bdecb6e35d429f3d | 8805f7749249536b | 2024-01-10 09:54:16.321485+00 | 2024-01-10 09:54:16.321529+00 | Executor | ExecutorRun

-- Calling pg_tracing_consume_spans a second time won't return anything as spans were consumed
select trace_id, parent_id, span_id, span_start, span_end, span_type, span_operation from pg_tracing_consume_spans order by span_start;
trace_id | parent_id | span_id | span_start | duration | name | resource
----------+-----------+---------+------------+----------+------+----------
(0 rows)
```

## Views

### pg_tracing_info

The statistics of the pg_tracing module itself are tracked and made available via a view named `pg_tracing_info`. This view contains only a single row.

| Column | Type | Description |
| --- | --- | ---
| `traces` | bigint | Total number of traces captured |
| `spans` | bigint | Total number of spans captured |
| `dropped_spans` | bigint | Total number of spans dropped due to exceeding `pg_tracing.max_span` spans |
| `failed_truncates` | bigint | Total number of times the module couldn't truncate the query file due to conflict lock on pg_tracing's text file |
| `last_consume` | timestamp with time zone | Time at which spans were last consumed |
| `stats_reset` | timestamp with time zone | Time at which all statistics in the `pg_tracing_info` view were last reset |

### pg_tracing_consume_spans/pg_tracing_peek_spans

The spans generated by the module are made available via a view named `pg_tracing_consume_spans`. This view contains one row for each generated span. The `pg_tracing_peek_spans` view provides the same output as `pg_tracing_consume_spans`. `pg_tracing_peek_spans` doesn't delete read spans.

| Column | Type | Description |
| --- | --- | ---
| `traceid` | char(32) | Trace identifier of the span |
| `parent_id` | char(16) | Identifier of the span |
| `queryid` | bigint | Hash code to identify identical normalized queries, 0 if `query_id` is disabled |
| `span_type` | text | Type of span (`Planner`, `Executor`...) |
| `span_operation` | text | Name of the span's operation |
| `span_start` | timestamp with time zone | Start time of the span |
| `span_end` | timestamp with time zone | End time of the span |
| `sql_error_code` | char(5) | Error code of the query. `00000` for a succesful query |
| `userid` | oid | OID of user who executed the statement |
| `dbid` | oid | OID of database in which the statement was executed |
| `pid` | integer | Pid of the backend process that processed the query |
| `subxact_count` | smallint | Active count of backend's subtransaction when the span was created |
| `plan_startup_cost` | float8 | Estimated cost before fetching any tuples by the query planner |
| `plan_total_cost` | double precision | Estimated total cost by the query planner |
| `plan_rows` | double precision | Estimated number of row plan is expected to emit |
| `plan_width` | integer | Estimated row width in bytes by the query planner |
| `rows` | bigint | Number of tuples processed |
| `nloops` | bigint | Number of cycles for this node |
| `shared_blks_hit` | bigint | Total number of shared block cache hits by the node |
| `shared_blks_read` | bigint | Total number of shared blocks read by the node |
| `shared_blks_dirtied` | bigint | Total number of shared blocks dirtied by the node |
| `shared_blks_written` | bigint | Total number of shared blocks written by the node |
| `local_blks_hit` | bigint | Total number of local block cache hits by the node |
| `local_blks_read` | bigint | Total number of local blocks read by the node |
| `local_blks_dirtied` | bigint | Total number of local blocks dirtied by the node |
| `local_blks_written` | bigint | Total number of local blocks written by the node |
| `blk_read_time` | double precision | Time spent reading blocks in milliseconds |
| `blk_write_time` | double precision | Time spent writing blocks in milliseconds |
| `temp_blks_read` | bigint | Total number of temp blocks read by the node |
| `temp_blks_written` | bigint | Total number of temp blocks written by the node |
| `temp_blk_read_time` | double precision | Total time the node spent reading temporary file blocks, in milliseconds (if `track_io_timing` is enabled, otherwise zero) |
| `temp_blk_write_time` | double precision | Total time the node spent writing temporary file blocks, in milliseconds (if `track_io_timing` is enabled, otherwise zero) |
| `wal_records` | bigint | Total number of WAL records generated by the node |
| `wal_fpi` | bigint | Total number of WAL full page images generated by the node |
| `wal_bytes` | numeric | Total amount of WAL generated by the node in bytes |
| `jit_functions` | bigint | Total number of functions JIT-compiled by the node |
| `jit_generation_time` | double precision | Total time spent by the node on generating JIT code, in milliseconds |
| `jit_inlining_time` | double precision | Total time spent by the node on inlining functions, in milliseconds |
| `jit_optimization_time` | double precision | Total time spent by the node on optimizing, in milliseconds |
| `startup` | bigint | Time to the first tuple in nanoseconds |
| `parameters` | text | Value of the query's parameters |

## Functions

### pg_tracing_reset()

Discards all statistics gathered so far by `pg_tracing`. Span buffer is not emptied by this function. By default, this function can only be executed by superusers. Access may be granted to others using `GRANT`.

### pg_tracing_spans(boolean)

The `pg_tracing_consume_spans` and `pg_tracing_peek_spans` views are defined in terms of a function named `pg_tracing_spans`. It is possible for clients to call the `pg_tracing_spans` function directly. By calling `pg_tracing_spans` with consume false, spans won't be removed from the span buffer. With consume true, read spans will be deleted. By default, this function can only be executed by superusers. Access may be granted to others using `GRANT`.

## Parameters

### pg_tracing.max_span (integer)

Specifies the maximum number of spans stored by the extension. If more spans are generated, the span buffer will be emptied if `pg_tracing.buffer_mode` is set to `drop_on_full`. If `pg_tracing.buffer_mode` is set to `keep_on_full`, the new spans will be dropped and tracing will be aborted. The default value is 5000. This parameter can only be set at server start.

> [!NOTE]
> A span takes around 370 bytes of memory so 5000 spans will use 1.7MB while 10000 spans will use 3.5MB. You can use the following query to get the exact size used by the pg_tracing's spans:
```sql
SELECT
(SELECT setting from pg_settings where name='pg_tracing.max_span') AS max_span,
pg_size_pretty(size) AS total_span_size,
pg_size_pretty(size/(SELECT setting from pg_settings
WHERE name='pg_tracing.max_span')::int)
AS size_per_span
FROM pg_shmem_allocations
WHERE name ='PgTracing Spans';
```

### pg_tracing.buffer_mode (enum)

Controls span buffer's behaviour when `pg_tracing.max_span` spans is reached. If `keep_on_full`, the existing buffer is kept while new spans are dropped. If `drop_on_full`, the existing buffer is dropped and new spans are added. The default value is `keep_on_full`.

### pg_tracing.max_parameter_size (integer)

Controls the maximum size of the parameter string. The default value is 1024.

### pg_tracing.track (enum)

Controls which statements should be generated. Specify `top` to only generate spans for top-level statements (those issued directly by clients), `all` to also generate spans for nested statements (such as statements invoked within functions), or `none` to disable span generation. The default value is `all`.

### pg_tracing.track_utility (boolean)

Controls whether spans should be generated for utility statements. Utility commands are all those other than `SELECT`, `INSERT`, `UPDATE`, `DELETE`, `TABLE`, and `MERGE`. The default value is `on`.

### pg_tracing.sample_rate (real)

Controls the fraction of statements that generate spans. Statements with tracecontext propagated with SQLCommenter and sampled flag enabled are not impacted by this parameter. For traces with nested statements, either all will be explained or none. The default value is 0.

### pg_tracing.caller_sample_rate (real)

Controls the fraction of statements with SQLCommenter tracecontext and an enabled sampled flag that will generate spans. The default value is 1.

### pg_tracing.export_parameters (boolean)

Controls whether the query's parameters should be exported in spans metadata. The default value is `on`.

## Implementation Details

### Tracecontext Propagation

A query with SQLcommenter can propagate a trace context to the database. It will have the following format:
```sql
/*dddbs='postgres.db',traceparent='00-00000000000000000000000000000009-0000000000000005-01'*/ select 1;

-- SQLCommenter may be added to the end
select 1 /*dddbs='postgres.db',traceparent='00-00000000000000000000000000000009-0000000000000005-01'*/
```

The traceparent fields are detailed in [w3c's trace-context](https://www.w3.org/TR/trace-context/#traceparent-header-field-values)

```
00000000000000000000000000000009: trace id
0000000000000005: parent id
01: trace flags (01 == sampled)
```

When a query with SQLCommenter is detected, the trace context is extracted and used by pg_tracing.

### Sampling

Spans will only be generated for sampled queries. A query is sampled if:
- It has a trace context propagated throught SQLCommenter with the sampled flag enabled and it passes the `pg_tracing.caller_sample_rate`
- It has no SQLCommenter but the query randomly passes the global `pg_tracing.sample_rate`

With the default values `pg_tracing.caller_sample_rate = 1` and `pg_tracing.sample_rate = 0`, only queries with a trace context and a sampled flag `sampled := 01` will be sampled, effectively offloading sampling decision to the callers.

## Authors

* [Anthonin Bonnefoy](https://github.com/bonnefoa)
Loading

0 comments on commit ceb749e

Please sign in to comment.