From ceb749e66fc72a80f55a448dcde8a7f6d679788a Mon Sep 17 00:00:00 2001 From: Anthonin Bonnefoy Date: Tue, 19 Mar 2024 16:06:21 +0100 Subject: [PATCH] Doc update --- README.md | 204 +++++++++------------------------------------- doc/pg_tracing.md | 158 +++++++++++++++++++++++++++++++++++ 2 files changed, 195 insertions(+), 167 deletions(-) create mode 100644 doc/pg_tracing.md diff --git a/README.md b/README.md index b8a68b2..1da3e98 100644 --- a/README.md +++ b/README.md @@ -1,33 +1,42 @@ -# pg_tracing - Distributed Tracing for PostgreSQL +# pg_tracing -> [!WARNING] -> This extension is still in early development and may be unstable. - -## Introduction - -The pg_tracing PostgreSQL extension allows to generate server-side spans for distributed tracing. +pg_tracing is a PostgreSQL extension allows to generate server-side spans for distributed tracing. When pg_tracing is active, it generates spans on sampled queries. To access these spans, the extension provides two views: `pg_tracing_consume_spans` and `pg_tracing_peek_spans`. The utility functions `pg_tracing_reset` and `pg_tracing_info` provide ways to read and reset extension's statistics. These are not available globally but can be enabled for a specific database with `CREATE EXTENSION pg_tracing`. Trace propagation currently relies on [SQLCommenter](https://google.github.io/sqlcommenter/). More mechanisms will be added in the future. +> [!WARNING] +> This extension is still in early development and may be unstable. + ## PostgreSQL Version Compatibility pg_tracing only supports PostgreSQL 16 for the moment. -## Compile From Sources +## Documentation + +The following list of files is found in the [doc](doc) folder of the pg_tracing github repository. For [installation instructions](#installation), please see the next section of this README. + +| File | Description | +|-------------------------------------------------------------------|-------------------------------------------------------------------------------| +| [pg_tracing.md](doc/pg_tracing.md) | Main reference documentation for pg_tracing. | + + +## Installation + +### From Source pg_tracing can be compiled against an installed copy of PostgreSQL with development packages using `PGXS`. To compile and install the extension, run: ```bash -git clone https://github.com/datadog/pg_tracing.git +git clone https://github.com/DataDog/pg_tracing.git cd pg_tracing make install ``` -## Usage +## Setup The extension must be loaded by adding pg_tracing to the [shared_preload_libraries](https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-SHARED-PRELOAD-LIBRARIES) in `postgresql.conf`. A server restart is needed to add or remove the extension. @@ -44,7 +53,24 @@ The extension requires additional shared memory proportional to `pg_tracing.max_ When `pg_tracing` is active, it generates spans on sampled queries. To access these spans, the extension provides two views: `pg_tracing_consume_spans` and `pg_tracing_peek_spans`. The utility functions `pg_tracing_reset` and `pg_tracing_info` provide ways to read and reset extension's statistics. These are not available globally but can be enabled for a specific database with `CREATE EXTENSION pg_tracing`. -### Trace Example +## Usage + +Trace context can be propagated through [SQLCommenter](https://google.github.io/sqlcommenter/). By default, all queries with a SQLCommenter with a sampled flag enabled will generate spans. + +```sql +-- Query with trace context and sampled flag enable +/*traceparent='00-00000000000000000000000000000123-0000000000000123-01'*/ SELECT 1; + +-- Check the generated spans +select trace_id, parent_id, span_id, span_start, span_end, span_type, span_operation from pg_tracing_consume_spans order by span_start; + trace_id | parent_id | span_id | span_start | span_end | span_type | span_operation +----------------------------------+------------------+------------------+-------------------------------+-------------------------------+--------------+---------------- + 00000000000000000000000000000123 | 0000000000000123 | 4268a4281c5316dd | 2024-03-19 13:46:43.97958+00 | 2024-03-19 13:46:43.980121+00 | Select query | SELECT $1; + 00000000000000000000000000000123 | 4268a4281c5316dd | 87cb96b6459880a0 | 2024-03-19 13:46:43.979642+00 | 2024-03-19 13:46:43.979978+00 | Planner | Planner + 00000000000000000000000000000123 | 4268a4281c5316dd | f5994f9159d8e80d | 2024-03-19 13:46:43.980081+00 | 2024-03-19 13:46:43.980111+00 | Executor | ExecutorRun +``` + +Queries can also be sampled randomly through the `pg_tracing.sample_rate` parameter. Setting this to 1 will trace all queries. ```sql -- Enable tracing for all queries @@ -60,164 +86,8 @@ select trace_id, parent_id, span_id, span_start, span_end, span_type, span_opera 458fbefd7034e670eb3d9c930862c378 | eb3d9c930862c378 | bdecb6e35d429f3d | 2024-01-10 09:54:16.321253+00 | 2024-01-10 09:54:16.321587+00 | Select query | SELECT $1; 458fbefd7034e670eb3d9c930862c378 | bdecb6e35d429f3d | ad49f27543b0175d | 2024-01-10 09:54:16.3213+00 | 2024-01-10 09:54:16.321412+00 | Planner | Planner 458fbefd7034e670eb3d9c930862c378 | bdecb6e35d429f3d | 8805f7749249536b | 2024-01-10 09:54:16.321485+00 | 2024-01-10 09:54:16.321529+00 | Executor | ExecutorRun - --- Calling pg_tracing_consume_spans a second time won't return anything as spans were consumed -select trace_id, parent_id, span_id, span_start, span_end, span_type, span_operation from pg_tracing_consume_spans order by span_start; - trace_id | parent_id | span_id | span_start | duration | name | resource -----------+-----------+---------+------------+----------+------+---------- -(0 rows) -``` - -## Views - -### pg_tracing_info - -The statistics of the pg_tracing module itself are tracked and made available via a view named `pg_tracing_info`. This view contains only a single row. - -| Column | Type | Description | -| --- | --- | --- -| `traces` | bigint | Total number of traces captured | -| `spans` | bigint | Total number of spans captured | -| `dropped_spans` | bigint | Total number of spans dropped due to exceeding `pg_tracing.max_span` spans | -| `failed_truncates` | bigint | Total number of times the module couldn't truncate the query file due to conflict lock on pg_tracing's text file | -| `last_consume` | timestamp with time zone | Time at which spans were last consumed | -| `stats_reset` | timestamp with time zone | Time at which all statistics in the `pg_tracing_info` view were last reset | - -### pg_tracing_consume_spans/pg_tracing_peek_spans - -The spans generated by the module are made available via a view named `pg_tracing_consume_spans`. This view contains one row for each generated span. The `pg_tracing_peek_spans` view provides the same output as `pg_tracing_consume_spans`. `pg_tracing_peek_spans` doesn't delete read spans. - -| Column | Type | Description | -| --- | --- | --- -| `traceid` | char(32) | Trace identifier of the span | -| `parent_id` | char(16) | Identifier of the span | -| `queryid` | bigint | Hash code to identify identical normalized queries, 0 if `query_id` is disabled | -| `span_type` | text | Type of span (`Planner`, `Executor`...) | -| `span_operation` | text | Name of the span's operation | -| `span_start` | timestamp with time zone | Start time of the span | -| `span_end` | timestamp with time zone | End time of the span | -| `sql_error_code` | char(5) | Error code of the query. `00000` for a succesful query | -| `userid` | oid | OID of user who executed the statement | -| `dbid` | oid | OID of database in which the statement was executed | -| `pid` | integer | Pid of the backend process that processed the query | -| `subxact_count` | smallint | Active count of backend's subtransaction when the span was created | -| `plan_startup_cost` | float8 | Estimated cost before fetching any tuples by the query planner | -| `plan_total_cost` | double precision | Estimated total cost by the query planner | -| `plan_rows` | double precision | Estimated number of row plan is expected to emit | -| `plan_width` | integer | Estimated row width in bytes by the query planner | -| `rows` | bigint | Number of tuples processed | -| `nloops` | bigint | Number of cycles for this node | -| `shared_blks_hit` | bigint | Total number of shared block cache hits by the node | -| `shared_blks_read` | bigint | Total number of shared blocks read by the node | -| `shared_blks_dirtied` | bigint | Total number of shared blocks dirtied by the node | -| `shared_blks_written` | bigint | Total number of shared blocks written by the node | -| `local_blks_hit` | bigint | Total number of local block cache hits by the node | -| `local_blks_read` | bigint | Total number of local blocks read by the node | -| `local_blks_dirtied` | bigint | Total number of local blocks dirtied by the node | -| `local_blks_written` | bigint | Total number of local blocks written by the node | -| `blk_read_time` | double precision | Time spent reading blocks in milliseconds | -| `blk_write_time` | double precision | Time spent writing blocks in milliseconds | -| `temp_blks_read` | bigint | Total number of temp blocks read by the node | -| `temp_blks_written` | bigint | Total number of temp blocks written by the node | -| `temp_blk_read_time` | double precision | Total time the node spent reading temporary file blocks, in milliseconds (if `track_io_timing` is enabled, otherwise zero) | -| `temp_blk_write_time` | double precision | Total time the node spent writing temporary file blocks, in milliseconds (if `track_io_timing` is enabled, otherwise zero) | -| `wal_records` | bigint | Total number of WAL records generated by the node | -| `wal_fpi` | bigint | Total number of WAL full page images generated by the node | -| `wal_bytes` | numeric | Total amount of WAL generated by the node in bytes | -| `jit_functions` | bigint | Total number of functions JIT-compiled by the node | -| `jit_generation_time` | double precision | Total time spent by the node on generating JIT code, in milliseconds | -| `jit_inlining_time` | double precision | Total time spent by the node on inlining functions, in milliseconds | -| `jit_optimization_time` | double precision | Total time spent by the node on optimizing, in milliseconds | -| `startup` | bigint | Time to the first tuple in nanoseconds | -| `parameters` | text | Value of the query's parameters | - -## Functions - -### pg_tracing_reset() - -Discards all statistics gathered so far by `pg_tracing`. Span buffer is not emptied by this function. By default, this function can only be executed by superusers. Access may be granted to others using `GRANT`. - -### pg_tracing_spans(boolean) - -The `pg_tracing_consume_spans` and `pg_tracing_peek_spans` views are defined in terms of a function named `pg_tracing_spans`. It is possible for clients to call the `pg_tracing_spans` function directly. By calling `pg_tracing_spans` with consume false, spans won't be removed from the span buffer. With consume true, read spans will be deleted. By default, this function can only be executed by superusers. Access may be granted to others using `GRANT`. - -## Parameters - -### pg_tracing.max_span (integer) - -Specifies the maximum number of spans stored by the extension. If more spans are generated, the span buffer will be emptied if `pg_tracing.buffer_mode` is set to `drop_on_full`. If `pg_tracing.buffer_mode` is set to `keep_on_full`, the new spans will be dropped and tracing will be aborted. The default value is 5000. This parameter can only be set at server start. - -> [!NOTE] -> A span takes around 370 bytes of memory so 5000 spans will use 1.7MB while 10000 spans will use 3.5MB. You can use the following query to get the exact size used by the pg_tracing's spans: -```sql -SELECT - (SELECT setting from pg_settings where name='pg_tracing.max_span') AS max_span, - pg_size_pretty(size) AS total_span_size, - pg_size_pretty(size/(SELECT setting from pg_settings - WHERE name='pg_tracing.max_span')::int) - AS size_per_span -FROM pg_shmem_allocations -WHERE name ='PgTracing Spans'; ``` -### pg_tracing.buffer_mode (enum) - -Controls span buffer's behaviour when `pg_tracing.max_span` spans is reached. If `keep_on_full`, the existing buffer is kept while new spans are dropped. If `drop_on_full`, the existing buffer is dropped and new spans are added. The default value is `keep_on_full`. - -### pg_tracing.max_parameter_size (integer) - -Controls the maximum size of the parameter string. The default value is 1024. - -### pg_tracing.track (enum) - -Controls which statements should be generated. Specify `top` to only generate spans for top-level statements (those issued directly by clients), `all` to also generate spans for nested statements (such as statements invoked within functions), or `none` to disable span generation. The default value is `all`. - -### pg_tracing.track_utility (boolean) - -Controls whether spans should be generated for utility statements. Utility commands are all those other than `SELECT`, `INSERT`, `UPDATE`, `DELETE`, `TABLE`, and `MERGE`. The default value is `on`. - -### pg_tracing.sample_rate (real) - -Controls the fraction of statements that generate spans. Statements with tracecontext propagated with SQLCommenter and sampled flag enabled are not impacted by this parameter. For traces with nested statements, either all will be explained or none. The default value is 0. - -### pg_tracing.caller_sample_rate (real) - -Controls the fraction of statements with SQLCommenter tracecontext and an enabled sampled flag that will generate spans. The default value is 1. - -### pg_tracing.export_parameters (boolean) - -Controls whether the query's parameters should be exported in spans metadata. The default value is `on`. - -## Implementation Details - -### Tracecontext Propagation - -A query with SQLcommenter can propagate a trace context to the database. It will have the following format: -```sql -/*dddbs='postgres.db',traceparent='00-00000000000000000000000000000009-0000000000000005-01'*/ select 1; - --- SQLCommenter may be added to the end -select 1 /*dddbs='postgres.db',traceparent='00-00000000000000000000000000000009-0000000000000005-01'*/ -``` - -The traceparent fields are detailed in [w3c's trace-context](https://www.w3.org/TR/trace-context/#traceparent-header-field-values) - -``` -00000000000000000000000000000009: trace id -0000000000000005: parent id -01: trace flags (01 == sampled) -``` - -When a query with SQLCommenter is detected, the trace context is extracted and used by pg_tracing. - -### Sampling - -Spans will only be generated for sampled queries. A query is sampled if: -- It has a trace context propagated throught SQLCommenter with the sampled flag enabled and it passes the `pg_tracing.caller_sample_rate` -- It has no SQLCommenter but the query randomly passes the global `pg_tracing.sample_rate` - -With the default values `pg_tracing.caller_sample_rate = 1` and `pg_tracing.sample_rate = 0`, only queries with a trace context and a sampled flag `sampled := 01` will be sampled, effectively offloading sampling decision to the callers. - ## Authors * [Anthonin Bonnefoy](https://github.com/bonnefoa) diff --git a/doc/pg_tracing.md b/doc/pg_tracing.md new file mode 100644 index 0000000..59368fa --- /dev/null +++ b/doc/pg_tracing.md @@ -0,0 +1,158 @@ +# pg_tracing + +pg_tracing extension allows to generate spans from with a PostgreSQL instance, providing informations like: + +- Top Level Query: A span representing the executed query with the query text and parameters used +- Planner: A span representing the time spent planning the query +- ExecutorRun: A span representing the execution of the query +- Nested queries support: Queries executed within another query (like calling a pgsql function) will be represented by its own top level query and subsequent spans +- Trigger support: Before and After triggers will be represented by their own top level queries + +## Tracecontext Propagation + +A query with SQLcommenter can propagate a trace context to the database. It will have the following format: +```sql +/*dddbs='postgres.db',traceparent='00-00000000000000000000000000000009-0000000000000005-01'*/ select 1; + +-- SQLCommenter may be added to the end +select 1 /*dddbs='postgres.db',traceparent='00-00000000000000000000000000000009-0000000000000005-01'*/ +``` + +The traceparent fields are detailed in [w3c's trace-context](https://www.w3.org/TR/trace-context/#traceparent-header-field-values) + +``` +00000000000000000000000000000009: trace id +0000000000000005: parent id +01: trace flags (01 == sampled) +``` + +When a query with SQLCommenter is detected, the trace context is extracted and used by pg_tracing. + +## Sampling + +Spans will only be generated for sampled queries. A query is sampled if: +- It has a trace context propagated throught SQLCommenter with the sampled flag enabled and it passes the `pg_tracing.caller_sample_rate` +- It has no SQLCommenter but the query randomly passes the global `pg_tracing.sample_rate` + +With the default values `pg_tracing.caller_sample_rate = 1` and `pg_tracing.sample_rate = 0`, only queries with a trace context and a sampled flag `sampled := 01` will be sampled, effectively offloading sampling decision to the callers. + +## Views + +### pg_tracing_consume_spans/pg_tracing_peek_spans + +The spans generated by the module are made available via a view named `pg_tracing_consume_spans`. This view contains one row for each generated span. The `pg_tracing_peek_spans` view provides the same output as `pg_tracing_consume_spans`. `pg_tracing_peek_spans` doesn't delete read spans. + +| Column | Type | Description | +| --- | --- | --- +| `traceid` | char(32) | Trace identifier of the span | +| `parent_id` | char(16) | Identifier of the span | +| `queryid` | bigint | Hash code to identify identical normalized queries, 0 if `query_id` is disabled | +| `span_type` | text | Type of span (`Planner`, `Executor`...) | +| `span_operation` | text | Name of the span's operation | +| `span_start` | timestamp with time zone | Start time of the span | +| `span_end` | timestamp with time zone | End time of the span | +| `sql_error_code` | char(5) | Error code of the query. `00000` for a succesful query | +| `userid` | oid | OID of user who executed the statement | +| `dbid` | oid | OID of database in which the statement was executed | +| `pid` | integer | Pid of the backend process that processed the query | +| `subxact_count` | smallint | Active count of backend's subtransaction when the span was created | +| `plan_startup_cost` | float8 | Estimated cost before fetching any tuples by the query planner | +| `plan_total_cost` | double precision | Estimated total cost by the query planner | +| `plan_rows` | double precision | Estimated number of row plan is expected to emit | +| `plan_width` | integer | Estimated row width in bytes by the query planner | +| `rows` | bigint | Number of tuples processed | +| `nloops` | bigint | Number of cycles for this node | +| `shared_blks_hit` | bigint | Total number of shared block cache hits by the node | +| `shared_blks_read` | bigint | Total number of shared blocks read by the node | +| `shared_blks_dirtied` | bigint | Total number of shared blocks dirtied by the node | +| `shared_blks_written` | bigint | Total number of shared blocks written by the node | +| `local_blks_hit` | bigint | Total number of local block cache hits by the node | +| `local_blks_read` | bigint | Total number of local blocks read by the node | +| `local_blks_dirtied` | bigint | Total number of local blocks dirtied by the node | +| `local_blks_written` | bigint | Total number of local blocks written by the node | +| `blk_read_time` | double precision | Time spent reading blocks in milliseconds | +| `blk_write_time` | double precision | Time spent writing blocks in milliseconds | +| `temp_blks_read` | bigint | Total number of temp blocks read by the node | +| `temp_blks_written` | bigint | Total number of temp blocks written by the node | +| `temp_blk_read_time` | double precision | Total time the node spent reading temporary file blocks, in milliseconds (if `track_io_timing` is enabled, otherwise zero) | +| `temp_blk_write_time` | double precision | Total time the node spent writing temporary file blocks, in milliseconds (if `track_io_timing` is enabled, otherwise zero) | +| `wal_records` | bigint | Total number of WAL records generated by the node | +| `wal_fpi` | bigint | Total number of WAL full page images generated by the node | +| `wal_bytes` | numeric | Total amount of WAL generated by the node in bytes | +| `jit_functions` | bigint | Total number of functions JIT-compiled by the node | +| `jit_generation_time` | double precision | Total time spent by the node on generating JIT code, in milliseconds | +| `jit_inlining_time` | double precision | Total time spent by the node on inlining functions, in milliseconds | +| `jit_optimization_time` | double precision | Total time spent by the node on optimizing, in milliseconds | +| `startup` | bigint | Time to the first tuple in nanoseconds | +| `parameters` | text | Value of the query's parameters | + +## Functions + +### pg_tracing_info + +The statistics of the pg_tracing extension itself are tracked and made available through `pg_tracing_info`. This function will only return a single row. + +| Column | Type | Description | +| --- | --- | --- +| `traces` | bigint | Total number of traces captured | +| `spans` | bigint | Total number of spans captured | +| `dropped_spans` | bigint | Total number of spans dropped due to exceeding `pg_tracing.max_span` spans | +| `failed_truncates` | bigint | Total number of times the module couldn't truncate the query file due to conflict lock on pg_tracing's text file | +| `last_consume` | timestamp with time zone | Time at which spans were last consumed | +| `stats_reset` | timestamp with time zone | Time at which all statistics in the `pg_tracing_info` view were last reset | + + +### pg_tracing_reset() + +Discards all statistics gathered so far by `pg_tracing`. Span buffer is not emptied by this function. By default, this function can only be executed by superusers. Access may be granted to others using `GRANT`. + +### pg_tracing_spans(boolean) + +The `pg_tracing_consume_spans` and `pg_tracing_peek_spans` views are defined in terms of a function named `pg_tracing_spans`. It is possible for clients to call the `pg_tracing_spans` function directly. By calling `pg_tracing_spans` with consume false, spans won't be removed from the span buffer. With consume true, read spans will be deleted. By default, this function can only be executed by superusers. Access may be granted to others using `GRANT`. + +## Parameters + +### pg_tracing.max_span (integer) + +Specifies the maximum number of spans stored by the extension. If more spans are generated, the span buffer will be emptied if `pg_tracing.buffer_mode` is set to `drop_on_full`. If `pg_tracing.buffer_mode` is set to `keep_on_full`, the new spans will be dropped and tracing will be aborted. The default value is 5000. This parameter can only be set at server start. + +> [!NOTE] +> A span takes around 370 bytes of memory so 5000 spans will use 1.7MB while 10000 spans will use 3.5MB. You can use the following query to get the exact size used by the pg_tracing's spans: +```sql +SELECT + (SELECT setting from pg_settings where name='pg_tracing.max_span') AS max_span, + pg_size_pretty(size) AS total_span_size, + pg_size_pretty(size/(SELECT setting from pg_settings + WHERE name='pg_tracing.max_span')::int) + AS size_per_span +FROM pg_shmem_allocations +WHERE name ='PgTracing Spans'; +``` + +### pg_tracing.buffer_mode (enum) + +Controls span buffer's behaviour when `pg_tracing.max_span` spans is reached. If `keep_on_full`, the existing buffer is kept while new spans are dropped. If `drop_on_full`, the existing buffer is dropped and new spans are added. The default value is `keep_on_full`. + +### pg_tracing.max_parameter_size (integer) + +Controls the maximum size of the parameter string. The default value is 1024. + +### pg_tracing.track (enum) + +Controls which statements should be generated. Specify `top` to only generate spans for top-level statements (those issued directly by clients), `all` to also generate spans for nested statements (such as statements invoked within functions), or `none` to disable span generation. The default value is `all`. + +### pg_tracing.track_utility (boolean) + +Controls whether spans should be generated for utility statements. Utility commands are all those other than `SELECT`, `INSERT`, `UPDATE`, `DELETE`, `TABLE`, and `MERGE`. The default value is `on`. + +### pg_tracing.sample_rate (real) + +Controls the fraction of statements that generate spans. Statements with tracecontext propagated with SQLCommenter and sampled flag enabled are not impacted by this parameter. For traces with nested statements, either all will be explained or none. The default value is 0. + +### pg_tracing.caller_sample_rate (real) + +Controls the fraction of statements with SQLCommenter tracecontext and an enabled sampled flag that will generate spans. The default value is 1. + +### pg_tracing.export_parameters (boolean) + +Controls whether the query's parameters should be exported in spans metadata. The default value is `on`.