pg_tracing extension allows to generate spans from with a PostgreSQL instance, providing informations like:
- Top Level Query: A span representing the executed query with the query text and parameters used
- Planner: A span representing the time spent planning the query
- ExecutorRun: A span representing the execution of the query
- Nested queries support: Queries executed within another query (like calling a pgsql function) will be represented by its own top level query and subsequent spans
- Trigger support: Before and After triggers will be represented by their own top level queries
SQLcommenter or pg_tracing.trace_context
can be used to propagate a trace context to the database. It will have the following format:
/*dddbs='postgres.db',traceparent='00-00000000000000000000000000000009-0000000000000005-01'*/ select 1;
-- SQLCommenter may be added to the end
select 1 /*dddbs='postgres.db',traceparent='00-00000000000000000000000000000009-0000000000000005-01'*/
-- Propagation through trace_context GUC
BEGIN;
SET LOCAL pg_tracing.trace_context='traceparent=''00-00000000000000000000000000000005-0000000000000005-01''';
UPDATE pgbench_accounts SET abalance=1 where aid=1;
COMMIT;
The traceparent fields are detailed in w3c's trace-context
00000000000000000000000000000009: trace id
0000000000000005: parent id
01: trace flags (01 == sampled)
When a query with a trace context is detected, it is extracted and used by pg_tracing.
Spans will only be generated for sampled queries. A query is sampled if:
- It has a trace context with the sampled flag enabled and it passes the
pg_tracing.caller_sample_rate
- It has no trace context but the query randomly passes the global
pg_tracing.sample_rate
With the default values pg_tracing.caller_sample_rate = 1
and pg_tracing.sample_rate = 0
, only queries with a trace context and a sampled flag sampled := 01
will be sampled, effectively offloading sampling decision to the callers.
The spans generated by the module are made available via a view named pg_tracing_consume_spans
. This view contains one row for each generated span. The pg_tracing_peek_spans
view provides the same output as pg_tracing_consume_spans
. pg_tracing_peek_spans
doesn't delete read spans.
Column | Type | Description |
---|---|---|
traceid |
char(32) | Trace identifier of the span |
parent_id |
char(16) | Identifier of the span |
queryid |
bigint | Hash code to identify identical normalized queries, 0 if query_id is disabled |
span_type |
text | Type of span (Planner , Executor ...) |
span_operation |
text | Name of the span's operation |
span_start |
timestamp with time zone | Start time of the span |
span_end |
timestamp with time zone | End time of the span |
sql_error_code |
char(5) | Error code of the query. 00000 for a succesful query |
userid |
oid | OID of user who executed the statement |
dbid |
oid | OID of database in which the statement was executed |
pid |
integer | Pid of the backend process that processed the query |
subxact_count |
smallint | Active count of backend's subtransaction when the span was created |
plan_startup_cost |
float8 | Estimated cost before fetching any tuples by the query planner |
plan_total_cost |
double precision | Estimated total cost by the query planner |
plan_rows |
double precision | Estimated number of row plan is expected to emit |
plan_width |
integer | Estimated row width in bytes by the query planner |
rows |
bigint | Number of tuples processed |
nloops |
bigint | Number of cycles for this node |
shared_blks_hit |
bigint | Total number of shared block cache hits by the node |
shared_blks_read |
bigint | Total number of shared blocks read by the node |
shared_blks_dirtied |
bigint | Total number of shared blocks dirtied by the node |
shared_blks_written |
bigint | Total number of shared blocks written by the node |
local_blks_hit |
bigint | Total number of local block cache hits by the node |
local_blks_read |
bigint | Total number of local blocks read by the node |
local_blks_dirtied |
bigint | Total number of local blocks dirtied by the node |
local_blks_written |
bigint | Total number of local blocks written by the node |
blk_read_time |
double precision | Time spent reading blocks in milliseconds |
blk_write_time |
double precision | Time spent writing blocks in milliseconds |
temp_blks_read |
bigint | Total number of temp blocks read by the node |
temp_blks_written |
bigint | Total number of temp blocks written by the node |
temp_blk_read_time |
double precision | Total time the node spent reading temporary file blocks, in milliseconds (if track_io_timing is enabled, otherwise zero) |
temp_blk_write_time |
double precision | Total time the node spent writing temporary file blocks, in milliseconds (if track_io_timing is enabled, otherwise zero) |
wal_records |
bigint | Total number of WAL records generated by the node |
wal_fpi |
bigint | Total number of WAL full page images generated by the node |
wal_bytes |
numeric | Total amount of WAL generated by the node in bytes |
jit_functions |
bigint | Total number of functions JIT-compiled by the node |
jit_generation_time |
double precision | Total time spent by the node on generating JIT code, in milliseconds |
jit_inlining_time |
double precision | Total time spent by the node on inlining functions, in milliseconds |
jit_optimization_time |
double precision | Total time spent by the node on optimizing, in milliseconds |
startup |
bigint | Time to the first tuple in nanoseconds |
parameters |
text[] | Value of the query's parameters |
deparse_info |
text | Information extracted from deparsing a plan node |
The statistics of the pg_tracing extension itself are tracked and made available through pg_tracing_info
. This function will only return a single row.
Column | Type | Description |
---|---|---|
processed_traces |
bigint | Total number of traces processed |
processed_spans |
bigint | Total number of spans processed |
dropped_traces |
bigint | Total number of traces dropped due to exceeding pg_tracing.max_span spans |
dropped_spans |
bigint | Total number of spans dropped due to exceeding pg_tracing.max_span spans |
otel_sent_spans |
bigint | Total number of spans dropped successfully sent to the otel collector |
otel_failures |
bigint | Total number of failures to send spans to the otel collector |
last_consume |
timestamp with time zone | Time at which spans were last consumed |
stats_reset |
timestamp with time zone | Time at which all statistics in the pg_tracing_info view were last reset |
Discards all statistics gathered so far by pg_tracing
. Span buffer is not emptied by this function. By default, this function can only be executed by superusers. Access may be granted to others using GRANT
.
The pg_tracing_consume_spans
and pg_tracing_peek_spans
views are defined in terms of a function named pg_tracing_spans
. It is possible for clients to call the pg_tracing_spans
function directly. By calling pg_tracing_spans
with consume false, spans won't be removed from the span buffer. With consume true, read spans will be deleted. By default, this function can only be executed by superusers. Access may be granted to others using GRANT
.
Controls span buffer's behaviour when pg_tracing.max_span
spans is reached. If keep_on_full
, the existing buffer is kept while new spans are dropped. If drop_on_full
, the existing buffer is dropped and new spans are added. The default value is keep_on_full
.
Controls the fraction of statements with SQLCommenter tracecontext and an enabled sampled flag that will generate spans. The default value is 1.
Controls whether the query plan should be deparsed. Deparsing plan allows to add more details in span's name. A BitmapHeapScan
node may have Recheck Cond: (a = 1)
as deparsing information. The default value is on
.
Controls whether the query's parameters should be exported in spans metadata. The default value is on
.
Restrict sampling to the provided queryIds. An empty value won't filter any queries. The default value is empty.
Controls the maximum size of the parameter string. The default value is 1024.
Specifies the maximum number of spans stored by the extension. If more spans are generated, the span buffer will be emptied if pg_tracing.buffer_mode
is set to drop_on_full
. If pg_tracing.buffer_mode
is set to keep_on_full
, the new spans will be dropped and tracing will be aborted. The default value is 5000. This parameter can only be set at server start.
Note
A span takes around 370 bytes of memory so 5000 spans will use 1.7MB while 10000 spans will use 3.5MB. You can use the following query to get the exact size used by the pg_tracing's spans:
SELECT
(SELECT setting from pg_settings where name='pg_tracing.max_span') AS max_span,
pg_size_pretty(size) AS total_span_size,
pg_size_pretty(size/(SELECT setting from pg_settings
WHERE name='pg_tracing.max_span')::int)
AS size_per_span
FROM pg_shmem_allocations
WHERE name ='PgTracing Spans';
Controls if spans should be generated from the executed query plan. The default value is true.
Controls the fraction of statements that generate spans. Statements with tracecontext propagated with SQLCommenter and sampled flag enabled are not impacted by this parameter. For traces with nested statements, either all will be explained or none. The default value is 0.
Controls whether spans should be generated for workers created by parallel queries. The default value is true.
Controls which statements should be generated. Specify top
to only generate spans for top-level statements (those issued directly by clients), all
to also generate spans for nested statements (such as statements invoked within functions), or none
to disable span generation. The default value is all
.
Controls whether spans should be generated for utility statements. Utility commands are all those other than SELECT
, INSERT
, UPDATE
, DELETE
, TABLE
, and MERGE
. The default value is on
.
Controls the fraction of statements that generate spans. Statements with tracecontext propagated with SQLCommenter and sampled flag enabled are not impacted by this parameter. For traces with nested statements, either all will be explained or none. The default value is 0.
Controls the fraction of statements with SQLCommenter tracecontext and an enabled sampled flag that will generate spans. The default value is 1.
URL of the otel collector to send spans to. Example: 'http://127.0.0.1:4318/v1/traces'. This parameter can only be set at server start. The default value is NULL.
Interval in milliseconds between upload of spans to the otel collector. This parameter can only be set at server start.
Service name to set in traces sent to otel collector. This parameter can only be set at server start.
Maximum time in milliseconds to connect to the otel collector. This includes DNS resolution and protocol handshake. This parameter can only be set at server start.