From 06a67d41240bf06245431441cd9af9a6bcd0a031 Mon Sep 17 00:00:00 2001 From: Enrico Zimuel Date: Mon, 2 Dec 2024 15:49:51 +0100 Subject: [PATCH] Added docs + spans --- docs/attributes-registry/db.md | 48 ++++++-- docs/database/README.md | 1 + docs/database/vector.md | 197 +++++++++++++++++++++++++++++++++ model/database/registry.yaml | 9 ++ model/database/spans.yaml | 78 +++++++++++++ 5 files changed, 325 insertions(+), 8 deletions(-) create mode 100644 docs/database/vector.md diff --git a/docs/attributes-registry/db.md b/docs/attributes-registry/db.md index cbde20d431..3a2b040da6 100644 --- a/docs/attributes-registry/db.md +++ b/docs/attributes-registry/db.md @@ -30,9 +30,10 @@ This group defines the attributes used to describe telemetry in the context of d | `db.operation.parameter.` | string | A database operation parameter, with `` being the parameter name, and the attribute value being a string representation of the parameter value. [5] | `someval`; `55` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `db.query.summary` | string | Low cardinality representation of a database query text. [6] | `SELECT wuser_table`; `INSERT shipping_details SELECT orders`; `get user by id` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `db.query.text` | string | The database query being executed. [7] | `SELECT * FROM wuser_table where username = ?`; `SET mykey ?` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `db.record.id` | string | The ID of the record [8] | `1`; `5c56c793-69f3-4fbf-87e6-c4bf54c28c26` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `db.response.returned_rows` | int | Number of rows returned by the operation. | `10`; `30`; `1000` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `db.response.status_code` | string | Database response status code. [8] | `102`; `ORA-17002`; `08P01`; `404` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `db.system` | string | The database management system (DBMS) product as identified by the client instrumentation. [9] | `other_sql`; `adabas`; `cache` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `db.response.status_code` | string | Database response status code. [9] | `102`; `ORA-17002`; `08P01`; `404` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `db.system` | string | The database management system (DBMS) product as identified by the client instrumentation. [10] | `other_sql`; `adabas`; `cache` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | **[1] `db.collection.name`:** It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. @@ -78,11 +79,13 @@ For batch operations, if the individual operations are known to have the same qu Even though parameterized query text can potentially have sensitive data, by using a parameterized query the user is giving a strong signal that any sensitive data will be passed as parameter values, and the benefit to observability of capturing the static part of the query text by default outweighs the risk. This attribute has stability level RELEASE CANDIDATE. -**[8] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes. +**[8] `db.record.id`:** This can be also the ID of the vector, in case of vector database. + +**[9] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes. Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system. This attribute has stability level RELEASE CANDIDATE. -**[9] `db.system`:** The actual DBMS may differ from the one identified by the client. For example, when using PostgreSQL client libraries to connect to a CockroachDB, the `db.system` is set to `postgresql` based on the instrumentation's best knowledge. +**[10] `db.system`:** The actual DBMS may differ from the one identified by the client. For example, when using PostgreSQL client libraries to connect to a CockroachDB, the `db.system` is set to `postgresql` based on the instrumentation's best knowledge. This attribute has stability level RELEASE CANDIDATE. --- @@ -195,12 +198,12 @@ This group defines attributes for Azure Cosmos DB. | `db.cosmosdb.client_id` | string | Unique Cosmos client instance id. | `3ba4827d-4422-483f-b59f-85b74211c11d` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `db.cosmosdb.connection_mode` | string | Cosmos client connection mode. | `gateway`; `direct` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `db.cosmosdb.consistency_level` | string | Account or request [consistency level](https://learn.microsoft.com/azure/cosmos-db/consistency-levels). | `Eventual`; `ConsistentPrefix`; `BoundedStaleness`; `Strong`; `Session` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `db.cosmosdb.regions_contacted` | string[] | List of regions contacted during operation in the order that they were contacted. If there is more than one region listed, it indicates that the operation was performed on multiple regions i.e. cross-regional call. [10] | `["North Central US", "Australia East", "Australia Southeast"]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `db.cosmosdb.regions_contacted` | string[] | List of regions contacted during operation in the order that they were contacted. If there is more than one region listed, it indicates that the operation was performed on multiple regions i.e. cross-regional call. [11] | `["North Central US", "Australia East", "Australia Southeast"]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `db.cosmosdb.request_charge` | double | Request units consumed for the operation. | `46.18`; `1.0` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `db.cosmosdb.request_content_length` | int | Request payload size in bytes. | | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `db.cosmosdb.sub_status_code` | int | Cosmos DB sub status code. | `1000`; `1002` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -**[10] `db.cosmosdb.regions_contacted`:** Region name matches the format of `displayName` in [Azure Location API](https://learn.microsoft.com/rest/api/subscription/subscriptions/list-locations?view=rest-subscription-2021-10-01&tabs=HTTP#location) +**[11] `db.cosmosdb.regions_contacted`:** Region name matches the format of `displayName` in [Azure Location API](https://learn.microsoft.com/rest/api/subscription/subscriptions/list-locations?view=rest-subscription-2021-10-01&tabs=HTTP#location) --- @@ -230,9 +233,38 @@ This group defines attributes for Elasticsearch. | Attribute | Type | Description | Examples | Stability | |---|---|---|---|---| | `db.elasticsearch.node.name` | string | Represents the human-readable identifier of the node/instance to which a request was routed. | `instance-0000000001` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `db.elasticsearch.path_parts.` | string | A dynamic value in the url path. [11] | `db.elasticsearch.path_parts.index=test-index`; `db.elasticsearch.path_parts.doc_id=123` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `db.elasticsearch.path_parts.` | string | A dynamic value in the url path. [12] | `db.elasticsearch.path_parts.index=test-index`; `db.elasticsearch.path_parts.doc_id=123` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + +**[12] `db.elasticsearch.path_parts`:** Many Elasticsearch url paths allow dynamic values. These SHOULD be recorded in span attributes in the format `db.elasticsearch.path_parts.`, where `` is the url path part name. The implementation SHOULD reference the [elasticsearch schema](https://raw.githubusercontent.com/elastic/elasticsearch-specification/main/output/schema/schema.json) in order to map the path part values to their names. + +## Search attributes + +This group defines attributes for Search. + +| Attribute | Type | Description | Examples | Stability | +|---|---|---|---|---| +| `db.search.similarity_metric` | string | The metric used in similarity search. | `cosine` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + +--- -**[11] `db.elasticsearch.path_parts`:** Many Elasticsearch url paths allow dynamic values. These SHOULD be recorded in span attributes in the format `db.elasticsearch.path_parts.`, where `` is the url path part name. The implementation SHOULD reference the [elasticsearch schema](https://raw.githubusercontent.com/elastic/elasticsearch-specification/main/output/schema/schema.json) in order to map the path part values to their names. +`db.search.similarity_metric` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `cosine` | The cosine metric. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `dot` | The dot product metric. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `euclidean` | The euclidean distance metric. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `manhattan` | The Manhattan distance metric. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + +## Db Vector Attributes + +This group defines attributes for vector databases. + +| Attribute | Type | Description | Examples | Stability | +|---|---|---|---|---| +| `db.vector.dimension_count` | int | The dimension of the vector. | `3` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `db.vector.field_name` | string | The name field as of the vector (e.g. a field name). | `vector` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `db.vector.query.top_k` | int | The top-k most similar vectors returned by a query. | `5` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | ## Deprecated Database Attributes diff --git a/docs/database/README.md b/docs/database/README.md index 361708dbe7..dc24befff8 100644 --- a/docs/database/README.md +++ b/docs/database/README.md @@ -55,5 +55,6 @@ Technology specific semantic conventions are defined for the following databases * [MSSQL](mssql.md): Semantic Conventions for *MSSQL*. * [Redis](redis.md): Semantic Conventions for *Redis*. * [SQL](sql.md): Semantic Conventions for *SQL* databases. +* [Vector DB](vector.md): Semantic Conventions for *Vector* databases. [DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status diff --git a/docs/database/vector.md b/docs/database/vector.md new file mode 100644 index 0000000000..b24ed81ca3 --- /dev/null +++ b/docs/database/vector.md @@ -0,0 +1,197 @@ + + +# Semantic Conventions for Vector databases + +**Status**: [Experimental][DocumentStatus] + +The Vector databases Semantic Conventions describes how common [Database Semantic Conventions](database-spans.md) apply to Vector databases. + +The following database systems (defined in the [`db.system`](./database-spans.md#notes-and-well-known-identifiers-for-dbsystem) set) are known to support vectors: + +- `cosmosdb` (technical preview) +- `elasticsearch` +- `postgresql` (using [pgvector](https://github.com/pgvector/pgvector)) +- `redis` + +Many other database systems support vector search. +Instrumentations applied to generic vector databases SHOULD adhere to vector db semantic conventions. + +## Span Name + +The **span name** follows the [general database span name guidelines](database-spans.md#name) with the endpoint identifier stored in `db.operation.name`, and the index stored in `db.collection.name`. + +## Attributes + + + + + + + + +| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | +|---|---|---|---|---|---| +| [`db.operation.name`](/docs/attributes-registry/db.md) | string | The operation to be performed on the vector database (e.g. build an index/collection) [1] | `build`; `insert`; `search`; `delete` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`http.request.method`](/docs/attributes-registry/http.md) | string | HTTP request method. [2] | `GET`; `POST`; `HEAD` | `Required` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`url.full`](/docs/attributes-registry/url.md) | string | Absolute URL describing a network resource according to [RFC3986](https://www.rfc-editor.org/rfc/rfc3986) [3] | `http://localhost:19530/v2/vectordb/entities/search` | `Required` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`db.record.id`](/docs/attributes-registry/db.md) | string | the ID of the record (e.g. the ID of the vector) [4] | `1`; `5c56c793-69f3-4fbf-87e6-c4bf54c28c26` | `Conditionally Required` If available. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | The HTTP response code returned by the vector database. [5] | `200`; `201`; `429` | `Conditionally Required` If response was received. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.search.similarity_metric`](/docs/attributes-registry/db.md) | string | specify the metric used in similarity search (e.g. cosine) [6] | `cosine`; `dot`; `euclidean`; `manhattan` | `Conditionally Required` If available. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.vector.field_name`](/docs/attributes-registry/db.md) | string | the name field of the vector embedding [7] | `image_vector`; `embedding_field` | `Conditionally Required` If available. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.vector.query.top_k`](/docs/attributes-registry/db.md) | int | the top-k most similar vectors returned by a query [8] | `10` | `Conditionally Required` If available. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [9] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [10] | `80`; `8080`; `443` | `Conditionally Required` [11] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`db.collection.name`](/docs/attributes-registry/db.md) | string | The index or data stream against which the query is executed. [12] | `my_index`; `index1, index2` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.operation.batch.size`](/docs/attributes-registry/db.md) | int | The number of queries included in a batch operation. [13] | `2`; `3`; `4` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.query.text`](/docs/attributes-registry/db.md) | string | The database query being executed. [14] | `"{\"collectionName\":\"my_collection\", \"data\": [[-5, 9, -12]], \"limit\": 5, \"annsField\": \"vector\"}"` | `Recommended` [15] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.vector.dimension_count`](/docs/attributes-registry/db.md) | int | the dimension of the vector (e.g. 1536) [16] | `3`; `1536` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`server.address`](/docs/attributes-registry/server.md) | string | Name of the database host. [17] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | + +**[1] `db.operation.name`:** The `db.operation.name` SHOULD match the endpoint identifier provided in the request (e.g. build, insert, update, delete, search). + +**[2] `http.request.method`:** HTTP request method value SHOULD be "known" to the instrumentation. +By default, this convention defines "known" methods as the ones listed in [RFC9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-methods) +and the PATCH method defined in [RFC5789](https://www.rfc-editor.org/rfc/rfc5789.html). + +If the HTTP request method is not known to instrumentation, it MUST set the `http.request.method` attribute to `_OTHER`. + +If the HTTP instrumentation could end up converting valid HTTP request methods to `_OTHER`, then it MUST provide a way to override +the list of known HTTP methods. If this override is done via environment variable, then the environment variable MUST be named +OTEL_INSTRUMENTATION_HTTP_KNOWN_METHODS and support a comma-separated list of case-sensitive known HTTP methods +(this list MUST be a full override of the default known method, it is not a list of known methods in addition to the defaults). + +HTTP method names are case-sensitive and `http.request.method` attribute value MUST match a known HTTP method name exactly. +Instrumentations for specific web frameworks that consider HTTP methods to be case insensitive, SHOULD populate a canonical equivalent. +Tracing instrumentations that do so, MUST also set `http.request.method_original` to the original value. + +**[3] `url.full`:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment +is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless. + +`url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`. +In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`. + +`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). + +Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it. + +![Experimental](https://img.shields.io/badge/-experimental-blue) +Query string values for the following keys SHOULD be redacted by default and replaced by the +value `REDACTED`: + +* [`AWSAccessKeyId`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) +* [`Signature`](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#RESTAuthenticationQueryStringAuth) +* [`sig`](https://learn.microsoft.com/azure/storage/common/storage-sas-overview#sas-token) +* [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls) + +This list is subject to change over time. + +When a query string value is redacted, the query string key SHOULD still be preserved, e.g. +`https://www.example.com/path?color=blue&sig=REDACTED`. + +**[4] `db.record.id`:** Some vector databases identify a vector using an ID. + +**[5] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes. +Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system. +This attribute has stability level RELEASE CANDIDATE. + +**[6] `db.search.similarity_metric`:** Some vector databases allow specifying the similarity search during a search, while others only allow it when creating an index or collection. + +**[7] `db.vector.field_name`:** Some vector databases use a field name to store the vector. + +**[8] `db.vector.query.top_k`:** The top-k parameter is usually specified when executing a vector search (i.e. query) + +**[9] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred. +When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred. +Instrumentations SHOULD document how `error.type` is populated. + +**[10] `server.port`:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. + +**[11] `server.port`:** If using a port other than the default port for this DBMS and if `server.address` is set. + +**[12] `db.collection.name`:** The query may target multiple indices or data streams, in which case it SHOULD be a comma separated list of those. If the query doesn't target a specific index, this field MUST NOT be set. + +**[13] `db.operation.batch.size`:** Operations are only considered batches when they contain two or more operations, and so `db.operation.batch.size` SHOULD never be `1`. +This attribute has stability level RELEASE CANDIDATE. + +**[14] `db.query.text`:** For sanitization see [Sanitization of `db.query.text`](../../docs/database/database-spans.md#sanitization-of-dbquerytext). +For batch operations, if the individual operations are known to have the same query text then that query text SHOULD be used, otherwise all of the individual query texts SHOULD be concatenated with separator `; ` or some other database system specific separator if more applicable. +Even though parameterized query text can potentially have sensitive data, by using a parameterized query the user is giving a strong signal that any sensitive data will be passed as parameter values, and the benefit to observability of capturing the static part of the query text by default outweighs the risk. +This attribute has stability level RELEASE CANDIDATE. + +**[15] `db.query.text`:** Should be collected by default for search-type queries and only if there is sanitization that excludes sensitive information and vector, if the size is too long. + +**[16] `db.vector.dimension_count`:** The dimension of a vector is typically defined when building an index/collection. + +**[17] `server.address`:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available. + +The following attributes can be important for making sampling decisions +and SHOULD be provided **at span creation time** (if provided at all): + +* [`db.collection.name`](/docs/attributes-registry/db.md) +* [`db.operation.name`](/docs/attributes-registry/db.md) +* [`db.query.text`](/docs/attributes-registry/db.md) +* [`http.request.method`](/docs/attributes-registry/http.md) +* [`server.address`](/docs/attributes-registry/server.md) +* [`server.port`](/docs/attributes-registry/server.md) +* [`url.full`](/docs/attributes-registry/url.md) + +--- + +`db.search.similarity_metric` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `cosine` | The cosine metric. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `dot` | The dot product metric. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `euclidean` | The euclidean distance metric. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `manhattan` | The Manhattan distance metric. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + +--- + +`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | + +--- + +`http.request.method` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `_OTHER` | Any HTTP method that the instrumentation has no prior knowledge of. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| `CONNECT` | CONNECT method. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| `DELETE` | DELETE method. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| `GET` | GET method. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| `HEAD` | HEAD method. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| `OPTIONS` | OPTIONS method. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| `PATCH` | PATCH method. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| `POST` | POST method. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| `PUT` | PUT method. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| `TRACE` | TRACE method. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | + + + + + + +## Example + +| Key | Value | +|:------------------------------|:--------------------------------------------------------------------------------------------------------------| +| Span name | `"search my-collection"` | +| `db.operation.name` | `"search"` | +| `db.server.address` | `"localhost"` | +| `db.server.port` | `19530` | +| `http.request.method` | `"POST"` | +| `url.full` | `"http://localhost:19530/v2/vectordb/entities/search"` | +| `db.collection.name` | `"my-collection"` | +| `db.query.text` | `"{\"collectionName\":\"my_collection\", \"data\": [[-5, 9, -12]], \"limit\": 5, \"annsField\": \"vector\"}"` | +| `db.search.similarity_metric` | `"cosine"` | +| `db.vector.query.top_k` | `5` | +| `db.vector.field_name` | `"vector"` | + +[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status diff --git a/model/database/registry.yaml b/model/database/registry.yaml index 4c9e72854d..9de5534a65 100644 --- a/model/database/registry.yaml +++ b/model/database/registry.yaml @@ -57,6 +57,15 @@ groups: This attribute has stability level RELEASE CANDIDATE. examples: ["findAndModify", "HMSET", "SELECT"] + - id: db.record.id + type: string + stability: experimental + brief: > + The ID of the record + note: > + This can be also the ID of the vector, in case of vector database. + examples: + ["1", "5c56c793-69f3-4fbf-87e6-c4bf54c28c26"] - id: db.query.text type: string stability: experimental # RELEASE CANDIDATE diff --git a/model/database/spans.yaml b/model/database/spans.yaml index df0a993543..466f568c9f 100644 --- a/model/database/spans.yaml +++ b/model/database/spans.yaml @@ -764,3 +764,81 @@ groups: - ref: db.cosmosdb.regions_contacted requirement_level: conditionally_required: If available. + + - id: span.db.vector.client + type: span + stability: experimental + span_kind: client + extends: trace.db.common.minimal + brief: > + Attributes for vector databases + attributes: + - ref: http.request.method + sampling_relevant: true + requirement_level: required + - ref: url.full + sampling_relevant: true + requirement_level: required + examples: [ 'http://localhost:19530/v2/vectordb/entities/search' ] + - ref: db.operation.name + requirement_level: required + note: > + The `db.operation.name` SHOULD match the endpoint identifier provided in the request + (e.g. build, insert, update, delete, search). + brief: The operation to be performed on the vector database (e.g. build an index/collection) + examples: ['build', 'insert', 'search', 'delete'] + - ref: db.query.text + sampling_relevant: true + requirement_level: + recommended: > + Should be collected by default for search-type queries and only if there is sanitization that excludes + sensitive information and vector, if the size is too long. + examples: ['"{\"collectionName\":\"my_collection\", \"data\": [[-5, 9, -12]], \"limit\": 5, \"annsField\": \"vector\"}"'] + - ref: db.collection.name + sampling_relevant: true + requirement_level: recommended + brief: The index or data stream against which the query is executed. + note: > + The query may target multiple indices or data streams, in which case it SHOULD be a comma separated list of those. + If the query doesn't target a specific index, this field MUST NOT be set. + examples: ['my_index', 'index1, index2'] + - ref: db.search.similarity_metric + brief: specify the metric used in similarity search (e.g. cosine) + note: > + Some vector databases allow specifying the similarity search during a search, while others only allow it when creating an index or collection. + requirement_level: + conditionally_required: If available. + examples: ['cosine', 'dot', 'euclidean', 'manhattan'] + - ref: db.record.id + brief: the ID of the record (e.g. the ID of the vector) + note: > + Some vector databases identify a vector using an ID. + requirement_level: + conditionally_required: If available. + examples: ['1', '5c56c793-69f3-4fbf-87e6-c4bf54c28c26'] + - ref: db.vector.field_name + brief: the name field of the vector embedding + note: > + Some vector databases use a field name to store the vector. + requirement_level: + conditionally_required: If available. + examples: ['image_vector', 'embedding_field'] + - ref: db.vector.dimension_count + brief: the dimension of the vector (e.g. 1536) + note: > + The dimension of a vector is typically defined when building an index/collection. + requirement_level: recommended + examples: [3, 1536] + - ref: db.vector.query.top_k + brief: the top-k most similar vectors returned by a query + note: > + The top-k parameter is usually specified when executing a vector search (i.e. query) + requirement_level: + conditionally_required: If available. + examples: [10] + - ref: db.response.status_code + brief: > + The HTTP response code returned by the vector database. + examples: [200, 201, 429] + requirement_level: + conditionally_required: If response was received.