diff --git a/docs/attributes-registry/gen-ai.md b/docs/attributes-registry/gen-ai.md index 0dc935e462..67f5d8cc55 100644 --- a/docs/attributes-registry/gen-ai.md +++ b/docs/attributes-registry/gen-ai.md @@ -17,8 +17,9 @@ This document defines the attributes used to describe telemetry in the context o | Attribute | Type | Description | Examples | Stability | | ---------------------------------- | -------- | ------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------- | ---------------------------------------------------------------- | | `gen_ai.completion` | string | The full response received from the GenAI model. [1] | `[{'role': 'assistant', 'content': 'The capital of France is Paris.'}]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.operation.name` | string | The name of the operation being performed. [2] | `chat`; `text_completion` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.prompt` | string | The full prompt sent to the GenAI model. [3] | `[{'role': 'user', 'content': 'What is the capital of France?'}]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.evaluation.score` | double | The score calculated by the evaluator for the GenAI response. [2] | `0.42` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.operation.name` | string | The name of the operation being performed. [3] | `chat`; `text_completion` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.prompt` | string | The full prompt sent to the GenAI model. [4] | `[{'role': 'user', 'content': 'What is the capital of France?'}]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.request.frequency_penalty` | double | The frequency penalty setting for the GenAI request. | `0.1` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.request.max_tokens` | int | The maximum number of tokens the model generates for a request. | `100` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.request.model` | string | The name of the GenAI model a request is being made to. | `gpt-4` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | @@ -30,18 +31,20 @@ This document defines the attributes used to describe telemetry in the context o | `gen_ai.response.finish_reasons` | string[] | Array of reasons the model stopped generating tokens, corresponding to each generation received. | `["stop"]`; `["stop", "length"]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.response.model` | string | The name of the model that generated the response. | `gpt-4-0613` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.system` | string | The Generative AI product as identified by the client or server instrumentation. [4] | `openai` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.system` | string | The Generative AI product as identified by the client or server instrumentation. [5] | `openai` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.token.type` | string | The type of token being counted. | `input`; `output` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.usage.input_tokens` | int | The number of tokens used in the GenAI input (prompt). | `100` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.usage.output_tokens` | int | The number of tokens used in the GenAI response (completion). | `180` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | **[1]:** It's RECOMMENDED to format completions as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) -**[2]:** If one of the predefined values applies, but specific system uses a different name it's RECOMMENDED to document it in the semantic conventions for specific GenAI system and use system-specific name in the instrumentation. If a different name is not documented, instrumentation libraries SHOULD use applicable predefined value. +**[2]:** Semantic conventions describing GenAI evaluation telemetry SHOULD document the scoring system and method used to calculate the score. -**[3]:** It's RECOMMENDED to format prompts as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) +**[3]:** If one of the predefined values applies, but specific system uses a different name it's RECOMMENDED to document it in the semantic conventions for specific GenAI system and use system-specific name in the instrumentation. If a different name is not documented, instrumentation libraries SHOULD use applicable predefined value. -**[4]:** The `gen_ai.system` describes a family of GenAI models with specific model identified +**[4]:** It's RECOMMENDED to format prompts as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) + +**[5]:** The `gen_ai.system` describes a family of GenAI models with specific model identified by `gen_ai.request.model` and `gen_ai.response.model` attributes. The actual GenAI product may differ from the one identified by the client. diff --git a/docs/gen-ai/gen-ai-evaluation-events.md b/docs/gen-ai/gen-ai-evaluation-events.md new file mode 100644 index 0000000000..e2655d7fc1 --- /dev/null +++ b/docs/gen-ai/gen-ai-evaluation-events.md @@ -0,0 +1,50 @@ + + + +# Semantic Conventions for GenAI evaluation events + +**Status**: [Experimental][DocumentStatus] + +Each evaluation event defines a common way to report an evaluation score and the context for this specific evaluation method. + +## Naming pattern + +The evaluation events follow `gen_ai.evaluation.{evaluation method}` naming pattern. +For example, evaluations that are common across different GenAI models and framework tooling, such as user feedback should be reported as `gen_ai.evaluation.user_feedback`. + +GenAI vendor-specific evaluation events SHOULD follow `gen_ai.{gen_ai.system}.evaluation.{evaluation method}` pattern. + +## User feedback evaluation + +The user feedback evaluation event SHOULD be captured if and only if user provided a reaction to GenAI model response. +It SHOULD, when possible, be parented to the GenAI span describing such response. + + + + + + + + +The event name MUST be `gen_ai.evaluation.user_feedback`. + +| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | +|---|---|---|---|---|---| +| [`gen_ai.response.id`](/docs/attributes-registry/gen-ai.md) | string | The unique identifier for the completion. | `chatcmpl-123` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.evaluation.score`](/docs/attributes-registry/gen-ai.md) | double | Quantified score calculated based on the user reaction in [-1.0, 1.0] range with 0 representing a neutral reaction. | `0.42` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + + + + + + + +The user feedback event body has the following structure: + +| Body Field | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| `comment` | string | Additional details about the user feedback | `"I did not like it"` | `Opt-in` | + +[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status \ No newline at end of file diff --git a/docs/gen-ai/gen-ai-spans.md b/docs/gen-ai/gen-ai-spans.md index 0a3eec44b4..59d7fbc524 100644 --- a/docs/gen-ai/gen-ai-spans.md +++ b/docs/gen-ai/gen-ai-spans.md @@ -175,4 +175,4 @@ The event name MUST be `gen_ai.content.completion`. -[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md +[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status \ No newline at end of file diff --git a/model/gen-ai/events.yaml b/model/gen-ai/events.yaml new file mode 100644 index 0000000000..94281587ff --- /dev/null +++ b/model/gen-ai/events.yaml @@ -0,0 +1,43 @@ +groups: + - id: gen_ai.content.prompt + name: gen_ai.content.prompt + stability: experimental + type: event + brief: > + In the lifetime of an GenAI span, events for prompts sent and completions received + may be created, depending on the configuration of the instrumentation. + attributes: + - ref: gen_ai.prompt + requirement_level: + conditionally_required: if and only if corresponding event is enabled + note: > + It's RECOMMENDED to format prompts as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) + + - id: gen_ai.content.completion + name: gen_ai.content.completion + type: event + stability: experimental + brief: > + In the lifetime of an GenAI span, events for prompts sent and completions received + may be created, depending on the configuration of the instrumentation. + attributes: + - ref: gen_ai.completion + requirement_level: + conditionally_required: if and only if corresponding event is enabled + note: > + It's RECOMMENDED to format completions as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) + + - id: gen_ai.evaluation.user_feedback + name: gen_ai.evaluation.user_feedback + type: event + stability: experimental + brief: > + This event describes the evaluation of GenAI response based on the user feedback. + attributes: + - ref: gen_ai.response.id + requirement_level: required + - ref: gen_ai.evaluation.score + brief: > + Quantified score calculated based on the user reaction in [-1.0, 1.0] range with 0 representing a neutral reaction. + note: "" + requirement_level: recommended diff --git a/model/gen-ai/registry.yaml b/model/gen-ai/registry.yaml index 5b3d1cff79..fe35a66216 100644 --- a/model/gen-ai/registry.yaml +++ b/model/gen-ai/registry.yaml @@ -1,6 +1,7 @@ groups: - id: registry.gen_ai type: attribute_group + stability: experimental display_name: GenAI Attributes brief: > This document defines the attributes used to describe telemetry in the context of Generative Artificial Intelligence (GenAI) Models requests and responses. @@ -148,8 +149,18 @@ groups: If one of the predefined values applies, but specific system uses a different name it's RECOMMENDED to document it in the semantic conventions for specific GenAI system and use system-specific name in the instrumentation. If a different name is not documented, instrumentation libraries SHOULD use applicable predefined value. + - id: gen_ai.evaluation.score + stability: experimental + type: double + brief: The score calculated by the evaluator for the GenAI response. + note: > + Semantic conventions describing GenAI evaluation telemetry SHOULD document + the scoring system and method used to calculate the score. + examples: [0.42] + - id: registry.gen_ai.openai type: attribute_group + stability: experimental display_name: OpenAI Attributes brief: > Thie group defines attributes for OpenAI. diff --git a/model/gen-ai/spans.yaml b/model/gen-ai/spans.yaml index d634d94473..7a4382bf33 100644 --- a/model/gen-ai/spans.yaml +++ b/model/gen-ai/spans.yaml @@ -58,32 +58,6 @@ groups: - gen_ai.content.prompt - gen_ai.content.completion - - id: gen_ai.content.prompt - name: gen_ai.content.prompt - type: event - brief: > - In the lifetime of an GenAI span, events for prompts sent and completions received - may be created, depending on the configuration of the instrumentation. - attributes: - - ref: gen_ai.prompt - requirement_level: - conditionally_required: if and only if corresponding event is enabled - note: > - It's RECOMMENDED to format prompts as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) - - - id: gen_ai.content.completion - name: gen_ai.content.completion - type: event - brief: > - In the lifetime of an GenAI span, events for prompts sent and completions received - may be created, depending on the configuration of the instrumentation. - attributes: - - ref: gen_ai.completion - requirement_level: - conditionally_required: if and only if corresponding event is enabled - note: > - It's RECOMMENDED to format completions as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) - - id: trace.gen_ai.client extends: trace.gen_ai.client.common brief: >