Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenAI: define conventions for embeddings operations #1603

Merged
merged 12 commits into from
Nov 27, 2024

Conversation

trentm
Copy link
Contributor

@trentm trentm commented Nov 21, 2024

Refs: #1174


Many LLMs support an Embeddings API, for example:

This proposal defines OpenTelemetry semantic conventions to use for instrumenting Embeddings API client usage.

Overview

A span will be created for Embeddings API calls. The only differences from existing chat spans are:

  • A new embeddings value is added for attribute gen_ai.operation.name.
  • A new gen_ai.request.encoding_formats attribute is defined. It is only relevant for embeddings operations.
  • Many of the "recommended" span attributes do not apply to embeddings operations. Perhaps that is fine as currently defined. (Open question: should a note: ... be added or if applicable. added to the brief for those span attributes?)

No new events are proposed. See "Notes" below.

For metrics:

  • The existing gen_ai.client.token.usage metric applies, with the note that output tokens do not apply for embeddings operations, so the metric would only be recorded with the gen_ai.token.type: 'input' attribute (as already specified).
  • The existing gen_ai.client.operation.duration metric applies as currently specified.

Example span

This an example embeddings operation span using the openai client library and the OTel JS ConsoleSpanExporter:

{
  traceId: '1dea787b9cd7bb895aee5bb74090610d',
  parentId: undefined,
  traceState: undefined,
  name: 'embeddings text-embedding-ada-002',
  id: '65aafaf145cf535a',
  kind: 2,
  timestamp: 1732229356238000,
  duration: 551555.417,
  attributes: {
    'gen_ai.operation.name': 'embeddings',
    'gen_ai.request.model': 'text-embedding-ada-002',
    'gen_ai.system': 'openai',
    'gen_ai.request.encoding_formats': [ 'float' ],
    'gen_ai.response.model': 'text-embedding-ada-002',
    'gen_ai.usage.input_tokens': 9
  },
  status: { code: 0 },
  events: [],
  links: []
}

Notes

This section contains supporting notes and reasoning for some of the proposed values.

  • A span attribute for the dimensions parameter (in the OpenAI API) was considered, but dropped as likely not being useful. Happy to revisit that if others know of a reasonable use case.

  • A (log) event to record the input strings to the Embeddings API call is not being proposed. The reasoning is that API calls for embeddings are expected to be higher volume, and the input strings less valuable to application observability than chat content, so the cost-benefit ratio is much less. If a good use case is presented for optionally recording Embeddings input strings, then this can be revisited.

  • Recording Embeddings API response vectors in telemetry is not being proposed. Vectors are large and would not be useful for observability.

  • The operation name embeddings was selected. Other possible options considered:

    1. embeddings
    2. embedding
    3. embed

    My inclination is embeddings to match the (OpenAI) API name: embeddings.create.
    Cohere's API name is embed.
    LangTrace uses embed.
    Current semconv values are chat (e.g. for openai.chat.completions.create()) and text_completion (for the deprecated openai.chat.create()).

  • The Cohere and Anthropic APIs have a request attribute input_type, for creating embeddings for inputs other than text. This might be worth considering adding as well. I have not currently proposed this because I have only prototyped with OpenAI.

CONTRIBUTING.md Show resolved Hide resolved
model/gen-ai/registry.yaml Outdated Show resolved Hide resolved
model/gen-ai/registry.yaml Outdated Show resolved Hide resolved
@trentm trentm marked this pull request as ready for review November 22, 2024 17:22
@trentm trentm requested review from a team as code owners November 22, 2024 17:22
Copy link
Contributor

@lmolkova lmolkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few minor comments, looks great otherwise!

model/gen-ai/registry.yaml Show resolved Hide resolved
model/gen-ai/spans.yaml Show resolved Hide resolved
@trentm trentm requested a review from lmolkova November 22, 2024 20:36
Copy link
Contributor

@lmolkova lmolkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@trentm
Copy link
Contributor Author

trentm commented Nov 22, 2024

^^ the check failure is:

   ERROR: 1 dead links found in ./docs/system/system-metrics.md !
  [✖] https://blogs.oracle.com/linux/post/understanding-linux-kernel-memory-statistics → Status: 0
make: *** [Makefile:72: markdown-link-check] Error 1

I'm guessing it was a fluke server error response from that site. The URL exists for me. The check passes for me locally as well:

% make markdown-link-check
semantic-conventions2@ /Users/trentm/tm/semantic-conventions2
└── [email protected]

I don't have permissions to re-run that workflow run.

@lmolkova
Copy link
Contributor

the link check is not required and oracle blog is the usual offender. We need another approval to merge though. @open-telemetry/semconv-genai-approvers ptal!

Copy link
Contributor

@karthikscale3 karthikscale3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting this

@lmolkova lmolkova merged commit 0c17ad5 into open-telemetry:main Nov 27, 2024
14 checks passed
@trentm trentm deleted the tm-genai-embeddings branch November 27, 2024 19:17
@axiomofjoy
Copy link

Just wanted to add some thoughts on observability workflows for embeddings since they are important to our product at Arize. We use both embedding vectors and their associated text for debugging and troubleshooting workflows. Because embedding vectors map semantically similar content to nearby vectors, you can use them for observability workflows such as:

  • clustering to find semantically meaningful groups of user queries, and then measuring performance/ computing evaluation metrics on those clusters to gain granular understanding of your application performance on a meaningful subset of data
  • embedding-based similarity search to find semantically similar examples to a chosen example, e.g., for debugging or curating datasets for fine-tuning
  • understanding how query and corpus distributions differ (e.g., to identify out-of-distribution queries that likely do not have answers in the embedded corpus)

These workflows require both embedding vectors and associated content (text, image, etc.), which we include as semantic conventions in the OpenInference spec.

I've included a few resources below for additional context.

embeddings.mp4

xrmx added a commit to elastic/elastic-otel-python-instrumentations that referenced this pull request Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants