Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prompt_tokens_details not being populated correctly in response model #1252

Closed
2 tasks
hnalla opened this issue Dec 12, 2024 · 1 comment · Fixed by #1254
Closed
2 tasks

prompt_tokens_details not being populated correctly in response model #1252

hnalla opened this issue Dec 12, 2024 · 1 comment · Fixed by #1254
Labels
bug Something isn't working

Comments

@hnalla
Copy link
Contributor

hnalla commented Dec 12, 2024

  • [ - ] This is actually a bug report.
  • I am not getting good LLM Results
  • [ - ] I have tried asking for help in the community on discord or discussions and have not received a response.
  • [ - ] I have tried searching the documentation and have not found an answer.

What Model are you using?

  • [ -] gpt-3.5-turbo
  • [ -] gpt-4-turbo
  • [- ] gpt-4
  • Other (please specify)

Describe the bug
I'm unable to see how many tokens were cached when I try to use client.chat.completions.create_with_completion. completion.usage.prompt_token_details is always completion.usage.prompt_token_details

To Reproduce

import instructor
from time import sleep
from pydantic import BaseModel, field_validator
from openai import OpenAI
class FakeClass(BaseModel):
    insight: str
client = instructor.from_openai(OpenAI(api_key=OPENAI_API_KEY))
fake_str = "This is a Test string" * 1000

for x in range(0, 3):
    sleep(1)
    insight, completion = client.chat.completions.create_with_completion(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": fake_str,
            },
            {"role": "user", "content": "Test User Prompt"},
        ],
        max_retries=1,
        response_model=FakeClass,
    )
 
    print(completion.usage.prompt_token_details.model_dump())

I get this:

{'audio_tokens': 0, 'cached_tokens': 0}

Expected behavior
I expect to see how many token have been cached. I know OpenAI is caching it since I tried it without wrapping it around Instructor

open_ai_client = OpenAI(api_key=OPENAI_API_KEY)
open_ai_client.chat.completions.create

for x in range(0, 3):
    sleep(1)
    chat_completion = open_ai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": fake_str,
            },
            {"role": "user", "content": "Test User Prompt"},
        ],
    )

    cached_tokens = chat_completion.usage.prompt_tokens_details.cached_tokens
    if cached_tokens == 0:
        print("No cached tokens")
    else:
        print(f"Tokens used: {cached_tokens}")
        print(chat_completion.usage.prompt_tokens_details.model_dump())

I get this output

Tokens used: 4864
{'audio_tokens': 0, 'cached_tokens': 4864}

Screenshots
If applicable, add screenshots to help explain your problem.

@github-actions github-actions bot added the bug Something isn't working label Dec 12, 2024
@hnalla
Copy link
Contributor Author

hnalla commented Dec 12, 2024

This is happening due to a typo in

prompt_token_details = PromptTokensDetails(audio_tokens=0, cached_tokens=0)

Its currently:

 total_usage = CompletionUsage(completion_tokens=0, prompt_tokens=0, total_tokens=0,
        completion_tokens_details = CompletionTokensDetails(audio_tokens=0, reasoning_tokens=0),
        prompt_token_details = PromptTokensDetails(audio_tokens=0, cached_tokens=0)
    )

Its supposed to be:

total_usage = CompletionUsage(completion_tokens=0, prompt_tokens=0, total_tokens=0,
        completion_tokens_details = CompletionTokensDetails(audio_tokens=0, reasoning_tokens=0),
        prompt_tokens_details = PromptTokensDetails(audio_tokens=0, cached_tokens=0)
    )

notice the typo in prompt_token_details vs prompt_tokens_details
I have tested in my local and it fixes it.

Supporting docs from OpenAI: https://github.com/openai/openai-python/blob/6e1161bc3ed20eef070063ddd5ac52fd9a531e88/src/openai/types/completion_usage.py#L53

@hnalla hnalla changed the title prompt_tokens_details not being populated correctly in reponse model prompt_tokens_details not being populated correctly in response model Dec 12, 2024
hnalla added a commit to hnalla/instructor that referenced this issue Dec 12, 2024
@hnalla hnalla mentioned this issue Dec 12, 2024
devin-ai-integration bot added a commit that referenced this issue Dec 15, 2024
This fixes #1252 by properly preserving the prompt_tokens_details
information from the OpenAI response in the returned model.

- Added test to verify token caching behavior
- Modified process_response to preserve usage information

Link to Devin run: https://app.devin.ai/sessions/d34daab99304486baa9643600abeef15

Co-Authored-By: [email protected] <[email protected]>
@jxnl jxnl closed this as completed in f736fc1 Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant