Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: chat API logprobs format #1788

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

domdomegg
Copy link

@domdomegg domdomegg commented Oct 6, 2024

Summary

The OpenAI compatible server should match the response structure of the OpenAI API for chat completions. Unfortunately there is a discrepancy with the format of logprobs: we return the logprobs format for the completions API, rather than the chat completions API.

This PR:

  • updates the types to match the OpenAI API
  • adds a function _convert_text_completion_logprobs_to_chat which is used in the chat completion responses to convert the logprobs to the new API format
  • updates the documentation on running the server locally, as I discovered this was outdated when I went to test things out

Issues fixed

Fixes #1787

@domdomegg
Copy link
Author

Demo

Request
{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ],
  "logprobs": true,
  "top_logprobs": 10,
  "max_tokens": 5
}
Response
{
  "id": "chatcmpl-1898ccce-2bf6-431c-b9e0-2a82e90a9604",
  "object": "chat.completion",
  "created": 1728184671,
  "model": "gpt-3.5-turbo",
  "choices": [
    {
      "index": 0,
      "message": {
        "content": "The capital of France is",
        "role": "assistant"
      },
      "logprobs": {
        "content": [
          {
            "token": "The",
            "logprob": -0.008244173601269722,
            "bytes": null,
            "top_logprobs": [
              {
                "token": "The",
                "logprob": -0.008244173601269722,
                "bytes": null
              },
              {
                "token": "Paris",
                "logprob": -5.3227219581604,
                "bytes": null
              },
              {
                "token": "Sure",
                "logprob": -5.770838260650635,
                "bytes": null
              },
              {
                "token": "Answer",
                "logprob": -9.54023265838623,
                "bytes": null
              },
              {
                "token": "Yes",
                "logprob": -9.896768569946289,
                "bytes": null
              },
              {
                "token": "France",
                "logprob": -10.62641429901123,
                "bytes": null
              },
              {
                "token": " The",
                "logprob": -11.367059707641602,
                "bytes": null
              },
              {
                "token": "According",
                "logprob": -11.45943546295166,
                "bytes": null
              },
              {
                "token": "**",
                "logprob": -11.586193084716797,
                "bytes": null
              },
              {
                "token": " Paris",
                "logprob": -11.59852409362793,
                "bytes": null
              }
            ]
          },
          {
            "token": " capital",
            "logprob": -0.0005453529884107411,
            "bytes": null,
            "top_logprobs": [
              {
                "token": " capital",
                "logprob": -0.0005453529884107411,
                "bytes": null
              },
              {
                "token": " Capital",
                "logprob": -7.571288108825684,
                "bytes": null
              },
              {
                "token": " city",
                "logprob": -11.57780647277832,
                "bytes": null
              },
              {
                "token": " current",
                "logprob": -12.473557472229004,
                "bytes": null
              },
              {
                "token": " correct",
                "logprob": -12.674555778503418,
                "bytes": null
              },
              {
                "token": "  ",
                "logprob": -12.77519416809082,
                "bytes": null
              },
              {
                "token": " answer",
                "logprob": -12.833593368530273,
                "bytes": null
              },
              {
                "token": " French",
                "logprob": -13.656529426574707,
                "bytes": null
              },
              {
                "token": " Paris",
                "logprob": -13.73013687133789,
                "bytes": null
              },
              {
                "token": " **",
                "logprob": -13.916248321533203,
                "bytes": null
              }
            ]
          },
          {
            "token": " of",
            "logprob": -0.019254328683018684,
            "bytes": null,
            "top_logprobs": [
              {
                "token": " of",
                "logprob": -0.019254328683018684,
                "bytes": null
              },
              {
                "token": " city",
                "logprob": -3.9625728130340576,
                "bytes": null
              },
              {
                "token": " and",
                "logprob": -10.33055305480957,
                "bytes": null
              },
              {
                "token": "  ",
                "logprob": -12.015106201171875,
                "bytes": null
              },
              {
                "token": " is",
                "logprob": -12.049043655395508,
                "bytes": null
              },
              {
                "token": " City",
                "logprob": -12.161520957946777,
                "bytes": null
              },
              {
                "token": " o",
                "logprob": -12.770393371582031,
                "bytes": null
              },
              {
                "token": " cities",
                "logprob": -14.372736930847168,
                "bytes": null
              },
              {
                "token": " của",
                "logprob": -14.63923454284668,
                "bytes": null
              },
              {
                "token": " ",
                "logprob": -14.65132999420166,
                "bytes": null
              }
            ]
          },
          {
            "token": " France",
            "logprob": -0.0000252720492426306,
            "bytes": null,
            "top_logprobs": [
              {
                "token": " France",
                "logprob": -0.0000252720492426306,
                "bytes": null
              },
              {
                "token": " the",
                "logprob": -11.084362030029297,
                "bytes": null
              },
              {
                "token": "  ",
                "logprob": -12.06197738647461,
                "bytes": null
              },
              {
                "token": "France",
                "logprob": -12.9952974319458,
                "bytes": null
              },
              {
                "token": " French",
                "logprob": -13.759483337402344,
                "bytes": null
              },
              {
                "token": " is",
                "logprob": -15.239158630371094,
                "bytes": null
              },
              {
                "token": " **",
                "logprob": -15.40572452545166,
                "bytes": null
              },
              {
                "token": " france",
                "logprob": -15.767807960510254,
                "bytes": null
              },
              {
                "token": " ",
                "logprob": -16.346908569335938,
                "bytes": null
              },
              {
                "token": " Frankreich",
                "logprob": -17.035612106323242,
                "bytes": null
              }
            ]
          },
          {
            "token": " is",
            "logprob": -0.000060437283536884934,
            "bytes": null,
            "top_logprobs": [
              {
                "token": " is",
                "logprob": -0.000060437283536884934,
                "bytes": null
              },
              {
                "token": "  ",
                "logprob": -9.920828819274902,
                "bytes": null
              },
              {
                "token": ",",
                "logprob": -12.151354789733887,
                "bytes": null
              },
              {
                "token": " was",
                "logprob": -13.53709602355957,
                "bytes": null
              },
              {
                "token": " ",
                "logprob": -14.004632949829102,
                "bytes": null
              },
              {
                "token": " in",
                "logprob": -14.70918083190918,
                "bytes": null
              },
              {
                "token": " **",
                "logprob": -14.768845558166504,
                "bytes": null
              },
              {
                "token": " the",
                "logprob": -14.776985168457031,
                "bytes": null
              },
              {
                "token": " ",
                "logprob": -14.940979957580566,
                "bytes": null
              },
              {
                "token": " Is",
                "logprob": -14.942352294921875,
                "bytes": null
              }
            ]
          }
        ],
        "refusal": null
      },
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "completion_tokens": 5,
    "total_tokens": 34
  }
}

uvicorn --factory llama.server:app --host ${HOST} --port ${PORT}
python llama_cpp/server --model ${MODEL}
Copy link
Contributor

@lukestanley lukestanley Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? It is unrelated to logprobs? @domdomegg

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the current instructions work, and this is how I got it working.

Happy to split this out to a separate PR, or for challenge that the uvicorn command does work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

server: chat completions returns wrong logprobs model
2 participants