Token Exceed for librechat but not when directly invoking same model #4259

amitkewal · 2024-09-26T14:29:32Z

amitkewal
Sep 26, 2024

What happened?

Token Exceed for librechat but not when directly invoking model with the same prompt

I have found one issue that I am getting the following error with librechat but when I am directly invoking the same model via streamlit + AWS lambda I am getting a proper response.

{"level":"error","message":"[handleAbortError] AI response error; aborting request: Prompt token count of 37165 exceeds max token count of 4095.","stack":"Error: Prompt token count of 37165 exceeds max token count of 4095.\n at OpenAIClient.handleContextStrategy (/app/api/app/clients/BaseClient.js:374:13)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async OpenAIClient.buildMessages (/app/api/app/clients/OpenAIClient.js:559:61)

Steps to Reproduce

I am working on internal 2 hours transcript data.
So I cant share it here.

What browsers are you seeing the problem on?

No response

Relevant log output

{"level":"error","message":"[handleAbortError] AI response error; aborting request: Prompt token count of 37165 exceeds max token count of 4095.","stack":"Error: Prompt token count of 37165 exceeds max token count of 4095.\n    at OpenAIClient.handleContextStrategy (/app/api/app/clients/BaseClient.js:374:13)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async OpenAIClient.buildMessages (/app/api/app/clients/OpenAIClient.js:559:61)

Screenshots

No response

Code of Conduct

I agree to follow this project's Code of Conduct

danny-avila · 2024-09-26T15:16:10Z

danny-avila
Sep 26, 2024
Maintainer

Which model, via a custom endpoint? The system has a specific list of models and uses a default context window if it's not recognized.

There will be a way to set the max context via config, but for now you can use the parameters for this:

0 replies

amitkewal · 2024-09-26T15:31:01Z

amitkewal
Sep 26, 2024
Author

@danny-avila I am using anthropic.claude-3-5-sonnet-20240620-v1:0

Sharing the librechat yaml.

---

version: 1.0.9
cache: false
interface:
  endpointsMenu: true
  modelSelect: true
  parameters: false
  sidePanel: false
  presets: false
  privacyPolicy:
    externalUrl: https://www.dummy.com/privacy
    openNewTab: true
  termsOfService:
    externalUrl: https://www.dummy.com/terms-of-use
    openNewTab: true

endpoints:
  custom:
    - name: DUMMY
      apiKey: xxxxxxxxxxxxxxxxxxxxxx
      baseURL: http://litellm:8000/v1
      models:
        default:          
          - bedrock/meta.llama3-70b-instruct-v1:0          
        fetch: true
      titleConvo: true
      titleModel: llama3-70b
      summarize: false
      summaryModel: llama3-70b
      forcePrompt: false
      modelDisplayLabel: "AI Chat"
fileConfig:
  endpoints:
    DUMMY: 
      disabled: true

0 replies

danny-avila · 2024-09-26T15:48:54Z

danny-avila
Sep 26, 2024
Maintainer

That specific model should have context recognized, odd that it isn't for you:

even when I use the exact model identifier:

To understand what's going on can you share your debug logs?

If you are using docker, they are saved in the ./logs directory at project root.

Reproduce the issue then check the latest debug logs (files starting with "debug" in name, i.e. logs/debug-2024-09-26.log)

2 replies

qymab Dec 16, 2024

I'm experiencing the issue too.

endpoint: "groq",
  endpointType: "custom",
  resendFiles: true,
    modelOptions.model: "llama-3.3-70b-versatile",
  modelsConfig: "exists",
}
2024 debug: [BaseClient] Loading history:
{
  conversationId: "",
  parentMessageId: "",
}
2024 debug: [BaseClient] Context Count (1/2)
{
  remainingContextTokens: 4092,
  maxContextTokens: 4095,
}
2024 debug: [BaseClient] Difference between original payload (1) and context (0): 1
2024 warn: Prompt token count exceeds max token count (5910 / 4095).
2024 error: [handleAbortError] AI response error; aborting request: { "type": "INPUT_LENGTH", "info": "5910 / 4095" }
2024 debug: [AskController] Request closed

danny-avila Dec 16, 2024
Maintainer

warn: Prompt token count exceeds max token coun

This means your input message exceeds the total model context.

I will update it soon for llama-3.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token Exceed for librechat but not when directly invoking same model #4259

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Token Exceed for librechat but not when directly invoking same model #4259

amitkewal Sep 26, 2024

What happened?

Steps to Reproduce

What browsers are you seeing the problem on?

Relevant log output

Screenshots

Code of Conduct

Replies: 3 comments · 2 replies

danny-avila Sep 26, 2024 Maintainer

amitkewal Sep 26, 2024 Author

---

danny-avila Sep 26, 2024 Maintainer

qymab Dec 16, 2024

danny-avila Dec 16, 2024 Maintainer

amitkewal
Sep 26, 2024

Replies: 3 comments 2 replies

danny-avila
Sep 26, 2024
Maintainer

amitkewal
Sep 26, 2024
Author

danny-avila
Sep 26, 2024
Maintainer

danny-avila Dec 16, 2024
Maintainer