CLI deployment: AOAI Quota limit #425

ArpitaisAn0maly · 2024-01-04T19:00:21Z

ArpitaisAn0maly
Jan 4, 2024
Maintainer

How do I fix this error at deployment time?

InsufficientQuota - The specified capacity '720' of account deployment is bigger than available capacity '###' for UsageName 'Tokens Per Minute (thousands) - GPT-35-Turbo-16K'.

ArpitaisAn0maly · 2024-01-04T19:13:40Z

ArpitaisAn0maly
Jan 4, 2024
Maintainer Author

AOAI service applies additional rate limits on top of model token limitations for each model deployment per region. Token Per Minute (TPM) is a configurable limit set per model per region within the API that provides a best prediction of your expected token usage over time.

if you get insufficient quota error, you need to reduce the TPM of your existing deployment of the model before you can make a new deployment or try deploying to a different region. You can edit the TPM in AI studio:

For each of your existing deployments, edit them and slide the slider down:

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI deployment: AOAI Quota limit #425

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

CLI deployment: AOAI Quota limit #425

ArpitaisAn0maly Jan 4, 2024 Maintainer

Replies: 1 comment

ArpitaisAn0maly Jan 4, 2024 Maintainer Author

ArpitaisAn0maly
Jan 4, 2024
Maintainer

ArpitaisAn0maly
Jan 4, 2024
Maintainer Author