CLI deployment: AOAI Quota limit #425
ArpitaisAn0maly
started this conversation in
General
Replies: 1 comment
-
AOAI service applies additional rate limits on top of model token limitations for each model deployment per region. Token Per Minute (TPM) is a configurable limit set per model per region within the API that provides a best prediction of your expected token usage over time. if you get insufficient quota error, you need to reduce the TPM of your existing deployment of the model before you can make a new deployment or try deploying to a different region. You can edit the TPM in AI studio: For each of your existing deployments, edit them and slide the slider down: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
How do I fix this error at deployment time?
InsufficientQuota - The specified capacity '720' of account deployment is bigger than available capacity '###' for UsageName 'Tokens Per Minute (thousands) - GPT-35-Turbo-16K'.
Beta Was this translation helpful? Give feedback.
All reactions