-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Guard for negative counter metrics to prevent crash #10430
[Bugfix] Guard for negative counter metrics to prevent crash #10430
Conversation
Signed-off-by: Travis Johnson <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: Travis Johnson <[email protected]>
4102ffd
to
338740a
Compare
Not sure if it's worth adding a test to |
Let's fix #6325 in another PR. |
…oject#10430) Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Manjul Mohan <[email protected]>
…oject#10430) Signed-off-by: Travis Johnson <[email protected]>
…oject#10430) Signed-off-by: Travis Johnson <[email protected]>
…oject#10430) Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>
…oject#10430) Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: rickyx <[email protected]>
…oject#10430) Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>
…oject#10430) Signed-off-by: Travis Johnson <[email protected]>
I'm not sure how it happens, but we have observed crashes when running vLLM in online model due to a negative value being sent to increment a Prometheus counter:
This PR adds a check on the value of the counter before calling the prometheus client to avoid the crash, but the root cause of the negative value needs more investigation.
FIX #6642
#6325 is related and shows the same error.