[V1] [6/N] API Server: Better Shutdown #11586

robertgshaw2-neuralmagic · 2024-12-28T16:10:00Z

SUMMARY:

Handle errors in background process from AsyncLLM
Handle error in output_handler loop
Right now, if any errors occur, we log the exception and kill the whole process tree. NOTE: as a follow up, we should look into a more graceful handling of EngineErrors, where we return a better message to client applications.

Pros/Cons of current design

The benefit of the design is that it is very unlikely to have any hanging or dangling resources
The negative of the design is that we cannot return clear error codes to client code

Follow Up

Explore whether we can propagate the exception back to the API Server

github-actions · 2024-12-28T16:10:12Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

robertgshaw2-neuralmagic · 2024-12-28T17:34:09Z

vllm/engine/protocol.py

@@ -29,11 +29,6 @@ class EngineClient(ABC):
    def is_running(self) -> bool:
        ...

-    @property


note: this property is not sure anywhere

simon-mo · 2024-12-29T05:58:37Z

vllm/v1/engine/async_llm.py

@@ -273,10 +292,12 @@ async def _run_output_handler(self):

                # 4) Abort any requests that finished due to stop strings.
                await self.engine_core.abort_requests_async(reqs_to_abort)
+                raise ValueError("my error!")


remove this?

simon-mo · 2024-12-29T05:58:54Z

vllm/v1/engine/async_llm.py

-            raise e
+        except Exception:
+            traceback = get_exception_traceback()
+            logger.error("EngineCore hit an exception: %s", traceback)


logger.error should automatically print traceback?

yep, thanks

tlrmchlsmth

looks like there's some debug cruft that snuck in during acbe6e3

tlrmchlsmth · 2024-12-29T18:49:52Z

vllm/engine/llm_engine.py

@@ -616,6 +616,7 @@ def _add_processed_request(
            decoder_inputs = processed_inputs
            encoder_inputs = None

+        print(f"{decoder_inputs=}")


remove this?

tlrmchlsmth · 2024-12-29T18:50:30Z

vllm/entrypoints/openai/serving_completion.py

@@ -105,13 +105,16 @@ async def create_completion(

            tokenizer = await self.engine_client.get_tokenizer(lora_request)

+            print(f"{request.prompt=}")


tlrmchlsmth · 2024-12-29T18:50:34Z

vllm/entrypoints/openai/serving_completion.py

+            print(f"{request_prompts=}")
+            print(f"{engine_prompts=}")


tlrmchlsmth · 2024-12-29T18:50:40Z

vllm/transformers_utils/tokenizer_group/tokenizer_group.py

+        print(f"{prompt=}")
        ret = tokenizer.encode(prompt)
+        print(f"{ret=}")


robertgshaw2-neuralmagic · 2024-12-29T22:57:49Z

Whoops, will fix

Signed-off-by: xcnick <[email protected]>

This reverts commit 5886aa4.

better shutdown

5d61d13

robertgshaw2-neuralmagic requested review from WoosukKwon, njhill, ywang96, comaniac and alexm-neuralmagic as code owners December 28, 2024 16:10

mergify bot added the frontend label Dec 28, 2024

shutdown on error

8f22b4b

robertgshaw2-neuralmagic commented Dec 28, 2024

View reviewed changes

robertgshaw2-neuralmagic added 4 commits December 28, 2024 17:34

remove EngineDeadError

577fe7f

remove unnessary changes

3ded2a6

formatting

c4d8782

spurious change

39b28b9

robertgshaw2-neuralmagic changed the title ~~better shutdown~~ [V1] [6/N] API Server: Better Shutdown Dec 28, 2024

robertgshaw2-neuralmagic added 2 commits December 28, 2024 17:55

update

43cf6e7

cleanup

5450350

robertgshaw2-neuralmagic assigned simon-mo and WoosukKwon Dec 28, 2024

formatted

7c5b564

simon-mo approved these changes Dec 29, 2024

View reviewed changes

use logger.error directly

acbe6e3

robertgshaw2-neuralmagic requested review from zhuohan123 and youkaichao as code owners December 29, 2024 13:11

passing

5087194

robertgshaw2-neuralmagic enabled auto-merge (squash) December 29, 2024 13:22

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 29, 2024

tlrmchlsmth reviewed Dec 29, 2024

View reviewed changes

tlrmchlsmth disabled auto-merge December 29, 2024 18:52

robertgshaw2-neuralmagic added 2 commits December 30, 2024 13:53

fix

492084b

fix

0b22987

robertgshaw2-neuralmagic enabled auto-merge (squash) December 30, 2024 13:54

robertgshaw2-neuralmagic merged commit 5886aa4 into main Dec 30, 2024
54 checks passed

robertgshaw2-neuralmagic deleted the sigquit-handling branch December 30, 2024 15:51

xcnick pushed a commit to xcnick/vllm that referenced this pull request Dec 31, 2024

[V1] [6/N] API Server: Better Shutdown (vllm-project#11586)

a3a52f9

Signed-off-by: xcnick <[email protected]>

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 31, 2024

[V1] [6/N] API Server: Better Shutdown (vllm-project#11586)

577c4ab

houseroad added a commit to houseroad/vllm that referenced this pull request Jan 1, 2025

Revert "[V1] [6/N] API Server: Better Shutdown (vllm-project#11586)"

c3aba2a

This reverts commit 5886aa4.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1] [6/N] API Server: Better Shutdown #11586

[V1] [6/N] API Server: Better Shutdown #11586

robertgshaw2-neuralmagic commented Dec 28, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 28, 2024

robertgshaw2-neuralmagic Dec 28, 2024

simon-mo Dec 29, 2024

robertgshaw2-neuralmagic Dec 29, 2024

simon-mo Dec 29, 2024

robertgshaw2-neuralmagic Dec 29, 2024

tlrmchlsmth left a comment

tlrmchlsmth Dec 29, 2024

tlrmchlsmth Dec 29, 2024

tlrmchlsmth Dec 29, 2024

tlrmchlsmth Dec 29, 2024

robertgshaw2-neuralmagic commented Dec 29, 2024

		@@ -105,13 +105,16 @@ async def create_completion(

		tokenizer = await self.engine_client.get_tokenizer(lora_request)

		print(f"{request.prompt=}")

[V1] [6/N] API Server: Better Shutdown #11586

[V1] [6/N] API Server: Better Shutdown #11586

Conversation

robertgshaw2-neuralmagic commented Dec 28, 2024 • edited by github-actions bot Loading

Pros/Cons of current design

Follow Up

github-actions bot commented Dec 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robertgshaw2-neuralmagic commented Dec 29, 2024

robertgshaw2-neuralmagic commented Dec 28, 2024 •

edited by github-actions bot

Loading