Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[V1] [6/N] API Server: Better Shutdown #11586

Merged
merged 13 commits into from
Dec 30, 2024
Merged

Conversation

robertgshaw2-neuralmagic
Copy link
Collaborator

@robertgshaw2-neuralmagic robertgshaw2-neuralmagic commented Dec 28, 2024

SUMMARY:

  • Handle errors in background process from AsyncLLM
  • Handle error in output_handler loop
  • Right now, if any errors occur, we log the exception and kill the whole process tree. NOTE: as a follow up, we should look into a more graceful handling of EngineErrors, where we return a better message to client applications.

Pros/Cons of current design

  • The benefit of the design is that it is very unlikely to have any hanging or dangling resources
  • The negative of the design is that we cannot return clear error codes to client code

Follow Up

  • Explore whether we can propagate the exception back to the API Server

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@mergify mergify bot added the frontend label Dec 28, 2024
@@ -29,11 +29,6 @@ class EngineClient(ABC):
def is_running(self) -> bool:
...

@property
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this property is not sure anywhere

@robertgshaw2-neuralmagic robertgshaw2-neuralmagic changed the title better shutdown [V1] [6/N] API Server: Better Shutdown Dec 28, 2024
@@ -273,10 +292,12 @@ async def _run_output_handler(self):

# 4) Abort any requests that finished due to stop strings.
await self.engine_core.abort_requests_async(reqs_to_abort)
raise ValueError("my error!")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my b

raise e
except Exception:
traceback = get_exception_traceback()
logger.error("EngineCore hit an exception: %s", traceback)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.error should automatically print traceback?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, thanks

@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 29, 2024
Copy link
Collaborator

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there's some debug cruft that snuck in during acbe6e3

@@ -616,6 +616,7 @@ def _add_processed_request(
decoder_inputs = processed_inputs
encoder_inputs = None

print(f"{decoder_inputs=}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this?

@@ -105,13 +105,16 @@ async def create_completion(

tokenizer = await self.engine_client.get_tokenizer(lora_request)

print(f"{request.prompt=}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 116 to 117
print(f"{request_prompts=}")
print(f"{engine_prompts=}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 60 to 62
print(f"{prompt=}")
ret = tokenizer.encode(prompt)
print(f"{ret=}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@tlrmchlsmth tlrmchlsmth disabled auto-merge December 29, 2024 18:52
@robertgshaw2-neuralmagic
Copy link
Collaborator Author

Whoops, will fix

@robertgshaw2-neuralmagic robertgshaw2-neuralmagic merged commit 5886aa4 into main Dec 30, 2024
54 checks passed
@robertgshaw2-neuralmagic robertgshaw2-neuralmagic deleted the sigquit-handling branch December 30, 2024 15:51
xcnick pushed a commit to xcnick/vllm that referenced this pull request Dec 31, 2024
BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 31, 2024
houseroad added a commit to houseroad/vllm that referenced this pull request Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants