Feat/evict req on client disconnect streaming case #223

bhimrazy · 2024-08-26T16:22:21Z

Before submitting

Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

How does this PR impact the user?

As a user, I want the server to stop processing requests that disconnect from the client before finishing. This PR focuses on tracking disconnected requests (specifically for non-batched streaming mode) and stops those running tasks, saving resources and freeing up space for handling other requests.

What does this PR do?

Partially fixes #165.

Handles client request disconnection in streaming mode (non-batch).

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

…tion

…g-case

codecov · 2024-08-26T16:26:55Z

Codecov Report

Attention: Patch coverage is 85.18519% with 4 lines in your changes missing coverage. Please review.

Project coverage is 95%. Comparing base (44e0fe9) to head (49bed55).

Additional details and impacted files

@@         Coverage Diff         @@
##           main   #223   +/-   ##
===================================
- Coverage    95%    95%   -0%     
===================================
  Files        18     18           
  Lines      1173   1185   +12     
===================================
+ Hits       1112   1122   +10     
- Misses       61     63    +2

aniketmaurya

Great work @bhimrazy! looking good.

src/litserve/server.py

aniketmaurya

I will run some perf test for streaming before merging this PR!

…g-case

for more information, see https://pre-commit.ci

…connection at certain regular intervals

bhimrazy · 2024-09-01T17:47:05Z

Hi @aniketmaurya,

I’ve made some modifications to check at specific intervals rather than on each output, hoping this approach might minimize any impact.

However, if this PR is affecting performance, I’m more than willing to close it. We can then explore alternative solutions that might be more effective.

lantiga · 2024-09-10T20:16:19Z

Hey @bhimrazy thanks for the patience, we're going to benchmark soon!

…g-case

for more information, see https://pre-commit.ci

…improve code readability

check interval 10 is passing. Increasing to 50.

bhimrazy · 2024-09-21T08:14:11Z

src/litserve/loops.py

+            check_interval = 50
+            for index, y_enc in enumerate(y_enc_gen):
+                if index % check_interval == 0 and request_evicted_status.get(uid):
+                    request_evicted_status.pop(uid)
+                    break


Checking the request_evicted_status for each token appears to have a significant impact, reducing performance from 3600 to around 3100. However, it may not be necessary to perform this check on every token.

While adding a check interval helps reduce the overhead and brings the performance closer to that of the main branch, but it still doesn't feel like an ideal solution.

thanks for your patience with the PR and checking the speed issue @bhimrazy 🙌 .

yeah, and in case when the time-to-first-token is large but rest of the token stream speed is fast, it doesn't help much.

this is just single worker. with multiple workers it might impact even more.

I think the overall design is correct, we are just way too aggressive checking the distributed dict and we get into contention problems.

One alternative that could reduce contention is getting a snapshot of the disconnected dictionary in every worker loop: so not use a managed dict but a shared value that the server publishes and that gets read as a whole by each worker periodically (every N seconds - we don't need a thread, we just check the time at every loop). This way every worker has a semi-up to date local dictionary that it can check as often as we want.

Having semi-up to date info on who disconnected every N seconds is totally fine, we don't need to react immediately.

This design also helps with ignoring items in the queue that come from clients that have been disconnected. For those we necessarily have to check at every request. If the local dictionary is not up to date we'll run some requests for nothing, but that's ok. One caveat is making sure the responses don't accumulate in the response dictionary on the webserver process, in this case (let's remember about this).

Thank you, @lantiga, for the valuable insights. This approach seems promising. I'll take some time to study the concept and work on the implementation shortly.

bhimrazy added 5 commits August 26, 2024 18:29

chore: Add SimpleDelayedStreamAPI for delayed streaming of output

d613565

add test_stream_client_disconnection

371cf56

add request_evicted_status param to run_streaming_loop

9e7f841

update test_stream_client_disconnection

7ce49ac

adds functionality to evict the request if disconnected before comple…

56c8587

…tion

bhimrazy requested review from lantiga, aniketmaurya, awaelchli and Andrei-Aksionov as code owners August 26, 2024 16:22

bhimrazy marked this pull request as draft August 26, 2024 16:22

Merge branch 'main' into feat/evict-req-on-client-disconnect-streamin…

f5961c4

…g-case

bhimrazy mentioned this pull request Aug 26, 2024

Feat: Evict requests if the client has disconnected #208

Closed

8 tasks

bhimrazy marked this pull request as ready for review August 26, 2024 16:31

aniketmaurya added 2 commits August 26, 2024 19:19

update exception

9330997

fix test

1f0bfe5

aniketmaurya approved these changes Aug 26, 2024

View reviewed changes

src/litserve/server.py Outdated Show resolved Hide resolved

Update src/litserve/server.py

d41db3c

aniketmaurya self-requested a review August 26, 2024 18:35

aniketmaurya reviewed Aug 26, 2024

View reviewed changes

Merge branch 'main' into feat/evict-req-on-client-disconnect-streamin…

4344720

…g-case

aniketmaurya self-requested a review August 26, 2024 18:39

bhimrazy and others added 8 commits August 27, 2024 17:01

Merge branch 'main' into feat/evict-req-on-client-disconnect-streamin…

f177fcb

…g-case

Merge branch 'main' into feat/evict-req-on-client-disconnect-streamin…

4e5045a

…g-case

Merge branch 'main' into feat/evict-req-on-client-disconnect-streamin…

1d4677c

…g-case

reverted changes to new updates

ca6fbc2

update

e61cdab

update

6c2e0c6

Merge branch 'main' into feat/evict-req-on-client-disconnect-streamin…

6668cc8

…g-case

[pre-commit.ci] auto fixes from pre-commit.com hooks

3448ef3

for more information, see https://pre-commit.ci

chore: Add test for streaming client disconnection

2cfd68e

bhimrazy marked this pull request as draft August 31, 2024 19:39

bhimrazy added 3 commits September 1, 2024 01:42

handle client disconnection streaming nonbatched case

c95ee45

chore: Optimize streaming loop performance by checking for client dis…

bac5534

…connection at certain regular intervals

chore: Update streaming loop to include request eviction status

f08ed4b

bhimrazy marked this pull request as ready for review September 1, 2024 17:41

bhimrazy marked this pull request as draft September 4, 2024 19:35

bhimrazy and others added 10 commits September 21, 2024 13:00

Merge branch 'main' into feat/evict-req-on-client-disconnect-streamin…

e060e39

…g-case

[pre-commit.ci] auto fixes from pre-commit.com hooks

2b11fc7

for more information, see https://pre-commit.ci

Refactor inference_worker function to remove optional parameters and …

5323c51

…improve code readability

update

4368b57

update

611a751

add missing param

5cc0f77

add missing param

8d4a05d

add missing param for run streaming loop

bd68b6c

test by removing the check interval

56f1076

so there is performance drop with this check,

49bed55

check interval 10 is passing. Increasing to 50.

bhimrazy commented Sep 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/evict req on client disconnect streaming case #223

Feat/evict req on client disconnect streaming case #223

bhimrazy commented Aug 26, 2024 •

edited

Loading

codecov bot commented Aug 26, 2024 •

edited

Loading

aniketmaurya left a comment

aniketmaurya left a comment •

edited

Loading

bhimrazy commented Sep 1, 2024 •

edited

Loading

lantiga commented Sep 10, 2024

bhimrazy Sep 21, 2024

aniketmaurya Sep 21, 2024 •

edited

Loading

aniketmaurya Sep 21, 2024

lantiga Sep 21, 2024

bhimrazy Sep 22, 2024

Feat/evict req on client disconnect streaming case #223

Are you sure you want to change the base?

Feat/evict req on client disconnect streaming case #223

Conversation

bhimrazy commented Aug 26, 2024 • edited Loading

How does this PR impact the user?

What does this PR do?

PR review

Did you have fun?

codecov bot commented Aug 26, 2024 • edited Loading

Codecov Report

aniketmaurya left a comment

Choose a reason for hiding this comment

aniketmaurya left a comment • edited Loading

Choose a reason for hiding this comment

bhimrazy commented Sep 1, 2024 • edited Loading

lantiga commented Sep 10, 2024

bhimrazy Sep 21, 2024

Choose a reason for hiding this comment

aniketmaurya Sep 21, 2024 • edited Loading

Choose a reason for hiding this comment

aniketmaurya Sep 21, 2024

Choose a reason for hiding this comment

lantiga Sep 21, 2024

Choose a reason for hiding this comment

bhimrazy Sep 22, 2024

Choose a reason for hiding this comment

bhimrazy commented Aug 26, 2024 •

edited

Loading

codecov bot commented Aug 26, 2024 •

edited

Loading

aniketmaurya left a comment •

edited

Loading

bhimrazy commented Sep 1, 2024 •

edited

Loading

aniketmaurya Sep 21, 2024 •

edited

Loading