add continuous batching loop 1/n #387

aniketmaurya · 2024-12-06T16:32:21Z

What does this PR do?

Allow users to quickly write a customizable continuous batching loop.

This is one of the first PRs to add a simple continuous batching loop and subjected to change till we are in 0.2.6.dev*. I will be making multiple PRs to clean this up and add tests.

Example with vLLM engine

from vllm import LLMEngine, EngineArgs, SamplingParams
from litserve.loops import ContinuousBatchingLoop, Output
from litserve.utils import LitAPIStatus
import litserve as ls

class LitVLLM(ls.LitAPI):
    def setup(self, device):
        engine_args = EngineArgs(model="meta-llama/Llama-3.2-1B", device=device, max_model_len=2048)
        self.engine = LLMEngine.from_engine_args(engine_args)

    def decode_request(self, request: dict) -> dict:
        return request

    def predict(self, x):
        # vLLM engine.generate is called in the loop
        yield x

    def encode_response(self, x: str):
        yield x


class vLLMLoop(ContinuousBatchingLoop):
    def __init__(self, max_sequence_length: int = 2048):
        super().__init__(max_sequence_length)
        self.uids = {}  # Maps vLLM request_id (str) to original uid

    def add_request(self, uid: str, request, lit_api: vLLMAPI, lit_spec) -> None:
        super().add_request(uid, request, lit_api, lit_spec)
        request_id = str(uid)
        sampling_params = SamplingParams(
            temperature=request.get("temperature", 0.0),
            max_tokens=request.get("max_tokens", 10)
        )
        lit_api.engine.add_request(
            request_id=request_id,
            prompt=request["prompt"],
            params=sampling_params
        )
        self.uids[request_id] = uid

    def step(self, prev_outputs, lit_api: vLLMAPI, lit_spec):
        request_outputs = lit_api.engine.step()
        outputs = []
        
        for request_output in request_outputs:
            if request_output.request_id not in self.uids:
                continue

            original_uid = self.uids[request_output.request_id]
            token_id = request_output.outputs[0].token_ids[-1]
            output = lit_api.engine.get_tokenizer().decode([token_id])
            outputs.append(Output(
                uid=original_uid,
                output=output,
                status=LitAPIStatus.OK
            ))

            if request_output.finished:
                outputs.append(Output(
                    uid=original_uid,
                    output="",
                    status=LitAPIStatus.FINISH_STREAMING
                ))
                del self.uids[request_output.request_id]

        return outputs

if __name__ == "__main__":
    loop = vLLMLoop()
    api = LitVLLM()
    server = ls.LitServer(api, loop=loop, max_batch_size=4, stream=True, timeout=-1)
    server.run()

Result of 4 concurrent requests:

Screen.Recording.2024-12-10.at.1.12.23.PM.mov

Before submitting

Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

for more information, see https://pre-commit.ci

codecov · 2024-12-06T16:37:15Z

Codecov Report

Attention: Patch coverage is 20.96774% with 98 lines in your changes missing coverage. Please review.

Project coverage is 88%. Comparing base (4a41f7f) to head (191da7b).
Report is 1 commits behind head on main.

Additional details and impacted files

@@         Coverage Diff          @@
##           main   #387    +/-   ##
====================================
- Coverage    93%    88%    -5%     
====================================
  Files        25     25            
  Lines      1635   1756   +121     
====================================
+ Hits       1520   1548    +28     
- Misses      115    208    +93

for more information, see https://pre-commit.ci

src/litserve/loops.py

ali-alshaar7

LGTM!

add continuous batching loop

cc82511

aniketmaurya requested review from lantiga, ethanwharris, Andrei-Aksionov and Borda as code owners December 6, 2024 16:32

[pre-commit.ci] auto fixes from pre-commit.com hooks

9851fa4

for more information, see https://pre-commit.ci

aniketmaurya and others added 15 commits December 6, 2024 21:12

update

1e22491

update

ebf470a

[pre-commit.ci] auto fixes from pre-commit.com hooks

ebfd684

for more information, see https://pre-commit.ci

bump version

1558463

fixes

df9ec29

fixes

843014d

update

740535b

Merge branch 'main' into aniket/cont-batch-loop

f36fd56

Merge branch 'main' into aniket/cont-batch-loop

d4e926d

fix

a6eb592

update

808a4ac

remove rich

7aff112

attach batch_size

a822eac

update

6d08ebf

update

e4ce1a2

aniketmaurya changed the title ~~[wip] add continuous batching loop~~ add continuous batching loop 1/n Dec 10, 2024

update

1d6a484

ali-alshaar7 reviewed Dec 10, 2024

View reviewed changes

src/litserve/loops.py Show resolved Hide resolved

src/litserve/loops.py Show resolved Hide resolved

src/litserve/loops.py Outdated Show resolved Hide resolved

aniketmaurya added 2 commits December 10, 2024 17:46

fix

d2a19c9

update

191da7b

aniketmaurya requested a review from ali-alshaar7 December 10, 2024 17:54

ali-alshaar7 approved these changes Dec 10, 2024

View reviewed changes

Andrei-Aksionov approved these changes Dec 11, 2024

View reviewed changes

aniketmaurya merged commit 75c6d0e into main Dec 11, 2024
21 checks passed

aniketmaurya deleted the aniket/cont-batch-loop branch December 11, 2024 12:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add continuous batching loop 1/n #387

add continuous batching loop 1/n #387

aniketmaurya commented Dec 6, 2024 •

edited

Loading

codecov bot commented Dec 6, 2024 •

edited

Loading

ali-alshaar7 left a comment

add continuous batching loop 1/n #387

add continuous batching loop 1/n #387

Conversation

aniketmaurya commented Dec 6, 2024 • edited Loading

What does this PR do?

Example with vLLM engine

PR review

Did you have fun?

codecov bot commented Dec 6, 2024 • edited Loading

Codecov Report

ali-alshaar7 left a comment

Choose a reason for hiding this comment

aniketmaurya commented Dec 6, 2024 •

edited

Loading

codecov bot commented Dec 6, 2024 •

edited

Loading