Add `loop.pre_setup` to allow fine-grained LitAPI validation based on inference loop #393

aniketmaurya · 2024-12-11T13:32:27Z

What does this PR do?

This PR allows fine-grained validation of LitAPI based on inference loop.

Default loops - will have the same validation as of now but new inference loops such as continuous batching will be able to add more validations such as check lit_api.predict_start method exists and won't force a global structure of validation.

For example, the following example will fail with the current global validation and forces to implement predict as generator:

from vllm import LLMEngine, EngineArgs, SamplingParams
from litserve.loops import ContinuousBatchingLoop, Output
from litserve.utils import LitAPIStatus
import litserve as ls

class LitVLLM(ls.LitAPI):
    def setup(self, device):
        engine_args = EngineArgs(model="meta-llama/Llama-3.2-1B", device=device, max_model_len=2048)
        self.engine = LLMEngine.from_engine_args(engine_args)

    def decode_request(self, request: dict) -> dict:
        return request

class vLLMLoop(ContinuousBatchingLoop):
    def __init__(self, max_sequence_length: int = 2048):
        super().__init__(max_sequence_length)
        self.uids = {}  # Maps vLLM request_id (str) to original uid

    def add_request(self, uid: str, request, lit_api: LitVLLM, lit_spec) -> None:
        super().add_request(uid, request, lit_api, lit_spec)
        request_id = str(uid)
        sampling_params = SamplingParams(
            temperature=request.get("temperature", 0.0),
            max_tokens=request.get("max_tokens", 10)
        )
        lit_api.engine.add_request(
            request_id=request_id,
            prompt=request["prompt"],
            params=sampling_params
        )
        self.uids[request_id] = uid

    def step(self, prev_outputs, lit_api: vLLMAPI, lit_spec):
        request_outputs = lit_api.engine.step()
        outputs = []

        for request_output in request_outputs:
            if request_output.request_id not in self.uids:
                continue

            original_uid = self.uids[request_output.request_id]
            token_id = request_output.outputs[0].token_ids[-1]
            output = lit_api.engine.get_tokenizer().decode([token_id])
            outputs.append(Output(
                uid=original_uid,
                output=output,
                status=LitAPIStatus.OK
            ))

            if request_output.finished:
                outputs.append(Output(
                    uid=original_uid,
                    output="",
                    status=LitAPIStatus.FINISH_STREAMING
                ))
                del self.uids[request_output.request_id]

        return outputs

if __name__ == "__main__":
    loop = vLLMLoop()
    api = LitVLLM()
    server = ls.LitServer(api, loop=loop, max_batch_size=4, stream=True, timeout=-1)
    server.run()

Before submitting

Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

codecov · 2024-12-11T14:10:34Z

Codecov Report

Attention: Patch coverage is 74.41860% with 11 lines in your changes missing coverage. Please review.

Project coverage is 88%. Comparing base (75c6d0e) to head (75aac9a).
Report is 1 commits behind head on main.

Additional details and impacted files

@@         Coverage Diff         @@
##           main   #393   +/-   ##
===================================
  Coverage    88%    88%           
===================================
  Files        25     25           
  Lines      1756   1765    +9     
===================================
+ Hits       1548   1558   +10     
+ Misses      208    207    -1

src/litserve/api.py

src/litserve/loops.py

pre_setup loop

1acc758

aniketmaurya requested review from lantiga, ethanwharris, Andrei-Aksionov and Borda as code owners December 11, 2024 13:32

add test

e815215

aniketmaurya changed the title ~~Sanitize LitAPI with loop.pre_setup to allow fine-grained LitAPI validation based on inference loop~~ Add loop.pre_setup to allow fine-grained LitAPI validation based on inference loop Dec 11, 2024

fix tests

d5e30ef

Borda approved these changes Dec 11, 2024

View reviewed changes

src/litserve/api.py Outdated Show resolved Hide resolved

src/litserve/loops.py Show resolved Hide resolved

src/litserve/loops.py Show resolved Hide resolved

src/litserve/loops.py Show resolved Hide resolved

apply feedback

75aac9a

aniketmaurya enabled auto-merge (squash) December 11, 2024 16:31

aniketmaurya merged commit 35129f7 into main Dec 11, 2024
21 checks passed

aniketmaurya deleted the aniket/pre-setup-loop branch December 11, 2024 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `loop.pre_setup` to allow fine-grained LitAPI validation based on inference loop #393

Add `loop.pre_setup` to allow fine-grained LitAPI validation based on inference loop #393

aniketmaurya commented Dec 11, 2024 •

edited

Loading

codecov bot commented Dec 11, 2024 •

edited

Loading

Add loop.pre_setup to allow fine-grained LitAPI validation based on inference loop #393

Add loop.pre_setup to allow fine-grained LitAPI validation based on inference loop #393

Conversation

aniketmaurya commented Dec 11, 2024 • edited Loading

What does this PR do?

PR review

Did you have fun?

codecov bot commented Dec 11, 2024 • edited Loading

Codecov Report

Add `loop.pre_setup` to allow fine-grained LitAPI validation based on inference loop #393

Add `loop.pre_setup` to allow fine-grained LitAPI validation based on inference loop #393

aniketmaurya commented Dec 11, 2024 •

edited

Loading

codecov bot commented Dec 11, 2024 •

edited

Loading