Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heterogeneous parallel processing to avoid CPU & GPU Idle time #258

Draft
wants to merge 47 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
6dff1e1
Heterogeneous computing featured added
Usama3059 Aug 31, 2024
5ee1b08
Update .gitignore
Usama3059 Aug 31, 2024
2463ab3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 31, 2024
1c253e0
Merge branch 'Lightning-AI:main' into main
Usama3059 Sep 2, 2024
39e5515
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 2, 2024
4eead85
create separate process workers,forCPU & GPU
Usama3059 Sep 3, 2024
a7b5dac
separate workers for CPU & GPU test
Usama3059 Sep 3, 2024
bbce8a4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 3, 2024
1e1dc09
call via api added
Usama3059 Sep 3, 2024
7b73459
Merge branch 'main' of https://github.com/Usama3059/LitServe-extra
Usama3059 Sep 3, 2024
cf0ff2f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 3, 2024
c02f759
Added preprocess workers, for streaming wip
Usama3059 Sep 7, 2024
0cf8847
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 7, 2024
2128aa9
added for streaming, testing in progress
Usama3059 Sep 7, 2024
131924d
Merge branch 'main' of https://github.com/Usama3059/LitServe-extra
Usama3059 Sep 7, 2024
d485f90
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 7, 2024
fd6779b
Added API for preprocess workers
Usama3059 Sep 8, 2024
d371aaf
Merge branch 'main' of https://github.com/Usama3059/LitServe-extra
Usama3059 Sep 8, 2024
42dbf63
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 8, 2024
168d254
Chore: combined loops funcs
Usama3059 Sep 14, 2024
28c9dfb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 14, 2024
36939a1
Merge branch 'Lightning-AI:main' into main
Usama3059 Sep 14, 2024
3b8fd1e
Fix: changes in test_loops.py
Usama3059 Sep 14, 2024
f68823c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 14, 2024
9903d8d
Test: start working on tests
Usama3059 Sep 14, 2024
e5a6586
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 14, 2024
d182ece
Tests: without preprocess checks
Usama3059 Sep 14, 2024
58a61a6
Merge branch 'main' of https://github.com/Usama3059/LitServe-extra
Usama3059 Sep 14, 2024
e3a2345
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 14, 2024
e3db002
Tests: initial test added
Usama3059 Sep 16, 2024
9f8865d
Refactor: refactor tests_func
Usama3059 Sep 16, 2024
2cf80b1
Merge branch 'main' of https://github.com/Usama3059/LitServe-extra
Usama3059 Sep 16, 2024
2ecf163
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 16, 2024
5c4cd3d
Fix: changed process spawn
Usama3059 Sep 16, 2024
84d417f
Merge branch 'main' of https://github.com/Usama3059/LitServe-extra
Usama3059 Sep 16, 2024
d1e59e3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 16, 2024
7e78a4e
Refactor: changed ready_to_inference name
Usama3059 Sep 16, 2024
e4cd2e7
Merge branch 'main' of https://github.com/Usama3059/LitServe-extra in…
Usama3059 Sep 16, 2024
430081d
Update src/litserve/server.py
aniketmaurya Sep 16, 2024
1a5ca56
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 16, 2024
0bdb93b
Merge branch 'Lightning-AI:main' into main
Usama3059 Sep 17, 2024
9e1725f
Tests: added flow with both workers
Usama3059 Sep 17, 2024
2be7c99
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 17, 2024
7883038
Refactor: clean tests
Usama3059 Sep 17, 2024
95cecbc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 17, 2024
451b79d
Update .gitignore
Usama3059 Sep 17, 2024
68593a1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -130,3 +130,10 @@ venv.bak/
lightning_logs/
MNIST
.DS_Store


src/litserve/server.log
src/client.py
src/start_server.py
src/litserve/start_server.py
src/litserve/client.py
Usama3059 marked this conversation as resolved.
Show resolved Hide resolved
75 changes: 75 additions & 0 deletions src/litserve/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,81 @@ def batch(self, inputs):

return inputs

def preprocess(self, x, **kwargs):
Usama3059 marked this conversation as resolved.
Show resolved Hide resolved
"""Preprocess the input data before passing it to the model for inference.

The `preprocess` function handles necessary transformations (e.g., data normalization,
tokenization, feature extraction, or image resizing) before sending the data to
the model for prediction.

Args:
x: Input data, either a single instance or a batch, depending on the model’s requirements.
kwargs: Additional arguments for specific preprocessing tasks.

Returns:
Preprocessed data in a format compatible with the model's `predict` function.

Usage:
- Separate Workers for Preprocessing and Inference: If the preprocessing step is
computationally intensive, it is run on separate process workers to prevent it from
blocking the main prediction flow. The processed data is passed via a queue to the
inference workers, ensuring both stages can work in parallel.
- Performance Optimization: By decoupling preprocessing and inference, the system
can handle more requests simultaneously, reducing latency and improving throughput.
For example, while one request is being preprocessed, another can be inferred,
overlapping the time spent on both operations.

Example:
Consider batch_size = 1, with 3 requests, and 1 inference worker:
Preprocessing takes 4s and Inference takes 2s.

1. Without Separate Preprocessing Workers (Sequential):
Request 1 → Preprocess → Inference
Request 2 → Preprocess → Inference
Request 3 → Preprocess → Inference

Request 1: |-- Preprocess --|-- Inference --|
Request 2: |-- Preprocess --|-- Inference --|
Request 3: |-- Preprocess --|-- Inference --|


Total time: (4s + 2s) * 3 = 18s

2. With Separate Preprocessing Workers (Concurrent):
Request 1 → Preprocess → Inference
Request 2 → Preprocess → Inference
Request 3 → Preprocess → Inference

Request 1: |-- Preprocess --|-- Inference --|
Request 2: |-- Preprocess --|-- Inference --|
Request 3: |-- Preprocess --|-- Inference --|

Total time: 4s + 4s + 4s + 2s = 14s

When to Override:
- When preprocessing is time-consuming: If your preprocessing step involves heavy
computations (e.g., applying complex filters, large-scale image processing, or
extensive feature extraction), you should override `preprocess` to run it separately
from inference. This is especially important when preprocessing and inference both
take considerable time, as overlapping the two processes improves throughput.

- If both preprocessing and inference take significant time (e.g., several
seconds), running them concurrently can significantly reduce latency and improve
performance. For example, in high-latency models like image segmentation or NLP
models that require tokenization, separating the two stages will be highly effective.

- Less effective for fast models: If both preprocessing and inference take only a
few milliseconds each, the benefit of separating them into parallel processes may
be minimal. In such cases, the overhead of managing multiple workers and queues may
outweigh the performance gain.

- Dynamic workloads: If your workload fluctuates or you expect periods of high
demand, decoupling preprocessing from inference allows you to scale each stage
independently by adding more workers based on the current system load.

"""
pass

@abstractmethod
def predict(self, x, **kwargs):
"""Run the model on the input and return or yield the output."""
Expand Down
Loading
Loading