[platforms] absorb worker cls difference into platforms folder #10555

youkaichao · 2024-11-21T22:13:20Z

part of #9268

every platforms should specify the worker class inside their own code.

in addition, the default case is auto , and this allows users to specify custom classes for extensibility (which i'm working on as part of an RLHF support).

Signed-off-by: youkaichao <[email protected]>

github-actions · 2024-11-21T22:13:33Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

comaniac

LGTM

Signed-off-by: youkaichao <[email protected]>

njhill · 2024-11-22T00:50:31Z

Thanks @youkaichao! I am also reviewing this now...

njhill

@youkaichao I think this change is probably ok if we're going to continue iterating on the architecture, but it doesn't seem like the right end design to me.

I've already been thinking that we need to overhaul the executor hierarchy/abstractions a bit (hoping we can do this as part of v1), and that may be part of why this doesn't sit right.

In particular we have one or more executor class(es) per platform so this is in some way abstracting over the platforms. But then there's a parallel Platform abstraction. I think we should get rid of the platform-specific executors. I.e. no ray_*pu_executor.pys. Possibly the platform-specific aspects could be a mix-in.

It also feels a bit wrong to me to update the config objects in-place since these might be created/ "owned" by the user.

Also wdyt about changing this field to be custom_worker_cls: Optional[str] = None? Since it's a very specialized option I would consider it more overriding vLLMs native behaviour so it's not so much an "auto" thing.

vllm/executor/cpu_executor.py

vllm/executor/ray_gpu_executor.py

vllm/platforms/cuda.py

vllm/executor/ray_hpu_executor.py

vllm/executor/ray_tpu_executor.py

youkaichao · 2024-11-22T01:21:21Z

I think we should get rid of the platform-specific executors. I.e. no ray_*pu_executor.pys. Possibly the platform-specific aspects could be a mix-in.

I strongly agree. We should only have {single worker executor, ray executor, mp executor} , and they should be able to initialize various workers.

I think this PR is a tiny step towards that direction.

youkaichao · 2024-11-22T01:23:44Z

It also feels a bit wrong to me to update the config objects in-place since these might be created/ "owned" by the user.

what's the concern here? the problem is we don't have a scratch space for the model to store some information, and right now we use vllm_config a lot, to store per-model basis global variables.

njhill · 2024-11-22T01:56:41Z

It also feels a bit wrong to me to update the config objects in-place since these might be created/ "owned" by the user.

what's the concern here? the problem is we don't have a scratch space for the model to store some information, and right now we use vllm_config a lot, to store per-model basis global variables.

Yeah it's kind of a more general point than this PR ... like you say two things are being conflated a bit. Ideally the config should be treated as read-only I think (could have been passed in by the user) and model global mutable state should be separate.

Perhaps something like:

@dataclass
class ModelState:
    config: VllmConfig
    
    # ...

Co-authored-by: Nick Hill <[email protected]>

youkaichao · 2024-11-22T03:22:02Z

class ModelState:
config: VllmConfig
# ...

@njhill this makes sense, but you need to figure out where to store it. All the classes, including engine, executor, worker, model runner, model needs to access it. And you cannot use module-level global state because people can create multiple LLM object in the same process.

youkaichao · 2024-11-22T03:27:07Z

wdyt about changing this field to be custom_worker_cls: Optional[str] = None?

I use a single worker_cls because it will be printed out during init.

Printing:

worker_cls = "vllm.worker.worker.Worker"
custom_worker_cls = "whatever.user.provide"

looks less clear than:

worker_cls = "auto"

and

worker_cls = "whatever.user.provide"

njhill

We can address the other suggestions as follow-on refactoring

…project#10555) Signed-off-by: youkaichao <[email protected]> Co-authored-by: Nick Hill <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>

`worker_module_name` and `worker_class_name` is no longer supported. Refer to vllm-project/vllm#10555 Signed-off-by: Hollow Man <[email protected]>

…project#10555) Signed-off-by: youkaichao <[email protected]> Co-authored-by: Nick Hill <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>

…project#10555) Signed-off-by: youkaichao <[email protected]> Co-authored-by: Nick Hill <[email protected]>

youkaichao added 3 commits November 21, 2024 13:26

stash

a1f05ce

Signed-off-by: youkaichao <[email protected]>

draft

f5ee6c2

Signed-off-by: youkaichao <[email protected]>

finish

00595e5

Signed-off-by: youkaichao <[email protected]>

youkaichao requested review from zhuohan123, alexm-neuralmagic, comaniac and njhill as code owners November 21, 2024 22:13

youkaichao changed the title ~~[platforms] refactor worker class specification~~ [platforms] absorb worker cls difference into platforms folder Nov 21, 2024

comaniac approved these changes Nov 21, 2024

View reviewed changes

comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 21, 2024

youkaichao added 3 commits November 21, 2024 14:56

fix

26dd3a7

Signed-off-by: youkaichao <[email protected]>

add default to scheduler config

5ce8f11

Signed-off-by: youkaichao <[email protected]>

add default

b261b38

Signed-off-by: youkaichao <[email protected]>

comaniac mentioned this pull request Nov 22, 2024

[Kernel]Generalize Speculative decode from Cuda #10094

Closed

njhill reviewed Nov 22, 2024

View reviewed changes

youkaichao and others added 6 commits November 21, 2024 19:08

Update vllm/executor/cpu_executor.py

7196dce

Co-authored-by: Nick Hill <[email protected]>

Update vllm/executor/ray_gpu_executor.py

cc56a35

Co-authored-by: Nick Hill <[email protected]>

Update vllm/executor/ray_gpu_executor.py

e6b47df

Co-authored-by: Nick Hill <[email protected]>

Update vllm/executor/ray_hpu_executor.py

671a50a

Co-authored-by: Nick Hill <[email protected]>

Update vllm/executor/ray_tpu_executor.py

df6a835

Co-authored-by: Nick Hill <[email protected]>

Update vllm/executor/ray_tpu_executor.py

9ab3cdd

Co-authored-by: Nick Hill <[email protected]>

njhill approved these changes Nov 22, 2024

View reviewed changes

youkaichao merged commit a111d01 into vllm-project:main Nov 22, 2024
48 of 51 checks passed

youkaichao deleted the worker_cls branch November 22, 2024 05:00

xuechendi mentioned this pull request Nov 23, 2024

[Kernel] Remove hard-dependencies of Speculative decode to CUDA workers #10587

Merged

MengqingCao mentioned this pull request Nov 26, 2024

[RFC]: Create VllmState to save immutable args in VllmConfig #10666

Open

1 task

HollowMan6 mentioned this pull request Nov 26, 2024

Use worker_cls when vLLM version > 0.6.4.post1 OpenRLHF/OpenRLHF#540

Merged

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[platforms] absorb worker cls difference into platforms folder (vllm-…

0dbd379

…project#10555) Signed-off-by: youkaichao <[email protected]> Co-authored-by: Nick Hill <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[platforms] absorb worker cls difference into platforms folder #10555

[platforms] absorb worker cls difference into platforms folder #10555

youkaichao commented Nov 21, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 21, 2024

comaniac left a comment

njhill commented Nov 22, 2024

njhill left a comment

youkaichao commented Nov 22, 2024

youkaichao commented Nov 22, 2024

njhill commented Nov 22, 2024

youkaichao commented Nov 22, 2024

youkaichao commented Nov 22, 2024

njhill left a comment

[platforms] absorb worker cls difference into platforms folder #10555

[platforms] absorb worker cls difference into platforms folder #10555

Conversation

youkaichao commented Nov 21, 2024 • edited by github-actions bot Loading

github-actions bot commented Nov 21, 2024

comaniac left a comment

Choose a reason for hiding this comment

njhill commented Nov 22, 2024

njhill left a comment

Choose a reason for hiding this comment

youkaichao commented Nov 22, 2024

youkaichao commented Nov 22, 2024

njhill commented Nov 22, 2024

youkaichao commented Nov 22, 2024

youkaichao commented Nov 22, 2024

njhill left a comment

Choose a reason for hiding this comment

youkaichao commented Nov 21, 2024 •

edited by github-actions bot

Loading