Loading models from an S3 location instead of local path #3090

simon-mo · 2024-02-28T18:20:13Z

Discussed in #3072

^{Originally posted by petrosbaltzis February 28, 2024}
Hello,

The VLLM library gives the ability to load the model and the tokenizer either from a local folder or directly from HuggingFace.

["python", "-m", "vllm.entrypoints.openai.api_server", \
"--host=0.0.0.0", \
"--port=8080", \
"--model=<local_path>", \
"--tokenizer=<local_path>",
]

I wonder if this functionality can be extended to support s3 locations so that when we initialize the API server, we pass the proper S3 location.

["python", "-m", "vllm.entrypoints.openai.api_server", \
"--host=0.0.0.0", \
"--port=8080", \
"--model=<s3://bucket/prefix>", \
"--tokenizer=<s3://bucket/prefix>",
]

Petros

The text was updated successfully, but these errors were encountered:

ywang96 · 2024-02-29T01:16:21Z

Similar to what @ikalista mentioned in original discussion, imo a better way is to mount a model storage to the container for model loading unless we want to rewrite the model loader to directly "stream" from S3 to GPU buffer like what Anyscale did.

drawnwren · 2024-04-25T02:26:54Z

Sorry to bump an old issue here, but does this mean that --download-dir does not load weights? Because the docs say "Directory to download and load the weights, default to the default cache dir of huggingface." which makes me think that when I specify --download-dir s3://my-bucket that the bucket is used as a cache. But then this issue makes me think that my interpretation is incorrect?

ashvinnihalani · 2024-09-24T00:08:21Z

@ywang96 is anybody working on the direct model loading, do we have a benchmark between mounting and directly loading to memory? Happy to work on this if nobody else is.

ywang96 · 2024-09-24T01:19:13Z

@ywang96 is anybody working on the direct model loading, do we have a benchmark between mounting and directly loading to memory? Happy to work on this if nobody else is.

Not in my knowledge. Feel free to work on this and thanks for your interest!

samos123 · 2024-11-07T14:49:45Z

@ashvinnihalani are you still working on this? This would be also be helpful to be able to load large models in environments where disk space isn't enough.

The issue with mounting object storage is that it requires the platform operator to provide this. For example, certain K8s setups the user deploying vLLM may not have the required permissions for mounting object storage in their container.

So that's why this would be a very valuable feature.

omer-dayan · 2024-11-17T06:34:02Z

Hey,
At RunAI we had published an open source tool to stream model weights from an object store like S3 to GPU memory - called RunAI Model Streamer (https://github.com/run-ai/runai-model-streamer)

The Streamer gives 2 main advantages:

Reading from storage with concurrency
Integrating with object storage, S3

You can read further in the whitepaper: https://pages.run.ai/hubfs/PDFs/White%20Papers/Model-Streamer-Performance-Benchmarks.pdf

We have proposed a way to integrate it into vLLM.
#10192

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading models from an S3 location instead of local path #3090

Loading models from an S3 location instead of local path #3090

simon-mo commented Feb 28, 2024

ywang96 commented Feb 29, 2024 •

edited

Loading

drawnwren commented Apr 25, 2024

ashvinnihalani commented Sep 24, 2024

ywang96 commented Sep 24, 2024 •

edited

Loading

samos123 commented Nov 7, 2024 •

edited

Loading

omer-dayan commented Nov 17, 2024 •

edited

Loading

Loading models from an S3 location instead of local path #3090

Loading models from an S3 location instead of local path #3090

Comments

simon-mo commented Feb 28, 2024

Discussed in #3072

ywang96 commented Feb 29, 2024 • edited Loading

drawnwren commented Apr 25, 2024

ashvinnihalani commented Sep 24, 2024

ywang96 commented Sep 24, 2024 • edited Loading

samos123 commented Nov 7, 2024 • edited Loading

omer-dayan commented Nov 17, 2024 • edited Loading

ywang96 commented Feb 29, 2024 •

edited

Loading

ywang96 commented Sep 24, 2024 •

edited

Loading

samos123 commented Nov 7, 2024 •

edited

Loading

omer-dayan commented Nov 17, 2024 •

edited

Loading