Develop #9

XkunW · 2024-08-27T15:54:17Z

PR Type

[Feature]

Short Description

Created vec-inf CLI and uploaded to PyPI
Added support for Llama-3.1, gemma-2, phi-3 families and other model variants

Tests Added

N/A

Install python 3.10.12 to match cluster's python 3.10 version

…gularity container, but multi-node hosting broken

…v to singularity container, updated flags to named arguments

…back now

…me location, merge launch server scripts, remove accidentally added quantization example

Feature/CLI

… latest vLLM updates, updated README accordingly

…ingle node, updated get_model_dir logic to remove company names

…t max_num_seqs to address KV cache error

…A OOM errors

…move old VLM options, update vllm to 0.5.4

…ding inference requests

…pdate model family names, add list all available models command, update launch command WIP

…ll model name, added option for log directory, move default log directory to home folder, added reason var for PENDING status

jwilles · 2024-08-28T23:53:20Z

Dockerfile

@@ -54,19 +54,19 @@ RUN python3.10 -m pip install --upgrade pip
 RUN python3.10 -m pip install poetry

 # Clone the repository
-RUN git clone https://github.com/VectorInstitute/vector-inference /vec-inf
+RUN git clone -b develop https://github.com/VectorInstitute/vector-inference /vec-inf


Why are we cloning here? Is there a reason we build from source? If so, we should have something more robust than develop. If not, we should simply install the pip package.

Good catch, forgot to update this

jwilles · 2024-08-29T00:00:07Z

vec_inf/cli/_cli.py

+    is_flag=True,
+    help='Output in JSON string',
+)
+def launch(


Is it still possible for someone to run their own model weights? Assuming it conforms to an architecture supported by vllm? Also, what about passing in additional config for launch? Max token length ect. Is that still possible?

If the architecture is supported by vllm, then yes, as long as they supply values for all the optional arguments. W.r.t max tokens, that is set in chat/completions API endpoint, not in model launch.

Sorry not max tokens, I meant max_model_len. Can these be set via the launch command?

Yes that's an optional argument

jwilles · 2024-08-29T00:13:34Z

Doesn't need to happen in this PR but can we fix the checks that are failing?

XkunW · 2024-08-29T00:18:50Z

Doesn't need to happen in this PR but can we fix the checks that are failing?

Lemme see if I can quickly fix that, I've tried to previously but didn't quite finish it

XkunW and others added 30 commits July 23, 2024 14:58

Update poetry lock

3729b61

Add docker image for default environment

fc49b78

Update docker image to not create a virtural env

e597136

Update verison

18647db

Test container with single node llama 3

b1571a0

Add vllm-nccl-cu12 as dependency

1dfd1c8

Update Dockerfile

00c469c

Install python 3.10.12 to match cluster's python 3.10 version

Move nccl file location

966ed93

Update poetry lock, add mistral models, update default env to use sin…

f986875

…gularity container, but multi-node hosting broken

Update README installation instructions

8731c93

Update env var name

2b9bdf4

Move Poetry cache dir to working dir

af0ad0c

Clone from main

8109795

Update to use vLLM 0.5.0

b669354

Add vim installation, remove cache directory as it is unused

d28f03f

Update examplesto include VLM completions, add profiling scripts

c4dbed0

Added support for VLMs - llava-1.5 and llava-next, updated default en…

c9bd432

…v to singularity container, updated flags to named arguments

Fixed data type override logic, added --time argument

f60c3f1

Accidentally removed variant argument in previous commits, adding it …

045fc81

…back now

Set default image input args for VLM models

07fbe33

Update Llava 1.5 README

57087f9

Update models README

e88da1f

Update README.md to reflect refactoring in examples folder

2e465e1

Update README.md to reflect factored changes

65bf554

refactoring v1.

4b608be

removed launched server from each models directory.

9e79c31

removed MODEL_EXT

96a7233

Update config files, consolidate all job launching bash scripts to sa…

9e42483

…me location, merge launch server scripts, remove accidentally added quantization example

Fix file path issues with the consolidated launch script

7e64ecb

Update README according to refactor

4054b3b

XkunW and others added 22 commits July 30, 2024 13:59

Deleted old files unresolved from merge, delete a comment

2bde47b

Merge pull request #8 from VectorInstitute/feature/cli

f16b837

Feature/CLI

Don't create venv when building docker image

ec5dd56

Minor bug fixes for CLI, added Phi-3, updated VLM launching logic for…

6bb439e

… latest vLLM updates, updated README accordingly

Add phi-3 vision, update multi-node launch with the same command as s…

6783f12

…ingle node, updated get_model_dir logic to remove company names

Remove old comments, use pipeline parallel for multinode

a9f04cf

Remove pipeline parallelism as only few architectures supported, limi…

70b6aef

…t max_num_seqs to address KV cache error

Turn down gpu utilization to 95% as 100% seems more likely to hit CUD…

e12be31

…A OOM errors

Add missing brackets

1e1619b

Update model family name extraction logic

9b42ed3

Update vllm version to 0.5.4, change vec-inf version to 0.3.0

d1126f1

Update default variant to chat variants, add max_model_len option, re…

e6904ad

…move old VLM options, update vllm to 0.5.4

Add missing input param to launch for max_model_len

8282cc5

Add served model name to replace the full model weights path when sen…

124e898

…ding inference requests

Add available server status

46fa15b

Move default config from model family level to model variant level, u…

4c66ec7

…pdate model family names, add list all available models command, update launch command WIP

Configure default to model variant level, launch command now takes fu…

782ed34

…ll model name, added option for log directory, move default log directory to home folder, added reason var for PENDING status

Enable pipeline parallelism

465445b

Add error handling and append FAILED reason

90a4a1c

Retrieve URL for every model instance

dcb7d1c

Change local import structure to package mode, ignore built files

d317a25

Remove old config files, update README

406bba6

XkunW requested a review from jwilles August 27, 2024 15:54

Add comment on where to find the slurm logs

f9248f4

jwilles reviewed Aug 28, 2024

View reviewed changes

jwilles reviewed Aug 29, 2024

View reviewed changes

Update installation commands in Dockerfile

254df3b

XkunW merged commit 156dfa5 into main Aug 29, 2024
0 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop #9

Develop #9

XkunW commented Aug 27, 2024

jwilles Aug 28, 2024

XkunW Aug 28, 2024

jwilles Aug 29, 2024

XkunW Aug 29, 2024

jwilles Aug 29, 2024

XkunW Aug 29, 2024

jwilles commented Aug 29, 2024

XkunW commented Aug 29, 2024

Develop #9

Develop #9

Conversation

XkunW commented Aug 27, 2024

PR Type

Short Description

Tests Added

jwilles Aug 28, 2024

Choose a reason for hiding this comment

XkunW Aug 28, 2024

Choose a reason for hiding this comment

jwilles Aug 29, 2024

Choose a reason for hiding this comment

XkunW Aug 29, 2024

Choose a reason for hiding this comment

jwilles Aug 29, 2024

Choose a reason for hiding this comment

XkunW Aug 29, 2024

Choose a reason for hiding this comment

jwilles commented Aug 29, 2024

XkunW commented Aug 29, 2024