-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop #9
Conversation
Install python 3.10.12 to match cluster's python 3.10 version
…gularity container, but multi-node hosting broken
…v to singularity container, updated flags to named arguments
…me location, merge launch server scripts, remove accidentally added quantization example
Feature/CLI
… latest vLLM updates, updated README accordingly
…ingle node, updated get_model_dir logic to remove company names
…t max_num_seqs to address KV cache error
…move old VLM options, update vllm to 0.5.4
…ding inference requests
…pdate model family names, add list all available models command, update launch command WIP
…ll model name, added option for log directory, move default log directory to home folder, added reason var for PENDING status
Dockerfile
Outdated
@@ -54,19 +54,19 @@ RUN python3.10 -m pip install --upgrade pip | |||
RUN python3.10 -m pip install poetry | |||
|
|||
# Clone the repository | |||
RUN git clone https://github.com/VectorInstitute/vector-inference /vec-inf | |||
RUN git clone -b develop https://github.com/VectorInstitute/vector-inference /vec-inf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we cloning here? Is there a reason we build from source? If so, we should have something more robust than develop. If not, we should simply install the pip package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, forgot to update this
is_flag=True, | ||
help='Output in JSON string', | ||
) | ||
def launch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it still possible for someone to run their own model weights? Assuming it conforms to an architecture supported by vllm? Also, what about passing in additional config for launch? Max token length ect. Is that still possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the architecture is supported by vllm, then yes, as long as they supply values for all the optional arguments. W.r.t max tokens, that is set in chat/completions API endpoint, not in model launch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry not max tokens, I meant max_model_len. Can these be set via the launch command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's an optional argument
Doesn't need to happen in this PR but can we fix the checks that are failing? |
Lemme see if I can quickly fix that, I've tried to previously but didn't quite finish it |
PR Type
[Feature]
Short Description
vec-inf
CLI and uploaded to PyPITests Added
N/A