Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop #9

Merged
merged 60 commits into from
Aug 29, 2024
Merged

Develop #9

merged 60 commits into from
Aug 29, 2024

Conversation

XkunW
Copy link
Contributor

@XkunW XkunW commented Aug 27, 2024

PR Type

[Feature]

Short Description

  • Created vec-inf CLI and uploaded to PyPI
  • Added support for Llama-3.1, gemma-2, phi-3 families and other model variants

Tests Added

N/A

XkunW and others added 30 commits July 23, 2024 14:58
Install python 3.10.12 to match cluster's python 3.10 version
…gularity container, but multi-node hosting broken
…v to singularity container, updated flags to named arguments
…me location, merge launch server scripts, remove accidentally added quantization example
XkunW and others added 22 commits July 30, 2024 13:59
… latest vLLM updates, updated README accordingly
…ingle node, updated get_model_dir logic to remove company names
…pdate model family names, add list all available models command, update launch command WIP
…ll model name, added option for log directory, move default log directory to home folder, added reason var for PENDING status
@XkunW XkunW requested a review from jwilles August 27, 2024 15:54
Dockerfile Outdated
@@ -54,19 +54,19 @@ RUN python3.10 -m pip install --upgrade pip
RUN python3.10 -m pip install poetry

# Clone the repository
RUN git clone https://github.com/VectorInstitute/vector-inference /vec-inf
RUN git clone -b develop https://github.com/VectorInstitute/vector-inference /vec-inf
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we cloning here? Is there a reason we build from source? If so, we should have something more robust than develop. If not, we should simply install the pip package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, forgot to update this

is_flag=True,
help='Output in JSON string',
)
def launch(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still possible for someone to run their own model weights? Assuming it conforms to an architecture supported by vllm? Also, what about passing in additional config for launch? Max token length ect. Is that still possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the architecture is supported by vllm, then yes, as long as they supply values for all the optional arguments. W.r.t max tokens, that is set in chat/completions API endpoint, not in model launch.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry not max tokens, I meant max_model_len. Can these be set via the launch command?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's an optional argument

@jwilles
Copy link

jwilles commented Aug 29, 2024

Doesn't need to happen in this PR but can we fix the checks that are failing?

@XkunW
Copy link
Contributor Author

XkunW commented Aug 29, 2024

Doesn't need to happen in this PR but can we fix the checks that are failing?

Lemme see if I can quickly fix that, I've tried to previously but didn't quite finish it

@XkunW XkunW merged commit 156dfa5 into main Aug 29, 2024
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants