-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Implementation for issue #136 : supports vlm requests for /chat/completions api #154
Conversation
zhycheng614
commented
Oct 8, 2024
•
edited
Loading
edited
- updates on the existing api: /chat/completions api to be compatible with OpenAI api
- supports streaming
- running model from local path supported, with --local_path and --model_type flags provided
…ovided but vlm message input
|
||
```json | ||
{ | ||
"model": "anything", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use an actual model name, developers may just want to copy it and it should work
@@ -15,13 +15,41 @@ def run_ggml_inference(args): | |||
if model_type: | |||
run_type = ModelType[model_type].value | |||
|
|||
def choose_file(local_path, file_type): | |||
""" Helper function for Multimodal inference only: select the model and projector ggufs from the local_path. """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we split proj and model file into two arguments? We might want to add a flag "--proj" or "-p" for projector. What if I mistakenly write proj at 1st position?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please confirm this UX with product @alanzhuly