Feature Implementation for issue #136 : supports vlm requests for /chat/completions api #154

zhycheng614 · 2024-10-08T18:51:56Z

updates on the existing api: /chat/completions api to be compatible with OpenAI api
supports streaming
running model from local path supported, with --local_path and --model_type flags provided

…port

…ble with openai

…ovided but vlm message input

zhiyuan8 · 2024-10-08T20:58:25Z

SERVER.md

+
+```json
+{
+  "model": "anything",


Please use an actual model name, developers may just want to copy it and it should work

zhiyuan8 · 2024-10-08T20:59:36Z

nexa/cli/entry.py

@@ -15,13 +15,41 @@ def run_ggml_inference(args):
    if model_type:
        run_type = ModelType[model_type].value

+    def choose_file(local_path, file_type):
+        """ Helper function for Multimodal inference only: select the model and projector ggufs from the local_path. """


why don't we split proj and model file into two arguments? We might want to add a flag "--proj" or "-p" for projector. What if I mistakenly write proj at 1st position?

please confirm this UX with product @alanzhuly

zhycheng614 added 12 commits October 8, 2024 06:38

added server load vlm model logic

c04697f

fixed completion always not used bug in nexa server

a830738

fixed server text generation when streaming cannot use logprob bug

e63c2e7

WIP - basic vlm api

65a9566

refactored cli entry multimodal model logic and for the vlm server su…

4e822c3

…port

added doc string for the choose model helper function in cli entry

2432bc0

make vlm server api work for basic uses

edc941a

combine the temporary vlm api with chat completions api to be compati…

fa0e150

…ble with openai

added error handling for the chat completions api if not vlm model pr…

c12d3f4

…ovided but vlm message input

fixed server running with vlm local model problem

48f04dd

added error handling for model path mismatch

6483ef1

Updated server cli for new /chat/completions api multimodal support

0ec080f

zhycheng614 marked this pull request as ready for review October 8, 2024 19:29

zhycheng614 requested a review from zhiyuan8 October 8, 2024 19:29

changed vlm choose model cli, and added some error handling

00902be

zhiyuan8 approved these changes Oct 9, 2024

View reviewed changes

zhiyuan8 merged commit 43450f5 into main Oct 9, 2024
2 checks passed

zhycheng614 mentioned this pull request Oct 9, 2024

[FEATURE] <title> run_type "Multimodal" missing in nexa_service.py #136

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Implementation for issue #136 : supports vlm requests for /chat/completions api #154

Feature Implementation for issue #136 : supports vlm requests for /chat/completions api #154

zhycheng614 commented Oct 8, 2024 •

edited

Loading

zhiyuan8 Oct 8, 2024

zhiyuan8 Oct 8, 2024

zhiyuan8 Oct 8, 2024

Feature Implementation for issue #136 : supports vlm requests for /chat/completions api #154

Feature Implementation for issue #136 : supports vlm requests for /chat/completions api #154

Conversation

zhycheng614 commented Oct 8, 2024 • edited Loading

zhiyuan8 Oct 8, 2024

Choose a reason for hiding this comment

zhiyuan8 Oct 8, 2024

Choose a reason for hiding this comment

zhiyuan8 Oct 8, 2024

Choose a reason for hiding this comment

zhycheng614 commented Oct 8, 2024 •

edited

Loading