Replies: 10 comments
-
Based on that issue I've compiled a list of features that are used in the server. Llama server featuresTo see all supported options and server endpoints you can look at server's documentation. There is explained what is supported by the server and how to configure it for your use-case. Feature listI'll compile a list of llama features that are used in the server and how much we've integrated them in our plugin. As a result we'll have a list of features to implement in order to be feature comparable when we're implementing our server. Model
Instance specific
Session specific
Server specific
** edited with prio notes ** |
Beta Was this translation helpful? Give feedback.
-
OpenAI have models which can do embeddings and text-completion. In our API this can be solved by creating new Model for each operation. Maybe we solve this at loader level by creating different Model/Instances for each type of model which supports specific operations. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Currently we support a single rudimentary op:
run
which generates up tomax_tokens
tokens.Obviously this is not enough for a chat (or at least would make a chat clunky).
What other ops do we need?
Beta Was this translation helpful? Give feedback.
All reactions