Replies: 4 comments
-
Hey @xKwan! I'm here to help with your issue. I can assist with bug fixes, answer questions, and guide you to become a contributor. Let's tackle this challenge together! To resolve the Here's a potential fix for the def complete(
self, prompt: str, formatted: bool = False, **kwargs: Any
) -> CompletionResponse:
kwargs = kwargs if kwargs else {}
params = {**self._model_kwargs, **kwargs}
# build sampling parameters
sampling_params = dict(**params)
sampling_params["prompt"] = prompt
response = post_http_request(self.api_url, sampling_params, stream=False)
output = get_response(response)
# Ensure the response contains the 'text' key
if 'text' not in output[0]:
raise KeyError("The response does not contain the 'text' key")
return CompletionResponse(text=output[0]['text']) This code checks if the Additionally, ensure that your from llama_index.llms.vllm import VllmServer
def messages_to_prompt(messages):
prompt = "\n".join([str(x) for x in messages])
return f"<s>[INST] {prompt} [/INST] </s>\n"
def completion_to_prompt(completion):
return f"<s>[INST] {completion} [/INST] </s>\n"
llm = VllmServer(
api_url="http://localhost:8000/v1/completions",
max_new_tokens=256,
temperature=0.1,
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
)
llm.complete("What is a black hole?") Ensure that the |
Beta Was this translation helpful? Give feedback.
-
did you find a solution? I have the same issue. |
Beta Was this translation helpful? Give feedback.
-
I followed your suggestion same error: ne 9, in get_response (propertyrag) akube@akube-dev: from llama_index.llms.vllm import VllmServer def complete(
def messages_to_prompt(messages): def completion_to_prompt(completion): def main():
if name=='main': |
Beta Was this translation helpful? Give feedback.
-
I tried to rewrite the function in the VllmServer class to solve this problem:
Output:
|
Beta Was this translation helpful? Give feedback.
-
Issue:
I want to serve LLM application in production mode, thus I am hosting LLM using vLLM and connect documents to LLM using LlamaIndex. When I tried a sample inferencing, I got KeyError: 'text'
Library versions used:
vllm: 0.4.0.post1
llama_index: 0.10.42
llama_index.llms.vllm: 0.1.7
Server Setup:
I installed vllm and started a vllm server with the following command in the terminal:
python3 -m vllm.entrypoints.openai.api_server --model=/path/to/my/local/model --dtype=float16 --tensor-parallel-size=8 --quantization=awq --gpu-memory-utilization=0.7
It is hosted on localhost:8000.
I did a sanity check with curl command:
Application Setup:
I followed the reference guide here:
https://docs.llamaindex.ai/en/stable/api_reference/llms/vllm/#llama_index.llms.vllm.VllmServer
KeyError: 'text'
Beta Was this translation helpful? Give feedback.
All reactions