Skip to content

Latest commit

 

History

History
 
 

NeuralChat

A customizable framework to create your own LLM-driven AI apps within minutes

🌟RESTful API   |   🔥Features   |   💻Examples   |   📖Notebooks

Introduction

NeuralChat is a powerful and flexible open framework that empowers you to effortlessly create LLM-centric AI applications, including chatbots and copilots.

NeuralChat

NeuralChat is under active development. APIs are subject to change.

Installation

NeuralChat is under Intel Extension for Transformers, so ensure the installation of Intel Extension for Transformers first by following the installation. After that, install additional dependency for NeuralChat per your device:

# For CPU device
pip install -r requirements_cpu.txt

# For HPU device
pip install -r requirements_hpu.txt

# For XPU device
pip install -r requirements_xpu.txt

# For CUDA device
pip install -r requirements.txt

Getting Started

OpenAI-Compatible RESTful APIs

NeuralChat provides OpenAI-compatible RESTful APIs for LLM inference, so you can use NeuralChat as a drop-in replacement for OpenAI APIs. NeuralChat service can also be accessible through OpenAI client library, curl commands, and requests library. See neuralchat_api.md.

Launch OpenAI-compatible Service

NeuralChat launches a chatbot service using Intel/neural-chat-7b-v3-1 by default. You can customize the chatbot service by configuring the YAML file.

You can start the NeuralChat server either using the shell command or Python code.

Using Shell Command:

neuralchat_server start --config_file ./server/config/neuralchat.yaml

Using Python Code:

from intel_extension_for_transformers.neural_chat import NeuralChatServerExecutor
server_executor = NeuralChatServerExecutor()
server_executor(config_file="./server/config/neuralchat.yaml", log_file="./neuralchat.log")

Access the Service

Once the service is running, you can observe an OpenAI-compatible endpoint /v1/chat/completions. You can use any of below ways to access the endpoint.

Using OpenAI Client Library

from openai import Client
# Replace 'your_api_key' with your actual OpenAI API key
api_key = 'your_api_key'
backend_url = 'http://127.0.0.1:80/v1/chat/completions'
client = Client(api_key=api_key, base_url=backend_url)
response = client.ChatCompletion.create(
      model="Intel/neural-chat-7b-v3-1",
      messages=[
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Tell me about Intel Xeon Scalable Processors."},
      ]
)
print(response)

Using Curl

curl http://127.0.0.1:80/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "Intel/neural-chat-7b-v3-1",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Tell me about Intel Xeon Scalable Processors."}
    ]
    }'

Using Python Requests Library

import requests
url = 'http://127.0.0.1:80/v1/chat/completions'
headers = {'Content-Type': 'application/json'}
data = '{"model": "Intel/neural-chat-7b-v3-1", "messages": [ \
          {"role": "system", "content": "You are a helpful assistant."}, \
          {"role": "user", "content": "Tell me about Intel Xeon Scalable Processors."}] \
       }'
response = requests.post(url, headers=headers, data=data)
print(response.json())

Langchain Extension APIs

Intel Extension for Transformers provides a comprehensive suite of Langchain-based extension APIs, including advanced retrievers, embedding models, and vector stores. These enhancements are carefully crafted to expand the capabilities of the original langchain API, ultimately boosting overall performance. This extension is specifically tailored to enhance the functionality and performance of RAG.

Vector Stores

We introduce enhanced vector store operations, enabling users to adjust and fine-tune their settings even after the chatbot has been initialized, offering a more adaptable and user-friendly experience. For langchain users, integrating and utilizing optimized Vector Stores is straightforward by replacing the original Chroma API in langchain.

from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from langchain.chains import RetrievalQA
from langchain_core.vectorstores import VectorStoreRetriever
from intel_extension_for_transformers.langchain.vectorstores import Chroma
retriever = VectorStoreRetriever(vectorstore=Chroma(...))
retrievalQA = RetrievalQA.from_llm(llm=HuggingFacePipeline(...), retriever=retriever)

Retrievers

We provide optimized retrievers such as VectorStoreRetriever, ChildParentRetriever to efficiently handle vectorstore operations, ensuring optimal retrieval performance.

from intel_extension_for_transformers.langchain.retrievers import ChildParentRetriever
from langchain.vectorstores import Chroma
retriever = ChildParentRetriever(vectorstore=Chroma(documents=child_documents), parentstore=Chroma(documents=parent_documents), search_type=xxx, search_kwargs={...})
docs=retriever.get_relevant_documents("Intel")

Please refer to this documentation for more details.

Customizing the NeuralChat Service

Users have the flexibility to customize the NeuralChat service by making modifications in the YAML configuration file. Detailed instructions can be found in the documentation.

Supported Models

NeuralChat boasts support for various generative Transformer models available in HuggingFace Transformers. The following is a curated list of models validated for both inference and fine-tuning within NeuralChat:

Pretrained model Text Generation (Completions) Text Generation (Chat Completions) Summarization Code Generation
Intel/neural-chat-7b-v1-1
Intel/neural-chat-7b-v3-1
meta-llama/Llama-2-7b-chat-hf
meta-llama/Llama-2-70b-chat-hf
EleutherAI/gpt-j-6b
mosaicml/mpt-7b-chat
mistralai/Mistral-7B-v0.1
mistralai/Mixtral-8x7B-Instruct-v0.1
upstage/SOLAR-10.7B-Instruct-v1.0
THUDM/chatglm2-6b
THUDM/chatglm3-6b
Qwen/Qwen-7B
microsoft/phi-2
bigcode/starcoder
codellama/CodeLlama-7b-hf
codellama/CodeLlama-34b-hf
Phind/Phind-CodeLlama-34B-v2
Salesforce/codegen2-7B
ise-uiuc/Magicoder-S-CL-7B

Modify the model_name_or_path parameter in the YAML configuration file to load different models.

Rich Plugins

NeuralChat includes support for various plugins to enhance its capabilities:

Multimodal APIs

In addition to the text-based chat RESTful API, NeuralChat offers several helpful plugins in its RESTful API lineup to aid users in building multimodal applications. NeuralChat supports the following RESTful APIs:

Tasks List RESTful APIs
textchat /v1/chat/completions
/v1/completions
voicechat /v1/audio/speech
/v1/audio/transcriptions
/v1/audio/translations
retrieval /v1/rag/create
/v1/rag/append
/v1/rag/upload_link
/v1/rag/chat
codegen /v1/code_generation
/v1/code_chat
text2image /v1/text2image
image2image /v1/image2image
faceanimation /v1/face_animation
finetune /v1/finetune

Modify the tasks_list parameter in the YAML configuration file to seamlessly leverage different RESTful APIs as per your project needs.