- Search Assistant
- Overview
- Before you begin
- Deploy the starter kit GUI
- Workflow overview
- Customizing the starter kit
- Third-party tools and data sources
This AI Starter Kit is an example of a semantic search workflow that can be built using the SambaNova platform to get answers to your questions using Google search information as the source. This kit includes:
- A configurable SambaNova Cloud or SambaStudio connector to run inference off a model deployed and trained on SambaNova hardware.
- A configurable integration with a third-party vector database.
- An implementation of the semantic search workflow and prompt construction strategies.
- Configurable integrations with multiple SERP APIs
- An strategy for an instant question - search - answer workflow
- An strategy for a query - search - web-crawl - answer workflow
This example is ready to use:
- Run the model following the steps in Before you begin and Deploy the starter kit GUI
- Learn how the model works and look at resources in Workflow overview.
- Customize the model to meet your organization's needs by looking at the Customizing the starter kit section.
Clone the start kit repo.
git clone https://github.com/sambanova/ai-starter-kit.git
The next step is to set up your environment variables to use one of the inference models available from SambaNova. You can obtain a free API key through SambaNova Cloud. Alternatively, if you are a current SambaNova customer, you can deploy your models using SambaStudio.
-
SambaNova Cloud (Option 1): Follow the instructions here to set up your environment variables. Then, in the config file, set the llm
api
variable to"sncloud"
and set theselect_expert
config depending on the model you want to use. -
SambaStudio (Option 2): Follow the instructions here to set up your endpoint and environment variables. Then in the config file set the llm
api
variable to"sambastudio"
, and set theCoE
andselect_expert
configs if you are using a CoE endpoint.
You have the following options to set up your embedding model:
-
CPU embedding model (Option 1): In the config file, set the variable
type
inembedding_model
to"cpu"
. -
SambaStudio embedding model (Option 2): To increase inference speed, you can use a SambaStudio embedding model endpoint instead of using the default (CPU) Hugging Face embedding. Follow the instructions here to set up your endpoint and environment variables. Then, in the config file, set the variable
type
inembedding_model
to"sambastudio"
, and set the configsbatch_size
,coe
andselect_expert
according to your SambaStudio endpoint.
We recommend that you run the starter kit in a virtual environment or use a container. We also recommend using Python >= 3.10 and < 3.12.
If you want to use virtualenv or conda environment
-
Install and update pip.
cd ai-starter-kit/search_assistant python3 -m venv search_assistant_env source search_assistant_env/bin/activate pip install -r requirements.txt
-
Set the serp tool to use. This kit provides 3 options of serp tool to use: SerpAPI, Serper, and openSERP.
-
For SerpAPI and Serper: Create an account and follow the instructions to get your API key. Then, add the key to the environment variables file in the root repo directory
ai-starter-kit/.env
. (SERPER_API_KEY
orSERPAPI_API_KEY
). -
For openSERP: Follow the docker usage instructions
Setting more than one of these tools is optional; you can set just one and run the kit with it. Each tool has its own pros and cons.
-
Run the following command:
streamlit run streamlit/app.py --browser.gatherUsageStats false
You should see the following application user interface:
If you want to use Docker:
-
Update the
SAMBASTUDIO_KEY
,SNAPI
,SNSDK
args in docker-compose.yaml file -
Run the command:
docker-compose up --build
You will be prompted to go to the link (http://localhost:8501/) in your browser where you will be greeted with the streamlit page shown in the screenshot above.
After the GUI is up and running, you can start making selections in the left pane of the GUI.
- Select the Search Tool to use. That's the tool that will search the internet.
- Select the search engine you want to use for retrieval.
- Set the maximum number of search results to retrieve.
- Select the method for retrieval
- Search and answer Does a search for each query you pass to the search assistant, and uses the search result snippets to provide an answer.
- Search and Scrape Sites Asks you for an initial query and searches and scrapes the sites. Creates a vector database from the result. You can then as other questions related to your initial query and the method uses the stored information to give an answer.
- Click the Set button to start asking questions!
This AI starter kit implements two distinct workflows each with a series of operations.
-
Search Use the Serp tool to retrieve the search results, and use the snippets of the organic search results (Serper, OpenSerp) or the raw knowledge graph (Serpapi) as context.
-
Answer Call the LLM using the retrieved information as context to answer your question.
-
Search Use the Serp tool to retrieve the search results and get links of organic search result.
-
Website crawling Scrape the HTML from the website using Langchain AsyncHtmlLoader Document, which is built on top of the requests and aiohttp Python packages.
-
Document parsing: Document transformers are tools used to transform and manipulate documents. They take in structured documents as input and apply transformations to extract specific information or modify the documents' content. Document transformers can perform tasks such as extracting properties, generating summaries, translating text, filtering redundant documents, and more. Transformers process many documents efficiently and can be used to preprocess data before further analysis or to generate new versions of the documents with desired modifications.
Depending on the required information you need to extract from websites, this step might require some customization.
- Langchain Document Transformer html2text is used to extract plain and clear text from the HTML documents.
- Other document transformers like the BeautfulSoup transformer are available for plain text extraction from HTML and are included in the LangChain package.
If you want to retrieve remote files, this starter kit includes extra file type loading functionality. You can activate or deactivate these loaders listing the filetypes in the config file in the parameter
extra_loaders
. Right now remote PDF loading is available -
Data splitting: Due to token limits in LLMs, you need to split the data into chunks of text to be embedded and stored in a vector database after the data has been parsed and its content extracted. The size of the chunk of text depends on the context (sequence) length offered by the model. Generally, larger context lengths result in better performance. The method used to split text also has an impact on performance (for instance, making sure there are no word breaks, sentence breaks, etc.). The downloaded data is split using RecursiveCharacterTextSplitter.
-
Data embedding: For each chunk of text from the previous step, we use an embeddings model to create a vector representation of it. These embeddings are used in the storage and retrieval of the most relevant content given a user's query. The split text is embedded using HuggingFaceInstructEmbeddings.
For more information about what an embeddings is click here
-
Embedding storage: Embeddings for each chunk, along with content and relevant metadata (such as source website), are stored in a vector database. The embedding acts as the index in the database. In this starter kit, we store information with each entry, which can be modified to suit your needs. Several vector database options are available, each with its own pros and cons. This starter kit uses FAISS as the vector database because it's a free, open-source option with straightforward setup, but can easily be updated to use another database if desired. In terms of metadata,
website source
is also attached to the embeddings which are stored during web scraping.
This workflow is an example of leveraging data stored in a vector database along with a large language model to enable retrieval-based Q&A of your data. This method is called Retrieval Augmented Generation RAG, The steps are:
-
Embed query: The first step is to convert a user-submitted query into a common representation (an embedding) for subsequent use in identifying the most relevant stored content. Because of this, we recommend that you use the same embedding model to generate embeddings. In this sample, the query text is embedded using HuggingFaceInstructEmbeddings, which is the same model in the ingestion workflow.
-
Retrieve relevant content: Next, we use the embeddings representation of the query to make a retrieval request from the vector database, which returns relevant entries (content). The vector database acts as a retriever for fetching relevant information from the database.
Find more information about embeddings and their retrieval here
Find more information about Retrieval augmented generation with LangChain here
You can customize this starter kit based on your use case.
You can modify or change the behavior of the search step by including your custom method in SearchAssistant class. Your method must be able to receive a query, have a do_analysis
flag, and return a result and a list of retrieved URLS.
This modification can be done in the following location:
file: src/search_assistant.py
Different packages are available to crawl and extract from websites. This starter kit uses the AsyncHtmlLoader. Langchain also includes a couple of HTML loaders that can be used.
This modification can be done in the following location:
file: src/search_assistant.py
function:
load_htmls
The maximum number of sites in the scraping method is set to 20 scraped sited, but you can modify that limit and the web crawling behavior in the following location:
file: config.yaml
web_crawling: "max_depth": 2 "max_scraped_websites": 20
file: src/search_assistant.py
function: web_crawl
Depending on the loader used for scraping the sites, you may want to use a transformation method to clean up the downloaded documents. You can do that in the following location:
file: src/search_assistant.py
function:
clean_docs
LangChain provides several document transformers that you can use.
You can experiment with different ways of splitting the data, such as splitting by tokens or using context-aware splitting for code or markdown files. LangChain provides several examples of different kinds of splitting here.
You can customize the RecursiveCharacterTextSplitter in the kit src file, which is used by this starter kit by changing the chunk_size
and chunk_overlap
parameters.
- For LLMs with a long sequence length, try using a larger value of
chunk_size
to provide the LLM with broader context and improve performance. - The
chunk_overlap
parameter is used to maintain continuity between different chunks.
This modification can be done in the following location:
file: config.yaml
retrieval: "chunk_size": 1200 "chunk_overlap": 240 "db_type": "faiss" "k_retrieved_documents": 4 "score_treshold": 0.5
Several open source embedding models are available on HuggingFace. This leaderboard ranks these models based on the Massive Text Embedding Benchmark (MTEB). Several of these models are available on SambaStudio and can be used or further fine-tuned on specific datasets to improve performance.
This modification can be done in the following location:
file: ../vectordb/vector_db.py
function:
load_embedding_model
Find more information about the usage of SambaStudio hosted embedding models in the section Use Sambanova's LLMs and Embeddings Langchain wrappers here.
Customize search assistant to use a different vector database to store the embeddings generated by the embedding model. The LangChain vector stores documentation provides a broad collection of vector stores that are easy to integrate.
This modification can be done in the following location:
file: ../vectordb/vector_db.py
function:
create_vector_store
Similar to the vector stores, a wide collection of retriever options is also available. This starter kit uses the vector store as a retriever, but it can be enhanced and customized, as shown in some of the examples here.
This modification can be done in the following location:
file: config.yaml
"db_type": "chroma"
"k_retrieved_documents": 3
"score_treshold": 0.6
and
file: src/search_assistant.py
function:
retrieval_qa_chain
You can further customize the model itself.
The starter kit uses the SN LLM model, which can be further fine-tuned to improve response quality.
To train a model in SambaStudio, learn how to:
You can modify the parameters for calling the model and the temperature and maximum generation token in the config,yaml
file.
Prompting has a significant effect on the quality of LLM responses. Prompts can be further customized to improve the overall quality of the responses from the LLMs. For example, in this starter kit, the following prompt was used to generate a response from the LLM, where question
is the user query and context
are the documents retrieved by the search engine.
template: |
<s>[INST] <<SYS>>\nUse the following pieces of context to answer the question at the end.
If the answer is not in context for answering, say that you don't know, don't try to make up an answer or provide an answer not extracted from provided context.
Cross check if the answer is contained in provided context. If not than say \"I do not have information regarding this.\"\n
context
{context}
end of context
<</SYS>>/n
Question: {question}
Helpful Answer: [/INST]
)
Those modifications can be done in the following locations:
Prompt: retrieval Q&A chain
Prompt: Serpapi search and answer chain
Prompt: Serper search and answer chain
Prompt: OpenSerp search and answer chain
Learn more about prompt engineering here
All the packages/tools are listed in the requirements.txt
file in the project directory.