Ignite 2024 Search Demo with Azure Cosmos DB

This repository contains a Python Streamlit application that allows users to search for movies using semantic search powered by OpenAI embeddings and Azure Cosmos DB. The application includes vector search, full text search, text ranking, and hybrid search. Three containers with no index, qflat, and DiskANN indexes are created.

Features

Integration with Azure Cosmos DB for storing and querying
- Semantic search for movies using OpenAI embeddings.
- Full text search for movies.
- Hybrid search combining semantic and full text search.
- Text ranking for search results.
Support for different indexes (No Index, Qflat Index, DiskANN Index).
Interactive UI built with Streamlit.

Prerequisites

Azure Cosmos DB account with NoSQL API.
Azure OpenAI account.

Run the application locally

Clone the repository and navigate to the folder:

git clone https://github.com/AzureCosmosDB/BRK193-Ignite2024.git
 cd BRK193-Ignite2024/cosmos-search-demo

Create file name ".env" in the "app" folder, and update with required environment variables for Azure Cosmos DB and Azure OpenAI
```
AZURE_OPENAI_APIKEY=
AZURE_OPENAI_ENDPOINT=
AZURE_COSMOSDB_ENDPOINT=
AZURE_COSMOSDB_KEY=
```
Install the required packages:
```
pip install -r requirements.txt
```
Run the Streamlit application:
```
 streamlit run src/app/cosmos-app.py
```

Deploy the application to Azure with vscode

Install the Azure App Service extension:
- Open the Extensions view by clicking on the square icon in the Sidebar.
- Search for "Azure App Service" and click on the Install button.
Deploy the application:
- create Azure Web App with Linux service plan (choose B1 SKU or higher), and Python 3.10.
- In App Service go to configuration and paste the following in the Startup Command field:
```
python -m pip install requirements.txt
python -m streamlit run src/app/cosmos-app.py --server.port 8000 --server.address 0.0.0.0
```
- Ctrl + Shift + P and select "Azure App Service: Deploy to Web App"
  1. Select this folder on your machine
  2. Select subscription
  3. Select the Azure Web App you created above.
- Wait until app deployed (can take up to 5 minutes).

Loading vectors into the containers

The app will create the containers with required vector and text search policies and indexes. You need to load the data into the containers separately.
The data-loader.py script is provided in the src/data folder. You can run this script for any data as long as it is a json array of documents with a unique id field, and a field of any name containing text to be vectorized. You can also use an existing vectorized field, or re-embed that field using OpenAI embeddings if necessary. The below command will load the Movie Lens dataset into 3 containers in a database called ignite2024demo (created by the streamlit app above), re-embedding the overview field and re-naming it text, also naming the embedding field embedding to match the streamlit app, and discarding the existing vector embedding field.
```
python src/data/data-loader.py --text_field_name "overview" --path_to_json_array "https://raw.githubusercontent.com/microsoft/AzureDataRetrievalAugmentedGenerationSamples/refs/heads/main/DataSet/Movies/MovieLens-4489-256D.json" --database_name "ignite2024demo" --concurrency 20 --vector_field_name "vector" --re_embed True
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Ignite 2024 Search Demo with Azure Cosmos DB

Features

Prerequisites

Run the application locally

Deploy the application to Azure with vscode

Loading vectors into the containers

Files

README.md

Latest commit

History

README.md

File metadata and controls

Ignite 2024 Search Demo with Azure Cosmos DB

Features

Prerequisites

Run the application locally

Deploy the application to Azure with vscode

Loading vectors into the containers