diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 8233daea..23ccf27a 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -44,9 +44,10 @@ repos: - id: nb-check name: nb-check entry: resources/nb-check.py - language: system + language: python files: \.ipynb$ exclude: notebooks/notebook-style-guide/notebook.ipynb + additional_dependencies: [nbformat==5.10.4] - id: nb-meta-check name: nb-meta-check entry: resources/nb-meta-check.py diff --git a/notebooks/atlas-and-kai/notebook.ipynb b/notebooks/atlas-and-kai/notebook.ipynb index fed7e6f5..0a6aea23 100644 --- a/notebooks/atlas-and-kai/notebook.ipynb +++ b/notebooks/atlas-and-kai/notebook.ipynb @@ -1,8 +1,8 @@ { "cells": [ { + "id": "50bd6467", "cell_type": "markdown", - "id": "48a6458f-75ed-4a6c-aaa8-184bb9edfb75", "metadata": {}, "source": [ "
Make sure to select a database from the drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI.
\n", "{success_msg}
\n", "\n", " | \n", " |
\n", " | \n", " |
Make sure to select a database from the drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "Make sure to select a database from the drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "Action Required
Be sure to select the {{S2_DATABASE_NAME}} database from the drop-down list at the top of this notebook. It updates the connection_url which is used by the %%sql magic command and SQLAlchemy to connect to the selected database.
Notes
All Kafka configurations in the pipeline, such as 'client.id', are supported since version 8.1.35.
The schema registry mapping section should be updated according to your schema registry in the 'table column name' <- 'schema registry field name' format.
If you have a Free Starter Workspace deployed already, select the database from drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "Make sure to select a database from the drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI.
\n", "\n", " |
\n", " | \n", " |
If you have a Free Starter Workspace deployed already, select the database from drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "Make sure to select a database from the drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI.
\n", "Select the database from the drop-down menu at the top of this notebook.
\n", "Select the database from the drop-down menu at the top of this notebook. It updates the connection_url which is used by SQLAlchemy to make connections to the selected database.
\n", "The following code will only work on the Standard Tier at this time.
\n", "This tutorial is meant for Standard & Premium Workspaces. You can't run this with a Free Starter Workspace due to restrictions on Storage. Create a Workspace using +group in the left nav & select Standard for this notebook. Gallery notebooks tagged with \"Starter\" are suitable to run on a Free Starter Workspace
\n", "Make sure to select a database from the drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "[CREATE PIPELINE](https://docs.singlestore.com/cloud/reference/create-pipeline/)
command to load data into the video_games
table. The CREATE PIPELINE
command may take around 30 seconds to run."
- ]
+ ],
+ "id": "e00df72a"
},
{
"cell_type": "code",
"execution_count": 2,
- "id": "fbd119f0-7700-4494-9efa-b999606ba4dd",
"metadata": {},
"outputs": [],
"source": [
@@ -143,60 +142,60 @@
"LINES TERMINATED BY '\\r\\n';\n",
"\n",
"START PIPELINE wiki_pipeline FOREGROUND;"
- ]
+ ],
+ "id": "3fbf7fdd"
},
{
"attachments": {},
"cell_type": "markdown",
- "id": "37554805-ce14-49c5-ba2e-5b010023c1b6",
"metadata": {},
"source": [
"Verify the data was loaded using the query below. Wait for the pipeline to finish before running the COUNT
query."
- ]
+ ],
+ "id": "59459056"
},
{
"cell_type": "code",
"execution_count": 3,
- "id": "692fd8c1-a867-46bf-b64b-68f6389b0992",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT COUNT(*)\n",
"FROM video_games;"
- ]
+ ],
+ "id": "83bda5b0"
},
{
"attachments": {},
"cell_type": "markdown",
- "id": "7ef5f8bd-10d9-4013-b4ce-a1f97132bc1a",
"metadata": {},
"source": [
"There should be 40,027 rows in the video_games
table when the PIPELINE
is finished."
- ]
+ ],
+ "id": "894ab109"
},
{
"attachments": {},
"cell_type": "markdown",
- "id": "b2fc1603-26f8-4e6e-ad35-bcf338f89612",
"metadata": {},
"source": [
"## 3. Create a full-text and a vector index."
- ]
+ ],
+ "id": "5bdeb5e5"
},
{
"attachments": {},
"cell_type": "markdown",
- "id": "51a47f76-e158-4bd1-ad71-d031dfa45f63",
"metadata": {},
"source": [
"Use the following SQL to create full-text and vector indexes on the video_games
table. Indexes can improve query performance on large vector data sets. Refer to [Vector Indexing](https://docs.singlestore.com/cloud/vectors/vector-indexing) for more information on vector indexes and [CREATE TABLE](https://docs.singlestore.com/studio-redir/create-table/)
Wait for the ALTER TABLE
commands to finish before running the OPTIMIZE
command."
- ]
+ ],
+ "id": "cf077539"
},
{
"cell_type": "code",
"execution_count": 5,
- "id": "9254cf36-c1ee-4c3f-8304-4e9d166b2f09",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"OPTIMIZE TABLE video_games FULL;"
- ]
+ ],
+ "id": "ad07912c"
},
{
"attachments": {},
"cell_type": "markdown",
- "id": "981b4ba7-109c-418b-ab39-1df64426a1f2",
"metadata": {},
"source": [
"## 4. Similarity search.\n",
@@ -240,12 +239,12 @@
"To find the most similar vectors in a query vector, use an ORDER BY\u2026 LIMIT\u2026
query. The ORDER BY
command will arrange the vectors by their similarity score produced by a vector similarity function, with the closest matches at the top.\n",
"\n",
"The SQL below finds three paragraphs that are the most similar to the first paragraph about Mario Kart, a semantic similarity search for information about Mario Kart."
- ]
+ ],
+ "id": "164c2cf0"
},
{
"cell_type": "code",
"execution_count": 6,
- "id": "97142354-ead5-4165-b8b8-03d55972ccbf",
"metadata": {},
"outputs": [],
"source": [
@@ -258,23 +257,23 @@
"FROM video_games\n",
"ORDER BY score DESC\n",
"LIMIT 3;"
- ]
+ ],
+ "id": "54acdc4b"
},
{
"attachments": {},
"cell_type": "markdown",
- "id": "7220f9af-7a0c-4142-ace1-32102bedf869",
"metadata": {},
"source": [
"## 5. Hybrid search.\n",
"\n",
"Hybrid Search combines multiple search methods in one query and blends full-text search (which finds keyword matches) and vector search (which finds semantic matches) allowing search results to be (re-)ranked by a score that combines full-text and vector rankings."
- ]
+ ],
+ "id": "637a88b9"
},
{
"cell_type": "code",
"execution_count": 7,
- "id": "c846a3b0-5477-4f73-9a7e-bd935717dcf0",
"metadata": {},
"outputs": [],
"source": [
@@ -305,30 +304,30 @@
"FROM fts FULL OUTER JOIN vs ON fts.id = vs.id\n",
"ORDER BY score DESC\n",
"LIMIT 5;"
- ]
+ ],
+ "id": "33a0455f"
},
{
"attachments": {},
"cell_type": "markdown",
- "id": "4ef8d384-a2dd-442b-ba9c-3b0d36429c2c",
"metadata": {},
"source": [
"## 6. Clean up."
- ]
+ ],
+ "id": "f9539b11"
},
{
"attachments": {},
"cell_type": "markdown",
- "id": "86d4270b-0471-42b9-896c-d9cc70957633",
"metadata": {},
"source": [
"The command below will drop the table created as part of this notebook. Dropping this table will allow you to rerun the notebook from the beginning."
- ]
+ ],
+ "id": "cbd6de3f"
},
{
"cell_type": "code",
"execution_count": 8,
- "id": "87482b82-ab10-4471-854a-71734b9c2d4a",
"metadata": {},
"outputs": [],
"source": [
@@ -336,11 +335,12 @@
"DROP PIPELINE wiki_pipeline;\n",
"\n",
"DROP TABLE video_games;"
- ]
+ ],
+ "id": "4c842df6"
},
{
+ "id": "1093f40b",
"cell_type": "markdown",
- "id": "cbf78a0b-cd8d-47d5-a369-f69653f69092",
"metadata": {},
"source": [
"
If you have a Free Starter Workspace deployed already, select the database from drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "Make sure to select a database from the drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI.
\n", "If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI.
\n", "If you have a Free Starter Workspace deployed already, select the database from drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "Make sure to select the pdf_db database from the drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "This notebook creates a pipeline, data may take up to 1 minute to populate
\n", "If you have a Free Starter Workspace deployed already, select the database from drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "If you have a Free Starter Workspace deployed already, select the database from drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "This tutorial is meant for Standard & Premium Workspaces. You can't run this with a Free Starter Workspace due to restrictions on Storage. Create a Workspace using +group in the left nav & select Standard for this notebook. Gallery notebooks tagged with \"Starter\" are suitable to run on a Free Starter Workspace
\n", "Make sure to select the demo_database database from the drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "Make sure to select your database from the drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "Make sure to select the stage-loader workspace from the drop-down menu at the top of this notebook.
\n", "This tutorial is meant for Standard & Premium Workspaces. You can't run this with a Free Starter Workspace due to restrictions on Storage. Create a Workspace using +group in the left nav & select Standard for this notebook. Gallery notebooks tagged with \"Starter\" are suitable to run on a Free Starter Workspace
\n", "This tutorial is meant for Standard & Premium Workspaces. You can't run this with a Free Starter Workspace due to restrictions on Storage. Create a Workspace using +group in the left nav & select Standard for this notebook. Gallery notebooks tagged with \"Starter\" are suitable to run on a Free Starter Workspace
\n", "Select the database from the drop-down menu at the top of this notebook. It updates the connection_url which is used by SQLAlchemy to make connections to the selected database.
\n", "This tutorial is meant for Standard & Premium Workspaces. You can't run this with a Free Starter Workspace due to restrictions on Storage. Create a Workspace using +group in the left nav & select Standard for this notebook. Gallery notebooks tagged with \"Starter\" are suitable to run on a Free Starter Workspace
\n", "Select the database from the drop-down menu at the top of this notebook. It updates the connection_url which is used by SQLAlchemy to make connections to the selected database.
\n", "This tutorial is meant for Standard & Premium Workspaces. You can't run this with a Free Starter Workspace due to restrictions on Storage. Create a Workspace using +group in the left nav & select Standard for this notebook. Gallery notebooks tagged with \"Starter\" are suitable to run on a Free Starter Workspace
\n", "Action Required
Make sure to select the siem_log_kafka_demo database from the drop-down menu at the top of this notebook. It updates the connection_url which is used by the %%sql magic command and SQLAlchemy to make connections to the selected database.
This tutorial is meant for Standard & Premium Workspaces. You can't run this with a Free Starter Workspace due to restrictions on Storage. Create a Workspace using +group in the left nav & select Standard for this notebook. Gallery notebooks tagged with \"Starter\" are suitable to run on a Free Starter Workspace
\n", "For that tutorial, we recommend using workspace of size S4 to ingest data faster and also see the difference and gain you can get from a distributed architecture.
\n", "{success_msg}
\n", "To get the external files, please add s2.q4cdn.com to the notebook Firewall.
\n", "Restart Kernel if importing umap gives error
\n", "Make sure to select a database from the drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "\n", "select * from information_schema.MV_BACKUP_HISTORY where STATUS = 'Success' and DATABASE_NAME = {database_name} order by BACKUP_ID desc\n", "" - ] + ], + "id": "23b493f3" }, { "attachments": {}, @@ -77,7 +79,8 @@ "
\n",
" SELECT * from information_schema.MV_BACKUP_HISTORY\n",
"
"
- ]
+ ],
+ "id": "5cf876e2"
},
{
"attachments": {},
@@ -85,7 +88,8 @@
"metadata": {},
"source": [
"### Imports"
- ]
+ ],
+ "id": "3c0750f6"
},
{
"cell_type": "code",
@@ -99,7 +103,8 @@
"\n",
"import singlestoredb as s2\n",
"from IPython.display import display, HTML"
- ]
+ ],
+ "id": "69b323c6"
},
{
"attachments": {},
@@ -107,7 +112,8 @@
"metadata": {},
"source": [
"### Variables"
- ]
+ ],
+ "id": "69c02a1e"
},
{
"cell_type": "code",
@@ -121,7 +127,8 @@
"aws_session_token = ''\n",
"target_db_name = None\n",
"backup_id = None"
- ]
+ ],
+ "id": "db3341f9"
},
{
"attachments": {},
@@ -129,7 +136,8 @@
"metadata": {},
"source": [
"### Functions to display various alerts"
- ]
+ ],
+ "id": "0fb9a0d5"
},
{
"cell_type": "code",
@@ -189,7 +197,8 @@
" {success_msg}
\n", "If you have a Free Starter Workspace deployed already, select the database from drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "Make sure to select a database from the drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "If you want to use dedicated workspace, directly select workspace and db from dropdown at the top and follow below steps to create databse.
\n", "If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI.
\n", "If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI.
\n", "This tutorial is meant for Standard & Premium Workspaces. You can't run this with a Free Starter Workspace due to restrictions on Storage. Create a Workspace using +group in the left nav & select Standard for this notebook. Gallery notebooks tagged with \"Starter\" are suitable to run on a Free Starter Workspace
\n", "If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI.
\n", " \n", "" - ] + ], + "id": "fc8b28e8" }, { "cell_type": "code", "execution_count": 10, - "id": "0e91592f-4856-4cab-b15e-23585f551ab3", "metadata": {}, "outputs": [], "source": [ "shared_tier_check = %sql show variables like 'is_shared_tier'\n", "if not shared_tier_check or shared_tier_check[0][1] == 'OFF':\n", " %sql DROP DATABASE IF EXISTS semantic_search;" - ] + ], + "id": "10aae5a1" }, { + "id": "60d17a89", "cell_type": "markdown", - "id": "a6829f66-b37e-493d-9631-6da519140485", "metadata": {}, "source": [ "\n", diff --git a/notebooks/semantic-search-with-openai-embedding-creation/notebook.ipynb b/notebooks/semantic-search-with-openai-embedding-creation/notebook.ipynb index 0fd6c313..28a6d830 100644 --- a/notebooks/semantic-search-with-openai-embedding-creation/notebook.ipynb +++ b/notebooks/semantic-search-with-openai-embedding-creation/notebook.ipynb @@ -1,8 +1,8 @@ { "cells": [ { + "id": "abecce4f", "cell_type": "markdown", - "id": "8e19358e-22e8-406c-ae17-d916db889313", "metadata": {}, "source": [ "You will have to update your notebook's firewall settings to include *.*.openai.com in order to get embedddings from OpenAI APIS.
\n", "If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI.
\n", "If you have a Free Starter Workspace deployed already, select the database from drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI.
\n", "If you have a Free Starter Workspace deployed already, select the database from drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "Make sure to select a database from the drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "3. Drop the blob column.
\n", "4. Rename the new vector column to the old blob column name. This will ensure any previous queries will still work, or at least require fewer changes.\n", "
" - ] + ], + "id": "ca8ad8bd" }, { "cell_type": "code", "execution_count": 19, - "id": "233bfe2c-e99c-4dd3-bf56-eafa923ba4d8", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT VECTOR_NUM_ELEMENTS(word_embeddings) FROM words_table LIMIT 1;" - ] + ], + "id": "f6e449f9" }, { "cell_type": "code", "execution_count": 20, - "id": "b467da39-b9c4-4576-828e-135520713907", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "ALTER TABLE words_table ADD COLUMN emb2 vector(384) AFTER word_embeddings;\n", "UPDATE words_table SET emb2=word_embeddings;" - ] + ], + "id": "d58eefb6" }, { "cell_type": "code", "execution_count": 21, - "id": "4288cee5-3dc0-4108-a272-287a1ffbbb01", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT word, emb2, JSON_ARRAY_UNPACK(word_embeddings) FROM words_table LIMIT 1;" - ] + ], + "id": "d40fe9ce" }, { "cell_type": "code", "execution_count": 22, - "id": "bf42a048-0a42-496d-8925-fbc6125316b4", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "ALTER TABLE words_table DROP COLUMN word_embeddings;\n", "ALTER TABLE words_table CHANGE emb2 word_embeddings;" - ] + ], + "id": "5bffb8aa" }, { "cell_type": "code", "execution_count": 23, - "id": "fef19d53-3f4e-4e75-a1ca-8328a5ae29a2", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "DESC words_table;" - ] + ], + "id": "60a3d4bd" }, { "attachments": {}, "cell_type": "markdown", - "id": "d906c5e5-bd11-4c4f-ba13-8b4a2a09c9d7", "metadata": {}, "source": [ "## 11. Semantic Search of the word -sunshine 🌞 using Infix Operator\n", @@ -608,12 +607,12 @@ "Performing a semantic search for the word 'sunshine' to find contextually similar or related words and phrases based on their semantic meanings rather than exact lexical matches.\n", "\n", "The infix operators `<*>` and `<->` can be used to facilitate DOT_PRODUCT and EUCLIDEAN_DISTANCE operations, respectively, providing a more concise query syntax compared to using the existing built-in functions such as DOT_PRODUCT(a, b) and EUCLIDEAN_DISTANCE(a, b)." - ] + ], + "id": "02364df2" }, { "cell_type": "code", "execution_count": 24, - "id": "77bb3be5-83c0-4413-87fa-36fc245856d7", "metadata": {}, "outputs": [], "source": [ @@ -623,21 +622,21 @@ " FROM words_table\n", " ORDER BY score desc\n", " LIMIT 3;" - ] + ], + "id": "98a39ef5" }, { "attachments": {}, "cell_type": "markdown", - "id": "9bd7668c-a84d-42ff-962b-ead151d81b9a", "metadata": {}, "source": [ "## Clean up" - ] + ], + "id": "99b34b0a" }, { "attachments": {}, "cell_type": "markdown", - "id": "72e79695-e9a0-4e09-817e-8963a9dcd340", "metadata": {}, "source": [ "If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI.
\n", "TEXT
column and a vector embedding of that comment stored as a VECTOR
([Vector Type](https://docs.singlestore.com/cloud/vectors/vector-type)) column. [Working with Vector Data](https://docs.singlestore.com/cloud/vectors/working-with-vector-data/) provides more details on this example and information about similarity search over vectors."
- ]
+ ],
+ "id": "f785b548"
},
{
"cell_type": "code",
"execution_count": 1,
- "id": "3dc4c365-1832-4525-bf6f-a53b77e6d6af",
"metadata": {},
"outputs": [],
"source": [
@@ -100,12 +99,12 @@
" comment TEXT,\n",
" comment_embedding VECTOR(4) NOT NULL,\n",
" category VARCHAR(256));"
- ]
+ ],
+ "id": "99d2e315"
},
{
"cell_type": "code",
"execution_count": 2,
- "id": "41bba8dc-a558-4e32-b484-be7321d3497f",
"metadata": {},
"outputs": [],
"source": [
@@ -120,41 +119,41 @@
" (3, \"The B24 restaurant salad bar is quite good.\",\n",
" '[0.1, 0.15, 0.37, 0.05]',\n",
" \"Food\");"
- ]
+ ],
+ "id": "f52bcffc"
},
{
"attachments": {},
"cell_type": "markdown",
- "id": "50a64717-fca6-4a58-8afa-5301a65be8f2",
"metadata": {},
"source": [
"### Verify the data was loaded"
- ]
+ ],
+ "id": "3ac554e1"
},
{
"attachments": {},
"cell_type": "markdown",
- "id": "4da232c3-7349-4785-99ea-042974597bf7",
"metadata": {},
"source": [
"Use the following SQL to view the data in the comments
table."
- ]
+ ],
+ "id": "67e9630b"
},
{
"cell_type": "code",
"execution_count": 3,
- "id": "e2b7e8f2-101f-447f-887a-a86c0e963aff",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT * FROM comments;"
- ]
+ ],
+ "id": "ee3acd15"
},
{
"attachments": {},
"cell_type": "markdown",
- "id": "981b4ba7-109c-418b-ab39-1df64426a1f2",
"metadata": {},
"source": [
"## 3. Search based on vector similarity.\n",
@@ -162,12 +161,12 @@
"To find the most similar vectors in a query vector, use an ORDER BY\u2026 LIMIT\u2026
query. The ORDER BY
command will sort the vectors by a similarity score produced by a vector similarity function, with the closest matches at the top.\n",
"\n",
"The SQL below sets up a query vector, then uses the DOT_PRODUCT
infix operator (<\\*>
) to find the two vectors that are most similar to the query vector."
- ]
+ ],
+ "id": "faa052dd"
},
{
"cell_type": "code",
"execution_count": 4,
- "id": "9ad316ae-495d-4e7f-a508-f84d7af20432",
"metadata": {},
"outputs": [],
"source": [
@@ -179,12 +178,12 @@
" FROM comments\n",
" ORDER BY score DESC\n",
" LIMIT 2;"
- ]
+ ],
+ "id": "1f2b57e4"
},
{
"attachments": {},
"cell_type": "markdown",
- "id": "7220f9af-7a0c-4142-ace1-32102bedf869",
"metadata": {},
"source": [
"## 4. Search using metadata filtering.\n",
@@ -192,12 +191,12 @@
"When building vector search applications, you may wish to filter on the fields of a record, with simple filters or via joins, in addition to applying vector similarity operations.\n",
"\n",
"The following query combines the use of an ORDER BY ... LIMIT
query and a metadata filter on category. This query will filter to find all comments in the category \"Food\"
and then calculate the score for each of those and rank in descending order."
- ]
+ ],
+ "id": "066fdd44"
},
{
"cell_type": "code",
"execution_count": 5,
- "id": "c846a3b0-5477-4f73-9a7e-bd935717dcf0",
"metadata": {},
"outputs": [],
"source": [
@@ -210,64 +209,64 @@
" WHERE category = \"Food\"\n",
" ORDER BY score DESC\n",
" LIMIT 3;"
- ]
+ ],
+ "id": "22679a21"
},
{
"attachments": {},
"cell_type": "markdown",
- "id": "c2010186-159d-4968-a9a3-fc285ab5a3cd",
"metadata": {},
"source": [
"## 5. Create and use a vector index.\n",
"\n",
"The command below creates a vector index on the comment_embedding
field of the comments
table."
- ]
+ ],
+ "id": "db0f41ad"
},
{
"cell_type": "code",
"execution_count": 6,
- "id": "87ab9e1b-d7ed-455e-b3f8-0691034436de",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"ALTER TABLE comments ADD VECTOR INDEX ivf(comment_embedding)\n",
"INDEX_OPTIONS '{\"index_type\":\"IVF_FLAT\"}';"
- ]
+ ],
+ "id": "de0c6f3f"
},
{
"attachments": {},
"cell_type": "markdown",
- "id": "9b8dd26d-edc9-4bec-b7f7-b1a393d459d7",
"metadata": {},
"source": [
"Optionally optimize the table for best performance."
- ]
+ ],
+ "id": "16ae9f59"
},
{
"cell_type": "code",
"execution_count": 7,
- "id": "089d7fc3-e23b-4b33-92cd-4ccba9121336",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"OPTIMIZE TABLE comments FULL;"
- ]
+ ],
+ "id": "ae4c1b16"
},
{
"attachments": {},
"cell_type": "markdown",
- "id": "6c0bfb7f-4831-4436-bed0-ef30a5a30e00",
"metadata": {},
"source": [
"The following query will use the vector index. Vector indexes can be used to improve performance of queries over large vector data sets. Refer to [Vector Indexing](https://docs.singlestore.com/cloud/vectors/vector-indexing/) for information on creating and using vector indexes."
- ]
+ ],
+ "id": "cddc5974"
},
{
"cell_type": "code",
"execution_count": 8,
- "id": "5327b465-0191-455d-a800-67e8ad403df6",
"metadata": {},
"outputs": [],
"source": [
@@ -279,23 +278,23 @@
" FROM comments\n",
" ORDER BY score DESC\n",
" LIMIT 2;"
- ]
+ ],
+ "id": "24fc1b33"
},
{
"attachments": {},
"cell_type": "markdown",
- "id": "cec7c7db-fb6a-4a22-8ebe-38c077f7ae70",
"metadata": {},
"source": [
"## 6. Check that your query is using a vector index.\n",
"\n",
"The EXPLAIN
command can be used to see the query plan and verify that the vector index is being used. In the example below, you can see INTERNAL_VECTOR_SEARCH
in the ColumnStoreFilter
row. This tells you that the vector index is being used."
- ]
+ ],
+ "id": "85a6bba3"
},
{
"cell_type": "code",
"execution_count": 9,
- "id": "e4fcced9-b650-4786-9a3d-3f8e2ac4fad1",
"metadata": {},
"outputs": [],
"source": [
@@ -308,33 +307,34 @@
" FROM comments\n",
" ORDER BY score DESC\n",
" LIMIT 2;"
- ]
+ ],
+ "id": "258a9714"
},
{
"attachments": {},
"cell_type": "markdown",
- "id": "08034846-168c-4547-abbb-72e10d9629e2",
"metadata": {},
"source": [
"## 7. Clean up.\n",
"\n",
"The command below will drop the table created as part of this notebook. Dropping this table will allow you to rerun the notebook from the beginning."
- ]
+ ],
+ "id": "e2a0af68"
},
{
"cell_type": "code",
"execution_count": 10,
- "id": "57290d8e-98d4-4ea8-b290-a925f5ba9bee",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"DROP TABLE comments;"
- ]
+ ],
+ "id": "663219d6"
},
{
+ "id": "aca52f19",
"cell_type": "markdown",
- "id": "b87169e1-aa2c-4364-bc4b-86ca97ef24fa",
"metadata": {},
"source": [
"\n",
diff --git a/notebooks/singlestore-april-challenge-haiku-ascii/notebook.ipynb b/notebooks/singlestore-april-challenge-haiku-ascii/notebook.ipynb
index 5412bdc9..2783eb21 100644
--- a/notebooks/singlestore-april-challenge-haiku-ascii/notebook.ipynb
+++ b/notebooks/singlestore-april-challenge-haiku-ascii/notebook.ipynb
@@ -1,8 +1,8 @@
{
"cells": [
{
+ "id": "0c39a476",
"cell_type": "markdown",
- "id": "574444dd-dce1-4658-b7a0-74d4ed39620c",
"metadata": {},
"source": [
"If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI.
\n", "Make sure to select 'BankingAnalytics' database from the drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "If you have a Free Starter Workspace deployed already, select the database from drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
\n", "Make sure to select the vector_data database from the drop-down menu at the top of this notebook. It updates the connection_url which is used by the %%sql magic command and SQLAlchemy to make connections to the selected database.
\n", "If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI.
\n", "