Skip to content

Commit

Permalink
Merge pull request #1553 from vespa-engine/aressem/add-podman-cmd
Browse files Browse the repository at this point in the history
Add alternative podman command for memory info
  • Loading branch information
kkraune authored Oct 30, 2024
2 parents 9eb85b4 + 4b0a224 commit 79c7f4c
Show file tree
Hide file tree
Showing 21 changed files with 293 additions and 262 deletions.
149 changes: 75 additions & 74 deletions billion-scale-image-search/README.md

Large diffs are not rendered by default.

35 changes: 18 additions & 17 deletions billion-scale-vector-search/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

<!-- Copyright Vespa.ai. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root.-->

<picture>
Expand All @@ -7,21 +6,21 @@
<img alt="#Vespa" width="200" src="https://assets.vespa.ai/logos/Vespa-logo-dark-RGB.svg" style="margin-bottom: 25px;">
</picture>

# SPANN Billion Scale Vector Search
# SPANN Billion Scale Vector Search

This sample application demonstrates how to represent *SPANN* (Space Partitioned ANN) using Vespa.ai.
This sample application demonstrates how to represent *SPANN* (Space Partitioned ANN) using Vespa.ai.

The *SPANN* approach for approximate nearest neighbor search is described in
[SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search](https://arxiv.org/abs/2111.08566).
[SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search](https://arxiv.org/abs/2111.08566).

SPANN uses a hybrid combination of graph and inverted index methods for approximate nearest neighbor search.
SPANN uses a hybrid combination of graph and inverted index methods for approximate nearest neighbor search.

This sample app demonstrates how the `SPANN` algorithm can be represented using Vespa.
See the [Billion-scale vector search using hybrid HNSW-IF](https://blog.vespa.ai/vespa-hybrid-billion-scale-vector-search/) for details on how `SPANN`
is represented with Vespa.
This sample app demonstrates how the `SPANN` algorithm can be represented using Vespa.
See the [Billion-scale vector search using hybrid HNSW-IF](https://blog.vespa.ai/vespa-hybrid-billion-scale-vector-search/) for details on how `SPANN`
is represented with Vespa.

These reproducing steps, demonstrates the functionality using a smaller subset of the 1B vector dataset, suitable
for reproducing on a laptop.
for reproducing on a laptop.

**Requirements:**

Expand All @@ -30,17 +29,19 @@ for reproducing on a laptop.
for details and troubleshooting
* Alternatively, deploy using [Vespa Cloud](#deployment-note)
* Operating system: Linux, macOS or Windows 10 Pro (Docker requirement)
* Architecture: x86_64 or arm64
* Architecture: x86_64 or arm64
* [Homebrew](https://brew.sh/) to install [Vespa CLI](https://docs.vespa.ai/en/vespa-cli.html), or download
a vespa cli release from [GitHub releases](https://github.com/vespa-engine/vespa/releases).
* <a href="https://openjdk.org/projects/jdk/17/" data-proofer-ignore>Java 17</a> installed.
* Python3 and numpy to process the vector dataset
* Python3 and numpy to process the vector dataset
* [Apache Maven](https://maven.apache.org/install.html) - this sample app uses custom Java components and Maven is used
to build the application.
to build the application.

Verify Docker Memory Limits:
<pre>
$ docker info | grep "Total Memory"
or
$ podman info | grep "memTotal"
</pre>

Install [Vespa CLI](https://docs.vespa.ai/en/vespa-cli.html):
Expand Down Expand Up @@ -78,8 +79,8 @@ $ vespa clone billion-scale-vector-search myapp && cd myapp


## Download Vector Data
This sample app uses the Microsoft SPACEV vector dataset from
https://big-ann-benchmarks.com/.
This sample app uses the Microsoft SPACEV vector dataset from
https://big-ann-benchmarks.com/.

It uses the first 10M vectors of the 100M slice sample.
This sample file is about 1GB (10M vectors):
Expand All @@ -88,7 +89,7 @@ $ curl -L -o spacev10m_base.i8bin \
https://data.vespa-cloud.com/sample-apps-data/spacev10m_base.i8bin
</pre>

Generate the feed file for the first 10M vectors from the 100M sample.
Generate the feed file for the first 10M vectors from the 100M sample.
This step creates two feed files:

* `graph-vectors.jsonl`
Expand All @@ -103,7 +104,7 @@ $ python3 src/main/python/create-vespa-feed.py spacev10m_base.i8bin
</pre>


## Build and deploy Vespa app
## Build and deploy Vespa app
Build the sample app:
<pre data-test="exec" data-test-expect="BUILD SUCCESS" data-test-timeout="300">
$ mvn clean package -U
Expand Down Expand Up @@ -145,7 +146,7 @@ $ curl -L -o spacev10m_gt100.i8bin \
</pre>

Note, initially, the routine above used the query file from https://comp21storage.blob.core.windows.net/publiccontainer/comp21/spacev1b/query.i8bin
but the link no longer works.
but the link no longer works.

Run first 1K queries and evaluate recall@10. A higher number of clusters gives higher recall:
<pre data-test="exec">
Expand Down
42 changes: 22 additions & 20 deletions commerce-product-ranking/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,24 @@
<img alt="#Vespa" width="200" src="https://assets.vespa.ai/logos/Vespa-logo-dark-RGB.svg" style="margin-bottom: 25px;">
</picture>

# Vespa Product Ranking
# Vespa Product Ranking

This sample application is used to demonstrate how to improve Product Search with Learning to Rank (LTR).

Blog post series:

* [Improving Product Search with Learning to Rank - part one](https://blog.vespa.ai/improving-product-search-with-ltr/)
This post introduces the dataset used in this sample application and several baseline ranking models.
This post introduces the dataset used in this sample application and several baseline ranking models.
* [Improving Product Search with Learning to Rank - part two](https://blog.vespa.ai/improving-product-search-with-ltr-part-two/)
This post demonstrates how to train neural methods for search ranking. The neural training routine is found in this
[notebook](https://github.com/vespa-engine/sample-apps/blob/master/commerce-product-ranking/notebooks/train_neural.ipynb)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vespa-engine/sample-apps/blob/master/commerce-product-ranking/notebooks/train_neural.ipynb).
* [Improving Product Search with Learning to Rank - part three](https://blog.vespa.ai/improving-product-search-with-ltr-part-three/)
This post demonstrates how to train GBDT methods for search ranking. The model uses also neural signals as features. See notebooks:
[XGBoost](https://github.com/vespa-engine/sample-apps/blob/master/commerce-product-ranking/notebooks/Train-xgboost.ipynb)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vespa-engine/sample-apps/blob/master/commerce-product-ranking/notebooks/Train-xgboost.ipynb) and
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vespa-engine/sample-apps/blob/master/commerce-product-ranking/notebooks/Train-xgboost.ipynb) and
[LightGBM](https://github.com/vespa-engine/sample-apps/blob/master/commerce-product-ranking/notebooks/Train-lightgbm.ipynb)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vespa-engine/sample-apps/blob/master/commerce-product-ranking/notebooks/Train-lightgbm.ipynb)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vespa-engine/sample-apps/blob/master/commerce-product-ranking/notebooks/Train-lightgbm.ipynb)

This work uses the largest product relevance dataset released by Amazon:

Expand All @@ -35,27 +35,29 @@ This work uses the largest product relevance dataset released by Amazon:
> Each query-product pair is accompanied by additional information.
> The dataset is multilingual, as it contains queries in English, Japanese, and Spanish.
The dataset is found at [amazon-science/esci-data](https://github.com/amazon-science/esci-data).
The dataset is found at [amazon-science/esci-data](https://github.com/amazon-science/esci-data).
The dataset is released under the [Apache 2.0 license](https://github.com/amazon-science/esci-data/blob/main/LICENSE).

## Quick start

The following is a quick start recipe on how to get started with this application.
The following is a quick start recipe on how to get started with this application.

* [Docker](https://www.docker.com/) Desktop installed and running. 6 GB available memory for Docker is recommended.
Refer to [Docker memory](https://docs.vespa.ai/en/operations-selfhosted/docker-containers.html#memory)
for details and troubleshooting
* Alternatively, deploy using [Vespa Cloud](#deployment-note)
* Operating system: Linux, macOS or Windows 10 Pro (Docker requirement)
* Architecture: x86_64 or arm64
* [Homebrew](https://brew.sh/) to install [Vespa CLI](https://docs.vespa.ai/en/vespa-cli.html), or download
* [Homebrew](https://brew.sh/) to install [Vespa CLI](https://docs.vespa.ai/en/vespa-cli.html), or download
a vespa cli release from [GitHub releases](https://github.com/vespa-engine/vespa/releases).
* zstd: `brew install zstd`
* Python3 with `requests` `pyarrow` and `pandas` installed
* Python3 with `requests` `pyarrow` and `pandas` installed

Validate Docker resource settings, should be minimum 6 GB:
<pre>
$ docker info | grep "Total Memory"
or
$ podman info | grep "memTotal"
</pre>

Install [Vespa CLI](https://docs.vespa.ai/en/vespa-cli.html):
Expand Down Expand Up @@ -94,7 +96,7 @@ $ curl -L -o application/models/title_ranker.onnx \

See [scripts/export-bi-encoder.py](scripts/export-bi-encoder.py) and
[scripts/export-cross-encoder.py](scripts/export-cross-encoder.py) for how
to export models from PyTorch to ONNX format.
to export models from PyTorch to ONNX format.

Deploy the application:
<pre data-test="exec" data-test-assert-contains="Success">
Expand All @@ -113,7 +115,7 @@ It is possible to deploy this app to

## Run basic system test

This step is optional, but it indexes two
This step is optional, but it indexes two
documents and runs a query [test](https://docs.vespa.ai/en/reference/testing.html)

<pre data-test="exec" data-test-assert-contains="Success">
Expand All @@ -130,9 +132,9 @@ $ zstdcat sample-data/sample-products.jsonl.zstd | vespa feed -
</pre>


## Evaluation
## Evaluation

Evaluate the `semantic-title` rank profile using the evaluation
Evaluate the `semantic-title` rank profile using the evaluation
script ([scripts/evaluate.py](scripts/evaluate.py)).

Install requirements
Expand All @@ -145,15 +147,15 @@ pip3 install numpy pandas pyarrow requests
$ python3 scripts/evaluate.py \
--endpoint http://localhost:8080/search/ \
--example_file sample-data/test-sample.parquet \
--ranking semantic-title
--ranking semantic-title
</pre>

[evaluate.py](scripts/evaluate.py) runs all the queries in the test split using the `--ranking` `<rank-profile>`
and produces a `<ranking>.run` file with the top ranked results.
This file is formatted in the format that `trec_eval` expects.

<pre data-test="exec" data-test-assert-contains="B08PB9TTKT">
$ cat semantic-title.run
$ cat semantic-title.run
</pre>

Example ranking produced by Vespa using the `semantic-title` rank-profile for query 535:
Expand Down Expand Up @@ -196,10 +198,10 @@ Run evaluation :
$ trec_eval test.qrels semantic-title.run -m 'ndcg.1=0,2=0.01,3=0.1,4=1'
</pre>

This particular product ranking for the query produces a NDCG score of 0.7046.
This particular product ranking for the query produces a NDCG score of 0.7046.
Note that the `sample-data/test-sample.parquet` file only contains one query.
To get the overall score, one must compute all the NDCG scores of all queries in the
test split and report the *average* NDCG score.
test split and report the *average* NDCG score.

Note that the evaluation uses custom NDCG label gains:

Expand Down Expand Up @@ -231,7 +233,7 @@ $ docker rm -f vespa
</pre>


## Full evaluation
## Full evaluation

Download a pre-processed feed file with all (1,215,854) products:

Expand All @@ -240,21 +242,21 @@ $ curl -L -o product-search-products.jsonl.zstd \
https://data.vespa-cloud.com/sample-apps-data/product-search-products.jsonl.zstd
</pre>

This step is resource intensive as the semantic embedding model encodes
This step is resource intensive as the semantic embedding model encodes
the product title and description into the dense embedding vector space.

<pre>
$ zstdcat product-search-products.jsonl.zstd | vespa feed -
</pre>

Evaluate the `hybrid` baseline rank profile using the evaluation
Evaluate the `hybrid` baseline rank profile using the evaluation
script ([scripts/evaluate.py](scripts/evaluate.py)).

<pre>
$ python3 scripts/evaluate.py \
--endpoint http://localhost:8080/search/ \
--example_file "https://github.com/amazon-science/esci-data/blob/main/shopping_queries_dataset/shopping_queries_dataset_examples.parquet?raw=true" \
--ranking semantic-title
--ranking semantic-title
</pre>

For Vespa cloud deployments we need to pass certificate and the private key.
Expand Down
31 changes: 16 additions & 15 deletions custom-embeddings/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

<!-- Copyright Vespa.ai. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root.-->

<picture>
Expand All @@ -9,9 +8,9 @@

# Customizing Frozen Data Embeddings in Vespa

This sample application is used to demonstrate how to adapt frozen embeddings from foundational
embedding models.
Frozen data embeddings from Foundational models are an emerging industry practice for reducing the complexity of maintaining and versioning embeddings. The frozen data embeddings are re-used for various tasks, such as classification, search, or recommendations.
This sample application is used to demonstrate how to adapt frozen embeddings from foundational
embedding models.
Frozen data embeddings from Foundational models are an emerging industry practice for reducing the complexity of maintaining and versioning embeddings. The frozen data embeddings are re-used for various tasks, such as classification, search, or recommendations.

Read the [blog post](https://blog.vespa.ai/).

Expand All @@ -25,12 +24,14 @@ The following is a quick start recipe on how to get started with this applicatio
* Alternatively, deploy using [Vespa Cloud](#deployment-note)
* Operating system: Linux, macOS or Windows 10 Pro (Docker requirement)
* Architecture: x86_64 or arm64
* [Homebrew](https://brew.sh/) to install [Vespa CLI](https://docs.vespa.ai/en/vespa-cli.html), or download
* [Homebrew](https://brew.sh/) to install [Vespa CLI](https://docs.vespa.ai/en/vespa-cli.html), or download
a vespa cli release from [GitHub releases](https://github.com/vespa-engine/vespa/releases).

Validate Docker resource settings, should be minimum 4 GB:
<pre>
$ docker info | grep "Total Memory"
or
$ podman info | grep "memTotal"
</pre>

Install [Vespa CLI](https://docs.vespa.ai/en/vespa-cli.html):
Expand Down Expand Up @@ -61,7 +62,7 @@ Download this sample application:
$ vespa clone custom-embeddings my-app && cd my-app
</pre>

Download a frozen embedding model file, see
Download a frozen embedding model file, see
[text embeddings made easy](https://blog.vespa.ai/text-embedding-made-simple/) for details:
<pre data-test="exec">
$ mkdir -p models
Expand All @@ -71,7 +72,7 @@ $ curl -L -o models/tokenizer.json \
$ curl -L -o models/frozen.onnx \
https://github.com/vespa-engine/sample-apps/raw/master/simple-semantic-search/model/e5-small-v2-int8.onnx

$ cp models/frozen.onnx models/tuned.onnx
$ cp models/frozen.onnx models/tuned.onnx
</pre>

In this case, we re-use the frozen model as the tuned model to demonstrate functionality.
Expand All @@ -95,36 +96,36 @@ vespa document ext/3.json

## Query and ranking examples

We demonstrate using `vespa cli`, use `-v` to see the curl equivalent using HTTP api.
We demonstrate using `vespa cli`, use `-v` to see the curl equivalent using HTTP api.

### Simple retrieve all documents with undefined ranking:
<pre data-test="exec" data-test-assert-contains='"totalCount": 3'>
vespa query 'yql=select * from doc where true' \
'ranking=unranked'
</pre>
Notice the `relevance`, which is assigned by the rank-profile.
Notice the `relevance`, which is assigned by the rank-profile.

### Using the frozen query tower
### Using the frozen query tower
<pre data-test="exec" data-test-assert-contains='"totalCount": 3'>
vespa query 'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding, q)' \
'input.query(q)=embed(frozen, "space contains many suns")'
</pre>

### Using the tuned query tower
### Using the tuned query tower
<pre data-test="exec" data-test-assert-contains='"totalCount": 3'>
vespa query 'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding, q)' \
'input.query(q)=embed(tuned, "space contains many suns")'
</pre>
In this case, the tuned model is equivelent to the frozen query tower that was used for document embeddings.

### Using the simple weight transformation query tower
### Using the simple weight transformation query tower
<pre data-test="exec" data-test-assert-contains='"totalCount": 3'>
vespa query 'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding, q)' \
'input.query(q)=embed(tuned, "space contains many suns")' \
'ranking=simple-similarity'
</pre>
This invokes the `simple-similarity` ranking model, which performs the query transformation
to the tuned embedding.
to the tuned embedding.

### Using the Deep Neural Network similarity
<pre data-test="exec" data-test-assert-contains='"totalCount": 3'>
Expand All @@ -134,12 +135,12 @@ vespa query 'yql=select * from doc where {targetHits:10}nearestNeighbor(embeddin
</pre>

Note that this just demonstrates the functionality, the custom similarity model is
initialized from random weights.
initialized from random weights.

### Dump all embeddings
This is useful for training routines, getting the frozen document embeddings out of Vespa:
<pre>
vespa visit --field-set "[all]" > ../vector-data.jsonl
vespa visit --field-set "[all]" > ../vector-data.jsonl
</pre>

### Get a specific document and it's embedding(s):
Expand Down
3 changes: 2 additions & 1 deletion examples/document-processing/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

<!-- Copyright Vespa.ai. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root. -->

<picture>
Expand Down Expand Up @@ -38,6 +37,8 @@ Refer to [Docker memory](https://docs.vespa.ai/en/operations-selfhosted/docker-c
for details and troubleshooting:
<pre>
$ docker info | grep "Total Memory"
or
$ podman info | grep "memTotal"
</pre>


Expand Down
3 changes: 2 additions & 1 deletion examples/generic-request-processing/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

<!-- Copyright Vespa.ai. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root. -->

<picture>
Expand All @@ -24,6 +23,8 @@ Refer to [Docker memory](https://docs.vespa.ai/en/operations-selfhosted/docker-c
for details and troubleshooting:
<pre>
$ docker info | grep "Total Memory"
or
$ podman info | grep "memTotal"
</pre>

**Check-out, compile and run:**
Expand Down
Loading

0 comments on commit 79c7f4c

Please sign in to comment.