Skip to content

Commit

Permalink
Merge pull request #30 from zilliztech/update_missing_diagrams_2
Browse files Browse the repository at this point in the history
Update missing diagrams
  • Loading branch information
fzliu authored Dec 6, 2022
2 parents 8598343 + 0605b45 commit b9b1e14
Showing 1 changed file with 2 additions and 7 deletions.
9 changes: 2 additions & 7 deletions codelabs/get-started-with-vector-db-6/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,7 @@ From the above example, we can see that scalar quantization has reduced the tota
## Scalar quantization
Duration: 3

So how exactly does scalar quantization work? Let's first take a look at the indexing process, i.e. turning floating point vectors into integer vectors. For each vector dimension, scalar quantization takes the maximum and minimum value of that particular dimension as seen across the entire database, and uniformly splits that dimension into bins across its entire range:

<div align="center">
<img align="center" src="">
</div>
<p style="text-align:center"><sub>Scalar quantization, visualized.</sub></p>
So how exactly does scalar quantization work? Let's first take a look at the indexing process, i.e. turning floating point vectors into integer vectors. For each vector dimension, scalar quantization takes the maximum and minimum value of that particular dimension as seen across the entire database, and uniformly splits that dimension into bins across its entire range.

Let's try writing that in code. We'll first generate a dataset of a thousand 128D floating point vectors sampled from a multivariate distribution. Since this is a toy example, I'll be sampling from a Gaussian distribution; in practice, actual embeddings are rarely Gaussian distributed unless added as a constraint when training the model (such as in variational autoencoders):

Expand Down Expand Up @@ -180,7 +175,7 @@ This might sound complex, but it becomes much easier to understand if we break i
3) With all centroids computed, we'll replace all subvectors in the original dataset with the ID of its closest centroid.

<div align="center">
<img align="center" src="">
<img align="center" src="./pic/product_quantization.png">
</div>
<p style="text-align:center"><sub>Product quantization, visualized.</sub></p>

Expand Down

0 comments on commit b9b1e14

Please sign in to comment.