Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Feedback for Vector Collection #1235

Open
naadhira opened this issue Aug 2, 2024 · 4 comments
Open

Docs: Feedback for Vector Collection #1235

naadhira opened this issue Aug 2, 2024 · 4 comments
Assignees

Comments

@naadhira
Copy link
Contributor

naadhira commented Aug 2, 2024

Hi, I have some feedback about this page

Using a vector collection within a pipeline is not included in the documentation. We need to add the following examples:

  1. Using a pipeline to create/update a VectorCollection. This is in the vector search tutorial, but we also need an example of it on this page as well.

  2. Using a pipeline for the similarity search. In this example, the client can be any HZ client. The search string is ingested into the pipeline, which does the embedding, the subsequent similarity search, any LLM interactions, and returns the results to the client. This opens up vector search to any HZ client - all the ML/AI work is done within the cluster.

Happy to do the editing/review once the code is in place...

@k-jamroz
Copy link
Contributor

k-jamroz commented Aug 2, 2024

Jet bindings for vector collection are documented under Jet:

See the original PR: #1125

There are no links from data structure description, but we do not have such links to Jet docs also for IMap or other data structure.

@k-jamroz
Copy link
Contributor

k-jamroz commented Aug 2, 2024

Using a pipeline for the similarity search.

This feels more like a tutorial. Basic search invocation from pipeline is shown in https://docs.hazelcast.com/hazelcast/6.0-snapshot/integrate/vector-collection-connector#searching-in-vector-collection

In this example, the client can be any HZ client. The search string is ingested into the pipeline,

I do not know of any easy way to send input from HZ client directly to a Jet pipeline. Observables work the other way around: client can get data produced by the pipeline. But as input you need to use Kafka, IMap journal etc. There is no ITopic source. I my examples I used sockets, but they are a bit problematic and require permissions in the cloud.

At least that is a situation if you think about streaming pipeline. For batch pipeline this can be organized differently, but you would have to submit the job many times (eg. once for each query) which is doable but IMO inconvenient and inefficient.

which does the embedding, the subsequent similarity search, any LLM interactions, and returns the results to the client. This opens up vector search to any HZ client - all the ML/AI work is done within the cluster.

We had examples of embedding creation in Jet pipelines as part of demos. I did not check yet if they ended in published tutorials.

@naadhira
Copy link
Contributor Author

naadhira commented Aug 2, 2024 via email

@k-jamroz
Copy link
Contributor

k-jamroz commented Aug 2, 2024

I agree that currently discovering that you can use vector collections in Jet is not easy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants