Docs: Feedback for Vector Collection #1235

naadhira · 2024-08-02T00:12:21Z

Hi, I have some feedback about this page

Using a vector collection within a pipeline is not included in the documentation. We need to add the following examples:

Using a pipeline to create/update a VectorCollection. This is in the vector search tutorial, but we also need an example of it on this page as well.
Using a pipeline for the similarity search. In this example, the client can be any HZ client. The search string is ingested into the pipeline, which does the embedding, the subsequent similarity search, any LLM interactions, and returns the results to the client. This opens up vector search to any HZ client - all the ML/AI work is done within the cluster.

Happy to do the editing/review once the code is in place...

k-jamroz · 2024-08-02T14:11:14Z

Jet bindings for vector collection are documented under Jet:

See the original PR: #1125

There are no links from data structure description, but we do not have such links to Jet docs also for IMap or other data structure.

k-jamroz · 2024-08-02T14:20:08Z

Using a pipeline for the similarity search.

This feels more like a tutorial. Basic search invocation from pipeline is shown in https://docs.hazelcast.com/hazelcast/6.0-snapshot/integrate/vector-collection-connector#searching-in-vector-collection

In this example, the client can be any HZ client. The search string is ingested into the pipeline,

I do not know of any easy way to send input from HZ client directly to a Jet pipeline. Observables work the other way around: client can get data produced by the pipeline. But as input you need to use Kafka, IMap journal etc. There is no ITopic source. I my examples I used sockets, but they are a bit problematic and require permissions in the cloud.

At least that is a situation if you think about streaming pipeline. For batch pipeline this can be organized differently, but you would have to submit the job many times (eg. once for each query) which is doable but IMO inconvenient and inefficient.

which does the embedding, the subsequent similarity search, any LLM interactions, and returns the results to the client. This opens up vector search to any HZ client - all the ML/AI work is done within the cluster.

We had examples of embedding creation in Jet pipelines as part of demos. I did not check yet if they ended in published tutorials.

naadhira · 2024-08-02T14:32:50Z

Then we should add cross-references, and the restructuring I suggested in Slack. There are use cases for both methods of ingestion and that should be discussed under the data structure itself. The way its set up now, you don't even know using Jet is an option... Except for in the tutorial.

…

On Fri, Aug 2, 2024, 7:11 AM Krzysztof Jamróz ***@***.***> wrote: Jet bindings for vector collection are documented under Jet: - https://docs.hazelcast.com/hazelcast/6.0-snapshot/integrate/vector-collection-connector - https://docs.hazelcast.com/hazelcast/6.0-snapshot/integrate/legacy-file-connector#fvecs-and-ivecs - https://docs.hazelcast.com/hazelcast/6.0-snapshot/integrate/file-connector#fvecs-and-ivecs See the original PR: #1125 <#1125> There are no links from data structure description, but we do not have such links to Jet docs also for IMap or other data structure. — Reply to this email directly, view it on GitHub <#1235 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJNVGSAXKWYF6BSYRUTW6TTZPOHRRAVCNFSM6AAAAABL3RFSXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRVGQ4DMNBUGE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- This message contains confidential information and is intended only for the individuals named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required, please request a hard-copy version. -Hazelcast

k-jamroz · 2024-08-02T14:34:57Z

I agree that currently discovering that you can use vector collections in Jet is not easy

naadhira assigned vbekiaris and k-jamroz Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs: Feedback for Vector Collection #1235

Docs: Feedback for Vector Collection #1235

naadhira commented Aug 2, 2024

k-jamroz commented Aug 2, 2024 •

edited

Loading

k-jamroz commented Aug 2, 2024

naadhira commented Aug 2, 2024 via email

k-jamroz commented Aug 2, 2024

Docs: Feedback for Vector Collection #1235

Docs: Feedback for Vector Collection #1235

Comments

naadhira commented Aug 2, 2024

k-jamroz commented Aug 2, 2024 • edited Loading

k-jamroz commented Aug 2, 2024

naadhira commented Aug 2, 2024 via email

k-jamroz commented Aug 2, 2024

k-jamroz commented Aug 2, 2024 •

edited

Loading