Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype search pipelines #97

Closed
Tracked by #80 ...
msfroh opened this issue Feb 6, 2023 · 2 comments · Fixed by opensearch-project/OpenSearch#7135
Closed
Tracked by #80 ...

Prototype search pipelines #97

msfroh opened this issue Feb 6, 2023 · 2 comments · Fixed by opensearch-project/OpenSearch#7135
Assignees
Labels
feature introduce a net new unit of functionality of a software system that satisfies a requirement Search
Milestone

Comments

@msfroh
Copy link
Collaborator

msfroh commented Feb 6, 2023

I'm going to put together a scrappy first implementation of search pipelines.

This first implementation will largely be a copy/paste from ingest pipelines.

I think it should be a good conversation-starter about whether/how to share implementation with ingest pipelines. Depending on where I get with this task, it may be throwaway learning code or it may be the first draft of what we eventually want to merge.

This task should be moved to the OpenSearch core project, but I'm creating it here as a placeholder.

The goals for my prototype include:

  1. Should be able to CRUD search pipelines, with persistence in cluster state.
  2. Should be able to invoke a named search pipeline from a search request. (The named search pipeline might include a dummy "hello world" processor that e.g. adds a field to the first hit in the response or adds a filter to an incoming query.)

Features that can come later (but before "release") include:

  1. Setting a default search pipeline for an index.
  2. Specifying an ad hoc search pipeline as part of a search request.
  3. Availability of a "standard" set of search pipeline processors (similar to ingest-common).
  4. Support for BracketProcessor (processors that modify both request and response, with state carried from request time to response time).
@msfroh msfroh added the feature introduce a net new unit of functionality of a software system that satisfies a requirement label Feb 6, 2023
@msfroh msfroh self-assigned this Feb 6, 2023
@msfroh msfroh moved this from 🆕 New to Now(This Quarter) in Search Project Board Feb 6, 2023
@msfroh msfroh mentioned this issue Feb 6, 2023
1 task
@macohen macohen removed the untriaged label Feb 6, 2023
@macohen macohen moved this from Now(This Quarter) to 🏗 In progress in Search Project Board Feb 6, 2023
@Jeevananthan-23
Copy link

Hi @msfroh, I could like to understand the concept behind the pipelines. So it is similar to Redis pipeline where to optimize round-trip times by batching tasks request in client side socket and send to server without waiting for the replies at all, and finally read the replies in a single step.

This is the design model of the ingest and search pipelines?

Thanks in advances!

@msfroh
Copy link
Collaborator Author

msfroh commented Feb 23, 2023

Hi @Jeevananthan-23, the motivation is about providing a (relatively) lightweight way to modify behavior of searches at the cluster level, since that may make more sense than modifying behavior at the application layer.

For example, ingest pipelines provide a way of manipulating incoming documents on the cluster before they're sent for indexing. You could just modify the documents before sending them to the cluster in the first place, but maybe that's not convenient (like maybe you have multiple applications sending documents). Also, you get the open-source benefit where one person can write a useful ingest pipeline processor and share it with other OpenSearch users, who don't need to modify any of their applications' indexing code.

On the search side, the specific thing we've been trying to tackle is final-stage rerankers (which is what we've been covering in https://github.com/opensearch-project/search-processor), where you want to run the collated search results through an external reranker to get more relevant results than you could get through term frequency-based relevance alone. You could send the results you get back from OpenSearch to the external reranker, but by letting the cluster drive the transformation you don't need to modify your search application. More importantly, one person can build and release a search pipeline processor that integrates with an external reranker, and many users can benefit without each having to modify their search application.

Inspired by ingest pipelines, we realized that "functional operator" model (where an ingest pipeline processor is effectively a function that takes an IngestDocument and returns an IngestDocument) is pretty powerful. We can similarly define a couple of interfaces that operate on SearchRequest and SearchResponse.

The linked RFC goes into much more detail, but I hope the above is a useful summary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature introduce a net new unit of functionality of a software system that satisfies a requirement Search
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants