[FEATURE] Enhance ML Inference Search Request Processor to carry over the query metadata fields. #2841

mingshl · 2024-08-20T21:27:30Z

Is your feature request related to a problem?
Currently, when rewriting query type in ML Inference Search Request Processor, users can build up a new query using query_template parameters when configuring the processor. For example, rewriting neural search query into knn query.

However, the query meta fields in search request request body will be ignored.

For example, configuring a ML Inference Search Request Processor with a cohere embedding model, cohere.ai/v1/embed, as below:

PUT /_search/pipeline/my_pipeline_neural_search
{
  "request_processors": [
    {
      "ml_inference": {
        "tag": "ml_inference",
        "description": "This processor is going to run ml inference during search request",
        "model_id": "K7WVcZEBXV92Z6odCZGJ",
        "query_template": """{
                              "query": {
                                "knn": {
                                  "review_embedding": {
                                    "vector": ${modelPredictionOutcome},
                                    "k": 5
                                  }
                                }
                              }
                            }""",
        "function_name": "REMOTE",
        "input_map": [
          {
            "texts": "query.neural.review_embedding.query_text"
          }
        ],
        "output_map": [
          {
            "modelPredictionOutcome": "embeddings[0]"
          }
        ],
        "ignore_missing": false,
        "ignore_failure": false
      }
    }
  ]
}

common use case to call query with query string:

GET /review_string_index/_search?search_pipeline=my_pipeline_neural_search
{
  "query": {
    "neural": {
      "review_embedding": {
        "query_text": "good review",
        "k": 5
      }
    }
  }
}

and it will rewrite to

GET /review_string_index/_search 
{
  "query": {
    "knn": {
      "review_embedding": {
        "vector": "<model inference vector>",
        "k": 5
      }
    }
  }
}

However, if I add the meta datafield _source in search request body, for example,

GET /review_string_index/_search?search_pipeline=my_pipeline_neural_search
{
  "_source": {
    "excludes": [
      "review_embedding"
    ]
  },
  "query": {
    "neural": {
      "review_embedding": {
        "query_text": "good review",
        "model_id": "K7WVcZEBXV92Z6odCZGJ",
        "k": 5
      }
    }
  }
}

It will still rewrite the same knn query and the meta datafield _source will be ignored

GET /review_string_index/_search 
{
  "query": {
    "knn": {
      "review_embedding": {
        "vector": "<model inference vector>",
        "k": 5
      }
    }
  }
}

What solution would you like?
Maybe try to add a parameter to opt in carry over query field other than query string, including _source, sort, search_after, etc.

What alternatives have you considered?
A clear and concise description of any alternative solutions or features you've considered.

Do you have any additional context?
Add any other context or screenshots about the feature request here.

The text was updated successfully, but these errors were encountered:

wrigleyDan · 2024-11-28T09:31:19Z

I think that would be a very useful feature and would love to see that implemented.

I'm currently trying to use the ml_inference processor in a hybrid search scenario which is giving me a hard time and I wonder if the implementation of this feature request would let me overcome the obstacles.

For context, I specify the hybrid search query together with the normalization-processor as part of a search_pipeline in the query_template of the ml_inference processor:

PUT _search/pipeline/ml_inference_pipeline
{
  "description": "search with predictions",
  "request_processors": [
    {
      "ml_inference": {
        "function_name": "remote",
        "full_response_path": True,
        "model_id": model_id,
        "model_input": """{ "parameters": {"input": "${input_map.features}"}}""",
        "query_template": """{
  "_source": {
    "excludes": [
      "title_embedding"
    ],
    "includes": "product_title"
  },
  "query": {
    "hybrid": {
      "queries": [
        {
          "multi_match": {
            "type": "best_fields",
            "fields": [
              "product_id^100",
              "product_bullet_point^3",
              "product_color^2",
              "product_brand^5",
              "product_description",
              "product_title^10"
            ],
            "operator": "and",
            "query": "iphone"
          }
        },
        {
          "neural": {
            "title_embedding": {
              "query_text": "iphone",
              "k": 50
            }
          }
        }
      ]
    }
  },
  "size": 1,
  "track_total_hits": true,
  "search_pipeline": {
    "phase_results_processors": [
      {
        "normalization-processor": {
          "normalization": {
            "technique": "l2"
          },
          "combination": {
            "technique": "arithmetic_mean",
            "parameters": {
              "weights": [
                ${keywordness},
                ${neuralness}
              ]
            }
          }
        }
      }
    ]
  }
}""",
        "input_map": [
          {
            "features": "query.term.features.value"
          }
        ],
        "output_map": [
          {
            "neuralness": "$.inference_results[0]output[0]dataAsMap.neuralness",
            "keywordness": "$.inference_results[0]output[0]dataAsMap.keywordness"
          }
        ],
        "ignore_missing": False,
        "ignore_failure": False
      }
    },
    {
      "neural_query_enricher": {
        "description": "one of many search pipelines for experimentation",
        "default_model_id": "i6jHTZMBflg_ePyfu9EK",
        "neural_field_default_id": {
            "title_embeddings": "i6jHTZMBflg_ePyfu9EK"
          }
      }
    }
  ]
}

Calling it returns a hitlist with scores that are not normalized.
Query:

POST ecommerce/_search?search_pipeline=ml_inference_pipeline

{
"query": {
  "term": {
    "features": {
      "value": "2, 0, 169, 1.1657, 8.58744, 0.67777"
      }
    }
  }
}

Response:

{
  "took": 74,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 15129,
      "relation": "eq"
    },
    "max_score": 80.15181,
    "hits": [
      {
        "_index": "...",
        "_id": "...",
        "_score": -9549512000,
        "_source": {...}
      }
    ]
  }
}

Or am I missing something? When the hybrid search query is part of the query_template the normalization-processor appears to be ignored. My gut feeling is because we are defining a search_pipeline (with phase_results_processors) in a search_pipeline (with the ml_inference processor).

mingshl · 2024-12-04T18:42:58Z

@wrigleyDan this is a pretty complicated use case,

you create a search pipeline that has a ml inference request processor and a request neural_query_enricher, this should be fine.

But within the ml inference request processor, you also rewrite the query with a inner search pipeline that has a normalization-processor which is a phase_results_processor. I have to debug a bit and see within ml inference request processor to confirm if the inner pipeline is created. I am not sure about this point.

can you tell me what model you are using ? provide me one dummy document so I can reproduce your case.

wrigleyDan · 2024-12-04T20:07:16Z

@mingshl thanks for getting back to me!

FYI: I opened a feature request today and briefly discussed it in today's search community meeting: opensearch-project/OpenSearch#16775 (comment)

Seems like there is path moving forward which is referenced in the comments.

mingshl added enhancement New feature or request untriaged labels Aug 20, 2024

Zhangxunmt removed the untriaged label Aug 27, 2024

Zhangxunmt assigned mingshl Aug 27, 2024

Zhangxunmt added this to ml-commons projects Aug 27, 2024

Zhangxunmt added the Priority-Medium label Aug 27, 2024

Zhangxunmt moved this to On-deck in ml-commons projects Aug 27, 2024

wrigleyDan mentioned this issue Dec 4, 2024

[Feature Request] Use ml_inference request processor output in normalization-processor opensearch-project/OpenSearch#16775

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Enhance ML Inference Search Request Processor to carry over the query metadata fields. #2841

[FEATURE] Enhance ML Inference Search Request Processor to carry over the query metadata fields. #2841

mingshl commented Aug 20, 2024

wrigleyDan commented Nov 28, 2024

mingshl commented Dec 4, 2024 •

edited

Loading

wrigleyDan commented Dec 4, 2024

[FEATURE] Enhance ML Inference Search Request Processor to carry over the query metadata fields. #2841

[FEATURE] Enhance ML Inference Search Request Processor to carry over the query metadata fields. #2841

Comments

mingshl commented Aug 20, 2024

wrigleyDan commented Nov 28, 2024

mingshl commented Dec 4, 2024 • edited Loading

wrigleyDan commented Dec 4, 2024

mingshl commented Dec 4, 2024 •

edited

Loading