Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add verbose pipeline parameter to output each processor's execution details #16843

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

junweid62
Copy link

@junweid62 junweid62 commented Dec 13, 2024

Jan 16 revision:

  1. Add two new fields: status and error
  • status: Represents the execution status of the processor, with possible values such as SUCCESS and FAIL.
  • error: Captures the error message or details if the processor encounters an issue during execution. If no error occurs, this field remains empty.
  1. When verbose pipeline is enabled, errors in a processor will not interrupt the search but will document the error and continue searching

When the verbose pipeline mode is enabled, if a processor encounters an error during execution, the search process will not be interrupted. Instead, the error will be documented in the processor's execution details (e.g., in ProcessorExecutionDetail) and the remaining search process will proceed as normal.

{
            "processor_name": "rename_field",
            "duration_millis": 0,
            "status": "fail",
            "error": "Document with id 1 is missing field messag123e",
            "input_data": [
                {
                    "_index": "my_index",
                    "_id": "1",
                    "_score": 1.0,
                    "_source": {
                        "message": "This is a public message",
                        "visibility": "public"
                    }
                },
                {
                    "_index": "my_index",
                    "_id": "2",
                    "_score": 1.0,
                    "_source": {
                        "message": "This is a private message",
                        "visibility": "private"
                    }
                }
            ],
            "output_data": null
        },
        {

Description

Related RFC : #16705

This PR introduces enhancements to OpenSearch's search pipeline functionality, focusing on improving the traceability and debugging of search request and response transformations. It addresses the increasing complexity of search pipeline processors by implementing verbose mode support, which provides detailed insights into processor execution.

  1. Adds Verbose Mode for Search Pipelines:

    • Introduced the verbose_pipeline parameter to search requests, default to false.
    • Tracks each processor’s input, output, execution time, and status (success/failure).
    • Provides detailed logs of the data flow through request and response processors.
  2. Improves Pipeline Debugging:

    • Captures step-by-step data transformations applied by each processor.
    • Includes execution metadata (e.g., timestamps and elapsed time) in search responses for better analysis.
  3. Supports All Pipeline Configurations:

    • Works seamlessly with:
      • Default pipelines configured at the index level.
      • Pipelines explicitly specified in the search request.
      • Ad-hoc pipelines defined inline.
  4. Test Framework Enhancements:

    • Added tests for verbose mode to ensure correct functionality and compatibility.

Example output with request processor: filter_query response processor: rename_field and sort

{
    "took": 29,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.9116078,
        "hits": [
            {
                "_index": "my_index",
                "_id": "2",
                "_score": 0.9116078,
                "_source": {
                    "notification": "This is a public message 2",
                    "visibility": "public"
                }
            }
        ]
    },
    "processor_result": [
        {
            "processor_name": "filter_query",
            "duration_millis": 0,
            "input_data": {
                "query": {
                    "bool": {
                        "must": [
                            {
                                "match": {
                                    "message": {
                                        "query": "This is a public message 1",
                                        "operator": "OR",
                                        "prefix_length": 0,
                                        "max_expansions": 50,
                                        "fuzzy_transpositions": true,
                                        "lenient": false,
                                        "zero_terms_query": "NONE",
                                        "auto_generate_synonyms_phrase_query": true,
                                        "boost": 1.0
                                    }
                                }
                            }
                        ],
                        "adjust_pure_negative": true,
                        "boost": 1.0
                    }
                },
                "search_pipeline": "my_pipeline"
            },
            "output_data": {
                "query": {
                    "bool": {
                        "must": [
                            {
                                "bool": {
                                    "must": [
                                        {
                                            "match": {
                                                "message": {
                                                    "query": "This is a public message 1",
                                                    "operator": "OR",
                                                    "prefix_length": 0,
                                                    "max_expansions": 50,
                                                    "fuzzy_transpositions": true,
                                                    "lenient": false,
                                                    "zero_terms_query": "NONE",
                                                    "auto_generate_synonyms_phrase_query": true,
                                                    "boost": 1.0
                                                }
                                            }
                                        }
                                    ],
                                    "adjust_pure_negative": true,
                                    "boost": 1.0
                                }
                            }
                        ],
                        "filter": [
                            {
                                "term": {
                                    "visibility": {
                                        "value": "public",
                                        "boost": 1.0
                                    }
                                }
                            }
                        ],
                        "adjust_pure_negative": true,
                        "boost": 1.0
                    }
                },
                "search_pipeline": "my_pipeline"
            }
        },
        {
            "processor_name": "rename_field",
            "duration_millis": 0,
            "input_data": [
                {
                    "_index": "my_index",
                    "_id": "2",
                    "_score": 0.9116078,
                    "_source": {
                        "message": "This is a public message 2",
                        "visibility": "public"
                    }
                }
            ],
            "output_data": [
                {
                    "_index": "my_index",
                    "_id": "2",
                    "_score": 0.9116078,
                    "_source": {
                        "notification": "This is a public message 2",
                        "visibility": "public"
                    }
                }
            ]
        },
        {
            "processor_name": "sort",
            "duration_millis": 0,
            "input_data": [
                {
                    "_index": "my_index",
                    "_id": "2",
                    "_score": 0.9116078,
                    "_source": {
                        "notification": "This is a public message 2",
                        "visibility": "public"
                    }
                }
            ],
            "output_data": [
                {
                    "_index": "my_index",
                    "_id": "2",
                    "_score": 0.9116078,
                    "_source": {
                        "notification": "This is a public message 2",
                        "visibility": "public"
                    }
                }
            ]
        }
    ]
}

Related Issues

Resolves #14745

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Other Priority-High Search Search query, autocomplete ...etc v2.19.0 Issues and PRs related to version 2.19.0 labels Dec 13, 2024
@junweid62 junweid62 added the backport 2.x Backport to 2.x branch label Dec 13, 2024
@junweid62 junweid62 force-pushed the search-pipline-execution branch from d5a2c4c to d931750 Compare December 13, 2024 00:26
Junwei Dai added 8 commits January 15, 2025 13:57
Signed-off-by: Junwei Dai <[email protected]>

# Conflicts:
#	CHANGELOG.md
Signed-off-by: Junwei Dai <[email protected]>
Signed-off-by: Junwei Dai <[email protected]>
Signed-off-by: Junwei Dai <[email protected]>
2.use exist xcontentUtil to read
3.move processor excution key to ProcessorExecutionDetail

Signed-off-by: Junwei Dai <[email protected]>
Signed-off-by: Junwei Dai <[email protected]>
@junweid62 junweid62 force-pushed the search-pipline-execution branch from a805cd9 to 021ecec Compare January 15, 2025 22:00
Copy link
Contributor

✅ Gradle check result for 021ecec: SUCCESS

Copy link
Member

@owaiskazi19 owaiskazi19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start.
Few questions

  1. Search pipelines can be used 3 ways: using it as default pipeline, as a temporary search request and through query param. Before applying the verbose pipeline param, we should check if search pipeline is applied in the Search Request. You have mentioned that in the RFC in resolvePipeline but I don't see the check in the PR.

  2. Once the check is done then need 3 more tests for verbose pipeline execution with all the ways mentioned above

We are touching few core search files. @msfroh can you take a look at it too?

@@ -302,6 +305,9 @@ public SearchSourceBuilder(StreamInput in) throws IOException {
if (in.getVersion().onOrAfter(Version.V_2_18_0)) {
searchPipeline = in.readOptionalString();
}
if (in.getVersion().onOrAfter(Version.CURRENT)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a general BWC dance we do for such cases, we don't need an issue to track this. @junweid62 can take care of it by following the below

  1. Do this on main with onOrAfter(Version.V_3_0_0)). Get it merged.
  2. You'll need a manual backport to 2.x, where you do onOrAfter(Version.V_2_19_0). Don't get it merged right away.
  3. Before merging the backport to 2.x, open another PR on main to change it to onOrAfter(Version.V_2_19_0).
  4. Merge the backport PR.
  5. Merge the main version update PR.

return;
}

if (data instanceof List) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also have a case of data instanceof Map. Moreover, did we try all the core processors to come up with such cases?

*/
@SuppressWarnings("unchecked")
public List<ProcessorExecutionDetail> getProcessorExecutionDetails() {
Object details = attributes.get(PROCESSOR_EXECUTION_DETAILS_KEY);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why keeping it as Object here and not directly List<ProcessorExecutionDetail> since that's how we added it in the first place? That would reduce the below check

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I originally used an Object check was to account for potential flexibility in how values might be stored in attributes. Since attributes is a generic map, I wanted to ensure that even if PROCESSOR_EXECUTION_DETAILS_KEY was accidentally associated with a non-List value, the code would handle it gracefully without throwing a ClassCastException. The instanceof check provided that extra layer of safety.

However, after reviewing the usage of PROCESSOR_EXECUTION_DETAILS_KEY, I confirmed that it is always associated with a List, so I simplified the code as suggested.

@ohltyler
Copy link
Member

ohltyler commented Jan 16, 2025

Tracks each processor’s input, output, execution time, and status (success/failure).

Let's make sure to include the status, plus any associated error message at a per-processor granularity as well.

Junwei Dai added 3 commits January 16, 2025 15:44
2.refactor error message

Signed-off-by: Junwei Dai <[email protected]>
Signed-off-by: Junwei Dai <[email protected]>
Copy link
Contributor

❌ Gradle check result for 7276e6c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

2.Removed redundant logic for cleaner and simpler implementation.

Signed-off-by: Junwei Dai <[email protected]>
@junweid62 junweid62 requested a review from cwperks as a code owner January 21, 2025 22:51
@junweid62
Copy link
Author

Hi @msfroh, thank you for your detailed feedback. I have addressed all your comments and made the suggested changes, please take a look and let me know if any further adjustments are needed.

Signed-off-by: Junwei Dai <[email protected]>
Copy link
Contributor

❌ Gradle check result for 2c5759d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Junwei Dai <[email protected]>
Copy link
Contributor

❌ Gradle check result for 2c16992: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Junwei Dai <[email protected]>
Copy link
Contributor

✅ Gradle check result for 42e50d0: SUCCESS

Copy link
Member

@owaiskazi19 owaiskazi19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there! Good work @junweid62

import org.opensearch.core.action.ActionListener;

/**
* Wrapper for SearchRequestProcessor to track execution details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Wrapper for SearchRequestProcessor to track execution details.
* Wrapper for SearchRequestProcessor to track execution details.
*
* @opensearch.internal

expectThrows(
IllegalArgumentException.class,
() -> searchPipelineService.resolvePipeline(searchRequest, indexNameExpressionResolver)
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add assert for the message as well

assertTrue(e.getMessage(), e.getMessage().contains(" The 'verbose pipelines' option requires a search pipeline to be defined.");"));

* @param wrappedProcessor the actual processor to be wrapped
*/
public TrackingSearchResponseProcessorWrapper(SearchResponseProcessor wrappedProcessor) {
if (wrappedProcessor == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can have a test to check for the null case here

@@ -156,7 +160,7 @@ void transformRequest(SearchRequest request, ActionListener<SearchRequest> reque
long took = TimeUnit.NANOSECONDS.toMillis(relativeTimeSupplier.getAsLong() - start);
afterRequestProcessor(processor, took);
onRequestProcessorFailed(processor);
if (processor.isIgnoreFailure()) {
if (processor.isIgnoreFailure() || r.source().verbosePipeline()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this extra check for verbosePipeline?

return pipeline.transformResponseListener(this, ActionListener.wrap(response -> {
// Extract processor execution details
List<ProcessorExecutionDetail> details = requestContext.getProcessorExecutionDetails();
logger.info("it is going to be executed in [{}]", details);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover?

@@ -426,6 +425,9 @@ public PipelinedRequest resolvePipeline(SearchRequest searchRequest, IndexNameEx
pipeline = pipelineHolder.pipeline;
}
}
if (searchRequest.source() != null && searchRequest.source().verbosePipeline() && pipeline.getId().equals(NOOP_PIPELINE_ID)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (searchRequest.source() != null && searchRequest.source().verbosePipeline() && pipeline.getId().equals(NOOP_PIPELINE_ID)) {
if (searchRequest.source() != null && searchRequest.source().verbosePipeline() && NOOP_PIPELINE_ID.equals(pipelineId) == false) {

@@ -426,6 +425,9 @@ public PipelinedRequest resolvePipeline(SearchRequest searchRequest, IndexNameEx
pipeline = pipelineHolder.pipeline;
}
}
if (searchRequest.source() != null && searchRequest.source().verbosePipeline() && pipeline.getId().equals(NOOP_PIPELINE_ID)) {
throw new IllegalArgumentException("The 'verbose pipelines' option requires a search pipeline to be defined.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw new IllegalArgumentException("The 'verbose pipelines' option requires a search pipeline to be defined.");
throw new IllegalArgumentException("The 'verbose pipeline' option requires a search pipeline to be defined.");

* @opensearch.internal
* @since 2.19.0
*/
@PublicApi(since = "2.19.0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an internal API which is what is mentioned on L293

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch enhancement Enhancement or improvement to existing feature or request Other Priority-High Search Search query, autocomplete ...etc v2.19.0 Issues and PRs related to version 2.19.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Support a verbose/debugging param in search pipelines
7 participants