-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disallow removing some metadata fields by remove ingest processor #10895
Conversation
Signed-off-by: Gao Binlong <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
Compatibility status:Checks if related components are compatible with change a12f312 Incompatible componentsIncompatible components: [https://github.com/opensearch-project/performance-analyzer.git] Skipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/sql.git] |
Signed-off-by: Gao Binlong <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
modules/ingest-common/src/main/java/org/opensearch/ingest/common/RemoveProcessor.java
Outdated
Show resolved
Hide resolved
@gaobinlong a few minor comments but LGTM otherwise, thank you! |
Signed-off-by: Gao Binlong <[email protected]>
❌ Gradle check result for 3cb50e8: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Gao Binlong <[email protected]>
Now all gradle checks have passed yet, but the code coverage check is not, that's because of the randomization in our test code. |
@gaobinlong we would need documentation update for this change as well, could you please create the issue at https://github.com/opensearch-project/documentation-website/issues ? |
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-10895-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a0b016bf154cf765483f38c4c7f135ae972004c2
# Push it to GitHub
git push --set-upstream origin backport/backport-10895-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x Then, create a pull request where the |
@gaobinlong and we need manual backport to |
…ensearch-project#10895) * Ignore metadata fields when removing fields by remove ingest processor Signed-off-by: Gao Binlong <[email protected]> * Modify change log Signed-off-by: Gao Binlong <[email protected]> * Throw exception when removing some metadata fields Signed-off-by: Gao Binlong <[email protected]> * Format the code Signed-off-by: Gao Binlong <[email protected]> * Remove calling toLowerCase() Signed-off-by: Gao Binlong <[email protected]> * Remove calling toLowerCase() Signed-off-by: Gao Binlong <[email protected]> * Fix test failure Signed-off-by: Gao Binlong <[email protected]> * Add skip config in yml test file Signed-off-by: Gao Binlong <[email protected]> * Improve test coverage Signed-off-by: Gao Binlong <[email protected]> * Optimize some code Signed-off-by: Gao Binlong <[email protected]> --------- Signed-off-by: Gao Binlong <[email protected]> (cherry picked from commit a0b016b)
…t processor (#11311) * Disallow removing some metadata fields by remove ingest processor (#10895) * Ignore metadata fields when removing fields by remove ingest processor Signed-off-by: Gao Binlong <[email protected]> * Modify change log Signed-off-by: Gao Binlong <[email protected]> * Throw exception when removing some metadata fields Signed-off-by: Gao Binlong <[email protected]> * Format the code Signed-off-by: Gao Binlong <[email protected]> * Remove calling toLowerCase() Signed-off-by: Gao Binlong <[email protected]> * Remove calling toLowerCase() Signed-off-by: Gao Binlong <[email protected]> * Fix test failure Signed-off-by: Gao Binlong <[email protected]> * Add skip config in yml test file Signed-off-by: Gao Binlong <[email protected]> * Improve test coverage Signed-off-by: Gao Binlong <[email protected]> * Optimize some code Signed-off-by: Gao Binlong <[email protected]> --------- Signed-off-by: Gao Binlong <[email protected]> (cherry picked from commit a0b016b) * Remove duplicated changelog Signed-off-by: Gao Binlong <[email protected]> --------- Signed-off-by: Gao Binlong <[email protected]>
…ensearch-project#10895) * Ignore metadata fields when removing fields by remove ingest processor Signed-off-by: Gao Binlong <[email protected]> * Modify change log Signed-off-by: Gao Binlong <[email protected]> * Throw exception when removing some metadata fields Signed-off-by: Gao Binlong <[email protected]> * Format the code Signed-off-by: Gao Binlong <[email protected]> * Remove calling toLowerCase() Signed-off-by: Gao Binlong <[email protected]> * Remove calling toLowerCase() Signed-off-by: Gao Binlong <[email protected]> * Fix test failure Signed-off-by: Gao Binlong <[email protected]> * Add skip config in yml test file Signed-off-by: Gao Binlong <[email protected]> * Improve test coverage Signed-off-by: Gao Binlong <[email protected]> * Optimize some code Signed-off-by: Gao Binlong <[email protected]> --------- Signed-off-by: Gao Binlong <[email protected]>
…ensearch-project#10895) * Ignore metadata fields when removing fields by remove ingest processor Signed-off-by: Gao Binlong <[email protected]> * Modify change log Signed-off-by: Gao Binlong <[email protected]> * Throw exception when removing some metadata fields Signed-off-by: Gao Binlong <[email protected]> * Format the code Signed-off-by: Gao Binlong <[email protected]> * Remove calling toLowerCase() Signed-off-by: Gao Binlong <[email protected]> * Remove calling toLowerCase() Signed-off-by: Gao Binlong <[email protected]> * Fix test failure Signed-off-by: Gao Binlong <[email protected]> * Add skip config in yml test file Signed-off-by: Gao Binlong <[email protected]> * Improve test coverage Signed-off-by: Gao Binlong <[email protected]> * Optimize some code Signed-off-by: Gao Binlong <[email protected]> --------- Signed-off-by: Gao Binlong <[email protected]>
…ensearch-project#10895) * Ignore metadata fields when removing fields by remove ingest processor Signed-off-by: Gao Binlong <[email protected]> * Modify change log Signed-off-by: Gao Binlong <[email protected]> * Throw exception when removing some metadata fields Signed-off-by: Gao Binlong <[email protected]> * Format the code Signed-off-by: Gao Binlong <[email protected]> * Remove calling toLowerCase() Signed-off-by: Gao Binlong <[email protected]> * Remove calling toLowerCase() Signed-off-by: Gao Binlong <[email protected]> * Fix test failure Signed-off-by: Gao Binlong <[email protected]> * Add skip config in yml test file Signed-off-by: Gao Binlong <[email protected]> * Improve test coverage Signed-off-by: Gao Binlong <[email protected]> * Optimize some code Signed-off-by: Gao Binlong <[email protected]> --------- Signed-off-by: Gao Binlong <[email protected]>
…ensearch-project#10895) * Ignore metadata fields when removing fields by remove ingest processor Signed-off-by: Gao Binlong <[email protected]> * Modify change log Signed-off-by: Gao Binlong <[email protected]> * Throw exception when removing some metadata fields Signed-off-by: Gao Binlong <[email protected]> * Format the code Signed-off-by: Gao Binlong <[email protected]> * Remove calling toLowerCase() Signed-off-by: Gao Binlong <[email protected]> * Remove calling toLowerCase() Signed-off-by: Gao Binlong <[email protected]> * Fix test failure Signed-off-by: Gao Binlong <[email protected]> * Add skip config in yml test file Signed-off-by: Gao Binlong <[email protected]> * Improve test coverage Signed-off-by: Gao Binlong <[email protected]> * Optimize some code Signed-off-by: Gao Binlong <[email protected]> --------- Signed-off-by: Gao Binlong <[email protected]> Signed-off-by: Shivansh Arora <[email protected]>
Description
When removing metadata fields such as
_index
,_version
,_id
can cause unexpected result: OpenSearch process crash ornull_pointer_exception
, so it's better we do some check in remove ingest processor to prevent those things from happening.In current implementation, the enriched document
IngestDocument
contains these metadata fields:_index
,_version
,_id
,_version_type
,_routing
:OpenSearch/server/src/main/java/org/opensearch/ingest/IngestDocument.java
Line 83 in 5bd413c
, in most cases, these metadata fields are used internally, but actually they can be removed by remove ingest processor. Removing these metadata fields are meaningless because these fields have corresponding query parameters in index API, remove ingest processor should not disrupt that API, if users want to remove those fields they should remove the query parameters of index API, remove ingest processor should only focus on the source fields of the incoming documents, this makes the responsibility of index API and remove ingest processor clear.
We have two options to prevent removing metadata fields in remove ingest processor, one option is that throwing an
illegal_argument_exception
with an error message likecannot remove metadata fields...
, another option is to ignore the metadata fields when removing the fields from the source document. But in the long run, we may supportexclude
parameter for the remove processor, so if all the metadata fields don't math theexclude
pattern, we shouldn't remove the metadata fields, we should ignore them, in order to make this behavior consistently, I prefer the option which ignoring all the metadata fields.Related Issues
#10732
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.