Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unable to delete an index when a merge is ongoing #168

Closed
kotwanikunal opened this issue Jul 18, 2024 · 0 comments
Closed

[BUG] Unable to delete an index when a merge is ongoing #168

kotwanikunal opened this issue Jul 18, 2024 · 0 comments
Labels
bug Something isn't working untriaged

Comments

@kotwanikunal
Copy link
Owner

kotwanikunal commented Jul 18, 2024

Describe the bug

Opensearch indexes merge Lucene segments in the background or allowing the user to force merge the segments.

If a user tries to delete an index when a merge is ongoing, it fails to delete the index and causes a node drop as the delete operation could not be performed on the node.

Related component

Indexing

To Reproduce

  1. Create a multinode Opensearch 2.15 cluster using the guide (https://opensearch.org/docs/latest/install-and-configure/install-opensearch/index/)
  2. Run an ingestion load using opensearch-benchmark workloads
    Sample command: opensearch-benchmark execute-test \ --target-hosts='http://<host>:9200' \ --workload=nyc_taxis \ --include-tasks=create-index,index \ --pipeline=benchmark-only \ --kill-running-processes &
  3. Start a force merge for the index using the following command - curl -X POST 'http://localhost:9200/nyc_taxis/_forcemerge?max_num_segments=1'
  4. When the merge is ongoing, try to delete the index using - curl -X DELETE 'http://localhost:9200/nyc_taxis'

The delete operation fails.

Expected behavior

  • Delete operation should stop any merges and should be able to take precedence over the merge operation, ideally cancel the merge operation
    • Merge operations can be wrapped with CancellableThreads or executed as CancellableTask instances to interrupt the merge

Possible solutions

Short term solution
This problem is very similar to the long running operation of snapshotting an index - which has protections added in within the core code base here.

Similar to this approach, the proposed solution is to prevent index deletions in case of an ongoing merge operation.
This will introduce safeguarding against node drops by adding in delete restrictions on the index.

/**
    * Delete some indices from the cluster state.
    */
public ClusterState deleteIndices(ClusterState currentState, Set<Index> indices) {
    final Metadata meta = currentState.metadata();
    final Set<Index> indicesToDelete = new HashSet<>();
    final Map<Index, DataStream> backingIndices = new HashMap<>();
    for (Index index : indices) {
        ...
    }

    // Check if index deletion conflicts with any running snapshots
    Set<Index> snapshottingIndices = SnapshotsService.snapshottingIndices(currentState, indicesToDelete);
    if (snapshottingIndices.isEmpty() == false) {
        throw new SnapshotInProgressException(
            "Cannot delete indices that are being snapshotted: "
                + snapshottingIndices
                + ". Try again after snapshot finishes or cancel the currently running snapshot."
        );
    }
    
        Set<Index> mergingIndices = clusterService.mergingIndices(currentState, indicesToDelete);
    if (mergingIndices.isEmpty() == false) {
        throw new MergeInProgressException(
            "Cannot delete indices that are being merged: "
                + mergingIndices
                + ". Try again after merge finishes."
        );
    }
    ...
    ...
    }

Long term solution
The long term solution would require the merge operation to follow the constructs of a CancellableTask , where in whenever a request for deletion comes in, we should be able to cancel the ongoing merge operation as deletion takes precedence over a merge operation.

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
Logs

On merge segments call
 
 curl -X POST 'http://localhost:9200/nyc_taxis/_forcemerge?max_num_segments=1'
{"_shards":{"total":1,"successful":0,"failed":1,"failures":[{"shard":0,"index":"nyc_taxis","status":"INTERNAL_SERVER_ERROR","reason":{"type":"i_o_exception","reason":"background merge hit exception: _ig(9.10.0):c29067:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721264002631}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz4n _4r(9.10.0):C35941387:[diagnostics={source=merge, os.arch=amd64, java.runtime.version=21.0.3+9-LTS, mergeFactor=10, os=Linux, java.vendor=Eclipse Adoptium, os.version=6.1.96-102.177.amzn2023.x86_64, timestamp=1721261478494, mergeMaxNumSegments=-1, lucene.version=9.10.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kyjh _5v(9.10.0):c2363928:[diagnostics={source=merge, os.arch=amd64, java.runtime.version=21.0.3+9-LTS, mergeFactor=10, os=Linux, java.vendor=Eclipse Adoptium, os.version=6.1.96-102.177.amzn2023.x86_64, timestamp=1721261688310, mergeMaxNumSegments=-1, lucene.version=9.10.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0ky2q _if(9.10.0):c4326041:[diagnostics={source=merge, os.arch=amd64, java.runtime.version=21.0.3+9-LTS, mergeFactor=10, os=Linux, java.vendor=Eclipse Adoptium, os.version=6.1.96-102.177.amzn2023.x86_64, timestamp=1721263965491, mergeMaxNumSegments=-1, lucene.version=9.10.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz4o _ie(9.10.0):c177225:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721264002428}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz4m _9a(9.10.0):C36063101:[diagnostics={source=merge, os.arch=amd64, java.runtime.version=21.0.3+9-LTS, mergeFactor=10, os=Linux, java.vendor=Eclipse Adoptium, os.version=6.1.96-102.177.amzn2023.x86_64, timestamp=1721262288898, mergeMaxNumSegments=-1, lucene.version=9.10.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kys2 _id(9.10.0):c333281:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721263998156}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz4k _ds(9.10.0):c1615049:[diagnostics={source=merge, os.arch=amd64, java.runtime.version=21.0.3+9-LTS, mergeFactor=10, os=Linux, java.vendor=Eclipse Adoptium, os.version=6.1.96-102.177.amzn2023.x86_64, timestamp=1721263093453, mergeMaxNumSegments=-1, lucene.version=9.10.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kyqu _ic(9.10.0):c484293:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721264000846}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz4l _dz(9.10.0):c3495612:[diagnostics={source=merge, os.arch=amd64, java.runtime.version=21.0.3+9-LTS, mergeFactor=10, os=Linux, java.vendor=Eclipse Adoptium, os.version=6.1.96-102.177.amzn2023.x86_64, timestamp=1721263114700, mergeMaxNumSegments=-1, lucene.version=9.10.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kyt8 _ib(9.10.0):c92550:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721263956858}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz46 _eh(9.10.0):C36347450:[diagnostics={source=merge, os.arch=amd64, java.runtime.version=21.0.3+9-LTS, mergeFactor=10, os=Linux, java.vendor=Eclipse Adoptium, os.version=6.1.96-102.177.amzn2023.x86_64, timestamp=1721263177380, mergeMaxNumSegments=-1, lucene.version=9.10.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz4q _i5(9.10.0):c3256642:[diagnostics={source=merge, os.arch=amd64, java.runtime.version=21.0.3+9-LTS, mergeFactor=10, os=Linux, java.vendor=Eclipse Adoptium, os.version=6.1.96-102.177.amzn2023.x86_64, timestamp=1721263889150, mergeMaxNumSegments=-1, lucene.version=9.10.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz41 _fc(9.10.0):c3298467:[diagnostics={source=merge, os.arch=amd64, java.runtime.version=21.0.3+9-LTS, mergeFactor=10, os=Linux, java.vendor=Eclipse Adoptium, os.version=6.1.96-102.177.amzn2023.x86_64, timestamp=1721263346662, mergeMaxNumSegments=-1, lucene.version=9.10.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kyvt _fj(9.10.0):c515:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721263389585}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kyvn _i2(9.10.0):c31185:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721263889224}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz3n _i1(9.10.0):c5932:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721263888998}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz3h _fw(9.10.0):c5368186:[diagnostics={source=merge, os.arch=amd64, java.runtime.version=21.0.3+9-LTS, mergeFactor=10, os=Linux, java.vendor=Eclipse Adoptium, os.version=6.1.96-102.177.amzn2023.x86_64, timestamp=1721263463843, mergeMaxNumSegments=-1, lucene.version=9.10.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kyyd _i0(9.10.0):c21002:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721263889101}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz3i _ga(9.10.0):c3201567:[diagnostics={source=merge, os.arch=amd64, java.runtime.version=21.0.3+9-LTS, mergeFactor=10, os=Linux, java.vendor=Eclipse Adoptium, os.version=6.1.96-102.177.amzn2023.x86_64, timestamp=1721263582161, mergeMaxNumSegments=-1, lucene.version=9.10.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kyzm _gj(9.10.0):c3847:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721263600548}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kyyo _hz(9.10.0):c32751:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721263887299}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz3a _h1(9.10.0):c69409:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721263687976}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz0a _hx(9.10.0):c108946:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721263888034}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz39 _hw(9.10.0):c40099:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721263877309}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz30 _ha(9.10.0):c5952700:[diagnostics={source=merge, os.arch=amd64, java.runtime.version=21.0.3+9-LTS, mergeFactor=10, os=Linux, java.vendor=Eclipse Adoptium, os.version=6.1.96-102.177.amzn2023.x86_64, timestamp=1721263763897, mergeMaxNumSegments=-1, lucene.version=9.10.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz2r _hb(9.10.0):c10048:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721263766097}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz19 _hv(9.10.0):c9810968:[diagnostics={source=merge, os.arch=amd64, java.runtime.version=21.0.3+9-LTS, mergeFactor=10, os=Linux, java.vendor=Eclipse Adoptium, os.version=6.1.96-102.177.amzn2023.x86_64, timestamp=1721263852980, mergeMaxNumSegments=-1, lucene.version=9.10.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz4p _hc(9.10.0):c137015:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721263779623}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz1j _hl(9.10.0):c3458020:[diagnostics={source=merge, os.arch=amd64, java.runtime.version=21.0.3+9-LTS, mergeFactor=10, os=Linux, java.vendor=Eclipse Adoptium, os.version=6.1.96-102.177.amzn2023.x86_64, timestamp=1721263782050, mergeMaxNumSegments=-1, lucene.version=9.10.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz2d _hi(9.10.0):c1751:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721263781575}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz1t _he(9.10.0):c39256:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721263777862}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz1k _hh(9.10.0):c27820:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721263781950}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz1s _hf(9.10.0):c58239:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721263781804}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz1r _hg(9.10.0):c66650:[diagnostics={source=flush, lucene.version=9.10.0, os.version=6.1.96-102.177.amzn2023.x86_64, os.arch=amd64, os=Linux, java.vendor=Eclipse Adoptium, java.runtime.version=21.0.3+9-LTS, timestamp=1721263782346}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_COMPRESSION}] :id=dffgu016a0oidvb3yk3s0kz1y into _ih [maxNumSegments=1] [ABORTED]","caused_by":{"type":"i_o_exception","reason":"Merge aborted."}}}]}}
 
 
 
On delete index call
 
[~]$ curl -X DELETE 'http://localhost:9200/nyc_taxis'
curl -X PUT 'http://localhost:9200/target_index2'
curl -X DELETE 'http://localhost:9200/target_index2'
{"acknowledged":false}{"error":{"root_cause":[{"type":"process_cluster_event_timeout_exception","reason":"failed to process cluster event (create-index [target_index2], cause [api]) within 30s"}],"type":"process_cluster_event_timeout_exception","reason":"failed to process cluster event (create-index [target_index2], cause [api]) within 30s"},"status":503}{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [target_index2]","index":"target_index2","resource.id":"target_index2","resource.type":"index_or_alias","index_uuid":"_na_"}],"type":"index_not_found_exception","reason":"no such index [target_index2]","index":"target_index2","resource.id":"target_index2","resource.type":"index_or_alias","index_uuid":"_na_"},"status":404}

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@kotwanikunal kotwanikunal added bug Something isn't working untriaged labels Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working untriaged
Projects
None yet
Development

No branches or pull requests

1 participant