Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Possible 2.x search regression related to mandatory soft deletes #7621

Open
jainankitk opened this issue May 18, 2023 · 5 comments
Labels
discuss Issues intended to help drive brainstorming and decision making Indexing Indexing, Bulk Indexing and anything related to indexing Performance This is for any performance related enhancements or bugs

Comments

@jainankitk
Copy link
Collaborator

Describe the bug
Soft deletes have been enforced from 2.x without any option to disable it through #1903. Enabling soft deletes can leave segments with large number of deleted documents impacting search performance.

Workaround is to lower the default flush threshold from 512MB to lower values like 1MB to ensure the documents can be marked eligible for being expunged. Soft deletes retention policy is based on trying to retain all deleted documents above a sequence number which is based on the minimum of global checkpoint (dependent on the all follower shard copies) and local checkpoint for the safe (durable) commit of the shard. The latter does not get updated due to infrequent flushes, and causing the policy to retain the documents.

To Reproduce
Steps to reproduce the behavior:
The behavior can be reproduced using update heavy workload where the document size is not very high. OS flushes will not get triggered as the default flush threshold requires translog to reach 512MB in size before the local checkpoint of safe commit on shard is updated

Expected behavior
Since this can impact search performance and updating the flush threshold can have other consequences, we should provide option to disable soft deletes similar to OS 1.x

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@jainankitk jainankitk added bug Something isn't working untriaged labels May 18, 2023
@andrross andrross added the Indexing Indexing, Bulk Indexing and anything related to indexing label May 30, 2023
@anasalkouz
Copy link
Member

Hey @jainankitk, thanks for reporting this. I would like to know more how OpenSearch will handle deletes if you disabled soft-deletes. My understanding soft-delete is required because of the immutable nature of Lucene segment files.

@nknize any thoughts?

@nknize
Copy link
Collaborator

nknize commented Jun 6, 2023

@nknize any thoughts?

Yes, It was a conscious decision to reduce storage cost and improve performance of peer recoveries and I have yet to see a detailed reproducible scenario w/ the open source OpenSearch distribution to substantiate the need. So I don't suggest we do this without strong justification. Even then I'd look at other mechanisms before unilaterally rolling back mandatory soft deletes.

Enabling soft deletes can leave segments with large number of deleted documents impacting search performance.

@jainankitk can you post a reproducible benchmark including segment sizes, geometries, merge policy, etc. for further investigation? Full details are needed to determine this isn't a red herring request.

Out of curiosity what is the behavior when you lower index.soft_deletes.retention_lease.period. This controls how long deletes are retained before being merged away. In the scenario you're experiencing this behavior, did you tracelog the merge before / after adjusting the retention lease? What's the segment geometry look like? Can you provide specific tracelog / cat API output clearly showing the causality.

@nknize nknize added the discuss Issues intended to help drive brainstorming and decision making label Jun 6, 2023
@nknize nknize changed the title [BUG] Soft deletes enforced from Opensearch 2.x [Performance] Possible 2.x search regression related to mandatory soft deletes Jun 6, 2023
@nknize nknize added Performance This is for any performance related enhancements or bugs and removed bug Something isn't working labels Jun 6, 2023
@nknize
Copy link
Collaborator

nknize commented Jun 6, 2023

removing bug label since this is related to a performance concern / discussion.

@backslasht
Copy link
Contributor

Reporting a slightly different side effect (regression) due to soft deletes on indexing latency.

Observed that p100 indexing latency goes as high as 10+ seconds with soft deletes enabled where as the p100 latency was approx. 300 ms with soft deletes disabled. Thread dump revealed that the write thread is blocked on the IndexWriter object which is held by the refresh thread. The thread dumps are from OpenSearch 1.3 but I validated the same issue exists in latest OpenSearch version (2.11)

"opensearch[027d7adff6d2636bf859b1546e6ca1f9][write][T#48]" #596 daemon prio=5 os_prio=0 cpu=644849.64ms elapsed=166954.02s tid=0x00007f29dc160830 nid=0x71f waiting for monitor entry  [0x00007f23076f6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.lucene.index.IndexWriter.getNextMerge(IndexWriter.java:2365)
        - waiting to lock <0x00000010ea798fb0> (a org.apache.lucene.index.IndexWriter)
        at org.apache.lucene.index.IndexWriter.access$800(IndexWriter.java:219)
        at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.getNextMerge(IndexWriter.java:5898)
        at org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:533)
        - locked <0x0000001296256ed0> (a org.opensearch.index.engine.InternalEngine$EngineMergeScheduler)
        at org.apache.lucene.index.IndexWriter.executeMerge(IndexWriter.java:2293)
"opensearch[027d7adff6d2636bf859b1546e6ca1f9][refresh][T#1]" #501 daemon prio=5 os_prio=0 cpu=123408204.42ms elapsed=167420.28s tid=0x00007f2bc00076d0 nid=0x94b7 runnable  [0x00007f2891cde000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer.writeValues(Lucene80DocValuesConsumer.java:198)
        at org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer.addNumericField(Lucene80DocValuesConsumer.java:126)
        at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addNumericField(PerFieldDocValuesFormat.java:109)
        at org.apache.lucene.index.ReadersAndUpdates.handleDVUpdates(ReadersAndUpdates.java:367)
        - locked <0x00000012320bf4c0> (a org.apache.lucene.index.ReadersAndUpdates)
        at org.apache.lucene.index.ReadersAndUpdates.writeFieldUpdates(ReadersAndUpdates.java:579)
        - locked <0x00000012320bf4c0> (a org.apache.lucene.index.ReadersAndUpdates)
        at org.apache.lucene.index.ReaderPool.writeAllDocValuesUpdates(ReaderPool.java:228)
        at org.apache.lucene.index.IndexWriter.writeReaderPool(IndexWriter.java:3629)
        at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:624)
        - locked <0x00000010ea798fb0> (a org.apache.lucene.index.IndexWriter)
        - locked <0x0000001260d37da8> (a java.lang.Object)
        at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:293)

Further inspection revealed that the refresh thread is writing doc values for soft delete field.

method=org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer.writeValues location=AtExit
ts=2023-10-16 10:33:41; [cost=2398.32054ms] result=@ArrayList[
    @Object[][
        @FieldInfo[
            name=@String[__soft_deletes],
            number=@Integer[18],
            docValuesType=@DocValuesType[

Digging further on why doc value writes taking time, it showed min, max and gcd computation for the soft delete field consuming the time. Though the soft delete field value is hard coded to 1 in https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/common/lucene/Lucene.java#L980 it still consumes time as it iterates over million of documents.

`---ts=2023-10-16 09:33:45;thread_name=opensearch[7cdd429cbfcbc2619f34618050b2f9cf][refresh][T#1];id=de;is_daemon=true;priority=5;TCCL=jdk.internal.loader.ClassLoaders$AppClassLoader@5c29bfd
    `---[83073.891311ms] org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer:writeValues()
        +---[0.00% 0.023908ms ] org.apache.lucene.codecs.DocValuesProducer:getSortedNumeric() #170
        +---[0.00% 5.92E-4ms ] org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer$MinMaxTracker:<init>() #172
        +---[0.00% 6.97E-4ms ] org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer$MinMaxTracker:<init>() #173
        +---[8.20% min=3.55E-4ms,max=16.545724ms,total=6809.57583ms,count=15417420] org.apache.lucene.index.SortedNumericDocValues:nextDoc() #176
        +---[7.71% min=3.4E-4ms,max=17.220172ms,total=6406.568143ms,count=15417419] org.apache.lucene.index.SortedNumericDocValues:docValueCount() #177
        +---[7.76% min=3.44E-4ms,max=16.878148ms,total=6446.381854ms,count=15417419] org.apache.lucene.index.SortedNumericDocValues:nextValue() #178
        +---[8.39% min=3.69E-4ms,max=17.628196ms,total=6971.536636ms,count=15417419] org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer$MinMaxTracker:update() #191
        +---[8.34% min=3.69E-4ms,max=16.76535ms,total=6928.286565ms,count=15417419] org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer$MinMaxTracker:update() #192
        +---[7.28% min=3.2E-4ms,max=17.074409ms,total=6047.507578ms,count=15417418] org.apache.lucene.util.MathUtil:gcd() #187
        +---[0.00% min=5.13E-4ms,max=0.00148ms,total=0.682019ms,count=941] org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer$MinMaxTracker:nextBlock() #194
        +---[0.00% 8.53E-4ms ] org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer$MinMaxTracker:finish() #207
        +---[0.00% 6.76E-4ms ] org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer$MinMaxTracker:finish() #208
        +---[0.00% 9.59E-4ms ] org.apache.lucene.store.IndexOutput:getFilePointer() #226
        +---[0.00% 0.002238ms ] org.apache.lucene.store.IndexOutput:writeLong() #227
        +---[0.00% 0.008232ms ] org.apache.lucene.codecs.DocValuesProducer:getSortedNumeric() #228
        +---[0.41% 342.714972ms ] org.apache.lucene.codecs.lucene80.IndexedDISI:writeBitSet() #229
        +---[0.00% 6.18E-4ms ] org.apache.lucene.store.IndexOutput:getFilePointer() #230
        +---[0.00% 0.001522ms ] org.apache.lucene.store.IndexOutput:writeLong() #230
        +---[0.00% 5.47E-4ms ] org.apache.lucene.store.IndexOutput:writeShort() #231
        +---[0.00% 7.11E-4ms ] org.apache.lucene.store.IndexOutput:writeByte() #232
        +---[0.00% 7.69E-4ms ] org.apache.lucene.store.IndexOutput:writeLong() #235
        +---[0.00% 0.0011ms ] org.apache.lucene.store.IndexOutput:writeInt() #241
        +---[0.00% 5.87E-4ms ] org.apache.lucene.store.IndexOutput:writeByte() #277
        +---[0.00% 6.13E-4ms ] org.apache.lucene.store.IndexOutput:writeLong() #278
        +---[0.00% 6.04E-4ms ] org.apache.lucene.store.IndexOutput:writeLong() #279
        +---[0.00% 5.44E-4ms ] org.apache.lucene.store.IndexOutput:getFilePointer() #280
        +---[0.00% 6.09E-4ms ] org.apache.lucene.store.IndexOutput:writeLong() #281
        +---[0.00% 5.28E-4ms ] org.apache.lucene.store.IndexOutput:getFilePointer() #288
        +---[0.00% 5.99E-4ms ] org.apache.lucene.store.IndexOutput:writeLong() #288
        `---[0.00% 5.77E-4ms ] org.apache.lucene.store.IndexOutput:writeLong() #289

As soft delete field is NumericDocValue field (https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/common/lucene/Lucene.java#L980), each doc value write with soft delete field computes the min, max and gcd for all the soft delete documents in the segment (https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesConsumer.java#L203-L230) and hence as the number of soft delete document grows in a segment the time taken to write a new soft delete document increases proportionately.

Mitigation
Setting index.merge.policy.deletes_pct_allowed to 5% (the lowest allowed) did accelerate the merging of segments with high deleted documents and consequently helped to reduce the p100 latency to approx. 3 seconds (from 10+ seconds) but it is still 10x higher than the p100 (300ms) with soft deletes disabled.

Questions

  1. Does this provide enough evidence to re-enable soft delete as an option?
  2. Alternatively, can soft delete field type change to a non numeric field which will avoid computation thus improving (re-gaining) the performance?

@backslasht
Copy link
Contributor

@nknize - Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making Indexing Indexing, Bulk Indexing and anything related to indexing Performance This is for any performance related enhancements or bugs
Development

No branches or pull requests

5 participants