[Remote Store] Root cause deleted segment files during remote uploads #11025

gbbafna · 2023-10-31T12:33:14Z

Describe the bug

During remote store uploads, we do GatedCloseable<SegmentInfos> segmentInfosGatedCloseable = indexShard.getSegmentInfosSnapshot() which makes sure that the files we are uploading are not deleted/merged away.

However we have seen in async flow , where in multiple refreshes can happen and the remote store refresh listener is uploading older files . When it tries to update the tracker, it sees that the file has been deleted from local

Stack Trace

[2023-10-30T19:13:39,289][ERROR][o.o.i.s.RemoteDirectory  ] [29713b453747eb14e6f0eb2ae1c4367f] Exception in segment postUpload for file [_12r.si]
java.lang.NullPointerException: Cannot invoke "java.lang.Long.longValue()" because the return value of "java.util.Map.get(Object)" is null
	at org.opensearch.index.shard.RemoteStoreRefreshListener$2.onFailure(RemoteStoreRefreshListener.java:551)
	at org.opensearch.index.shard.RemoteStoreRefreshListener.lambda$uploadNewSegments$5(RemoteStoreRefreshListener.java:400)
	at org.opensearch.core.action.ActionListener$1.onFailure(ActionListener.java:90)
	at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:84)
	at org.opensearch.index.store.RemoteDirectory.lambda$uploadBlob$3(RemoteDirectory.java:369)
	at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82)
	at org.opensearch.core.action.ActionListener$6.onResponse(ActionListener.java:301)
	at org.opensearch.repositories.s3.S3BlobContainer.lambda$asyncBlobUpload$5(S3BlobContainer.java:235)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
	at org.opensearch.repositories.s3.async.AsyncTransferManager.lambda$uploadInOneChunk$21(AsyncTransferManager.java:362)
	at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
	at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncApiCallMetricCollectionStage.lambda$execute$0(AsyncApiCallMetricCollectionStage.java:56)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncApiCallTimeoutTrackingStage.lambda$execute$2(AsyncApiCallTimeoutTrackingStage.java:69)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:177)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$null$0(MakeAsyncHttpRequestStage.java:105)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$executeHttpRequest$3(MakeAsyncHttpRequestStage.java:163)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
	at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:858)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior

The locked files should not get deleted

The text was updated successfully, but these errors were encountered:

linuxpi · 2024-01-15T18:49:04Z

There are two places where the local segment files map is updated in RemoteStoreRefreshListener

OpenSearch/server/src/main/java/org/opensearch/index/shard/ReleasableRetryableRefreshListener.java

Lines 59 to 68 in c132db9

    
           @Override 
        
           public final void afterRefresh(boolean didRefresh) throws IOException { 
        
               if (closed.get()) { 
        
                   return; 
        
               } 
        
               runAfterRefreshExactlyOnce(didRefresh); 
        
               runAfterRefreshWithPermit(didRefresh, () -> {}); 
        
           }

runAfterRefreshExactlyOnce call this without acquiring any locks, whereas runAfterRefreshWithPermit does it by acquiring permits.

Flow for runAfterRefreshWithPermit

OpenSearch/server/src/main/java/org/opensearch/index/shard/ReleasableRetryableRefreshListener.java

Lines 147 to 169 in c132db9

    
               /** 
        
                * Runs the performAfterRefresh method under permit. If there are no permits available, then it is no-op. It also hits 
        
                * the scheduleRetry method with the result value of the performAfterRefresh method invocation. 
        
                * The synchronised block ensures that if there is a retry or afterRefresh waiting, then it waits until the previous 
        
                * execution finishes. 
        
                */ 
        
               private synchronized void runAfterRefreshWithPermit(boolean didRefresh, Runnable runFinally) { 
        
                   if (closed.get()) { 
        
                       return; 
        
                   } 
        
                   boolean successful; 
        
                   boolean permitAcquired = semaphore.tryAcquire(); 
        
                   try { 
        
                       successful = permitAcquired && performAfterRefreshWithPermit(didRefresh); 
        
                   } finally { 
        
                       if (permitAcquired) { 
        
                           semaphore.release(); 
        
                       } 
        
                       runFinally.run(); 
        
                   } 
        
                   scheduleRetry(successful, didRefresh); 
        
               }

OpenSearch/server/src/main/java/org/opensearch/index/shard/RemoteStoreRefreshListener.java

Lines 209 to 220 in c132db9

    
           try (GatedCloseable<SegmentInfos> segmentInfosGatedCloseable = indexShard.getSegmentInfosSnapshot()) { 
        
               SegmentInfos segmentInfos = segmentInfosGatedCloseable.get(); 
        
               final ReplicationCheckpoint checkpoint = indexShard.computeReplicationCheckpoint(segmentInfos); 
        
               // Capture replication checkpoint before uploading the segments as upload can take some time and checkpoint can 
        
               // move. 
        
               long lastRefreshedCheckpoint = ((InternalEngine) indexShard.getEngine()).lastRefreshedCheckpoint(); 
        
               Collection<String> localSegmentsPostRefresh = segmentInfos.files(true); 
        
               // Create a map of file name to size and update the refresh segment tracker 
        
               updateLocalSizeMapAndTracker(localSegmentsPostRefresh); 
        
               CountDownLatch latch = new CountDownLatch(1); 
        
               ActionListener<Void> segmentUploadsCompletedListener = new LatchedActionListener<>(new ActionListener<>() {

Flow for runAfterRefreshExactlyOnce

OpenSearch/server/src/main/java/org/opensearch/index/shard/RemoteStoreRefreshListener.java

Lines 123 to 140 in c132db9

    
           @Override 
        
           protected void runAfterRefreshExactlyOnce(boolean didRefresh) { 
        
               // We have 2 separate methods to check if sync needs to be done or not. This is required since we use the return boolean 
        
               // from isReadyForUpload to schedule refresh retries as the index shard or the primary mode are not in complete 
        
               // ready state. 
        
               if (shouldSync(didRefresh, true) && isReadyForUpload()) { 
        
                   try { 
        
                       segmentTracker.updateLocalRefreshTimeAndSeqNo(); 
        
                       try (GatedCloseable<SegmentInfos> segmentInfosGatedCloseable = indexShard.getSegmentInfosSnapshot()) { 
        
                           Collection<String> localSegmentsPostRefresh = segmentInfosGatedCloseable.get().files(true); 
        
                           updateLocalSizeMapAndTracker(localSegmentsPostRefresh); 
        
                       } 
        
                   } catch (Throwable t) { 
        
                       logger.error("Exception in runAfterRefreshExactlyOnce() method", t); 
        
                   } 
        
               } 
        
           }

We need to fix the runAfterRefreshExactlyOnce flow to not update local segments map without acquiring proper locks

ashking94 · 2024-01-16T04:12:41Z

The underlying map is backed by concurrent hashmap. If both the executions are happening simultaneously, it should still be idempotent and should not lead to any entry go missing from the map.

linuxpi · 2024-01-16T05:01:37Z

The underlying map is backed by concurrent hashmap. If both the executions are happening simultaneously, it should still be idempotent and should not lead to any entry go missing from the map.

Agree, but the modifications to the map is not happening concurrently, but some millis apart. The problem is the upload is still in progress for the previous refresh segment files, which are removed from the map due to newer refresh flow has already called updateLatestLocalFileNameLengthMap .

ConcurrentHashMap would ensure no two threads modify it at the same time, but we are actually modifying the map few millis apart.

Checkout the logs from integ test where i saw this issue - #9774 (comment)

ashking94 · 2024-01-16T06:53:21Z

The underlying map is backed by concurrent hashmap. If both the executions are happening simultaneously, it should still be idempotent and should not lead to any entry go missing from the map.

Agree, but the modifications to the map is not happening concurrently, but some millis apart. The problem is the upload is still in progress for the previous refresh segment files, which are removed from the map due to newer refresh flow has already called updateLatestLocalFileNameLengthMap .

ConcurrentHashMap would ensure no two threads modify it at the same time, but we are actually modifying the map few millis apart.

Checkout the logs from integ test where i saw this issue - #9774 (comment)

The case you mentioned can happen if there is a retry ongoing coinciding with a refresh/flush. On the solution part, lets not introduce more synchronisations or locks usage, we can deep clone the segment tracker and use it throughout the method's lifecycle.

linuxpi · 2024-01-16T06:55:27Z

Ideally refreshes and the afterRefresh listeners are not supposed to be executed in concurrently, due to refreshLock in ReferenceManager.java

https://github.com/apache/lucene/blob/c746bea233fd14fe81d9805633fcce0f7b9681b6/lucene/core/src/java/org/apache/lucene/search/ReferenceManager.java#L161-L186

But with ReleasableRetryableRefreshListener we allow retries to be executed separately from this flow.

OpenSearch/server/src/main/java/org/opensearch/index/shard/ReleasableRetryableRefreshListener.java

Lines 124 to 131 in c132db9

    
           try { 
        
               this.threadPool.schedule( 
        
                   () -> runAfterRefreshWithPermit(didRefresh, () -> retryScheduled.set(false)), 
        
                   interval, 
        
                   retryThreadPoolName 
        
               ); 
        
               scheduled = true; 
        
               getLogger().info("Scheduled retry with didRefresh={}", didRefresh);

linuxpi · 2024-01-16T06:57:21Z

Even after upload is completed, another place we are using the local files reference from segmentTracker which could provide incorrect results due to allowed concurrency with retries.

OpenSearch/server/src/main/java/org/opensearch/index/shard/RemoteStoreRefreshListener.java

Lines 293 to 303 in c132db9

    
           private void onSuccessfulSegmentsSync( 
        
               long refreshTimeMs, 
        
               long refreshClockTimeMs, 
        
               long refreshSeqNo, 
        
               long lastRefreshedCheckpoint, 
        
               ReplicationCheckpoint checkpoint 
        
           ) { 
        
               // Update latest uploaded segment files name in segment tracker 
        
               segmentTracker.setLatestUploadedFiles(segmentTracker.getLatestLocalFileNameLengthMap().keySet()); 
        
               // Update the remote refresh time and refresh seq no 
        
               updateRemoteRefreshTimeAndSeqNo(refreshTimeMs, refreshClockTimeMs, refreshSeqNo);

linuxpi · 2024-01-16T07:17:01Z

The underlying map is backed by concurrent hashmap. If both the executions are happening simultaneously, it should still be idempotent and should not lead to any entry go missing from the map.

Agree, but the modifications to the map is not happening concurrently, but some millis apart. The problem is the upload is still in progress for the previous refresh segment files, which are removed from the map due to newer refresh flow has already called updateLatestLocalFileNameLengthMap .
ConcurrentHashMap would ensure no two threads modify it at the same time, but we are actually modifying the map few millis apart.
Checkout the logs from integ test where i saw this issue - #9774 (comment)

The case you mentioned can happen if there is a retry ongoing coinciding with a refresh/flush. On the solution part, lets not introduce more synchronisations or locks usage, we can deep clone the segment tracker and use it throughout the method's lifecycle.

yeah, not going introduce any more locks as they are not needed. But need to understand if we can skip the the call to updateLocalSizeMapAndTracker in runAfterRefreshExactlyOnce

OpenSearch/server/src/main/java/org/opensearch/index/shard/RemoteStoreRefreshListener.java

Lines 123 to 140 in c132db9

    
           @Override 
        
           protected void runAfterRefreshExactlyOnce(boolean didRefresh) { 
        
               // We have 2 separate methods to check if sync needs to be done or not. This is required since we use the return boolean 
        
               // from isReadyForUpload to schedule refresh retries as the index shard or the primary mode are not in complete 
        
               // ready state. 
        
               if (shouldSync(didRefresh, true) && isReadyForUpload()) { 
        
                   try { 
        
                       segmentTracker.updateLocalRefreshTimeAndSeqNo(); 
        
                       try (GatedCloseable<SegmentInfos> segmentInfosGatedCloseable = indexShard.getSegmentInfosSnapshot()) { 
        
                           Collection<String> localSegmentsPostRefresh = segmentInfosGatedCloseable.get().files(true); 
        
                           updateLocalSizeMapAndTracker(localSegmentsPostRefresh); 
        
                       } 
        
                   } catch (Throwable t) { 
        
                       logger.error("Exception in runAfterRefreshExactlyOnce() method", t); 
        
                   } 
        
               } 
        
           }

We anyways update this map in syncSegments. Only caveat i can see is the bytes lag computation would be delayed a bit. is there any other use of this call?

The deep clone solution would complicate the code further, ideally the segmenttracker should support addition of new files and removal of files as they are uploaded, instead of just supporting a replacement of local files map.

ashking94 · 2024-01-16T07:44:20Z

We anyways update this map in syncSegments. Only caveat i can see is the bytes lag computation would be delayed a bit. is there any other use of this call?

If there is any upload taking a lot of time, the lag would be incorrect and difficult to debug issues. We need this at both the places to ensure that the local refreshes show up asap and if there is retry happening, the local state again reflects correctly if there has been any refreshes performed since the original retry.

The deep clone solution would complicate the code further, ideally the segmenttracker should support addition of new files and removal of files as they are uploaded, instead of just supporting a replacement of local files map.

The local map contains the state as on local. On the approach you mentioned about removal of files once uploaded can lead to memory leak situations as we have seen that segments file name consist of file names that are not present locally in certain circumstances. As discussed, we can have a clone created in the method scope to ensure that the segment tracker is populated correctly.

ashking94 · 2024-01-16T07:53:12Z

@linuxpi @gbbafna - the issue mentioned in the description is different from what we have discussed above. The issue that @gbbafna has mentioned happens when the segmentinfos returns a segment file that does not exists locally. This will lead to 2 problems - 1) During the upload, the segment file is missing and the upload fails. 2) After the upload fails, we try to update the segment tracker with the failed upload details and they do not exist.

We still need to handle the above 2 cases - just fyi.

linuxpi · 2024-01-16T08:41:28Z

i think its the same issue which @gbbafna has mentioned. The files would still not be deleted which are being uploaded, the only issue was that local segments map in segmentTracker was getting updated prematurely

During the upload, the segment file is missing and the upload fails.

We should have a separate issue for this i believe as GatedClosable is not able to guarantee file wont be removed

After the upload fails, we try to update the segment tracker with the failed upload details and they do not exist.

This is due the same issue we are trying to solve here. If the local files map would have had the file that were being uploaded, it would not have failed.

linuxpi · 2024-01-16T08:42:21Z

We anyways update this map in syncSegments. Only caveat i can see is the bytes lag computation would be delayed a bit. is there any other use of this call?

If there is any upload taking a lot of time, the lag would be incorrect and difficult to debug issues. We need this at both the places to ensure that the local refreshes show up asap and if there is retry happening, the local state again reflects correctly if there has been any refreshes performed since the original retry.

The deep clone solution would complicate the code further, ideally the segmenttracker should support addition of new files and removal of files as they are uploaded, instead of just supporting a replacement of local files map.

The local map contains the state as on local. On the approach you mentioned about removal of files once uploaded can lead to memory leak situations as we have seen that segments file name consist of file names that are not present locally in certain circumstances. As discussed, we can have a clone created in the method scope to ensure that the segment tracker is populated correctly.

sounds good. raised a draft PR #11896

linuxpi · 2024-02-09T09:46:36Z

@gbbafna @ashking94 @sachinpkale Looks like the issue still persists. Re-opening

[2024-02-09T08:36:34,456][WARN ][o.o.i.r.RemoteSegmentTransferTracker] [109734d9693e4e20ca5986a72e60da3c] [grab_185][1] Exception while reading the fileLength of file=segments_j
java.nio.file.NoSuchFileException: /hdd1/mnt/env/root/<REDACTED>var/es/data/nodes/0/indices/00Ftf6TCIORAuST1uEfyBg8A/1/index/segments_j
        at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
        at java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
        at java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148)
        at java.base/sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
        at java.base/java.nio.file.Files.readAttributes(Files.java:1851)
        at java.base/java.nio.file.Files.size(Files.java:2468)
        at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:208)
        at org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:70)
        at org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:70)
        at org.opensearch.index.remote.RemoteSegmentTransferTracker.lambda$updateLatestLocalFileNameLengthMap$3(RemoteSegmentTransferTracker.java:323)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
        at java.base/java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1707)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
        at org.opensearch.index.remote.RemoteSegmentTransferTracker.updateLatestLocalFileNameLengthMap(RemoteSegmentTransferTracker.java:320)
        at org.opensearch.index.shard.RemoteStoreRefreshListener.updateLocalSizeMapAndTracker(RemoteStoreRefreshListener.java:482)
        at org.opensearch.index.shard.RemoteStoreRefreshListener.syncSegments(RemoteStoreRefreshListener.java:244)
        at org.opensearch.index.shard.RemoteStoreRefreshListener.performAfterRefreshWithPermit(RemoteStoreRefreshListener.java:153)
        at org.opensearch.index.shard.ReleasableRetryableRefreshListener.runAfterRefreshWithPermit(ReleasableRetryableRefreshListener.java:160)
        at org.opensearch.index.shard.ReleasableRetryableRefreshListener.lambda$scheduleRetry$2(ReleasableRetryableRefreshListener.java:126)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:858)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
[2024-02-09T08:36:34,457][WARN ][o.o.i.s.RemoteStoreRefreshListener] [109734d9693e4e20ca5986a72e60da3c] [grab_185][1] Exception: [java.nio.file.NoSuchFileException: /hdd1/mnt/env/root/<REDACTED>/var/es/data/nodes/0/indices/00Ftf6TCIORAuST1uEfyBg8A/1/index/segments_j] while uploading segment files
java.nio.file.NoSuchFileException: /hdd1/mnt/env/root/<REDACTED>/var/es/data/nodes/0/indices/00Ftf6TCIORAuST1uEfyBg8A/1/index/segments_j
        at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
        at java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:181)
        at java.base/java.nio.channels.FileChannel.open(FileChannel.java:298)
        at java.base/java.nio.channels.FileChannel.open(FileChannel.java:357)
        at org.apache.lucene.store.MappedByteBufferIndexInputProvider.openInput(MappedByteBufferIndexInputProvider.java:74)
        at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:323)
        at org.opensearch.index.store.FsDirectoryFactory$HybridDirectory.openInput(FsDirectoryFactory.java:181)
        at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101)
        at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101)
        at org.opensearch.index.store.RemoteDirectory.calculateChecksumOfChecksum(RemoteDirectory.java:403)
        at org.opensearch.index.store.RemoteDirectory.uploadBlob(RemoteDirectory.java:347)
        at org.opensearch.index.store.RemoteDirectory.copyFrom(RemoteDirectory.java:330)
        at org.opensearch.index.store.RemoteSegmentStoreDirectory.copyFrom(RemoteSegmentStoreDirectory.java:452)
        at org.opensearch.index.shard.RemoteStoreRefreshListener.uploadNewSegments(RemoteStoreRefreshListener.java:434)
        at org.opensearch.index.shard.RemoteStoreRefreshListener.syncSegments(RemoteStoreRefreshListener.java:280)
        at org.opensearch.index.shard.RemoteStoreRefreshListener.performAfterRefreshWithPermit(RemoteStoreRefreshListener.java:153)
        at org.opensearch.index.shard.ReleasableRetryableRefreshListener.runAfterRefreshWithPermit(ReleasableRetryableRefreshListener.java:160)
        at org.opensearch.index.shard.ReleasableRetryableRefreshListener.lambda$scheduleRetry$2(ReleasableRetryableRefreshListener.java:126)

gbbafna added bug Something isn't working untriaged Storage:Durability Issues and PRs related to the durability framework labels Oct 31, 2023

linuxpi mentioned this issue Jan 15, 2024

[BUG] Flaky org.opensearch.remotestore.RemoteStoreStatsIT.testStatsResponseFromLocalNode test #9774

Closed

linuxpi added Storage:Remote and removed untriaged labels Jan 15, 2024

linuxpi self-assigned this Jan 15, 2024

linuxpi mentioned this issue Jan 17, 2024

Create a clone of local segements size map used for Remote Segment Stats until sync to remote completes #11896

Merged

8 tasks

gbbafna mentioned this issue Jan 17, 2024

[Remote Store] Simplify RemoteStoreRefresh Listener #11908

Open

gbbafna closed this as completed in #11896 Feb 2, 2024

linuxpi reopened this Feb 9, 2024

github-actions bot added the untriaged label Feb 9, 2024

Bukhtawar added this to Storage Project Board Feb 15, 2024

github-project-automation bot moved this to 🆕 New in Storage Project Board Feb 15, 2024

linuxpi moved this from 🆕 New to Later (6 months plus) in Storage Project Board Mar 5, 2024

linuxpi added Storage Issues and PRs relating to data and metadata storage and removed untriaged labels Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Remote Store] Root cause deleted segment files during remote uploads #11025

[Remote Store] Root cause deleted segment files during remote uploads #11025

gbbafna commented Oct 31, 2023 •

edited

Loading

linuxpi commented Jan 15, 2024

ashking94 commented Jan 16, 2024 •

edited

Loading

linuxpi commented Jan 16, 2024

ashking94 commented Jan 16, 2024

linuxpi commented Jan 16, 2024

linuxpi commented Jan 16, 2024

linuxpi commented Jan 16, 2024

ashking94 commented Jan 16, 2024 •

edited

Loading

ashking94 commented Jan 16, 2024

linuxpi commented Jan 16, 2024 •

edited

Loading

linuxpi commented Jan 16, 2024

linuxpi commented Feb 9, 2024

[Remote Store] Root cause deleted segment files during remote uploads #11025

[Remote Store] Root cause deleted segment files during remote uploads #11025

Comments

gbbafna commented Oct 31, 2023 • edited Loading

linuxpi commented Jan 15, 2024

Flow for runAfterRefreshWithPermit

Flow for runAfterRefreshExactlyOnce

ashking94 commented Jan 16, 2024 • edited Loading

linuxpi commented Jan 16, 2024

ashking94 commented Jan 16, 2024

linuxpi commented Jan 16, 2024

linuxpi commented Jan 16, 2024

linuxpi commented Jan 16, 2024

ashking94 commented Jan 16, 2024 • edited Loading

ashking94 commented Jan 16, 2024

linuxpi commented Jan 16, 2024 • edited Loading

linuxpi commented Jan 16, 2024

linuxpi commented Feb 9, 2024

gbbafna commented Oct 31, 2023 •

edited

Loading

ashking94 commented Jan 16, 2024 •

edited

Loading

ashking94 commented Jan 16, 2024 •

edited

Loading

linuxpi commented Jan 16, 2024 •

edited

Loading