[BUG] org.opensearch.operateAllIndices.DestructiveOperationsIT.testOpenIndexDefaultBehaviour is flaky #11275

cwperks · 2023-11-20T15:36:17Z

Describe the bug

Failure seen on test run https://build.ci.opensearch.org/job/gradle-check/30215/console

To Reproduce

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.operateAllIndices.DestructiveOperationsIT.testOpenIndexDefaultBehaviour" -Dtests.seed=56F5521F6E40D4B4 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ar-SD -Dtests.timezone=Canada/Saskatchewan -Druntime.java=21

SwethaGuptha · 2024-04-19T06:05:20Z

This test case performs 3 actions: Creates indices, Closes them and then re-open them again to validate if the indices was successfully opened or not.

From the gradle report, seems like this is a issue from shard-management as indices were closed successfully but experienced ShardLockObtainFailedException on opening the index again

 1> [2023-11-20T09:02:33,914][INFO ][o.o.c.m.MetadataIndexStateService] [node_s2] completed closing of indices [index1, 1index]

  1> [2023-11-20T09:02:39,050][WARN ][o.o.i.c.IndicesClusterStateService] [node_s1] [index1][0] marking and sending shard failed due to [failed to create shard]
  1> java.io.IOException: failed to obtain in-memory shard lock
  1> 	at org.opensearch.index.IndexService.createShard(IndexService.java:539) ~[main/:?]
  1> 	at org.opensearch.indices.IndicesService.createShard(IndicesService.java:1002) ~[main/:?]
  1> 	at org.opensearch.indices.IndicesService.createShard(IndicesService.java:211) ~[main/:?]
  1> 	at org.opensearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:672) [main/:?]
  1> 	at org.opensearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:649) [main/:?]
  1> 	at org.opensearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:294) [main/:?]
  1> 	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:608) [main/:?]
  1> 	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:595) [main/:?]
  1> 	at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:563) [main/:?]
  1> 	at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:486) [main/:?]
  1> 	at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:188) [main/:?]
  1> 	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:852) [main/:?]
  1> 	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283) [main/:?]
  1> 	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246) [main/:?]
  1> 	at java.****/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
  1> 	at java.****/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
  1> 	at java.****/java.lang.Thread.run(Thread.java:1583) [?:?]
  1> Caused by: org.opensearch.env.ShardLockObtainFailedException: [index1][0]: obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [5110ms]
  1> 	at org.opensearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:851) ~[main/:?]
  1> 	at org.opensearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:748) ~[main/:?]
  1> 	at org.opensearch.index.IndexService.createShard(IndexService.java:476) ~[main/:?]
  1> 	... 16 more

cwperks added bug Something isn't working untriaged labels Nov 20, 2023

cwperks mentioned this issue Nov 20, 2023

Bump jackson and jackson_databind from 2.15.2 to 2.16.0 #11273

Merged

8 tasks

peternied added flaky-test Random test failure that succeeds on second run and removed untriaged labels Nov 30, 2023

ankitkala assigned gauravruhela Dec 16, 2023

ankitkala added the Cluster Manager label Dec 16, 2023

rwali-aws unassigned gauravruhela Mar 27, 2024

rwali-aws added ShardManagement:Resiliency and removed Cluster Manager labels Apr 19, 2024

github-project-automation bot added this to Cluster Manager Project Board and Shard Management Project Board Apr 19, 2024

github-project-automation bot moved this to 🆕 New in Shard Management Project Board Apr 19, 2024

github-project-automation bot moved this to 🆕 New in Cluster Manager Project Board Apr 19, 2024

rwali-aws removed this from Cluster Manager Project Board Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] org.opensearch.operateAllIndices.DestructiveOperationsIT.testOpenIndexDefaultBehaviour is flaky #11275

[BUG] org.opensearch.operateAllIndices.DestructiveOperationsIT.testOpenIndexDefaultBehaviour is flaky #11275

cwperks commented Nov 20, 2023

SwethaGuptha commented Apr 19, 2024

[BUG] org.opensearch.operateAllIndices.DestructiveOperationsIT.testOpenIndexDefaultBehaviour is flaky #11275

[BUG] org.opensearch.operateAllIndices.DestructiveOperationsIT.testOpenIndexDefaultBehaviour is flaky #11275

Comments

cwperks commented Nov 20, 2023

SwethaGuptha commented Apr 19, 2024