Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.operateAllIndices.DestructiveOperationsIT.testOpenIndexDefaultBehaviour is flaky #11275

Open
cwperks opened this issue Nov 20, 2023 · 1 comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run ShardManagement:Resiliency

Comments

@cwperks
Copy link
Member

cwperks commented Nov 20, 2023

Describe the bug

Failure seen on test run https://build.ci.opensearch.org/job/gradle-check/30215/console

To Reproduce

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.operateAllIndices.DestructiveOperationsIT.testOpenIndexDefaultBehaviour" -Dtests.seed=56F5521F6E40D4B4 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ar-SD -Dtests.timezone=Canada/Saskatchewan -Druntime.java=21

@cwperks cwperks added bug Something isn't working untriaged labels Nov 20, 2023
@peternied peternied added flaky-test Random test failure that succeeds on second run and removed untriaged labels Nov 30, 2023
@SwethaGuptha
Copy link
Contributor

This test case performs 3 actions: Creates indices, Closes them and then re-open them again to validate if the indices was successfully opened or not.

From the gradle report, seems like this is a issue from shard-management as indices were closed successfully but experienced ShardLockObtainFailedException on opening the index again

 1> [2023-11-20T09:02:33,914][INFO ][o.o.c.m.MetadataIndexStateService] [node_s2] completed closing of indices [index1, 1index]

  1> [2023-11-20T09:02:39,050][WARN ][o.o.i.c.IndicesClusterStateService] [node_s1] [index1][0] marking and sending shard failed due to [failed to create shard]
  1> java.io.IOException: failed to obtain in-memory shard lock
  1> 	at org.opensearch.index.IndexService.createShard(IndexService.java:539) ~[main/:?]
  1> 	at org.opensearch.indices.IndicesService.createShard(IndicesService.java:1002) ~[main/:?]
  1> 	at org.opensearch.indices.IndicesService.createShard(IndicesService.java:211) ~[main/:?]
  1> 	at org.opensearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:672) [main/:?]
  1> 	at org.opensearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:649) [main/:?]
  1> 	at org.opensearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:294) [main/:?]
  1> 	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:608) [main/:?]
  1> 	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:595) [main/:?]
  1> 	at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:563) [main/:?]
  1> 	at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:486) [main/:?]
  1> 	at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:188) [main/:?]
  1> 	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:852) [main/:?]
  1> 	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283) [main/:?]
  1> 	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246) [main/:?]
  1> 	at java.****/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
  1> 	at java.****/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
  1> 	at java.****/java.lang.Thread.run(Thread.java:1583) [?:?]
  1> Caused by: org.opensearch.env.ShardLockObtainFailedException: [index1][0]: obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [5110ms]
  1> 	at org.opensearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:851) ~[main/:?]
  1> 	at org.opensearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:748) ~[main/:?]
  1> 	at org.opensearch.index.IndexService.createShard(IndexService.java:476) ~[main/:?]
  1> 	... 16 more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run ShardManagement:Resiliency
Projects
Status: 🆕 New
Development

No branches or pull requests

6 participants