Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initializing the security index times out when re-using storage with OpenSearch 2.14.0 #326

Closed
reneradoi opened this issue Jun 10, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@reneradoi
Copy link
Contributor

Steps to reproduce

  • create a cluster of 3 units
  • remove the application
  • deploy again with 1 unit and re-attach storage

Expected behavior

Newly deployed application unit starts correctly.

Actual behavior

Unit does not start up, instead hangs in Initializing the security index...

Versions

OpenSearch 2.14.0 (snap revision 51)

Log output

from the server log:

[2024-06-10T07:09:40,716][INFO ][o.o.a.c.ADDataMigrator   ] [opensearch-3] Start migrating AD data
[2024-06-10T07:09:40,716][INFO ][o.o.a.c.ADDataMigrator   ] [opensearch-3] AD job index doesn't exist, no need to migrate
[2024-06-10T07:09:40,716][INFO ][o.o.a.c.ADClusterEventListener] [opensearch-3] Init AD version hash ring successfully
[2024-06-10T07:09:40,731][INFO ][o.o.g.GatewayService     ] [opensearch-3] recovered [5] indices into cluster_state
[2024-06-10T07:09:40,734][WARN ][o.o.o.i.ObservabilityIndex] [opensearch-3] message: index [.opensearch-observability/lao7cXi9SzCfWdPsE-qpdA] already exists
[2024-06-10T07:09:40,735][INFO ][o.o.s.l.LogTypeService   ] [opensearch-3] Loading builtin types!
[2024-06-10T07:09:40,738][INFO ][o.o.s.l.LogTypeService   ] [opensearch-3] Indexing [429] fieldMappingDocs from logTypes: 24
[2024-06-10T07:09:40,771][WARN ][o.o.c.r.a.AllocationService] [opensearch-3] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
[2024-06-10T07:09:40,777][WARN ][o.o.s.SecurityAnalyticsPlugin] [opensearch-3] Failed to initialize LogType config index and builtin log types
[2024-06-10T07:09:40,863][INFO ][o.o.p.PluginsService     ] [opensearch-3] PluginService:onIndexModule index:[.plugins-ml-config/uYso9eSvSRy3MvABKkgmCg]
[2024-06-10T07:09:41,310][WARN ][o.o.c.r.a.AllocationService] [opensearch-3] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
[2024-06-10T07:09:41,631][INFO ][o.o.s.l.LogTypeService   ] [opensearch-3] Loading builtin types!
[2024-06-10T07:09:41,632][INFO ][o.o.s.l.LogTypeService   ] [opensearch-3] Indexing [429] fieldMappingDocs from logTypes: 24
[2024-06-10T07:09:41,634][INFO ][o.o.s.l.LogTypeService   ] [opensearch-3] Loading builtin types!
[2024-06-10T07:09:41,635][INFO ][o.o.s.l.LogTypeService   ] [opensearch-3] Indexing [429] fieldMappingDocs from logTypes: 24
[2024-06-10T07:09:41,637][INFO ][o.o.s.i.DetectorIndexManagementService] [opensearch-3] info deleteOldIndices
[2024-06-10T07:09:41,641][INFO ][o.o.s.i.DetectorIndexManagementService] [opensearch-3] No Old Correlation Indices to delete
[2024-06-10T07:10:20,678][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [opensearch-3] Failure No shard available for [org.opensearch.action.get.MultiGetShardRequest@28fa55ab] retrieving configuration for [ACTIONGROUPS, ALLOWLIST, AUDIT, CONFIG, INTERNALUSERS, NODESDN, ROLES, ROLESMAPPING, TENANTS, WHITELIST] (index=.opendistro_security)

Additional context

There was a change to the way the security index is initialized in OpenSearch 2.14.0, see here.

@reneradoi reneradoi added the bug Something isn't working label Jun 10, 2024
Copy link
Contributor

@phvalguima
Copy link
Contributor

@reneradoi if the application is going away eventually, could it be just deploying a single unit would reproduce your issue as well?

@reneradoi
Copy link
Contributor Author

Some new details can be added on the issue.

Error from the OpenSearch server logfile:

[2024-06-29T14:47:57,204][WARN ][o.o.c.c.ClusterFormationFailureHelper] [opensearch-6] cluster-manager not discovered or elected yet, an election requires a node with id [qRqeY8p5QJmJLPyRCZjVPA], have discovered [{opensearch-6}{DCFTCopZQy6IxSEfTcElOQ}{NKaiIbJnSAadazKYJhorkg}{10.26.48.242}{10.26.48.242:9300}{coordinating_onlydimml}{shard_indexing_pressure_enabled=true}] which is not a quorum; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, [::1]:9300, [::1]:9301, [::1]:9302, [::1]:9303, [::1]:9304, [::1]:9305, 127.0.0.1:9300] from hosts providers and [{opensearch-6}{DCFTCopZQy6IxSEfTcElOQ}{NKaiIbJnSAadazKYJhorkg}{10.26.48.242}{10.26.48.242:9300}{coordinating_onlydimml}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 3, last-accepted version 153 in term 3

This could be reproduced by removing the OpenSearch application and re-using the storage, more concrete: when attaching a non-cluster-leader disk to a new unit. The root cause is: If, when removing the application, the cluster manager node is removed first, this can result in a split brain situation where the shards formerly assigned to this node don't get assigned to a new node anymore because only two nodes remain for voting and they don't reach consensus. If this state remains and the application finally gets removed, it can't get out of it anymore.

There's another issue related to this: #327

This issue here should be resolved too when #327 is resolved.

@phvalguima
Copy link
Contributor

Hi @reneradoi can we close this issue?

@reneradoi
Copy link
Contributor Author

No longer relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants