Skip to content

Commit

Permalink
[DPE-4557] fix timeout when initializing the security index (#321)
Browse files Browse the repository at this point in the history
## Issue
#320

## Solution
When the last unit in a cluster is stopping, it adds the
`voting_config_exclusion`, but doesn't delete it anymore (because all
units are already stopped). It is then persisting on disk. If the
storage is reused and the first new unit starts, it may happen that the
new unit can't become cluster manager because it's not possible to reach
quorum.

Therefore the last unit stopping should not add a voting exclusion.
  • Loading branch information
reneradoi authored Jun 7, 2024
1 parent b86bbac commit f97f015
Showing 1 changed file with 12 additions and 4 deletions.
16 changes: 12 additions & 4 deletions lib/charms/opensearch/v0/opensearch_base_charm.py
Original file line number Diff line number Diff line change
Expand Up @@ -1042,10 +1042,18 @@ def _stop_opensearch(self, *, restart=False) -> None:
self.status.set(WaitingStatus(ServiceIsStopping))

if self.opensearch.is_node_up():
# TODO: we should probably NOT have any exclusion on restart
# https://chat.canonical.com/canonical/pl/bgndmrfxr7fbpgmwpdk3hin93c
# 1. Add current node to the voting + alloc exclusions
self.opensearch_exclusions.add_current()
try:
nodes = self._get_nodes(True)
# do not add exclusions if it's the last unit to stop
# otherwise cluster manager election will be blocked when starting up again
# and re-using storage
if len(nodes) > 1:
# TODO: we should probably NOT have any exclusion on restart
# https://chat.canonical.com/canonical/pl/bgndmrfxr7fbpgmwpdk3hin93c
# 1. Add current node to the voting + alloc exclusions
self.opensearch_exclusions.add_current()
except OpenSearchHttpError:
logger.debug("Failed to get online nodes, voting and alloc exclusions not added")

# TODO: should block until all shards move addressed in PR DPE-2234

Expand Down

0 comments on commit f97f015

Please sign in to comment.