hook "leader-elected" fails when adding a unit after scale down to zero units #306

reneradoi · 2024-05-23T07:21:46Z

Steps to reproduce

juju add-model opensearch
# apply the kernel parameters required for opensearch
juju model-config --file ./cloudinit-userdata.yaml
juju create-storage-pool opensearch-storage lxd volume-type=standard
juju deploy opensearch -n 2 --channel 2/edge --storage opensearch-data=opensearch-storage,1G,1
juju deploy self-signed-certificates
juju config self-signed-certificates ca-common-name="CN_CA"
juju relate self-signed-certificates opensearch
juju remove-unit opensearch/1
juju remove-unit opensearch/0
juju add-unit opensearch --attach-storage=opensearch-data/0

Expected behavior

The newly added unit should start up without error.

Actual behavior

$ juju status --storage
Model  Controller  Cloud/Region         Version  SLA          Timestamp
dev    opensearch  localhost/localhost  3.1.8    unsupported  06:52:18Z

App                       Version  Status  Scale  Charm                     Channel  Rev  Exposed  Message
opensearch                         active      1  opensearch                           1  no       
self-signed-certificates           active      1  self-signed-certificates  stable    72  no       

Unit                         Workload  Agent  Machine  Public address  Ports  Message
opensearch/2*                error     idle   5        10.27.170.244          hook failed: "leader-elected"
self-signed-certificates/0*  active    idle   2        10.27.170.141          

Machine  State    Address        Inst id        Base          AZ  Message
2        started  10.27.170.141  juju-622e8b-2  [email protected]      Running
5        started  10.27.170.244  juju-622e8b-5  [email protected]      Running

Storage Unit  Storage ID         Type        Pool                Mountpoint                   Size     Status    Message
              opensearch-data/1  filesystem  opensearch-storage                               1.0 GiB  detached  
opensearch/2  opensearch-data/0  filesystem  opensearch-storage  /var/snap/opensearch/common  1.0 GiB  attached

Versions

Operating system: Ubuntu 24.04 LTS, Ubuntu 22.04 LTS
Juju CLI: 3.1.8-genericlinux-amd64
Juju agent: 3.1.8
Charm revision: 47
LXD: 5.21.1 LTS

Log output

unit-opensearch-2: 06:53:05 ERROR unit.opensearch/2.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-opensearch-2/charm/./src/charm.py", line 267, in <module>
    main(OpenSearchOperatorCharm)
  File "/var/lib/juju/agents/unit-opensearch-2/charm/venv/ops/main.py", line 544, in main
    manager.run()
  File "/var/lib/juju/agents/unit-opensearch-2/charm/venv/ops/main.py", line 520, in run
    self._emit()
  File "/var/lib/juju/agents/unit-opensearch-2/charm/venv/ops/main.py", line 509, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name)
  File "/var/lib/juju/agents/unit-opensearch-2/charm/venv/ops/main.py", line 143, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-opensearch-2/charm/venv/ops/framework.py", line 352, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-opensearch-2/charm/venv/ops/framework.py", line 851, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-opensearch-2/charm/venv/ops/framework.py", line 941, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-opensearch-2/charm/lib/charms/opensearch/v0/opensearch_base_charm.py", line 302, in _on_leader_elected
    self._put_or_update_internal_user_leader(user)
  File "/var/lib/juju/agents/unit-opensearch-2/charm/lib/charms/opensearch/v0/opensearch_base_charm.py", line 1244, in _put_or_update_internal_user_leader
    self.user_manager.update_user_password(user, hashed_pwd)
  File "/var/lib/juju/agents/unit-opensearch-2/charm/lib/charms/opensearch/v0/opensearch_users.py", line 268, in update_user_password
    resp = self.opensearch.request(
  File "/var/lib/juju/agents/unit-opensearch-2/charm/lib/charms/opensearch/v0/opensearch_distro.py", line 266, in request
    raise OpenSearchHttpError(
charms.opensearch.v0.opensearch_exceptions.OpenSearchHttpError: HTTP error self.response_code=None
self.response_text='Host 10.27.170.244:9200 and alternative_hosts: [] not reachable.'
unit-opensearch-4: 06:53:06 ERROR juju.worker.uniter.operation hook "leader-elected" (via hook dispatching script: dispatch) failed: exit status 1

Additional context

I assume the issue is with security_index_initialised, this is not in the peer data anymore:

$ jhack show-relation opensearch:opensearch-peers opensearch:opensearch-peers
                                                                                             relation data v0.6                                                                                             
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ peer relation (id: 2) ┃ opensearch                                                                                                                                                                       ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ type                  │ peer                                                                                                                                                                             │
│ interface             │ opensearch_peers                                                                                                                                                                 │
│ model                 │ the current model                                                                                                                                                                │
│ relation ID           │ 2                                                                                                                                                                                │
│ endpoint              │ opensearch-peers                                                                                                                                                                 │
│ leader unit           │ 2                                                                                                                                                                                │
├───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ application data      │ ╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │
│                       │ │                                                                                                                                                                              │ │
│                       │ │  admin_user_initialized                     True                                                                                                                             │ │
│                       │ │  allocation-exclusions-to-delete            ,opensearch-2                                                                                                                    │ │
│                       │ │  delete-voting-exclusions                   True                                                                                                                             │ │
│                       │ │  deployment-description                     {"config": {"cluster_name": "opensearch-attz", "init_hold": false, "roles": [], "data_temperature": null}, "start":              │ │
│                       │ │                                             "start-with-generated-roles", "pending_directives": [], "typ": "main-orchestrator", "app": "opensearch", "state": {"value":      │ │
│                       │ │                                             "active", "message": ""}, "promotion_time": 1716446675.797672}                                                                   │ │
│                       │ │  opensearch:app:admin-password              secret://d95bf0dc-53cc-4a8c-8f9e-538bd7622e8b/cp7ebls8c16j9paghi7g                                                               │ │
│                       │ │  opensearch:app:admin-password-hash         secret://d95bf0dc-53cc-4a8c-8f9e-538bd7622e8b/cp7ebls8c16j9paghi80                                                               │ │
│                       │ │  opensearch:app:app-admin                   secret://d95bf0dc-53cc-4a8c-8f9e-538bd7622e8b/cp7eblc8c16j9paghi50                                                               │ │
│                       │ │  opensearch:app:kibanaserver-password       secret://d95bf0dc-53cc-4a8c-8f9e-538bd7622e8b/cp7eblk8c16j9paghi6g                                                               │ │
│                       │ │  opensearch:app:kibanaserver-password-hash  secret://d95bf0dc-53cc-4a8c-8f9e-538bd7622e8b/cp7eblk8c16j9paghi70                                                               │ │
│                       │ │  opensearch:app:monitor-password            secret://d95bf0dc-53cc-4a8c-8f9e-538bd7622e8b/cp7ec248c16j9paghib0                                                               │ │
│                       │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
│ unit data             │ ╭─ opensearch/opensearch/2 ──────────────────────────────────────────────────────────────────────────────╮                                                                       │
│                       │ │                                                                                                        │                                                                       │
│                       │ │  opensearch:unit:2:unit-http       secret://d95bf0dc-53cc-4a8c-8f9e-538bd7622e8b/cp7eevc8c16j9paghic0  │                                                                       │
│                       │ │  opensearch:unit:2:unit-transport  secret://d95bf0dc-53cc-4a8c-8f9e-538bd7622e8b/cp7eevc8c16j9paghibg  │                                                                       │
│                       │ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────╯                                                                       │
└───────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

This is where an adjustment might be necessary: https://github.com/canonical/opensearch-operator/blob/main/lib/charms/opensearch/v0/opensearch_base_charm.py#L271

The text was updated successfully, but these errors were encountered:

github-actions · 2024-05-23T07:22:04Z

https://warthogs.atlassian.net/browse/DPE-4415

## Issue When attaching an existing storage to a new unit, 2 issues happen: - Snap install failed because of permissions / ownership of directories - snap_common gets completely deleted ## Solution - bump snap version, use the fixed one (the fixed revision is 47, this is already outdated as a newer version of the snap is already available and merged to main prior to this PR) - enhance test coverage for integration tests ## Integration Testing Tests for attaching existing storage can be found in integration/ha/test_storage.py. There are now three test cases: 1. test_storage_reuse_after_scale_down: remove one unit from the deployment, afterwards add a new one re-using the storage from the removed unit. check if the continuous writes are ok and a testfile that was created intially is still there. 2. test_storage_reuse_after_scale_to_zero: remove both units from the deployment, keep the application, add two new units using the storage again. check the continuous writes. 3. test_storage_reuse_in_new_cluster_after_app_removal: from a cluster of three units, remove all of them and remove the application. deploy a new application (with one unit) to the same model, attach the storage, then add two more units with the other storage volumes. check the continuous writes. ## Other Issues - As part of this PR, another issue is addressed: #306. It is resolved with this commit: 19f843c - Furthermore problems with acquiring the OpenSearch lock are worked around with this PR, especially when the shards for the locking index within OpenSearch are not assigned to a new primary when removing the former primary. This was also reported in #243 and will be further investigated in #327.

reneradoi · 2024-06-11T13:00:27Z

Resolved with #272

reneradoi added the bug Something isn't working label May 23, 2024

reneradoi self-assigned this May 23, 2024

reneradoi mentioned this issue May 30, 2024

[DPE-2119] fix issues and tests when reusing storage #272

Merged

reneradoi closed this as completed Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hook "leader-elected" fails when adding a unit after scale down to zero units #306

hook "leader-elected" fails when adding a unit after scale down to zero units #306

reneradoi commented May 23, 2024

github-actions bot commented May 23, 2024

reneradoi commented Jun 11, 2024

hook "leader-elected" fails when adding a unit after scale down to zero units #306

hook "leader-elected" fails when adding a unit after scale down to zero units #306

Comments

reneradoi commented May 23, 2024

Steps to reproduce

Expected behavior

Actual behavior

Versions

Log output

Additional context

github-actions bot commented May 23, 2024

reneradoi commented Jun 11, 2024