-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update opensearch to 2.9.0 #299
Conversation
Cancelled CI, need image build first. Image build running in https://github.com/stackhpc/ansible-slurm-appliance/actions/runs/5810024081/job/15750069870 edit: building image |
Image build running in https://github.com/stackhpc/ansible-slurm-appliance/actions/runs/5811371127/job/15754445881 Built image openhpc-230809-1602-2250239e |
2250239
to
884df2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
a40a0ed
to
2937725
Compare
I've checked that upgrading a cluster from current
I also then reimaged the cluster again (at 2937725) to check the case where the Note that document IDs are not slurm job ids (but are stable):
See comment in environments/common/files/filebeat/filebeat.yml for why they're not actual job IDs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Updates opensearch to v2.9.0, required as opensearch 2.4.0 fails* on podman v4.4.1.
Also:
Pulls container before starting systemd service to eliminate unit startup timeouts on slow networks
Refactors role to provide separate install & runtime task books for later speed optimisation.
Changes filebeat configuration to derive opensearch document IDs from the Slurm job id; this prevents duplicate records after an image-based upgrade where filebeat ingests the same records from slurm/sacct again. Note that when upgrading a cluster, opensearch data from before this PR (with unsafe document IDs) will be archived to
/var/lib/state/opensearch/data-$TIMESTAMP
. Filebeat will then reingest all jobs within the last year from slurm/sacct.Reviewed relevant changelogs for any changes of significance
Checked that this works when performing image-based upgrades
* Container startup fails with
Actual problem is
/sys/fs/cgroup
gets mounted twice inside the container with podman v4.4.1, opensearch 2.4.0 cannot tolerate this.