Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve backup-node_configs.sh to be aware of xcat status of node #60

Open
billglick opened this issue Apr 16, 2024 · 1 comment
Open

Comments

@billglick
Copy link
Member

We ran into an issue today where a node had been partially deployed after/during some hardware diagnostics, and the daily backup-node_configs.sh script backed up incorrect, broken Puppet client configs and SSL certificates. Then additional reboots of this server resulted in node-config restores restoring the broken Puppet configs and SSL certificates.

A couple of recommendations to resolve this and make it easier to recover a node from this issue:

  1. Update the backup-node_configs.sh script to only backup the node if xCAT reports the node's status is booted. If nodes are in a sit-and-spin type postscript their status should instead be postbooting, and if a postscript exits with a bad error code its status should be failed. So checking for a status of booted should allow us to only take backups if the node is in the expected state.
  2. Would it be possible to keep one old version of each node's various node_configs backup files on the xCAT server? I don't have a proposed solution to implement this, but being able to at least see what changed with a node-configs backup file in the last few days would be helpful, as well as being able to restore old versions of the node-configs backup files.
@billglick
Copy link
Member Author

I'm somewhat surprised we hadn't run into this issue before. A couple of other examples of when this might be an issue include:

  • rolling reboots of a cluster
  • schedule, automated reboots of nodes within a cluster (e.g. weekly, monthly, etc) might take place close to the same time as daily backup-node_configs.sh are scheduled to run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant