-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit ironic UID and GID assignment #582
Comments
It should be noted that those UID/GID are present in BMO manifests, docs, guides, ironic-standalone-operator as well. Auto-generated UID/GID is a bit problematic for this reason, and probably we get off much easier if we just pick UID/GID at high range. Picking UID/GID under 1024 in the first place was kind of asking trouble later. Just for completeness sake tho:
If this need fixing short term, I say picking a high range number is the choice right now. Long term, ironic-standalone-operator can let it auto-generate as its the sole controller for Ironic. |
@tuminoid thanks for the reply, I'm aware of the various places where the ironic UID/GID are used, that's why I opted for opening an issue instead of just going for a direct change. |
Is this possibly the reason I'm seeing an issue with the the ironic-ipa-downloader container? e.g. possibly to do with securitycontexts of the pod If I spin up the pod, it will go to completion and be healthy, however if the pod is killed and lands on the same node, then the init will fail with Edit: It looks like the /shared folder is root:root, but all subfolders are ironic:ironic, so there's probably something going on for my contexts using the BMO |
Is /shared a host directory for you? I think you need privileged containers for that to work. |
I don't want to derail the issue, since this seems like a "my cluster" problem. It's not a host directory; it's running longhorn CSI as it's provisioner with PVC as RWX, fstype ext4. So my guess is that longhorn is probably not respecting securityContext.fsGroup: 994 for some reason. |
Issue with that is in k8s pod's In general, the UID/GID change is pain, as Ironic manifests are in BMO, which leads to coupling and means people need to adapt the manifests if they use "non-coupled" versions of BMO/Ironic together. Not a blocker by any means, but major annoyance. We have more and more cases coming where the Ironic manifests in BMO is really painful, and while our choice of solution is to wait for Ironic Standalone Operator, I'm starting to feel we need to address this sooner than that. But that is discussion we can have in another issue or meeting. |
Note: the operator needs fixing too https://github.com/metal3-io/ironic-standalone-operator/blob/main/pkg/ironic/containers.go#L22-L23 Unfortunately, it will not solve the problem of hardcoding UID/GID when building ironic-image: https://github.com/metal3-io/ironic-image/blob/main/prepare-image.sh#L29-L30 |
/triage accepted |
/kind bug |
A wild idea: we can use a privileged init container to make upgrades possible without breakages. So, state 1: ironic-image has OLD_ID, BMO has OLD_ID. We add an init container whose only job is to take new ID's via environment variables and change the ownership of the files we care about. State 2: ironic-image has OLD_ID, BMO scripts tell the init container to re-own files to NEW_ID and then deploy Ironic with NEW_ID. At this point, we have an upgrade path. We only need to tell the users to upgrade ironic-image first. State 3: ironic-image has NEW_ID, BMO scripts tell the init container to re-own files to NEW_ID and then deploy Ironic with NEW_ID. After this, we can drop the workaround. State 4: ironic-image has NEW_ID, BMO scripts deploy Ironic with NEW_ID. |
I am bit confused about "state1" if the BMO scripts still have the old ids in the security context how would the init container help? I mean it would reown stuff just fine but then the security context coming from BMO kustomize would want to run the containers with the old IDs. |
currently the ironic user and group have fixed UID and GID assigned, specifically UID is 997 and GID is 994
Unfortunately, with the increase of default system services, these numbers can easily conflict with other already installed services, for example in the CentOS Stream 10 image they're assigned by default to chrony and dockerroot.
We have at least two options:
The text was updated successfully, but these errors were encountered: