-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FATAL: pgnodemx: expected 1, got 0, lines from file /sys/fs/cgroup/user.slice/user-1000.slice/session-c3.scope/cgroup.controllers #26
Comments
Is this only happening on 17? |
The most likely situation I've found where this file could be empty is if a Docker environment is setup as Rootless, which seems more likely to be a thing that could have changed most recently rather than Keith's guess that it's PG17. pgnodemx is expecting a line like this:
And in a rootless environment, not only can you sometimes not monitor those things, you can not even necessarily see what could be monitored. Our code expects the things to monitor to vary, but no one considered an environment so locked down the list was empty. I'm not sure if just degrading this to a WARNING gets us to an ideal place. The whole point of pgnodemx is to collect data like this, so if there's nothing there to collect, there's nothing for the program to do. Poking around at what's happening and what some other projects do, there seem a few equally sensible options with good and bad implications:
We should probably provide a simple solution that doesn't punish Christoph for being the once to spot the problem; have our build/package group setup our own Rootless test environment to do further development; and then do the work to document The Right Way for CI testing that packagers should adopt. And if that goes well, then maybe we start removing ways to bypass the testing. (I hope I'm not wrong about the root cause altogether, because that would mean I just wasted a lot of typing) |
Thanks for the investigation. In fact, the test never worked before: At the moment the problem isn't critical for me, the CI tests I care about are running on apt.postgresql.org, and there they work. The CI pipeline on salsa.debian.org is just a nice extra to have even more package-related checks running. What made me write the issue is that it seems to prevent startup. Wouldn't it make more sense to throw an ERROR only when someone queries the stats? Not starting up could be a bad time bomb, perhaps people upgrade the kernel or some kernel settings change, and then on the next reboot or crash, the database suddenly doesn't start anymore. |
Additional context appreciated. Knowing we're not causing you a serious CI issue is a relief. I think the question you're right to raise is what about the person who starts their database without the stats there, then someone fixes the problem by granting the right permissions. Shouldn't the stats then start to work? They might not even be able to manually restart their pgnodemx if it gave up and died. Since the implications of these rootless changes slipped by as something no one ever considered before, I think we need a little design review session that reconsiders error handling for a few of these use cases. Maybe even adjust our idle/sleeping behavior. Thanks for the input, we'll tag the issue when we do something about it. |
In Debian's CI environment, the pgnodemx regression tests fail:
https://salsa.debian.org/postgresql/pgnodemx/-/jobs/6281060
Perhaps that problem should be a WARNING instead of preventing startup?
The text was updated successfully, but these errors were encountered: