-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nginx_stage nginx_clean cleans up active PUN #3111
Comments
Thank you for the very detailed description and debugging. I'll have to dig in myself and see what's what. |
We thought a bit more about how this could be solved.
Some additional checks if the As the This is just an idea though, there could always be some edge cases with how Passenger behaves that I am not aware of. |
It seems like |
What's the difference between these environments? |
We haven't been able to track down / pinpoint any meaningful difference that would lead to this. For every difference between ood-testing and ood-future, we have the same config or setup in some other environment where the problem does not occur. In the environment where the problem occurs (ood-testing), we have sandbox apps enabled, while in ood-future we don't, but at the same time we have another environment with sandbox apps enabled where the problem does not happen. |
This is causing a bit more problems that initially thought, so I have done more investigation into the problem. In some environments, this bug seems to be that active PUNs are force-cleaned, and in other environments this causes PUNs to never be cleaned (considered to always be active). I have not yet found any environment where nginx_clean works as I would expect it to work.
The difference that causes this is whether the nginx worker process has With 3.0.3 RPMs and a The key thing here, however, is that none of these environments behave correctly since the activity of the user does not seem to be reflected at all in the open file descriptors (other than running the shell app). |
I'll have to look at where all
I think a naive implementation may be something like this. Attached to that class, I don't think you'd need to pass user into the method, I just made this hacking around. def sessions(user)
`ps -o cmd -u #{user}`.split("\n").select do |command|
command.match?(/Passenger [\w]+App:/)
end.count
end nginx processes will always be running. But the Passenger processes' will only kick in once you have an active session. Passenger apps stop themselves, but nginx does not - hence this nginx_clean action here. So my theory is - if you have Passenger processes (that haven't stopped themselves) you have active sessions. If you don't have any Passenger processes (because they've stopped themselves) we can go ahead and stop nginx too. |
It seems like that implementation would work well enough. From what I've seen, the PUNs work according to the theory you've described. Though there is a difference between the core Passenger processes (core+watchdog), which do not seem to shut down, and the Passenger apps, i.e. dashboard or shell, which do get cleaned up. Once the Passenger apps have been cleaned up, it should be safe to kill also the core Passenger processes and nginx. We also looked at some potential alternatives, but using One thing to consider is that the |
Thanks for fixing this! We tried out the new @johrstrom is this something which could be backported to the 3.1.x branch? We're interested in taking the fixed version into production as soon as possible, since we've currently disabled the old version from running completely. |
Yes, this'll be in 3.1.5. |
Just to update 3.1.5 is in the |
We are having a strange bug in one of our OOD environments, where active PUNs are being cleaned up when running
/opt/ood/nginx_stage/sbin/nginx_stage nginx_clean
.This happens, for example, even when there is an active file transfer (using Rclone in this case) in progress and the user's browser is actively querying the progress. The cleanup then causes the file transfer to be aborted. The OOD versions we are using are 3.0.1 and 3.0.2.
We run OOD in a container, and even though two of the environments are running exactly the same image, it only happens in one of them.
Having the terminal (shell app) open in OOD prevents the PUN from being cleaned up.
With the dashboard open and an active transfer using Rclone,
nginx_clean
and thelsof
command used by it gives the following output:Here the PUN is considered inactive by nginx_stage. Process 2708 here is
nginx: master process (robkarls)
and 2734 is thenginx: worker process
.In the other OOD environment, where it is working as expected, this is the output:
Process 758 here is
nginx: master process (robkarls)
.After opening the shell app in both of these environments, these are the outputs from the commands:
The logic for counting active sessions seems to happen here. In the output from
lsof
in the problematic OOD environment (ood-testing), without the shell app open, the inode412849896
occurs twice, which means it is filtered out by.select{|k.v| v.size == 1}
, which seems wrong. Active sessions in that case is 0. Later, when the shell app is open, the inode413408799
occurs, passing the check there and counts as an active session.There seems to be a few things that could be wrong here, although I am not sure what exactly is wrong. Is the nginx worker process supposed to have the socket open at all, as it happens only in the problematic environment? The comment for that line of code seems outdated as these sockets are open by the root nginx process, not the apache proxy as the comment seems to indicate, is the logic still up to date after 7 years?
The text was updated successfully, but these errors were encountered: