-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inexplicable 503 and 504 errors #497
Comments
@minusdavid what version of Loris are you running? There's been at least one commit recently (8855cc9) that helped reduce our exceptions in production. Please do post any stack traces you're able to get. |
@bcail We're running 2.2.0, which is quite long in the tooth now. Planning to switch over to Python3 anyway, so planning to upgrade very soon. I notice that the latest release is 2.3.3 from June 2018 (https://github.com/loris-imageserver/loris/releases), but there has been a lot of work done since then. Could we get a new release posted or at least tagged? |
Happy to do packaging work on my end, but just would like to know some boundaries for stability : ). |
@minusdavid see #498. |
You're a champion, @bcail |
how did you go with your crashes, @minusdavid ? did they improve after an upgrade? We don't see freezes ourselves but we do get random 5xx responses from time to time. We added a workaround in nginx to re-request the same image in a different format if the upstream server (uwsgi+iiif) returned an error. After an upgrade (to 2.3.3) the 5xx responses stopped happening and a corrupted image was instead being produced so we upgraded again (to 3.0) and it's much better behaved. I went to remove the workaround but it turns out we're still getting the occasional random 5xx response behind the scenes. |
@lsh-0 we never did the upgrade as the funding ran out, so that Loris server still has lots of issues unfortunately. I hope one day that some money comes in and we do the upgrade though. |
@lsh-0 Gad things are looking better with the upgrade to 3.0. I would highly recommend anyone getting 500 errors to make sure they're running 3.0. Since you're still getting the occasional 5xx response, could you please post any error or stacktrace from your logs? Thanks. |
We've started running Loris in production, and we're starting to notice occasional 503 and 504 errors.
It's mostly just a feeling at this point, but I think that Loris (in this case mod_wsgi) is getting hung, but the logs don't clearly say why. I'm mostly wondering if other people are having this issue. I think I saw @alexwlchan saying something like this on the WellcomeTrust Github, although I think he uses uwsgi and Nginx instead of mod_wsgi in Apache. The mod_wsgi author blames this sort of scenario on the app.
In test environments and for the majority of time in production, the Loris servers (2 round robin load balanced servers each with 5 single-threaded processes*) manage very well. But on occasion they seem to freeze up. At the moment, it looks like it happens bad enough about every 12 hours to the point of needing to kill the machine and bring up a new one.
The freezing doesn't happen during periods of high load either. The servers seem fine during their busiest times. It's actually often during the quietest times that we get the worst performance.
I'm going to try adding stack dump code to the WSGI file as per the mod_wsgi authors advice, but wondering if others are having these same problems.
*I used to run 10 processes with 15 threads as that was the default configuration but that seemed even worse
The text was updated successfully, but these errors were encountered: