Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inexplicable 503 and 504 errors #497

Open
minusdavid opened this issue Jan 7, 2020 · 8 comments
Open

Inexplicable 503 and 504 errors #497

minusdavid opened this issue Jan 7, 2020 · 8 comments

Comments

@minusdavid
Copy link

We've started running Loris in production, and we're starting to notice occasional 503 and 504 errors.

It's mostly just a feeling at this point, but I think that Loris (in this case mod_wsgi) is getting hung, but the logs don't clearly say why. I'm mostly wondering if other people are having this issue. I think I saw @alexwlchan saying something like this on the WellcomeTrust Github, although I think he uses uwsgi and Nginx instead of mod_wsgi in Apache. The mod_wsgi author blames this sort of scenario on the app.

In test environments and for the majority of time in production, the Loris servers (2 round robin load balanced servers each with 5 single-threaded processes*) manage very well. But on occasion they seem to freeze up. At the moment, it looks like it happens bad enough about every 12 hours to the point of needing to kill the machine and bring up a new one.

The freezing doesn't happen during periods of high load either. The servers seem fine during their busiest times. It's actually often during the quietest times that we get the worst performance.

I'm going to try adding stack dump code to the WSGI file as per the mod_wsgi authors advice, but wondering if others are having these same problems.

*I used to run 10 processes with 15 threads as that was the default configuration but that seemed even worse

@bcail
Copy link
Contributor

bcail commented Jan 7, 2020

@minusdavid what version of Loris are you running? There's been at least one commit recently (8855cc9) that helped reduce our exceptions in production.

Please do post any stack traces you're able to get.

@minusdavid
Copy link
Author

@bcail We're running 2.2.0, which is quite long in the tooth now. Planning to switch over to Python3 anyway, so planning to upgrade very soon.

I notice that the latest release is 2.3.3 from June 2018 (https://github.com/loris-imageserver/loris/releases), but there has been a lot of work done since then.

Could we get a new release posted or at least tagged?

@minusdavid
Copy link
Author

Happy to do packaging work on my end, but just would like to know some boundaries for stability : ).

@bcail
Copy link
Contributor

bcail commented Jan 9, 2020

@minusdavid see #498.

@minusdavid
Copy link
Author

You're a champion, @bcail

@lsh-0
Copy link

lsh-0 commented May 8, 2020

how did you go with your crashes, @minusdavid ? did they improve after an upgrade?

We don't see freezes ourselves but we do get random 5xx responses from time to time. We added a workaround in nginx to re-request the same image in a different format if the upstream server (uwsgi+iiif) returned an error. After an upgrade (to 2.3.3) the 5xx responses stopped happening and a corrupted image was instead being produced so we upgraded again (to 3.0) and it's much better behaved.

I went to remove the workaround but it turns out we're still getting the occasional random 5xx response behind the scenes.

@minusdavid
Copy link
Author

@lsh-0 we never did the upgrade as the funding ran out, so that Loris server still has lots of issues unfortunately. I hope one day that some money comes in and we do the upgrade though.

@bcail
Copy link
Contributor

bcail commented May 8, 2020

@lsh-0 Gad things are looking better with the upgrade to 3.0. I would highly recommend anyone getting 500 errors to make sure they're running 3.0. Since you're still getting the occasional 5xx response, could you please post any error or stacktrace from your logs? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants