Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory not released #344

Open
Zabrane opened this issue Dec 19, 2020 · 12 comments
Open

Memory not released #344

Zabrane opened this issue Dec 19, 2020 · 12 comments

Comments

@Zabrane
Copy link

Zabrane commented Dec 19, 2020

Hi guys,

I'm facing the same issue using Hitch 1.7.0 on Ubuntu 20.04 LTS.
While stress testing (with vegeta) our backend app which sits behind Hitch, we noticed that Hitch's memory never gets released back to the system.

This is Hitch's memory usage before starting the benchmark (using ps_mem.py to track memory usage)

 Private  +   Shared  =  RAM used       Program
 5.2 MiB +   1.7 MiB =   6.9 MiB       hitch (10)

And this is Hitch's memory usage when the benchmark was done:

 Private  +   Shared  =  RAM used       Program
2.51 GiB +   192.1 MiB =   2.7 GiB       hitch (10)

The memory is still not released yet (24h later).

My config:

  1. Ubuntu 20.04 LTS
  2. Hitch 1.7.0
  3. OpenSSL 1.1.1f
  4. GCC 9.3.0
  5. Only one SSL certificate
@daghf
Copy link
Member

daghf commented Dec 21, 2020

Hi @Zabrane

Thanks for the report, I will take a look.

Could you share some details of the benchmark you ran? Is this a handshake oriented or a throughput oriented test? HTTP kee-alive? Number of clients/request rate?

Also, is there anything else special about your config? Could you perhaps share your hitch command line and hitch.conf?

@Zabrane
Copy link
Author

Zabrane commented Dec 22, 2020

Hi @daghf

Thanks for taking the time to look at this.
Here are the steps to reproduce the issue:

  1. install Express to run the NodeJS backend sample server (file srv.js.zip)
$ unzip -a srv.js.zip
$ npm install express
$ node srv.js
::: listening on http://localhost:7200/
  1. Use the latest Hitch 1.7.0 with the following hitch.conf (point pem-file to yours).
    We were able to reproduce this memory issue from version 1.5.0 to 1.7.0.
## Listening
frontend   = "[0.0.0.0]:8443"
## https://ssl-config.mozilla.org/
ciphers    = "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384"

tls-protos = TLSv1.2

## TLS for HTTP/2 traffic
alpn-protos = "http/1.1"

## Send traffic to the backend without the PROXY protocol
backend        = "[127.0.0.1]:7200"
write-proxy-v1 = off
write-proxy-v2 = off
write-ip       = off

## List of PEM files, each with key, certificates and dhparams
pem-file = "hitch.pem"

## set it to number of cores
workers = 10
backlog = 1024
keepalive = 30

## Logging / Verbosity
quiet = on
log-filename = "/dev/null"

## Automatic OCSP staple retrieval
ocsp-verify-staple = off
ocsp-dir = ""

Then, run it:

$ hitch -V
hitch 1.7.0
$ hitch --config=./hitch.conf 
  1. Check if the pieces are successfully connected:
$ curl -k -D- -q -sS "https://localhost:8443/" --output /dev/null
HTTP/1.1 200 OK
X-Powered-By: Express
Content-Type: application/json; charset=utf-8
Content-Length: 6604
Date: Tue, 22 Dec 2020 12:01:33 GMT
Connection: keep-alive
  1. Get vegeta binary for your distribution. No need to compile it, releases are available here.

Finally, run it like this:

$ echo "GET https://localhost:8443/" | vegeta attack -insecure -header 'Connection: keep-alive' -timeout=2s -rate=1000 -duration=1m | vegeta encode | vegeta report
Requests      [total, rate, throughput]         60000, 1000.02, 1000.02
Duration      [total, attack, wait]             59.999s, 59.999s, 219.979µs
Latencies     [min, mean, 50, 90, 95, 99, max]  165.935µs, 262.688µs, 230.6µs, 333.352µs, 375.975µs, 502.351µs, 16.373ms
Bytes In      [total, mean]                     396240000, 6604.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:60000
Error Set:

During the stress test with vegeta, check hitch memory usage (top, htop or ps_mem):

$ sudo su
root$ ps_mem.py -p `pgrep -d, hitch | sed -e 's|,$||'`
root$ watch -n 3 "ps_mem.py -p `pgrep -d, hitch | sed -e 's|,$||'`"

You can set vegeta's -duration option to a larger value (ex. 15m) to see the memory effect on Hitch.

Please let me know if you need anything else.

NOTE: on MacOS, top shows that hitch 1.7.0 uses only 02 workers despite the fact they are set to 10

@Zabrane
Copy link
Author

Zabrane commented Jan 6, 2021

Hi @daghf and Happy New Year.

Any update on this :-) ?

@daghf
Copy link
Member

daghf commented Jan 18, 2021

Hi @Zabrane

I haven't had any luck in reproducing this.

Even trying to set up something identical to your setup (Ubuntu 20.04, gcc9.3, openssl 1.1.1f), and running vegeta with your Express as backend - I still did not see memory usage creep much above 50M.

I did find a few inconsequential small memory leaks relating to a config file update, which I fixed in a commit just pushed. However, these are not the kind of memory leaks that would incur growing memory usage relating to traffic or running time.

@Zabrane
Copy link
Author

Zabrane commented Feb 3, 2021

@daghf thanks for your time looking at this issue.

We are still seeing this behaviour in 2 different products behind hitch. It's a bit sad you weren't able to reproduce it.

One last question before i close this issue if you don't mind: if the backend server decides to close the connection after servicing some requests, will hitch reopen it immeditaley?
Or will it wait till a new client connection is established?

Thanks

@robinbohnen
Copy link

Have the same problem here; hitch is currently taking up to 24GB of ram until it was killed (Out of memory: Kill process # (hitch) score 111 or sacrifice child.
Seems to only start happening after our latest update to 1.7.0. Not sure what version we were running before.

@Zabrane
Copy link
Author

Zabrane commented Feb 25, 2021

@robinbohnen thanks for confirming the issue. We still suffer from the memory problem and the current workaround is to manually kill/restart hitch (yes, a hack with a bad consequence of losing connections).

We consider switching to stunnel 5.58, haproxy 2.3 or envoy 1.17.

caveat: the stunnel link is an old blog post againt stud (hitch ancestor). But we were able to reproduce those numbers (even better ones) as of today:
Capture_d’écran_2021-02-25_à_08_20_37

@gquintard
Copy link
Contributor

@Zabrane , since we are having trouble reproducing the issue, could you try either sharing some docker-compose or vagrant file so we can look at it locally? is there anything special about your certificates? (large numbers, lots of intermediate CA, complete options, etc.)

@Zabrane
Copy link
Author

Zabrane commented Feb 25, 2021

@gquintard we use 1 Certificate and 1 CA as explained above. Unfortunately, we don't rely on Docker for our services. It took us 6-weeks to able to report the issue here ( get approval from business - we work for a private bank).

@robinbohnen could you please shed more lights on your config?

@robinbohnen
Copy link

We have about 3500 LetsEncrypt certificates served by Hitch, we don't use Docker as well.

@dridi
Copy link
Member

dridi commented Feb 25, 2021

I think what @gquintard was asking is rather, can you reproduce this behavior in a docker or vagrant (or maybe other) setup that we could duplicate on our end to try to observe it as well?

@vcabbage
Copy link

FWIW, we observed something similar. In our case we had 300-500K concurrent connections, when the connection count dropped RSS continued to increase, until stablizing around 90GB.

After trying a variety of adjustments we ended up loading jemalloc via LD_PRELOAD. With that change RSS became much more correlated with the number of connections (26-44GB).

I don't have a firm explanation, but it does remind me a bit of this post where it's theorized that the excess memory usage of libc malloc involved fragmentation caused by multithreading. I'm not sure if that would apply in hitch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants