Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connections: expiration_interval as select timeout #352

Closed

Conversation

liamstask
Copy link
Contributor

@liamstask liamstask commented Dec 20, 2020

❓ What kind of change does this PR introduce?

  • 🐞 bug fix
  • 🐣 feature
  • πŸ“‹ docs update
  • πŸ“‹ tests/coverage improvement
  • πŸ“‹ refactoring
  • πŸ’₯ other

πŸ“‹ What is the related issue number (starting with #)

#311 made a change to process connections as they become active, rather than polling. One of the motivations was to ensure that the latency with which connections are processed is not tied to the select timeout, as it was originally - see #305

I originally had this change in #311 and intended to submit with it, but must have somehow dropped it during editing/rebasing.

Also includes a minor change to avoid an unnecessary call to time.time() when handling expirations.

❓ What is the current behavior? (You can also link to an open issue here)

❓ What is the new behavior (if this is a feature change)?

πŸ“‹ Other information:

πŸ“‹ Checklist:

  • I think the code is well written
  • I wrote good commit messages
  • I have squashed related commits together after the changes have been approved
  • Unit tests for the changes exist
  • Integration tests for the changes exist (if applicable)
  • I used the same coding conventions as the rest of the project
  • The new code doesn't generate linter offenses
  • Documentation reflects the changes
  • The PR relates to only one subject with a clear title
    and description in grammatically correct, complete sentences

This change is Reviewable

cheroot/connections.py Outdated Show resolved Hide resolved
@liamstask liamstask force-pushed the connections-select-timeout branch from 1162281 to 52cb7ec Compare January 2, 2021 19:03
@liamstask
Copy link
Contributor Author

@mar10 any chance you can re-run the benchmarking from #305 to validate this change? πŸ™

@liamstask liamstask marked this pull request as ready for review January 4, 2021 19:43
@mar10
Copy link
Contributor

mar10 commented Jan 4, 2021

mar10 any chance you can re-run the benchmarking from #305 to validate this change? πŸ™

Sure:
2021-01-04, PC, Windows 10:

  • Cheroot 8.5.1 Executed 16,660 activities
  • Cheroot 8.4.8 Executed 16,364 activities
  • Cheroot 8.3.1 Executed 2,928 activities
  • Cheroot 8.2.1 Executed 2,828 activities
  • Cheroot 8.1.0 Executed 2,960 activities
  • Cheroot 8.0.0: Executed 17,092 activities
  • gevent 20.12.1: Executed 15,420 activities
Click to expand details
> stressor -Vv
stressor/0.5.1-a1 Python/3.9.1(64 bit) Windows-10-10.0.19041-SP0
> wsgidav -Vv
WsgiDAV/3.0.5-a4 Python/3.9.1(64 bit) Windows-10-10.0.19041-SP0

Cheroot 8.5.1

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Stressor scenario 'test_rw' finished.
  Tag:      'n.a.'
  Base URL: http://127.0.0.1:8082
  Start:    2021-01-04 21:24:12
  End:      2021-01-04 21:24:42
Run time 30.281 sec, net: 4:58.43 min.
Executed 16,660 activities in 4,175 sequences, using 10 parallel sessions.
Sequence duration: 0.007253 sec average.
             rate: 8,273 sequences per minute (per user: 827.3).
Activity rate:     550.2 activities per second (per user: 55.02).
3 monitored activities:
  - /main/1/GetRequest(/private/test.html)
    n: 4,155, min: 0.004999 sec, avg: 0.011 sec, max: 0.035 sec
  - /main/2/PutRequest(/temp/wsgidav_test_file~$(session_id).txt)
    n: 4,155, min: 0.007987 sec, avg: 0.019 sec, max: 0.046 sec
  - /main/3/GetRequest(/temp/wsgidav_test_file~$(session_id).txt)
    n: 4,155, min: 0.004061 sec, avg: 0.018 sec, max: 0.048 sec
Result: Ok.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Cheroot 8.4.8

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Stressor scenario 'test_rw' finished.
  Tag:      'n.a.'
  Base URL: http://127.0.0.1:8082
  Start:    2021-01-04 21:22:12
  End:      2021-01-04 21:22:42
Run time 30.265 sec, net: 4:58.40 min.
Executed 16,364 activities in 4,101 sequences, using 10 parallel sessions.
Sequence duration: 0.007380 sec average.
             rate: 8,130 sequences per minute (per user: 813.0).
Activity rate:     540.7 activities per second (per user: 54.07).
3 monitored activities:
  - /main/1/GetRequest(/private/test.html)
    n: 4,081, min: 0.004999 sec, avg: 0.012 sec, max: 0.042 sec
  - /main/2/PutRequest(/temp/wsgidav_test_file~$(session_id).txt)
    n: 4,081, min: 0.003999 sec, avg: 0.019 sec, max: 0.059 sec
  - /main/3/GetRequest(/temp/wsgidav_test_file~$(session_id).txt)
    n: 4,081, min: 0.007000 sec, avg: 0.018 sec, max: 0.072 sec
Result: Ok.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Cheroot 8.3.1

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Stressor scenario 'test_rw' finished.
  Tag:      'n.a.'
  Base URL: http://127.0.0.1:8082
  Start:    2021-01-04 21:20:48
  End:      2021-01-04 21:21:19
Run time 30.453 sec, net: 5:01.63 min.
Executed 2,928 activities in 742 sequences, using 10 parallel sessions.
Sequence duration: 0.041 sec average.
             rate: 1,462 sequences per minute (per user: 146.2).
Activity rate:     96.15 activities per second (per user: 9.615).
3 monitored activities:
  - /main/1/GetRequest(/private/test.html)
    n: 722, min: 0.006999 sec, avg: 0.085 sec, max: 0.155 sec
  - /main/2/PutRequest(/temp/wsgidav_test_file~$(session_id).txt)
    n: 722, min: 0.008998 sec, avg: 0.106 sec, max: 0.147 sec
  - /main/3/GetRequest(/temp/wsgidav_test_file~$(session_id).txt)
    n: 722, min: 0.009990 sec, avg: 0.100 sec, max: 0.139 sec
Result: Ok.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Cheroot 8.2.1

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Stressor scenario 'test_rw' finished.
  Tag:      'n.a.'
  Base URL: http://127.0.0.1:8082
  Start:    2021-01-04 21:18:48
  End:      2021-01-04 21:19:18
Run time 30.547 sec, net: 5:01.84 min.
Executed 2,828 activities in 717 sequences, using 10 parallel sessions.
Sequence duration: 0.043 sec average.
             rate: 1,408 sequences per minute (per user: 140.8).
Activity rate:     92.58 activities per second (per user: 9.258).
3 monitored activities:
  - /main/1/GetRequest(/private/test.html)
    n: 697, min: 0.007997 sec, avg: 0.089 sec, max: 0.143 sec
  - /main/2/PutRequest(/temp/wsgidav_test_file~$(session_id).txt)
    n: 697, min: 0.014 sec, avg: 0.108 sec, max: 0.148 sec
  - /main/3/GetRequest(/temp/wsgidav_test_file~$(session_id).txt)
    n: 697, min: 0.013 sec, avg: 0.105 sec, max: 0.146 sec
Result: Ok.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Cheroot 8.1.0

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Stressor scenario 'test_rw' finished.
  Tag:      'n.a.'
  Base URL: http://127.0.0.1:8082
  Start:    2021-01-04 21:17:07
  End:      2021-01-04 21:17:37
Run time 30.485 sec, net: 5:01.40 min.
Executed 2,960 activities in 750 sequences, using 10 parallel sessions.
Sequence duration: 0.041 sec average.
             rate: 1,476 sequences per minute (per user: 147.6).
Activity rate:     97.1 activities per second (per user: 9.71).
3 monitored activities:
  - /main/1/GetRequest(/private/test.html)
    n: 730, min: 0.005002 sec, avg: 0.083 sec, max: 0.141 sec
  - /main/2/PutRequest(/temp/wsgidav_test_file~$(session_id).txt)
    n: 730, min: 0.012 sec, avg: 0.105 sec, max: 0.156 sec
  - /main/3/GetRequest(/temp/wsgidav_test_file~$(session_id).txt)
    n: 730, min: 0.011 sec, avg: 0.099 sec, max: 0.138 sec
Result: Ok.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Cheroot 8.0.0

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Stressor scenario 'test_rw' finished.
  Tag:      'n.a.'
  Base URL: http://127.0.0.1:8082
  Start:    2021-01-04 21:13:26
  End:      2021-01-04 21:13:56
Run time 30.281 sec, net: 4:58.11 min.
Executed 17,092 activities in 4,283 sequences, using 10 parallel sessions.
Sequence duration: 0.007070 sec average.
             rate: 8,487 sequences per minute (per user: 848.7).
Activity rate:     564.4 activities per second (per user: 56.44).
3 monitored activities:
  - /main/1/GetRequest(/private/test.html)
    n: 4,263, min: 0.002996 sec, avg: 0.010 sec, max: 0.031 sec
  - /main/2/PutRequest(/temp/wsgidav_test_file~$(session_id).txt)
    n: 4,263, min: 0.004000 sec, avg: 0.018 sec, max: 0.058 sec
  - /main/3/GetRequest(/temp/wsgidav_test_file~$(session_id).txt)
    n: 4,263, min: 0.002999 sec, avg: 0.017 sec, max: 0.045 sec
Result: Ok.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

gevent 20.12.1

Compared to gevent as Wsgi-Server

$ pip install gevent
$ wsgidav --root tests/stressor/htdocs/ --host 127.0.0.1 --port 8082 --auth anonymous --no-config --server gevent -q
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Stressor scenario 'test_rw' finished.
  Tag:      'n.a.'
  Base URL: http://127.0.0.1:8082
  Start:    2021-01-04 21:43:29
  End:      2021-01-04 21:43:59
Run time 30.266 sec, net: 4:58.30 min.
Executed 15,420 activities in 3,865 sequences, using 10 parallel sessions.
Sequence duration: 0.007831 sec average.
             rate: 7,662 sequences per minute (per user: 766.2).
Activity rate:     509.5 activities per second (per user: 50.95).
3 monitored activities:
  - /main/1/GetRequest(/private/test.html)
    n: 3,845, min: 0.005235 sec, avg: 0.018 sec, max: 0.033 sec
  - /main/2/PutRequest(/temp/wsgidav_test_file~$(session_id).txt)
    n: 3,845, min: 0.006213 sec, avg: 0.019 sec, max: 0.039 sec
  - /main/3/GetRequest(/temp/wsgidav_test_file~$(session_id).txt)
    n: 3,845, min: 0.002064 sec, avg: 0.019 sec, max: 0.036 sec
Result: Ok.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

@webknjaz
Copy link
Member

webknjaz commented Jan 5, 2021

@mar10 that log looks like you were testing the released versions but not this PR branch. Is that right?

@liamstask rebasing the PR should pull-in the pytest-related fixes should get more jobs green.

@liamstask liamstask force-pushed the connections-select-timeout branch from 52cb7ec to 2af88bb Compare January 5, 2021 04:31
@liamstask
Copy link
Contributor Author

rebased

@mar10
Copy link
Contributor

mar10 commented Jan 5, 2021

@mar10 that log looks like you were testing the released versions but not this PR branch. Is that right?

that's true.
I just tried with

pip install git+https://github.com/cherrypy/cheroot.git@refs/pull/352/head

and now see errors (PUT timeouts after 0.5 seconds, which is a threshold I set in my test).
I will look into it a bit more, when I find some time.

@webknjaz webknjaz added bug Something is broken enhancement Improvement help wanted Somebody help us, please! labels Jan 5, 2021
@mar10
Copy link
Contributor

mar10 commented Jan 7, 2021

I removed the timeout check and re-run the test again (30 seconds, 10 parallel users).

With this PR I see that most requests seem to take ~0.5 seconds.
(There are no errors and some requests are fast, as seen in the 'min' column):
cheroot_pr352_timeout_05

I searched cheroot for '0.5' and found server.expiration_interval = 0.5

I then re-run with server.expiration_interval = 0.7 and get this result.
Note that Max column is now at ~0.7, so there is a clear evidence that this setting is related:

cheroot_pr352_timeout_07

For reference, here are the results with cheroot 8.5.1:
cheroot_851

@webknjaz
Copy link
Member

webknjaz commented Jan 8, 2021

Do you think this setting should be a part of the public API? As in, should users be expected to tweak it per their needs similar to the amount threads?

@webknjaz webknjaz requested a review from jaraco January 8, 2021 01:26
@liamstask
Copy link
Contributor Author

hm, no - this is not expected behavior, as the selector should return immediately once an entry becomes ready to read.

will see if i can reproduce and understand what might be responsible.

@webknjaz
Copy link
Member

webknjaz commented Jan 8, 2021

@liamstask do you want to mark this PR as a Draft or do you want to work on a separate one?

Also, is this PR considered a feature or a bugfix? Depending on this, I may consider still publishing it under 8.x (it's still in sync with the master but 9.x is supposed to be Python 3-only).

@liamstask liamstask marked this pull request as draft January 9, 2021 22:00
@liamstask
Copy link
Contributor Author

marking as draft until we understand the nature of this issue - depending on what we find, it may make sense to open another PR.

and hm - i think we could probably consider this PR a feature, as the current behavior isn't a problem necessarily.

but i do wonder if we should see even lower latency than the 0.01 minimum in the numbers above if we were running as expected, since it should really just be limited by the time it takes to wake up from the selector, which should be shorter than that.

@webknjaz
Copy link
Member

it should really just be limited by the time it takes to wake up from the selector, which should be shorter than that.

Hm... I'm curious if that's a side effect of the GIL allowing only one thread at a time.

@liamstask
Copy link
Contributor Author

i am not immediately able to reproduce. i've locally added the following to test_wsgi.py and run via tox -- cheroot/test/test_wsgi.py::test_response_latency:

def test_response_latency(simple_wsgi_server):
    import statistics
    from timeit import default_timer as timer

    session = Session(base_url=simple_wsgi_server['url'])
    pooled = requests.adapters.HTTPAdapter(
        pool_connections=1, pool_maxsize=1000,
    )
    session.mount('http://', pooled)

    def do_request():
        start = timer()
        resp = session.get('info')
        resp.raise_for_status()
        return (timer() - start)

    with ThreadPoolExecutor(max_workers=10) as pool:
        tasks = [
            pool.submit(do_request)
            for _ in range(1000)
        ]
        durations = [task.result() for task in tasks]

    print("done.")
    print("min: {}".format(min(durations)))
    print("mean: {}".format(statistics.mean(durations)))
    print("max: {}".format(max(durations)))
    print("stdev: {}".format(statistics.stdev(durations)))
    print("variance: {}".format(statistics.variance(durations)))

    assert False  # dump stdout to console

and varied expiration_interval to see if the resulting durations appear to be related to that timeout, and at least from an initial unscientific take, they do not appear to be:

expiration_interval = 0.01
min: 0.013620100999999885
mean: 0.034936168017999994
max: 0.24456356999999995
stdev: 0.014986918771672504
variance: 0.00022460773426870968

expiration_interval = 0.5
min: 0.013251182999999944
mean: 0.033912222677999995
max: 0.23156637300000016
stdev: 0.013886721685052835
variance: 0.00019284103915811663

expiration_interval = 1.0
min: 0.01254098800000003
mean: 0.033798407693999985
max: 0.2403869270000003
stdev: 0.014740100437246145
variance: 0.000217270560900104

expiration_interval = 5.0
min: 0.01327589600000012
mean: 0.034817831576
max: 0.25336556600000026
stdev: 0.01470904346964532
variance: 0.00021635595979191567

expiration_interval = 10.0
min: 0.012837041000000049
mean: 0.034476956264000005
max: 0.23227233400000014
stdev: 0.014599182796090602
variance: 0.0002131361383136678

These all look fairly consistent, within a margin of jitter, regardless of the timeout, as would be expected.

This is on my local machine - macOS, python3 - I wonder if there could be some unexpected platform-specific Windows behavior in the selector module?

@mar10 are you able to run your test on any other platforms to get another data point?

@liamstask
Copy link
Contributor Author

@mar10 if it's not straightforward to run your previous tests on another platform, it would still be useful to run the quick test I posted above on your platform, if possible, to help understand where this divergence might be coming from.

@webknjaz if you have a moment to run the test i posted above, or recommend another strategy, that would be helpful too. thanks!

@webknjaz
Copy link
Member

Here's what I get on my Gentoo Linux laptop (P1 Gen2) under Python 3.9:

# 0.5
min: 0.0061725289560854435
mean: 0.03574465919518843
max: 0.1078346953727305
stdev: 0.01949915910102327
variance: 0.0003802172056470187

# 0.01
min: 0.002142445184290409
mean: 0.0351595618231222
max: 0.1128864698112011
stdev: 0.01742951475130368
variance: 0.00030378798446591257

# 1.0
min: 0.005018966272473335
mean: 0.032176667262334376
max: 1.0674880309961736
stdev: 0.03516322005452696
variance: 0.0012364520446030871

# 5.0
min: 0.006421465892344713
mean: 0.03070156146818772
max: 0.08956986200064421
stdev: 0.01611755839335192
variance: 0.00025977568856310894

# 10.0
min: 0.00646551325917244
mean: 0.027719526661094278
max: 1.0390562019310892
stdev: 0.0465421165957302
variance: 0.002166168617210545

@mar10
Copy link
Contributor

mar10 commented Jan 19, 2021

On an older macbook pro i don't see a performance drop like on windows

cheroot 8.5.1

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Stressor scenario 'test_rw' finished.
  Tag:      'n.a.'
  Base URL: http://127.0.0.1:8082
  Start:    2021-01-19 19:46:46
  End:      2021-01-19 19:47:16
Run time 30.281 sec, net: 4:58.38 min.
Executed 8,744 activities in 2,196 sequences, using 10 parallel sessions.
Sequence duration: 0.014 sec average.
             rate: 4,351 sequences per minute (per user: 435.1).
Activity rate:     288.8 activities per second (per user: 28.88).
3 monitored activities:
  - /main/1/GetRequest(/private/test.html)
    n: 2,176, min: 0.012 sec, avg: 0.025 sec, max: 0.052 sec
  - /main/2/PutRequest(/temp/wsgidav_test_file~$(session_id).txt)
    n: 2,176, min: 0.011 sec, avg: 0.034 sec, max: 0.065 sec
  - /main/3/GetRequest(/temp/wsgidav_test_file~$(session_id).txt)
    n: 2,176, min: 0.007975 sec, avg: 0.030 sec, max: 0.072 sec
Result: Ok. ✨ 🍰 ✨
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

cheroot 8.5.2.dev8+g2af88bb8

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Stressor scenario 'test_rw' finished.
  Tag:      'n.a.'
  Base URL: http://127.0.0.1:8082
  Start:    2021-01-19 19:50:09
  End:      2021-01-19 19:50:39
Run time 30.289 sec, net: 4:58.44 min.
Executed 8,544 activities in 2,146 sequences, using 10 parallel sessions.
Sequence duration: 0.014 sec average.
             rate: 4,251 sequences per minute (per user: 425.1).
Activity rate:     282.1 activities per second (per user: 28.21).
3 monitored activities:
  - /main/1/GetRequest(/private/test.html)
    n: 2,126, min: 0.011 sec, avg: 0.026 sec, max: 0.070 sec
  - /main/2/PutRequest(/temp/wsgidav_test_file~$(session_id).txt)
    n: 2,126, min: 0.013 sec, avg: 0.035 sec, max: 0.083 sec
  - /main/3/GetRequest(/temp/wsgidav_test_file~$(session_id).txt)
    n: 2,126, min: 0.006162 sec, avg: 0.030 sec, max: 0.074 sec
Result: Ok. ✨ 🍰 ✨
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

@liamstask
Copy link
Contributor Author

@mar10 thank you. 2 other thoughts/requests to test on Windows, if/when you have time:

@mar10
Copy link
Contributor

mar10 commented Jan 24, 2021

Sorry for being so unresponsive, I will try to address the this week...

@MichaIng
Copy link
Contributor

MichaIng commented Jul 23, 2021

As of #378 and cherrypy/cherrypy#1908, hopefully we can finish this. Indeed select() should immediately return as fast as a connection is ready and the timeout only have an effect on idle situations: https://docs.python.org/3/library/selectors.html?highlight=selector#selectors.BaseSelector.select

So it would be good to know whether the issue that select() seems to wait for the timeout regardless is still present on the Windows system(s) it showed up before. I'm willing to try replicating it here, but not sure how @mar10 exactly did the tests?
Another test could be to add some debug time output before and after the select() call, to verify that it's really that one and not something in the expire method or elsewhere (although I cannot see any way this is the case).

@mar10
Copy link
Contributor

mar10 commented Jul 26, 2021

So it would be good to know whether the issue that select() seems to wait for the timeout regardless is still present on the Windows system(s) it showed up before. I'm willing to try replicating it here, but not sure how @mar10 exactly did the tests?

I ran a test tool I wrote against a WebDAV server that uses Cheroot (among others) as WSGI server.
This is the test suite. I can help running it, or repeat the test myself if it helps.

@MichaIng
Copy link
Contributor

MichaIng commented Jul 27, 2021

Not sure how to use tox, it always misses some config file... I ran #352 (comment) now via py -m pytest -k test_response_latency test_wsgi.py on current Windows 10 21H1 and Python 3.9.6:

Unmodified cheroot 8.5.2

min: 0.011346500000000148
mean: 0.0310407035
max: 0.10277959999999997
stdev: 0.01078381715002545
variance: 0.00011629071232518302

min: 0.011402799999999935
mean: 0.0309045372
max: 0.6013285
stdev: 0.02022699558124092
variance: 0.0004091313502435398

cheroot 8.5.2 with timeout=10

  • ... never finishes in reasonable time

cheroot 8.5.2 with timeout=0.1

min: 0.012011800000000017
mean: 0.1067621261
max: 0.5968496999999999
stdev: 0.02901034300752099
variance: 0.0008416000014140222

min: 0.01785569999999992
mean: 0.10330534
max: 0.14574089999999984
stdev: 0.023048536605140002
variance: 0.0005312350396384786
  • The error is there, no question. As this looks wrong to me, probably something to report at Python Windows guys?
  • Same behaviour with timeout=None or not definit it at all. It seems that a request simply does not trigger select(). But since connections are possible, when using a timeout, they seem to be successfully registered with the selector.

Replaced self.socket.settimeout(1) with self.socket.setblocking(False) #352 (comment)

min: 0.009096400000000004
mean: 0.10749972850000002
max: 0.15891839999999924
stdev: 0.019692086392565155
variance: 0.00038777826649224974

min: 0.01853629999999984
mean: 0.1093117002
max: 0.6129125
stdev: 0.0229338039484658
variance: 0.0005259593635466656
  • No difference.

@MichaIng
Copy link
Contributor

Maybe this is related?

Probably someone has an idea how to narrow it down to the naked selector's select() method, with a single socket, a single select() call without timeout, then registering the socket and see whether select() returns or not.

@Blindfreddy
Copy link

I noticed that the CPU utilisation on a raspberry pi 1st generation drops from ~20% to ~10% with cheroot 8.5.2 (or maybe a little earlier), but only when settings 'server.thread_pool' : 1 and/or 'engine.autoreload.on' : False are set before quickstart. This effectively removes the original problem / bug I lodged in #1908 and taken over in #378. So something else must have changed in the meantime.....

@MichaIng
Copy link
Contributor

MichaIng commented Aug 18, 2021

It makes sense that the load is reduced when the server threads are reduced, but also 10% idle load is much too high and limiting the threads has negative impact on concurrent connection performance likely. With this commit is should be possible to reduce idle load to 1/5, i.e. something around 5% in your case, without affecting concurrent connection performance.

Autoreload is a CherryPy feature, not a cheroot one, and it is disabled by default, isn't it? However, it is another (unrelated) layer of idle load of course, when CherryPy regularly checks for file changes.

So while your settings changes are workarounds, it doesn't mean that cheroot performs well or that this PR looses its reason.

@Blindfreddy
Copy link

"It makes sense that the load it reduced when the server threads are reduced" - well, previously that never made any difference, so something else has happened in the meantime which practically removes the problem. Nevertheless, agreed that this PR is still worthwhile having.

@MichaIng
Copy link
Contributor

Would it be an option to add a workaround for Windows, to either set a lower expiration_interval or a e.g. hardcoded timeout=0.1 only on Windows?

I currently do not find the time to nail it down to selector.select simply not returning when a new socket is ready as it should, but at least the way cheroot has it implemented, this is obviously the case. Would be nice to have a minimal example to replicate the issue independently of cheroot to have a chance this getting fixed or worked around upstream.

@Blindfreddy
Copy link

I don't use Windows so I can't comment on the suggested workaround.

@MichaIng
Copy link
Contributor

MichaIng commented Oct 22, 2021

Basically I mean to leave things for Windows as they are currently, until we find a better solution, but apply this PR/change to all other systems, where it clearly enhances things. Observing idle CPU load with and without this PR should help to build an opinion whether it's worth it on non-Windows systems, and request performance on Windows and non-Windows systems are well covered by the tests above.

I'm just not sure how to reliably check whether it's a Windows platform or not in Python?

I try to push this topic since the high CPU idle load on small SBCs is currently really a deal breaker for one of our projects, where we're forced to use CherryPy < 9.0 because of this, which has other downsides, obviously.

@Blindfreddy
Copy link

Understood and makes sense - and I don't use Windows, so fine for me.

@MichaIng
Copy link
Contributor

MichaIng commented Nov 23, 2021

To go on, I opened a new PR: #401
It's the same as this one, but caps the select() timeout to 0.05 seconds on Windows. Whether this is a sufficiently small cap or not can be discussed, but I think it is small enough to not be recognised by humans.

@webknjaz
Copy link
Member

webknjaz commented Jan 3, 2022

Maybe this is related?

Probably someone has an idea how to narrow it down to the naked selector's select() method, with a single socket, a single select() call without timeout, then registering the socket and see whether select() returns or not.

FWIW since that has mentions of select([], [], [], timeout=xxx) on windows becoming time.sleep(timeout=xxx), assuming that it's the case, we could try having a counter in cheroot.connections._ThreadsafeSelector for register()/unregister() and based on that, skip the actual calls to the underlying select().

@MichaIng
Copy link
Contributor

MichaIng commented Jan 3, 2022

This PR has been superseded by #401, which was merged now. @mar10 would be great if you could rerun your benchmarks on Windows to verify that the timeout of now 0.05 vs 0.01 seconds does not significantly reduce the handled connections per seconds respectively significantly increases average "processing" time per connection.

Not sure whether the idle CPU usage can be reasonably compared on common (relatively powerful) Windows systems, but on an embedded system/SBC this should be visible, as tested on cherrypy/cherrypy#1908 and #378.

This PR can hence be closed. Further discussion, when required, make sense in the merged PR πŸ™‚.

@MichaIng
Copy link
Contributor

MichaIng commented Jan 3, 2022

@webknjaz
Ah sorry I missed your comment.

FWIW since that has mentions of select([], [], [], timeout=xxx) on windows becoming time.sleep(timeout=xxx)

As far as I understand, the selectors class select() method is a shortcut/wrapper of the low level select module's select([], [], [], timeout=xxx) method. So yeah when this turned into a sleep, then it explains the behaviour pretty well. Though I wonder how long it sleeps when no timeout is given, since it is per docs optional. Now I wonder whether we have tested to not use/pass any timeout on Windows πŸ€”.

we could try having a counter in cheroot.connections._ThreadsafeSelector for register()/unregister() and based on that, skip the actual calls to the underlying select().

You mean, instead of "waiting" for a new connection via select(), to check whether there is any connection registered via counter and in case process it? But then again the counter needs to be checked regularly. Or can register() or put() probably trigger the connection processing so that the loop is not required (but only one for expiring connections)? I lack a bit of insights but does put() currently trigger the select() return event (on non-Windows) via:

            self._selector.register(
                conn.socket.fileno(), selectors.EVENT_READ, data=conn,
            )

@jaraco
Copy link
Member

jaraco commented May 20, 2023

This PR can hence be closed.

You got it.

@jaraco jaraco closed this May 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken enhancement Improvement help wanted Somebody help us, please!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants