Reading from in_stream is extremely slow by default #774

atleta · 2020-12-16T04:55:33Z

I've run into this using fabric2, but it seems to be an invoke issue: run passes data from in_stream to the process in an extremely slow manner. About 90 CPS (yes, that is characters per second). So if I enable echo_stdin, it looks like a screen from your typical hacker movie (only without the chirp).

Now while writing this bug report, I found a hint in the code (and then in the documentation) that this was intentional: input_sleep is set to 0.01 on invoke.Runner. But this seems like a pretty surprising default and, at least compared to the effect and the surprise factor, not a very well documented one. Also, probably my fault, but I did not find a way how to set it or disable it. After skimming the documentation, I have no idea which key to set.

I've figured out how to set echo_stdin using the environment variables when running things from fabric, but the same scheme did not work out for input_sleep. Which means I wasn't able to guess the correct key. I've tried runners.local.input_sleep and `run.input_sleep. (So I ended up importing the class and then assigning 0 to the member but that's pretty lame.)

Actually, what's the point of this at all? If the input is not ready, the read will block anyway. If it's ready, why slow things down?

The text was updated successfully, but these errors were encountered:

chrboe · 2021-04-19T15:33:31Z

Did you end up solving this in a halfway convenient way?

@bitprophet any comment on the default behavior?

vlcinsky · 2021-06-28T09:27:46Z

Regarding the reason, comment in the code tells:

                # Take a nap so we're not chewing CPU.
                time.sleep(self.input_sleep)

vlcinsky · 2021-06-28T09:50:33Z

How to set input_sleep to 0?

A bit surprising one-liner with side-effect

Use current context c.config.runners.local.input_sleep and set it to 0 as in following example:

from invoke import task


@task
def check_config(c):
    c.config.runners.local.input_sleep = 0
    with open("docker-compose.yml", "rb") as in_stream:
        # c.run("docker-compose -f docker-compose.yml config")
        c.run("docker-compose -f - config", in_stream=in_stream)

But be aware: this is changing class (not object) property - so the value holds true for any subsequent call of the Local runner.

In case of tasks.py as follows:

from invoke import task


@task
def check_config(c):
    c.config.runners.local.input_sleep = 0
    with open("docker-compose.yml", "rb") as in_stream:
        # c.run("docker-compose -f docker-compose.yml config")
        c.run("docker-compose -f - config", in_stream=in_stream)

@task
def check_sleepy(c):
    with open("docker-compose.yml", "rb") as in_stream:
        # c.run("docker-compose -f docker-compose.yml config")
        c.run("docker-compose -f - config", in_stream=in_stream)

the timing will depend on order of task executions.

Fast one:

$ invoke check_config check_sleepy

and slow one (as the check_sleepy uses default input_sleep)

$ invoke check_sleepy check_config

For this reason it seems safer to set this on module level, as it is least surprising:

Less surprising method

from invoke.runners import Local

Local.input_sleep = 0

(this is the method @atleta has described)

from invoke import task

from invoke.runners import Local

Local.input_sleep = 0


@task
def check_config(c):
    # c.config.runners.local.input_sleep = 0
    with open("docker-compose.yml", "rb") as in_stream:
        # c.run("docker-compose -f docker-compose.yml config")
        c.run("docker-compose -f - config", in_stream=in_stream)


@task
def check_sleepy(c):
    with open("docker-compose.yml", "rb") as in_stream:
        # c.run("docker-compose -f docker-compose.yml config")
        c.run("docker-compose -f - config", in_stream=in_stream)

bitprophet · 2021-07-02T14:10:23Z

It sounds like the right fix for this is to make the sleep only occur when in_stream is not sys.stdin (or if possible some slightly less naive test, but that's all that comes to mind offhand 😐 ). It's definitely required in that case - without it, Invoke and friends use 99% cpu waiting around for the user to type stuff.

But if one is replaying something via a custom input stream that's definitely not needed or, clearly, wanted! I'd accept a patch that does this (usual caveat that it needs tests, etc).

asqui · 2022-08-10T14:53:37Z

invoke/invoke/runners.py

Lines 59 to 60 in 267182b

    
           read_chunk_size = 1000 
        
           input_sleep = 0.01

If the purpose of the sleep is to “Take a nap so we're not chewing CPU.” should this sleep be conditional upon there not having been any data in the previous read? (So that we only sleep when waiting for more data, but if there is data, read all of it as fast as possible, and only have one "sleepless-read" each time we reach the end of the available data.)

Also, 1000B buffer seems small – how about something like io.DEFAULT_BUFFER_SIZE (= 8192) as a default for read_chunk_size?

meshy · 2024-01-20T17:36:54Z

I have encountered this too.

I managed to speed things up a lot by using unittest.mock, but it was still not fast.
Changing the read_chunk_size did make a bit of difference.

I like @asqui's idea that this should only sleep when the buffer is empty.

with (
    mock.patch("time.sleep", new=lambda _: None),
    mock.patch.object(ctx.runners.local, "read_chunk_size", new=io.DEFAULT_BUFFER_SIZE),
):
    ctx.run(...)

This change means that we don't delay reading the input stream when there is still data available to read from it. This provides a significant speed improvement to scripts which are passing a populated stream of data, rather than awaiting user input from stdin. See pyinvoke#774

meshy · 2024-01-20T19:01:06Z

I've opened a PR which I hope will address this issue: #983

This change means that we don't delay reading the input stream when there is still data available to read from it. This provides a significant speed improvement to scripts which are passing a populated stream of data, rather than awaiting user input from stdin. See pyinvoke#774

asqui · 2024-01-22T10:34:01Z

Nice, thanks for this change!

Is it worth reframing the PR as "Speed up reading from in_stream" and also including the increase of read_chunk_size to io.DEFAULT_BUFFER_SIZE, since you reported that this too made a bit of a difference?

meshy · 2024-01-22T10:50:42Z

No worries!

Is it worth reframing the PR as "Speed up reading from in_stream"

I'm happy to rename it -- I think that's a better name. Thank you.

and also including the increase of read_chunk_size to io.DEFAULT_BUFFER_SIZE, since you reported that this too made a bit of a difference?

In my case I didn't see any difference once the sleep was avoided. The time went from about a minute to faster than I could blink, and I forgot all about that setting.

In applications with larger buffers I can imagine it making a difference though, so I'll include it anyway!

EDIT: It does indeed make a minor difference in my case, though it's harder to notice because the sleep makes such a huge difference.

dtenenba · 2024-03-27T19:32:07Z

@atleta (or anyone else): did you ever figure out how to work around this issue in a Fabric connection?

EDIT: Answering my own question. After creating my Fabric connection (conn), but before using it for anything, I ran this:

conn.config.runners.remote.input_sleep = 0

meshy linked a pull request Jan 20, 2024 that will close this issue

Speed up reading from in_stream #983

Open

dtenenba mentioned this issue Mar 27, 2024

sending data to stdin of a connection slows down linearly fabric/fabric#2296

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading from in_stream is extremely slow by default #774

Reading from in_stream is extremely slow by default #774

atleta commented Dec 16, 2020

chrboe commented Apr 19, 2021

vlcinsky commented Jun 28, 2021

vlcinsky commented Jun 28, 2021

bitprophet commented Jul 2, 2021

asqui commented Aug 10, 2022 •

edited

Loading

meshy commented Jan 20, 2024 •

edited

Loading

meshy commented Jan 20, 2024

asqui commented Jan 22, 2024

meshy commented Jan 22, 2024 •

edited

Loading

dtenenba commented Mar 27, 2024 •

edited

Loading

Reading from in_stream is extremely slow by default #774

Reading from in_stream is extremely slow by default #774

Comments

atleta commented Dec 16, 2020

chrboe commented Apr 19, 2021

vlcinsky commented Jun 28, 2021

vlcinsky commented Jun 28, 2021

A bit surprising one-liner with side-effect

Less surprising method

bitprophet commented Jul 2, 2021

asqui commented Aug 10, 2022 • edited Loading

meshy commented Jan 20, 2024 • edited Loading

meshy commented Jan 20, 2024

asqui commented Jan 22, 2024

meshy commented Jan 22, 2024 • edited Loading

dtenenba commented Mar 27, 2024 • edited Loading

asqui commented Aug 10, 2022 •

edited

Loading

meshy commented Jan 20, 2024 •

edited

Loading

meshy commented Jan 22, 2024 •

edited

Loading

dtenenba commented Mar 27, 2024 •

edited

Loading