Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Congestion window grows without bound when CPU-bound #1126

Open
Ralith opened this issue May 22, 2021 · 4 comments
Open

Congestion window grows without bound when CPU-bound #1126

Ralith opened this issue May 22, 2021 · 4 comments
Labels
bug Something isn't working

Comments

@Ralith
Copy link
Collaborator

Ralith commented May 22, 2021

If throughput is limited by CPU resources, then every non-app-limited transmit will, once ACKed, cause the congestion window to grow. If CPU resources abruptly increase (e.g. due to reduced contention from other tasks and/or processes) then we risk flooding the link. The congestion controller interface/implementation should be modified such that success with N in-flight bytes is interpreted as license to grow the congestion window by at most a constant multiple of N.

@Ralith Ralith added the bug Something isn't working label May 22, 2021
@Matthias247
Copy link
Contributor

When implementing #1127 I noticed that cwnd growth depends heavily on the setting of the new MAX_TRANSMIT_DATAGRAMS flag. When configured lower, CWND is more likely to grow to infinitiy since a single poll will never yield all available datagrams, which leads app_limited set to false and allows CWND to grow. If we increase the value and all datagrams are polled then app_limited might get set to true which stops CWND from growing. However that also means higher latencies.

Some examples:

MAX_TRANSMIT_DATAGRAMS = 20:

path: PathStats {
        rtt: 389.489µs,
        cwnd: 536692490,

MAX_TRANSMIT_DATAGRAMS = 30:

 path: PathStats {
        rtt: 478.052µs,
        cwnd: 2821703,

MAX_TRANSMIT_DATAGRAMS = 50:

path: PathStats {
        rtt: 646.804µs,
        cwnd: 1481010,

MAX_TRANSMIT_DATAGRAMS = 100:

path: PathStats {
        rtt: 533.493µs,
        cwnd: 1476074,

While this benchmark looks like 30 would be a better value then 20 I would remind that I'm running this on a desktop with extremely high single-threaded performance, and pretty much everything else would have a harder time producing 30 packets and have higher latencies.

Overall I think the way app_limited is set at the moment is not optimal, and there needs to be a strategy improvement

@Ralith
Copy link
Collaborator Author

Ralith commented Oct 30, 2021

What if we maintain a separate cpu_limited flag, set it whenever we stop calling poll_transmit early, and unset it whenever poll_transmit is fully drained?

@Matthias247
Copy link
Contributor

Interesting thought. But it kind of relies on whether the application does call poll_transmit again as early as possible, and not delays it for other reasons - right? The quinn wrapper does, but it would be some kind of implicit contract (similar to the read/EAGAIN/epoll ones.

The other question is whether this would already been improved upon if #1131 gets addressed by merging tasks. I'm not sure, haven't thought too deeply about this.

@Ralith
Copy link
Collaborator Author

Ralith commented Nov 4, 2021

Interesting thought. But it kind of relies on whether the application does call poll_transmit again as early as possible, and not delays it for other reasons - right?

So long as the polling pattern is consistent, I think the behavior would be correct wrt. the amount of CPU time being given. If polling is arbitrarily erratic I don't think we can hope to guarantee good behavior regardless.

The other question is whether this would already been improved upon if #1131 gets addressed by merging tasks

I don't think so; the fundamental problem is that the current construction of app_limited assumes we'll always be able to drain the connection's logical queues in a timely fashion, and when we're CPU bound we by definition can't, regardless of how work is distributed. Performance might get better, but the design flaw stands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants