Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sim): correct Waiting state comparison in NodeHolder::ready() #2263

Merged
merged 1 commit into from
Dec 8, 2024

Conversation

mxinden
Copy link
Collaborator

@mxinden mxinden commented Dec 2, 2024

A Node (e.g. a Client, Server or TailDrop router) can be in 3 states:

enum NodeState {
    /// The node just produced a datagram.  It should be activated again as soon as possible.
    Active,
    /// The node is waiting.
    Waiting(Instant),
    /// The node became idle.
    Idle,
}

NodeHolder::ready() determines whether a Node is ready to be processed again. When NodeState::Waiting, it should only be ready when t <= now, i.e. the waiting time has passed, not t >= now.

impl NodeHolder {
    fn ready(&self, now: Instant) -> bool {
        match self.state {
            Active => true,
            Waiting(t) => t <= now, // not >=
            Idle => false,
        }
    }
}

The previous behavior lead to wastefull non-ready Nodes being processed and thus a large test runtime when e.g. simulating a gbit connection (#2203).


Time to simulate a 100 MB transfer on a 1 Gbit/s 100ms connection on the Simulator on my 12th Gen Intel(R) Core(TM) i9-12900HK:

  • Before 40.51s
  • After 17.84s

Copy link

codecov bot commented Dec 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.39%. Comparing base (9143d73) to head (a34bf86).
Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2263   +/-   ##
=======================================
  Coverage   95.39%   95.39%           
=======================================
  Files         113      113           
  Lines       36683    36683           
=======================================
  Hits        34993    34993           
  Misses       1690     1690           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mxinden mxinden marked this pull request as ready for review December 2, 2024 13:38
A `Node` (e.g. a `Client`, `Server` or `TailDrop` router) can be in 3 states:

``` rust
enum NodeState {
    /// The node just produced a datagram.  It should be activated again as soon as possible.
    Active,
    /// The node is waiting.
    Waiting(Instant),
    /// The node became idle.
    Idle,
}
```

`NodeHolder::ready()` determines whether a `Node` is ready to be processed
again. When `NodeState::Waiting`, it should only be ready when `t <= now`, i.e.
the waiting time has passed, not `t >= now`.

``` rust
impl NodeHolder {
    fn ready(&self, now: Instant) -> bool {
        match self.state {
            Active => true,
            Waiting(t) => t <= now, // not >=
            Idle => false,
        }
    }
}
```

The previous behavior lead to wastefull non-ready `Node`s being processed and
thus a large test runtime when e.g. simulating a gbit
connection (mozilla#2203).
Copy link

github-actions bot commented Dec 2, 2024

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Copy link

github-actions bot commented Dec 2, 2024

Benchmark results

Performance differences relative to 9143d73.

decode 4096 bytes, mask ff: No change in performance detected.
       time:   [15.886 µs 15.935 µs 15.989 µs]
       change: [-0.2499% +0.1716% +0.5815%] (p = 0.42 > 0.05)

Found 17 outliers among 100 measurements (17.00%)
1 (1.00%) low mild
1 (1.00%) high mild
15 (15.00%) high severe

decode 1048576 bytes, mask ff: No change in performance detected.
       time:   [4.0536 ms 4.0844 ms 4.1363 ms]
       change: [-0.2256% +0.6255% +1.6841%] (p = 0.34 > 0.05)

Found 14 outliers among 100 measurements (14.00%)
14 (14.00%) high severe

decode 4096 bytes, mask 7f: No change in performance detected.
       time:   [27.015 µs 27.065 µs 27.122 µs]
       change: [-0.6229% -0.1736% +0.2939%] (p = 0.46 > 0.05)

Found 19 outliers among 100 measurements (19.00%)
3 (3.00%) low severe
2 (2.00%) low mild
14 (14.00%) high severe

decode 1048576 bytes, mask 7f: Change within noise threshold.
       time:   [6.8665 ms 6.8688 ms 6.8728 ms]
       change: [-0.9667% -0.6462% -0.3770%] (p = 0.00 < 0.05)

Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe

decode 4096 bytes, mask 3f: No change in performance detected.
       time:   [12.410 µs 12.442 µs 12.482 µs]
       change: [-0.4219% +0.0663% +0.5419%] (p = 0.81 > 0.05)

Found 17 outliers among 100 measurements (17.00%)
4 (4.00%) low mild
2 (2.00%) high mild
11 (11.00%) high severe

decode 1048576 bytes, mask 3f: No change in performance detected.
       time:   [3.1739 ms 3.1963 ms 3.2315 ms]
       change: [-0.2974% +0.5213% +1.7319%] (p = 0.37 > 0.05)

Found 9 outliers among 100 measurements (9.00%)
9 (9.00%) high severe

coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [98.613 ns 98.886 ns 99.168 ns]
       change: [-0.7567% -0.2733% +0.1859%] (p = 0.26 > 0.05)

Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [116.68 ns 117.05 ns 117.45 ns]
       change: [-0.5081% -0.1467% +0.2187%] (p = 0.43 > 0.05)

Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) low mild
1 (1.00%) high mild
11 (11.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [116.54 ns 117.20 ns 117.96 ns]
       change: [-0.3348% +0.3767% +1.1458%] (p = 0.32 > 0.05)

Found 17 outliers among 100 measurements (17.00%)
2 (2.00%) low severe
2 (2.00%) low mild
13 (13.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [97.879 ns 98.038 ns 98.206 ns]
       change: [-0.1132% +0.7299% +1.6890%] (p = 0.12 > 0.05)

Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) high mild
6 (6.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [112.37 ms 112.50 ms 112.72 ms]
       change: [+0.6527% +0.9012% +1.1450%] (p = 0.00 < 0.05)

Found 14 outliers among 100 measurements (14.00%)
7 (7.00%) low mild
6 (6.00%) high mild
1 (1.00%) high severe

SentPackets::take_ranges: No change in performance detected.
       time:   [5.5368 µs 5.6953 µs 5.8600 µs]
       change: [-15.821% -4.4750% +3.8702%] (p = 0.59 > 0.05)

Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) high mild

transfer/pacing-false/varying-seeds: Change within noise threshold.
       time:   [28.715 ms 29.227 ms 29.734 ms]
       change: [+2.9643% +6.7465% +10.620%] (p = 0.00 < 0.05)
transfer/pacing-true/varying-seeds: 💚 Performance has improved.
       time:   [30.376 ms 30.977 ms 31.563 ms]
       change: [-16.650% -12.676% -8.3336%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild

transfer/pacing-false/same-seed: No change in performance detected.
       time:   [24.925 ms 25.468 ms 26.001 ms]
       change: [-2.5376% +1.1989% +5.1343%] (p = 0.54 > 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild

transfer/pacing-true/same-seed: 💚 Performance has improved.
       time:   [28.577 ms 29.436 ms 30.291 ms]
       change: [-34.368% -30.635% -26.773%] (p = 0.00 < 0.05)
1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: 💚 Performance has improved.
       time:   [881.60 ms 890.41 ms 899.32 ms]
       thrpt:  [111.19 MiB/s 112.31 MiB/s 113.43 MiB/s]
change:
       time:   [-6.3342% -4.9615% -3.4397%] (p = 0.00 < 0.05)
       thrpt:  [+3.5622% +5.2205% +6.7625%]
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.
       time:   [316.24 ms 319.33 ms 322.48 ms]
       thrpt:  [31.010 Kelem/s 31.316 Kelem/s 31.622 Kelem/s]
change:
       time:   [-1.0352% +0.3673% +1.7207%] (p = 0.60 > 0.05)
       thrpt:  [-1.6915% -0.3660% +1.0460%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.
       time:   [33.557 ms 33.747 ms 33.956 ms]
       thrpt:  [29.450  elem/s 29.632  elem/s 29.800  elem/s]
change:
       time:   [-1.4170% -0.5950% +0.2552%] (p = 0.16 > 0.05)
       thrpt:  [-0.2546% +0.5986% +1.4374%]

Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
3 (3.00%) high mild
2 (2.00%) high severe

1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: No change in performance detected.
       time:   [1.6435 s 1.6624 s 1.6816 s]
       thrpt:  [59.465 MiB/s 60.153 MiB/s 60.847 MiB/s]
change:
       time:   [-1.9051% -0.2441% +1.4652%] (p = 0.78 > 0.05)
       thrpt:  [-1.4441% +0.2447% +1.9421%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client Server CC Pacing MTU Mean [ms] Min [ms] Max [ms]
gquiche gquiche 1504 514.8 ± 9.6 500.9 529.0
neqo gquiche reno on 1504 765.5 ± 20.6 740.5 807.7
neqo gquiche reno 1504 764.0 ± 18.5 745.8 809.5
neqo gquiche cubic on 1504 777.2 ± 15.4 745.6 794.7
neqo gquiche cubic 1504 767.2 ± 11.3 749.8 785.7
msquic msquic 1504 97.0 ± 7.7 90.5 115.7
neqo msquic reno on 1504 220.0 ± 14.9 198.1 246.6
neqo msquic reno 1504 214.3 ± 13.0 200.6 240.6
neqo msquic cubic on 1504 211.8 ± 12.1 195.6 236.6
neqo msquic cubic 1504 210.3 ± 13.6 194.8 240.0
gquiche neqo reno on 1504 688.2 ± 89.4 551.5 810.6
gquiche neqo reno 1504 685.9 ± 97.7 547.2 837.6
gquiche neqo cubic on 1504 694.4 ± 98.9 566.7 879.4
gquiche neqo cubic 1504 698.6 ± 90.7 567.5 835.8
msquic neqo reno on 1504 536.3 ± 103.4 470.7 740.2
msquic neqo reno 1504 476.2 ± 7.4 467.3 489.4
msquic neqo cubic on 1504 493.0 ± 7.4 480.7 500.0
msquic neqo cubic 1504 482.5 ± 8.0 472.6 496.6
neqo neqo reno on 1504 511.8 ± 30.6 461.5 563.8
neqo neqo reno 1504 517.6 ± 32.6 460.4 554.2
neqo neqo cubic on 1504 553.9 ± 18.6 520.8 580.5
neqo neqo cubic 1504 541.6 ± 31.6 481.2 570.1

⬇️ Download logs

Copy link
Collaborator

@larseggert larseggert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your explanation make sense, but suggest for @martinthomson to take a look, as he is more familiar with the simulator.

@larseggert
Copy link
Collaborator

Is there any chance this might fix #2162?

@mxinden
Copy link
Collaborator Author

mxinden commented Dec 5, 2024

Looking at the logs of a recent failure, it seems like in the latter half of the test execution, only the Drop node is being skipped.

2024-11-18T08:42:02.5636603Z 7s500ms DEBUG [idle_timeout_crazy_rtt] skipping drop

Thus, I don't think this patch will fix #2162 unfortunately.

@martinthomson martinthomson added this pull request to the merge queue Dec 8, 2024
Merged via the queue into mozilla:main with commit a87b379 Dec 8, 2024
62 of 63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants