Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark for the number of minimum cpu cores #5127

Merged
merged 15 commits into from
Sep 5, 2024

Conversation

alexggh
Copy link
Contributor

@alexggh alexggh commented Jul 24, 2024

Fixes: #5122.

This PR extends the existing single core benchmark_cpu to also build a score of the entire processor by spawning EXPECTED_NUM_CORES(8) threads and averaging their throughput.

This is better than simply checking the number of cores, because also covers multi-tenant environments where the OS sees a high number of available CPUs, but because it has to share it with the rest of his neighbours its total throughput does not satisfy the minimum requirements.

TODO

  • Obtain reference values on the reference hardware.

@alexggh alexggh requested a review from koute as a code owner July 24, 2024 14:50
@alexggh alexggh requested review from sandreim and eskimor July 24, 2024 14:50
@alexggh alexggh added the T0-node This PR/Issue is related to the topic “node”. label Jul 24, 2024
@alexggh alexggh requested a review from ggwpez July 24, 2024 14:58
Signed-off-by: Alexandru Gheorghe <[email protected]>
@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable 2/3
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6802189

Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A PRDoc is needed for operators. Otherwise LGTM!

polkadot/node/service/src/lib.rs Outdated Show resolved Hide resolved
substrate/client/sysinfo/src/sysinfo.rs Outdated Show resolved Hide resolved
substrate/client/sysinfo/src/sysinfo.rs Outdated Show resolved Hide resolved
@ggwpez
Copy link
Member

ggwpez commented Jul 25, 2024

I ran this 5 times on ref hardware and it is very consistent on 1022 MiBs for the new BLAKE2 parallel metric. Kind of expected on a homogeneous system 😆

@ggwpez
Copy link
Member

ggwpez commented Jul 25, 2024

PS: Apparently we did not update these metrics after devops moved the servers from GCP to Scaleway, so these numbers should still be fine for GCP servers:
Screenshot 2024-07-25 at 12 55 47

If we update them now, then we basically raise the requirement from GCP default to something better. Personally i think we can keep it for now since node operators rely on this and use it quite often.

@alexggh
Copy link
Contributor Author

alexggh commented Jul 25, 2024

PS: Apparently we did not update these metrics after devops moved the servers from GCP to Scaleway, so these numbers should still be fine for GCP servers:

You mean our current ref hardware it is not the same as the ones we generated reference_hardware.json from ?

If we update them now, then we basically raise the requirement from GCP default to something better. Personally i think we can keep it for now since node operators rely on this and use it quite often.

Yeah, I concur with you that we probably don't want to change the values for existing benchmarks, although I'm not sure where should I generate the new one I want to add.

Signed-off-by: Alexandru Gheorghe <[email protected]>
@ggwpez
Copy link
Member

ggwpez commented Jul 25, 2024

You mean our current ref hardware it is not the same as the ones we generated reference_hardware.json from ?

Yea. We migrated the server in the meantime. But the definition of the ref HW in the wiki is still the same, so i think we should keep the values.

Yeah, I concur with you that we probably don't want to change the values for existing benchmarks, although I'm not sure where should I generate the new one I want to add.

Given that the multi core score was pretty much identical to the single thread score on the ref hardware, i think its fine to use the same value in the JSON config file. Any concerns about that?

@alexggh
Copy link
Contributor Author

alexggh commented Jul 25, 2024

Yeah, I concur with you that we probably don't want to change the values for existing benchmarks, although I'm not sure where should I generate the new one I want to add.

Given that the multi core score was pretty much identical to the single thread score on the ref hardware, i think its fine to use the same value in the JSON config file. Any concerns about that?

Yeah, that should be fine as well, although I will try to see if I can get my hands on this type of machine.

@ggwpez
Copy link
Member

ggwpez commented Jul 25, 2024

You can request access by opening an issue here: https://github.com/paritytech/devops/issues. This is the machien: https://github.com/paritytech/devops/pull/3210

Signed-off-by: Alexandru Gheorghe <[email protected]>
@alexggh
Copy link
Contributor Author

alexggh commented Jul 30, 2024

PS: Apparently we did not update these metrics after devops moved the servers from GCP to Scaleway, so these numbers should still be fine for GCP servers: Screenshot 2024-07-25 at 12 55 47

If we update them now, then we basically raise the requirement from GCP default to something better. Personally i think we can keep it for now since node operators rely on this and use it quite often.

Did some measurements and spelunking on the following hardware-types and what I can tell is that the speed-up in benchmark comes actually from the fact that we introduced SIMD optimisations on the benchmarked functions here
Blake2 and SR25519-Verify without updating the reference performance.

Master

+----------+-----------------------+-------------+-------------+-------------------+
| Category | Function              | Score       | Minimum     | Result            |
+==================================================================================+
| CPU      | BLAKE2-256            | 1.00 GiBs   | 783.27 MiBs | ✅ Pass (131.3 %) |
|----------+-----------------------+-------------+-------------+-------------------|
| CPU      | BLAKE2-256-Parallel-8 | 1.00 GiBs   | 783.27 MiBs | ✅ Pass (131.1 %) |
|----------+-----------------------+-------------+-------------+-------------------|
| CPU      | SR25519-Verify        | 701.33 KiBs | 560.67 KiBs | ✅ Pass (125.1 %) |

Without SIMD optimisations

+----------+-----------------------+-------------+-------------+-------------------+
| Category | Function              | Score       | Minimum     | Result            |
+==================================================================================+
| CPU      | BLAKE2-256            | 805.98 MiBs | 783.27 MiBs | ✅ Pass (102.9 %) |

Master

+==================================================================================+
| CPU      | BLAKE2-256            | 969.47 MiBs | 783.27 MiBs | ✅ Pass (123.8 %) |
|----------+-----------------------+-------------+-------------+-------------------|
| CPU      | BLAKE2-256-Parallel-8 | 966.19 MiBs | 783.27 MiBs | ✅ Pass (123.4 %) |
|----------+-----------------------+-------------+-------------+-------------------|
| CPU      | SR25519-Verify        | 583.56 KiBs | 560.67 KiBs | ✅ Pass (104.1 %) |
|----------+-----------------------+-------------+-------------+-------------------|

Without SIMD optimizations

+----------+-----------------------+-------------+-------------+-------------------+
| Category | Function              | Score       | Minimum     | Result            |
+==================================================================================+
| CPU      | BLAKE2-256            | 782.02 MiBs | 783.27 MiBs | ✅ Pass ( 99.8 %) |
+==================================================================================+
| CPU      | BLAKE2-256            | 1.03 GiBs   | 783.27 MiBs | ✅ Pass (134.3 %) |
|----------+-----------------------+-------------+-------------+-------------------|
| CPU      | BLAKE2-256-Parallel-8 | 1.02 GiBs   | 783.27 MiBs | ✅ Pass (133.8 %) |
|----------+-----------------------+-------------+-------------+-------------------|
| CPU      | SR25519-Verify        | 655.97 KiBs | 560.67 KiBs | ✅ Pass (117.0 %) |
|----------+-----------------------+-------------+-------------+-------------------|

Consequences

Every weight update we did after Blake2 (March 2023) got merged gets the 20-30% cpu speed up, but every validator using the benchmarks to determine if their validator is in parameters gets a false OK, because the reference value have not been increased. That means that there is potential that the weights are understimated with around 20-30%.

What next ?

From my perspective we have a few options here:

  1. We increase the reference hardware benchmarks to reflect the optimisations, the unfortunate immediate consequence of that is that every validator that is around baseline will fail the check, and from https://telemetry.polkadot.io/ these numbers don't seem to be negligible and people will probably not be happy to get this out of the sudden.
    Kusama
Screenshot 2024-07-30 at 16 49 20

Polkadot
Screenshot 2024-07-30 at 16 49 01

  1. Regenerate the weights that got the speed up and future weights on a slower hardware closee to the baseline.
  2. Do nothing now, since this might not be a problem(it's been like that for 1.5y) yet and just use the values with the speed up for the newly introduced parallel benchmark BLAKE2-256-Parallel-8 which isn't planned to be enforced right away. This practically enforce validators to slowly converge to hardware where the single core BLAKE2-256 is also in sync with the reference hardware we use for generating the weights.

@ggwpez @koute @PierreBesson, thoughts ?

@ggwpez
Copy link
Member

ggwpez commented Jul 30, 2024

Then i think we should bump the numbers. Otherwise we silently and accidentally reduced the single core requirements by merging these dependency updates and not updating them.

Good find, thanks for investigating!

@alvicsam
Copy link
Contributor

alvicsam commented Aug 1, 2024

For transparency, CI is still using GCP machines and we are not planning to change it at least until we finish the ci migration.

@alexggh
Copy link
Contributor Author

alexggh commented Aug 1, 2024

For transparency, CI is still using GCP machines and we are not planning to change it at least until we finish the ci migration.

@alvicsam These #5196 updated numbers are pretty similar between GCP and scaleway, overall the conclusion is that the speedup did not came from changing cloud providers, but simply from the optimisations that the code suffered since the reference numbers were last updated.

github-merge-queue bot pushed a commit that referenced this pull request Aug 7, 2024
…5196)

Since `May 2023` after
paritytech/substrate#13548 optimization,
`Blake2256` is faster with about 30%, that means that there is a
difference of ~30% between the benchmark values we ask validators to run
against and the machine we use for generating the weights.So if all
validators, just barely pass the benchmarks our weights are potentially
underestimated with about ~20%, so let's bring this two in sync.

Same thing happened when we merged
#2524 in `Nov 2023`
SR25519-Verify became faster with about 10-15%

## Results

Generated on machine from here:
paritytech/devops#3210
```
+----------+----------------+--------------+-------------+-------------------+
| Category | Function       | Score        | Minimum     | Result            |
+============================================================================+
| CPU      | BLAKE2-256     | 1.00 GiBs    | 783.27 MiBs | ✅ Pass (130.7 %) |
|----------+----------------+--------------+-------------+-------------------|
| CPU      | SR25519-Verify | 637.62 KiBs  | 560.67 KiBs | ✅ Pass (113.7 %) |
|----------+----------------+--------------+-------------+-------------------|
| Memory   | Copy           | 12.19 GiBs   | 11.49 GiBs  | ✅ Pass (106.1 %) |
```

Discovered and discussed here:
#5127 (comment)

## Downsides

Machines that barely passed the benchmark will suddenly find themselves
bellow the benchmark, but since that is just an warning and everything
else continues as before it shouldn't be too impactful and should give
the validators the necessary information that they need to become
compliant, since they actually aren't when compared with the used
weights.

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
polkadot/node/service/src/lib.rs Show resolved Hide resolved
substrate/client/sysinfo/src/lib.rs Outdated Show resolved Hide resolved
substrate/client/sysinfo/src/sysinfo.rs Outdated Show resolved Hide resolved
substrate/client/sysinfo/src/sysinfo.rs Show resolved Hide resolved
substrate/client/sysinfo/src/sysinfo.rs Outdated Show resolved Hide resolved
substrate/client/sysinfo/src/sysinfo.rs Show resolved Hide resolved
substrate/client/sysinfo/src/sysinfo.rs Outdated Show resolved Hide resolved
substrate/client/sysinfo/src/sysinfo.rs Outdated Show resolved Hide resolved
Copy link
Member

@ggwpez ggwpez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to approve now so feel free to merge.

@alexggh
Copy link
Contributor Author

alexggh commented Aug 15, 2024

Going to approve now so feel free to merge.

Thanks, my plan is to merge this only after https://polkadot.subsquare.io/referenda/1051 passes and I do the necessary updates in: https://wiki.polkadot.network/docs/maintain-guides-how-to-validate-polkadot#reference-hardware.

dharjeezy pushed a commit to dharjeezy/polkadot-sdk that referenced this pull request Aug 28, 2024
…aritytech#5196)

Since `May 2023` after
paritytech/substrate#13548 optimization,
`Blake2256` is faster with about 30%, that means that there is a
difference of ~30% between the benchmark values we ask validators to run
against and the machine we use for generating the weights.So if all
validators, just barely pass the benchmarks our weights are potentially
underestimated with about ~20%, so let's bring this two in sync.

Same thing happened when we merged
paritytech#2524 in `Nov 2023`
SR25519-Verify became faster with about 10-15%

## Results

Generated on machine from here:
https://github.com/paritytech/devops/pull/3210
```
+----------+----------------+--------------+-------------+-------------------+
| Category | Function       | Score        | Minimum     | Result            |
+============================================================================+
| CPU      | BLAKE2-256     | 1.00 GiBs    | 783.27 MiBs | ✅ Pass (130.7 %) |
|----------+----------------+--------------+-------------+-------------------|
| CPU      | SR25519-Verify | 637.62 KiBs  | 560.67 KiBs | ✅ Pass (113.7 %) |
|----------+----------------+--------------+-------------+-------------------|
| Memory   | Copy           | 12.19 GiBs   | 11.49 GiBs  | ✅ Pass (106.1 %) |
```

Discovered and discussed here:
paritytech#5127 (comment)

## Downsides

Machines that barely passed the benchmark will suddenly find themselves
bellow the benchmark, but since that is just an warning and everything
else continues as before it shouldn't be too impactful and should give
the validators the necessary information that they need to become
compliant, since they actually aren't when compared with the used
weights.

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
substrate/client/sysinfo/src/lib.rs Outdated Show resolved Hide resolved
substrate/client/sysinfo/src/sysinfo.rs Outdated Show resolved Hide resolved
substrate/client/sysinfo/src/sysinfo.rs Outdated Show resolved Hide resolved
substrate/client/sysinfo/src/sysinfo.rs Outdated Show resolved Hide resolved
@alexggh
Copy link
Contributor Author

alexggh commented Sep 5, 2024

Referenda passed: https://polkadot.subsquare.io/referenda/1051, wiki page updated with w3f/polkadot-wiki#6202, merging it ...

@alexggh alexggh enabled auto-merge September 5, 2024 12:08
@alexggh alexggh added this pull request to the merge queue Sep 5, 2024
Merged via the queue into master with commit a947cb8 Sep 5, 2024
162 of 198 checks passed
@alexggh alexggh deleted the alexggh/increase_cpu_score branch September 5, 2024 13:09
alexggh added a commit that referenced this pull request Sep 6, 2024
ggwpez pushed a commit that referenced this pull request Sep 9, 2024
…5613)

This backports #5127, to
the stable branch.

Unfortunately https://polkadot.subsquare.io/referenda/1051 passed after
the cut-off deadline and I missed the window of getting this PR merged.

The change itself is super low-risk it just prints a new message to
validators that starting with January 2025 the required minimum of
hardware cores will be 8, I see value in getting this in front of the
validators as soon as possible.

Since we did not release things yet and it does not invalidate any QA we
already did, it should be painless to include it in the current release.

(cherry picked from commit a947cb8)
btwiuse pushed a commit to btwiuse/substrate-benchmark-machine that referenced this pull request Dec 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T0-node This PR/Issue is related to the topic “node”.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update polkadot CPU score to reflect 8 cores are minimum required
6 participants