chore(bench): improve heuristic to run throughput benchmarks #1868

soonum · 2024-12-13T13:20:36Z

This changes the way we define the number of elements to load in the throughput pipeline.
Light operations needs more elements to saturate a backend. Conversely a heavy operations will require less elements to be able to run in a decent time.
That improvement will dramatically decrease benchmark total duration.

This has been tested with the following operations:

bitand (light operation)
add (mid operation)
div_rem (heavy operation):

Saturation has been tested and successful on the following backends:

CPU
GPU

soonum · 2024-12-13T13:21:33Z

I need a review of the design before exporting it to all other benchmark functions handling throughput variants.

IceTDrinker · 2024-12-13T13:26:13Z

what's the idea of the heuristic ?

Load depending on how many PBS are in an operation ?

soonum · 2024-12-13T14:17:19Z

what's the idea of the heuristic ?

Load depending on how many PBS are in an operation ?

Yes this is the idea. The load is computed based on the number of threads available divided by the number of PBS needed for one operation.
This value is then used a coefficient within the previous implementation. Typically the coefficient has a value greater than 1.0 for quick operations, if the machine is big enough, and a value below 1.0 for slow operations.
That way the number of elements to pass during the throughput benchmark is dynamically generated.

IceTDrinker · 2024-12-13T14:23:51Z

Yes this is the idea. The load is computed based on the number of threads available divided by the number of PBS needed for one operation. This value is then used a coefficient within the previous implementation. Typically the coefficient has a value greater than 1.0 for quick operations, if the machine is big enough, and a value below 1.0 for slow operations. That way the number of elements to pass during the throughput benchmark is dynamically generated.

Did you check the measured throughput with the new loading factor is similar to the old one ?

soonum · 2024-12-13T16:41:44Z

Yes I've just checked and they are the same 🎉

soonum · 2024-12-20T11:08:11Z

get_pbs_count() do not increment the atomic counter on GPU.
Thus I cannot test this new implementation on GPU for now.

IceTDrinker · 2025-01-02T10:48:33Z

get_pbs_count() do not increment the atomic counter on GPU. Thus I cannot test this new implementation on GPU for now.

use the CPU stats, they should be similar I would think

soonum · 2025-01-06T15:20:14Z

The PBS count for GPU is fixed. Ready for review now.

agnesLeroy

Hey! Thanks a lot @soonum! I only have some minor fixes 🙂 do you know how long the throughput benches take now?

tfhe/benches/integer/bench.rs

tfhe/benches/integer/signed_bench.rs

soonum · 2025-01-07T08:28:37Z

Hey! Thanks a lot @soonum! I only have some minor fixes 🙂 do you know how long the throughput benches take now?

Yes, for Cuda benchmarks, we're now down to 46 mins for de-duplicated operations in 64 bits.
I'll launch a full precision on default ops today.

This is done to fill up backend with enough elements to fill the backend and avoid having long execution time for heavy operations like multiplication or division.

IceTDrinker

I saw I was still supposed to review some thing here ?

Btw @soonum just thought about something but enabling the pbs stats can have an impact on CPU performance as we update a single counter via many threads, so I guess there is a need (only for CPU) to do a first pass to measure PBSes for all ops and precision and relaunch throughput loading those data from file to avoid it having an adverse effect on precision I guess

we must not launch latency pbs with the pbs-stats feature

cla-bot bot added the cla-signed label Dec 13, 2024

soonum self-assigned this Dec 13, 2024

soonum requested review from IceTDrinker and agnesLeroy December 13, 2024 13:20

soonum force-pushed the dt/bench/throughput_heuristic branch 8 times, most recently from 1fb02c8 to d3593b0 Compare December 20, 2024 09:48

soonum force-pushed the dt/bench/throughput_heuristic branch 8 times, most recently from 603cd8b to dc59088 Compare January 6, 2025 14:52

soonum marked this pull request as ready for review January 6, 2025 15:19

agnesLeroy reviewed Jan 6, 2025

View reviewed changes

soonum force-pushed the dt/bench/throughput_heuristic branch 3 times, most recently from 0a7b264 to 4c0162f Compare January 9, 2025 10:27

chore(bench): new heuristic to define elements for throughput

edb6501

This is done to fill up backend with enough elements to fill the backend and avoid having long execution time for heavy operations like multiplication or division.

soonum force-pushed the dt/bench/throughput_heuristic branch from 4c0162f to edb6501 Compare January 9, 2025 10:30

soonum requested a review from agnesLeroy January 9, 2025 11:11

IceTDrinker reviewed Jan 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(bench): improve heuristic to run throughput benchmarks #1868

chore(bench): improve heuristic to run throughput benchmarks #1868

soonum commented Dec 13, 2024 •

edited

Loading

soonum commented Dec 13, 2024

IceTDrinker commented Dec 13, 2024

soonum commented Dec 13, 2024

IceTDrinker commented Dec 13, 2024

soonum commented Dec 13, 2024

soonum commented Dec 20, 2024

IceTDrinker commented Jan 2, 2025

soonum commented Jan 6, 2025

agnesLeroy left a comment

soonum commented Jan 7, 2025

IceTDrinker left a comment

chore(bench): improve heuristic to run throughput benchmarks #1868

Are you sure you want to change the base?

chore(bench): improve heuristic to run throughput benchmarks #1868

Conversation

soonum commented Dec 13, 2024 • edited Loading

soonum commented Dec 13, 2024

IceTDrinker commented Dec 13, 2024

soonum commented Dec 13, 2024

IceTDrinker commented Dec 13, 2024

soonum commented Dec 13, 2024

soonum commented Dec 20, 2024

IceTDrinker commented Jan 2, 2025

soonum commented Jan 6, 2025

agnesLeroy left a comment

Choose a reason for hiding this comment

soonum commented Jan 7, 2025

IceTDrinker left a comment

Choose a reason for hiding this comment

soonum commented Dec 13, 2024 •

edited

Loading