Pinning a source of randomness in server-side aggregation. #278

emersodb · 2024-11-06T22:13:00Z

Adding in sorting of results on the server side prior to aggregation. This reduces randomness in results due to numpy numerical addition not identically associative

PR Type

"Fix" (not really but reducing randomness is good)

Short Description

Clickup Ticket(s): N/A

A source of randomness appeared in some of our experiments when using more than 2 clients, in spite of us pinning random seeds for python, numpy, and torch. One culprit for this non-determinism was found on the server-side weight aggregation. It turns out that numpy numerical addition is not associative (https://stackoverflow.com/questions/69616727/why-does-computing-mean-with-numpy-meana-axis-10-differs-from-computing-mea). That is, the order in which you add up floats can make a difference in the numerical precision fluctuations that you see. Because client weights are not strictly ordered (they are ordered by when their message is processed by the server), this can change the order in which weights are added together.

My initial approach was to sort the client results seen by the server by client ID (CID). However, these CIDs are generated deep within Flower by uuid and are, therefore, very hard to pin. So they fluctuate each run, which makes sorting by them useless in preserving summation order. So, in order to do this, I introduced a pseudo sorting approach that should work in deterministic settings. It's not may favorite idea, because it introduces some computation overhead, but it works.

If someone has a less invasive/better idea, I'm very much open to it.

Tests Added

Tested by hand on some of our trajectories to guarantee that the code here fixes the numerical fluctuations. This does not, however, guarantee that there are no other sources of randomness.

… This reduces randomness in results due to numpy numerical addition not identifically associative

…hub glitch.

fl4health/strategies/basic_fedavg.py

…ays in sort function

fl4health/utils/functions.py

…irst element

sanaAyrml · 2024-11-08T22:59:49Z

On second thought should we also add

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

to set_all_random_seeds in random.py?

emersodb · 2024-11-10T18:03:18Z

On second thought should we also add
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
to set_all_random_seeds in random.py?

It's actually included in a different PR: See https://github.com/VectorInstitute/FL4Health/pull/251/files (random.py there). Sorry for the confusion.

emersodb added 3 commits November 6, 2024 12:55

Adding in sorting of results on the server side prior to aggregation.…

04bb082

… This reduces randomness in results due to numpy numerical addition not identifically associative

CIDs don't work for sorting. So we need something else.

9764642

dropping some useless leftover code

e183a93

emersodb requested review from lotif, fatemetkl, jewelltaylor and sanaAyrml November 6, 2024 22:13

emersodb added 2 commits November 6, 2024 17:31

Grabbing the server logs first for debugging.

551231a

Putting the logs in the right place. Seems like maybe a temporary git…

fc0647a

…hub glitch.

sanaAyrml reviewed Nov 6, 2024

View reviewed changes

fl4health/strategies/basic_fedavg.py Outdated Show resolved Hide resolved

emersodb added 4 commits November 6, 2024 17:51

Fixing a typo in the comments

66d0622

Adding back useful comments that I dropped. Filtering non-float ndarr…

060aadf

…ays in sort function

Fixing small 'issue' with test dtype

aff2949

Fixing a bit of documentation and dropping a few unused pieces of code.

6be23be

jewelltaylor reviewed Nov 7, 2024

View reviewed changes

fl4health/utils/functions.py Outdated Show resolved Hide resolved

jewelltaylor reviewed Nov 7, 2024

View reviewed changes

fl4health/utils/functions.py Outdated Show resolved Hide resolved

emersodb added 2 commits November 7, 2024 12:17

Renaming function in line with PR comments

0dcdfa3

Moving from randomly sampling array elements to simply selected the f…

7c4f970

…irst element

sanaAyrml approved these changes Nov 8, 2024

View reviewed changes

emersodb added 2 commits November 11, 2024 10:01

Merge branch 'main' into dbe/fixing_some_numpy_randomness_in_aggregation

2c0e435

Merge branch 'main' into dbe/fixing_some_numpy_randomness_in_aggregation

fccfda7

emersodb merged commit 23b4224 into main Nov 11, 2024
6 checks passed

emersodb deleted the dbe/fixing_some_numpy_randomness_in_aggregation branch November 11, 2024 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pinning a source of randomness in server-side aggregation. #278

Pinning a source of randomness in server-side aggregation. #278

emersodb commented Nov 6, 2024

sanaAyrml commented Nov 8, 2024 •

edited

Loading

emersodb commented Nov 10, 2024

Pinning a source of randomness in server-side aggregation. #278

Pinning a source of randomness in server-side aggregation. #278

Conversation

emersodb commented Nov 6, 2024

PR Type

Short Description

Tests Added

sanaAyrml commented Nov 8, 2024 • edited Loading

emersodb commented Nov 10, 2024

sanaAyrml commented Nov 8, 2024 •

edited

Loading