generated from VectorInstitute/aieng-template
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pinning a source of randomness in server-side aggregation. #278
Merged
emersodb
merged 13 commits into
main
from
dbe/fixing_some_numpy_randomness_in_aggregation
Nov 11, 2024
Merged
Pinning a source of randomness in server-side aggregation. #278
emersodb
merged 13 commits into
main
from
dbe/fixing_some_numpy_randomness_in_aggregation
Nov 11, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… This reduces randomness in results due to numpy numerical addition not identifically associative
sanaAyrml
reviewed
Nov 6, 2024
jewelltaylor
reviewed
Nov 7, 2024
jewelltaylor
reviewed
Nov 7, 2024
sanaAyrml
approved these changes
Nov 8, 2024
On second thought should we also add
to |
It's actually included in a different PR: See https://github.com/VectorInstitute/FL4Health/pull/251/files (random.py there). Sorry for the confusion. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adding in sorting of results on the server side prior to aggregation. This reduces randomness in results due to numpy numerical addition not identically associative
PR Type
"Fix" (not really but reducing randomness is good)
Short Description
Clickup Ticket(s): N/A
A source of randomness appeared in some of our experiments when using more than 2 clients, in spite of us pinning random seeds for python, numpy, and torch. One culprit for this non-determinism was found on the server-side weight aggregation. It turns out that numpy numerical addition is not associative (https://stackoverflow.com/questions/69616727/why-does-computing-mean-with-numpy-meana-axis-10-differs-from-computing-mea). That is, the order in which you add up floats can make a difference in the numerical precision fluctuations that you see. Because client weights are not strictly ordered (they are ordered by when their message is processed by the server), this can change the order in which weights are added together.
My initial approach was to sort the client results seen by the server by client ID (CID). However, these CIDs are generated deep within Flower by uuid and are, therefore, very hard to pin. So they fluctuate each run, which makes sorting by them useless in preserving summation order. So, in order to do this, I introduced a pseudo sorting approach that should work in deterministic settings. It's not may favorite idea, because it introduces some computation overhead, but it works.
If someone has a less invasive/better idea, I'm very much open to it.
Tests Added
Tested by hand on some of our trajectories to guarantee that the code here fixes the numerical fluctuations. This does not, however, guarantee that there are no other sources of randomness.