You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The DiLoCo paper by Arthur Douillard et al. explores the non-i.i.d. data regime in comparison to data parallelism. Could you kindly confirm if OpenDiLoCo supports this setup? If it does, could you please provide guidance on how to configure such an experiment? If not, would you recommend an efficient way to organize it in a similar manner to the approach described in the paper?
Thank you!
The text was updated successfully, but these errors were encountered:
So it seems in the original paper they did non-i.i.d by doing k-mean clustering on last layer features from a model. Not sure if they disclose what this model is but I imagine it doesnt matter too much and if i were to guess they probably used bert.
You can then have the different workers load different datasets. Each one loading a different cluster split from the k-means
Hello again!
The DiLoCo paper by Arthur Douillard et al. explores the non-i.i.d. data regime in comparison to data parallelism. Could you kindly confirm if OpenDiLoCo supports this setup? If it does, could you please provide guidance on how to configure such an experiment? If not, would you recommend an efficient way to organize it in a similar manner to the approach described in the paper?
Thank you!
The text was updated successfully, but these errors were encountered: