Support for non-i.i.d. data regime in OpenDiLoCo experiments #37

gaseln · 2024-11-26T18:30:08Z

Hello again!

The DiLoCo paper by Arthur Douillard et al. explores the non-i.i.d. data regime in comparison to data parallelism. Could you kindly confirm if OpenDiLoCo supports this setup? If it does, could you please provide guidance on how to configure such an experiment? If not, would you recommend an efficient way to organize it in a similar manner to the approach described in the paper?

Thank you!

Jackmin801 · 2024-12-01T16:54:22Z

So it seems in the original paper they did non-i.i.d by doing k-mean clustering on last layer features from a model. Not sure if they disclose what this model is but I imagine it doesnt matter too much and if i were to guess they probably used bert.

You can then have the different workers load different datasets. Each one loading a different cluster split from the k-means

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for non-i.i.d. data regime in OpenDiLoCo experiments #37

Support for non-i.i.d. data regime in OpenDiLoCo experiments #37

gaseln commented Nov 26, 2024

Jackmin801 commented Dec 1, 2024

Support for non-i.i.d. data regime in OpenDiLoCo experiments #37

Support for non-i.i.d. data regime in OpenDiLoCo experiments #37

Comments

gaseln commented Nov 26, 2024

Jackmin801 commented Dec 1, 2024