Equal distribution of particles to MPI processors #1231

JakeNewmanUEA · 2022-08-31T13:57:42Z

When running simulations in MPI mode involving the repeated release of small quantities of particles (e.g. 100 particles across 10 MPI processors), we would occasionally receive the error 'Cannot initialise with fewer particles than MPI processors'. The cause of this appeared to be the way that K-means was distributing the particles among the MPI processors. Specifically, K-means does not return clusters of a fixed or minimum size, thus it is highly possible that some MPI processors will be allocated fewer than the minimum number of required particles, especially if the number of particles is small and the number of MPI processors approaches the maximum for a given number of particles (e.g. 10 MPI processors is the maximum for 100 particles, as 100 particles / 10 processors = 10 particles per processor). To overcome this, we equally distribute the available particles among the MPI processors using Numpy's linspace method. We create a linearly spaced array that is the same length as the number particles. The array contains decimals ranging from 0 to the number of MPI processors, and that array is then rounded down so that the values represent integer indices to MPI processors. As before, this change requires that the user has requested at least the minimum required number of MPI processors, but it ensures that the minimum number is always sufficient.

This change has only been tested for collectionsoa.py but not collectionaos.py.

Added some stuff

When running simulations in MPI mode involving the repeated release of small quantities of particles (e.g. 100 particles across 10 MPI processors), we would occasionally receive the error 'Cannot initialise with fewer particles than MPI processors'. The cause of this appeared to be the way that K-means was distributing the particles among the MPI processors. Specifically, K-means does not return clusters of a fixed or minimum size, thus it is highly possible that some MPI processors will be allocated fewer than the minimum number of required particles, especially if the number of particles is small and the number of MPI processors approaches the maximum for a given number of particles (e.g. 10 MPI processors is the maximum for 100 particles, as 100 particles / 10 processors = 10 particles per processor). To overcome this, we equally distribute the available particles among the MPI processors using Numpy's linspace method. We create a linearly spaced array that is the same length as the number particles. The array contains decimals ranging from 0 to the number of MPI processors, and that array is then rounded down so that the values represent integer indices to MPI processors. As before, this change requires that the user has requested at least the minimum required number of MPI processors, but it ensures that the minimum number is always sufficient.

Related to pull request OceanParcels#1231. This change has only been tested for collectionsoa.py but not collectionaos.py. When running simulations in MPI mode involving the repeated release of small quantities of particles (e.g. 100 particles across 10 MPI processors), we would occasionally receive the error 'Cannot initialise with fewer particles than MPI processors'. The cause of this appeared to be the way that K-means was distributing the particles among the MPI processors. Specifically, K-means does not return clusters of a fixed or minimum size, thus it is highly possible that some MPI processors will be allocated fewer than the minimum number of required particles, especially if the number of particles is small and the number of MPI processors approaches the maximum for a given number of particles (e.g. 10 MPI processors is the maximum for 100 particles, as 100 particles / 10 processors = 10 particles per processor). To overcome this, we equally distribute the available particles among the MPI processors using Numpy's linspace method. We create a linearly spaced array that is the same length as the number particles. The array contains decimals ranging from 0 to the number of MPI processors, and that array is then rounded down so that the values represent integer indices to MPI processors. As before, this change requires that the user has requested at least the minimum required number of MPI processors, but it ensures that the minimum number is always sufficient.

erikvansebille · 2022-08-31T14:29:25Z

Thanks for creating this PR, @JakeNewmanUEA. Interesting issue that you found; although I wonder what use cases would warrant distribution of 100 particles on 10 cores?

Your solution is in principle elegant; but the problem is that it doesn't take into account the efficiency of the spatial distribution of particles per processor. Since MPI is often run together with FieldSet chunking, it is highly beneficial to put particles that are spatially close together on the same processor, so to minimise the number of chunks that each processor has to load.

See also the Load balancing section of the MPI documentation. Note that actual 'on-the-fly' load-balancing is still a future development; but the KMeans-distribution is a way to at least do some load balancing at the initial distribution of particles over the processors.

So can you come up with a solution that is spatially-aware?

Following pull request OceanParcels#1231: Reinstated K-Means as the method of grouping particle locations. Rather than using the clusters assignments directly, we find the X nearest coordinates to each cluster centroid, where X is the minimum number of particles required to distribute the particles equally among MPI processors. The result is an (approximately) equal number of particles assigned to each MPI processor, with particles being grouped according to their spatial distribution.

JakeNewmanUEA · 2022-09-01T11:50:58Z

Thanks for creating this PR, @JakeNewmanUEA. Interesting issue that you found; although I wonder what use cases would warrant distribution of 100 particles on 10 cores?

Your solution is in principle elegant; but the problem is that it doesn't take into account the efficiency of the spatial distribution of particles per processor. Since MPI is often run together with FieldSet chunking, it is highly beneficial to put particles that are spatially close together on the same processor, so to minimise the number of chunks that each processor has to load.

See also the Load balancing section of the MPI documentation. Note that actual 'on-the-fly' load-balancing is still a future development; but the KMeans-distribution is a way to at least do some load balancing at the initial distribution of particles over the processors.

So can you come up with a solution that is spatially-aware?

I've reinstated the k-means allocation but now it assigns an equal number of particles to each centroid. Probably still not ideal, but hopefully this is a better solution than no spatial awareness at all or imbalanced particle allocation. At least it still overcomes the original issue I described (the case for which my colleague has offered to elaborate on soon).

erikvansebille · 2022-09-02T12:25:31Z

Thanks for this update, @JakeNewmanUEA. But I wonder now whether it would not be more efficient and easier to implement another spatial clustering algorithm altogether, that seeks roughly equal clusters.

This is not my expertise, but from this introduction website, it seems that the SKATER algorithm could be a good candidate? It's implemented in pygeoda, which would then become a dependency for Parcels.

Do you have experience with this? Do you want to give it a try whether pygeoda.skater() works for your case?

erikvansebille · 2023-09-18T09:08:16Z

This has now been fixed/implemented in #1414 and #1424, so closing this PR

JakeNewmanUEA added 3 commits August 31, 2022 10:06

Update collectionsoa.py

be039cb

Added some stuff

JakeNewmanUEA added 2 commits September 1, 2022 12:26

erikvansebille mentioned this pull request Feb 3, 2023

Significantly improved MPI performance with revised partitioning #1300

Closed

erikvansebille closed this Sep 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Equal distribution of particles to MPI processors #1231

Equal distribution of particles to MPI processors #1231

JakeNewmanUEA commented Aug 31, 2022 •

edited

Loading

erikvansebille commented Aug 31, 2022

JakeNewmanUEA commented Sep 1, 2022

erikvansebille commented Sep 2, 2022

erikvansebille commented Sep 18, 2023

Equal distribution of particles to MPI processors #1231

Equal distribution of particles to MPI processors #1231

Conversation

JakeNewmanUEA commented Aug 31, 2022 • edited Loading

erikvansebille commented Aug 31, 2022

JakeNewmanUEA commented Sep 1, 2022

erikvansebille commented Sep 2, 2022

erikvansebille commented Sep 18, 2023

JakeNewmanUEA commented Aug 31, 2022 •

edited

Loading