Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equal distribution of particles to MPI processors #1231

Closed
wants to merge 5 commits into from

Conversation

JakeNewmanUEA
Copy link

@JakeNewmanUEA JakeNewmanUEA commented Aug 31, 2022

When running simulations in MPI mode involving the repeated release of small quantities of particles (e.g. 100 particles across 10 MPI processors), we would occasionally receive the error 'Cannot initialise with fewer particles than MPI processors'. The cause of this appeared to be the way that K-means was distributing the particles among the MPI processors. Specifically, K-means does not return clusters of a fixed or minimum size, thus it is highly possible that some MPI processors will be allocated fewer than the minimum number of required particles, especially if the number of particles is small and the number of MPI processors approaches the maximum for a given number of particles (e.g. 10 MPI processors is the maximum for 100 particles, as 100 particles / 10 processors = 10 particles per processor). To overcome this, we equally distribute the available particles among the MPI processors using Numpy's linspace method. We create a linearly spaced array that is the same length as the number particles. The array contains decimals ranging from 0 to the number of MPI processors, and that array is then rounded down so that the values represent integer indices to MPI processors. As before, this change requires that the user has requested at least the minimum required number of MPI processors, but it ensures that the minimum number is always sufficient.

This change has only been tested for collectionsoa.py but not collectionaos.py.

Added some stuff
When running simulations in MPI mode involving the repeated release of small quantities of particles (e.g. 100 particles across 10 MPI processors), we would occasionally receive the error 'Cannot initialise with fewer particles than MPI processors'. The cause of this appeared to be the way that K-means was distributing the particles among the MPI processors. Specifically, K-means does not return clusters of a fixed or minimum size, thus it is highly possible that some MPI processors will be allocated fewer than the minimum number of required particles, especially if the number of particles is small and the number of MPI processors approaches the maximum for a given number of particles (e.g. 10 MPI processors is the maximum for 100 particles, as 100 particles / 10 processors = 10 particles per processor). To overcome this, we equally distribute the available particles among the MPI processors using Numpy's linspace method. We create a linearly spaced array that is the same length as the number particles. The array contains decimals ranging from 0 to the number of MPI processors, and that array is then rounded down so that the values represent integer indices to MPI processors. As before, this change requires that the user has requested at least the minimum required number of MPI processors, but it ensures that the minimum number is always sufficient.
Related to pull request OceanParcels#1231. This change has only been tested for collectionsoa.py but not collectionaos.py.

When running simulations in MPI mode involving the repeated release of small quantities of particles (e.g. 100 particles across 10 MPI processors), we would occasionally receive the error 'Cannot initialise with fewer particles than MPI processors'. The cause of this appeared to be the way that K-means was distributing the particles among the MPI processors. Specifically, K-means does not return clusters of a fixed or minimum size, thus it is highly possible that some MPI processors will be allocated fewer than the minimum number of required particles, especially if the number of particles is small and the number of MPI processors approaches the maximum for a given number of particles (e.g. 10 MPI processors is the maximum for 100 particles, as 100 particles / 10 processors = 10 particles per processor). To overcome this, we equally distribute the available particles among the MPI processors using Numpy's linspace method. We create a linearly spaced array that is the same length as the number particles. The array contains decimals ranging from 0 to the number of MPI processors, and that array is then rounded down so that the values represent integer indices to MPI processors. As before, this change requires that the user has requested at least the minimum required number of MPI processors, but it ensures that the minimum number is always sufficient.
@erikvansebille
Copy link
Member

Thanks for creating this PR, @JakeNewmanUEA. Interesting issue that you found; although I wonder what use cases would warrant distribution of 100 particles on 10 cores?

Your solution is in principle elegant; but the problem is that it doesn't take into account the efficiency of the spatial distribution of particles per processor. Since MPI is often run together with FieldSet chunking, it is highly beneficial to put particles that are spatially close together on the same processor, so to minimise the number of chunks that each processor has to load.

See also the Load balancing section of the MPI documentation. Note that actual 'on-the-fly' load-balancing is still a future development; but the KMeans-distribution is a way to at least do some load balancing at the initial distribution of particles over the processors.

So can you come up with a solution that is spatially-aware?

Following pull request OceanParcels#1231: Reinstated K-Means as the method of grouping particle locations. Rather than using the clusters assignments directly, we find the X nearest coordinates to each cluster centroid, where X is the minimum number of particles required to distribute the particles equally among MPI processors. The result is an (approximately) equal number of particles assigned to each MPI processor, with particles being grouped according to their spatial distribution.
Following pull request OceanParcels#1231: Reinstated K-Means as the method of grouping particle locations. Rather than using the clusters assignments directly, we find the X nearest coordinates to each cluster centroid, where X is the minimum number of particles required to distribute the particles equally among MPI processors. The result is an (approximately) equal number of particles assigned to each MPI processor, with particles being grouped according to their spatial distribution.
@JakeNewmanUEA
Copy link
Author

Thanks for creating this PR, @JakeNewmanUEA. Interesting issue that you found; although I wonder what use cases would warrant distribution of 100 particles on 10 cores?

Your solution is in principle elegant; but the problem is that it doesn't take into account the efficiency of the spatial distribution of particles per processor. Since MPI is often run together with FieldSet chunking, it is highly beneficial to put particles that are spatially close together on the same processor, so to minimise the number of chunks that each processor has to load.

See also the Load balancing section of the MPI documentation. Note that actual 'on-the-fly' load-balancing is still a future development; but the KMeans-distribution is a way to at least do some load balancing at the initial distribution of particles over the processors.

So can you come up with a solution that is spatially-aware?

I've reinstated the k-means allocation but now it assigns an equal number of particles to each centroid. Probably still not ideal, but hopefully this is a better solution than no spatial awareness at all or imbalanced particle allocation. At least it still overcomes the original issue I described (the case for which my colleague has offered to elaborate on soon).

@erikvansebille
Copy link
Member

Thanks for this update, @JakeNewmanUEA. But I wonder now whether it would not be more efficient and easier to implement another spatial clustering algorithm altogether, that seeks roughly equal clusters.

This is not my expertise, but from this introduction website, it seems that the SKATER algorithm could be a good candidate? It's implemented in pygeoda, which would then become a dependency for Parcels.

Do you have experience with this? Do you want to give it a try whether pygeoda.skater() works for your case?

@erikvansebille
Copy link
Member

This has now been fixed/implemented in #1414 and #1424, so closing this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants