Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizations method weighted #23

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

CarlosPenaDePedro
Copy link

@CarlosPenaDePedro CarlosPenaDePedro commented Nov 22, 2024

This pull request highlights changes that can improve the performance of the code. The first change involves iterating over sparse matrices using indexes instead of iterators, which allows for better compiler optimizations and is generally more suited for structures with a known size, such as vectors.

The second change introduces OpenMP parallelization for certain independent sections of the code, such as the treatment of rows, to further improve performance.

These changes are applied only to specific parts of the code relevant to our current interests. However, we understand that if these changes are accepted, a major refactor would be required to standardize their usage across all parts of the code that rely on this iterator structure. Such a refactor would represent a long-term development effort.

We also acknowledge that the iterator structure may have a specific purpose and might not be replaced. Our goal is to showcase the observed improvements on MN5 and explain how we achieved them.

We applied these changes to the ForceMissing calculation (Section 3) and the MissingValueTreatment, specifically the MissingIfHeaviestMissing case (Section 9).

Other relevant sections include:

  1. Section 8, which corresponds to the matrix copy and represents the major bottleneck, as we could not modify it within MIR. Perhaps the eckit copy constructor could be revised to optimize this process.
  2. Section 10, which covers the interpolation calculation itself. This section is already parallelized using OpenMP due to ECKIT_OMP and MIR_OMP being enabled.

This table presents the average burst time for the MIR Method Weighted sections across different servers during an output step of Tco1279-eORCA12 simulations using the DE340 output plans. The data focuses exclusively on the ocean servers, showcasing only eORCA12 data with missing values.

This is the base case.
Histogram base case

Using indexes instead of iterators

  • Section 3: 5.4x Average burst time speedup respect base
  • Section 9: 2.85x Average burst time speedup respect base

Index_refactor

Using indexes and 16 OMP threads

  • Section 3: 29.42x Average burst time speedup respect base
  • Section 9: 18.97x Average burst time speedup respect base

Index_refactor+OMP

@FussyDuck
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Carlos seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment on lines 435 to 441
std::vector<size_t> forceMissing; // reserving size unnecessary (not the general case)
{
auto begin = W.begin(0);
auto end(begin);
for (size_t r = 0; r < W.rows(); r++) {
if (begin == (end = W.end(r))) {
forceMissing.push_back(r);
}
begin = end;
#pragma omp parallel for reduction(vec_merge_sorted:forceMissing)
for (size_t r = 0; r < W.rows(); ++r) {
if (W.outer()[r] == W.outer()[r + 1]) {
forceMissing.push_back(r);
}
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part corresponds to Section 3. If I am not mistaken, this calculation could be performed during matrix creation, as the original empty rows in the weight matrix are part of the geometry. A vector containing the indices of these empty rows could be stored as a private member and cached. This approach could save significant time, especially with larger matrices, where traversing all rows is costly. For example, in the base case profiling with eORCA12, each MIR API call takes approximately 40 ms.

@CarlosPenaDePedro CarlosPenaDePedro marked this pull request as ready for review November 22, 2024 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants