Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: DIV_BY_ZERO in MatchingFunctionSpacePartitionerLonLatPolygon #244

Merged
merged 1 commit into from
Nov 22, 2024

Conversation

wdeconinck
Copy link
Member

@wdeconinck wdeconinck commented Nov 22, 2024

What happened

FE_DIV_BY_ZERO reported by @cducher with following reproducer:

/*
To run:
mpirun -np 2 --use-hwthread-cpus $exe_dir/atlas_bug.x
*/
#include <vector>

#include "atlas/library.h"
#include "atlas/parallel/mpi/mpi.h"
#include "atlas/meshgenerator.h"
#include "atlas/grid.h"
#include "atlas/grid/Grid.h"
#include "atlas/grid/StructuredGrid.h"
#include "atlas/grid/detail/grid/Gaussian.h"
#include "atlas/grid/Distribution.h"
#include "atlas/grid/Vertical.h"
#include "atlas/functionspace/StructuredColumns.h"
#include "atlas/functionspace/NodeColumns.h"
#include "atlas/field/Field.h"

int gridSizeX_{32};  // grid size X
int gridSizeY_{64};  // grid size Y
int nHalo_{3};       // N halo points
int nLevels_{19};

atlas::functionspace::StructuredColumns setupFunctionSpace() {

    long n_procs = atlas::mpi::comm().size();

    // ------------------ grid ------------------
    std::vector<long> gaussian_x(gridSizeX_,gridSizeY_);
    atlas::StructuredGrid grid = atlas::grid::detail::grid::reduced_gaussian( gaussian_x );
    atlas::grid::Distribution distribution(grid, 
                                           atlas::util::Config("type", "checkerboard") | 
                                           atlas::util::Config("bands", n_procs));

    // ----------- function space ---------------
    return atlas::functionspace::StructuredColumns(grid, 
                                                    distribution, 
                                                    atlas::util::Config("halo", nHalo_) | 
                                                    atlas::util::Config("levels", nLevels_) );

}

int main(int argc, char* argv[]) {

    atlas::initialize(argc,argv);

    atlas::functionspace::StructuredColumns fs_ = setupFunctionSpace();

    atlas::Field field_dummy = fs_.createField<double>(atlas::option::name("dummy") | 
                                                        atlas::option::variables(1) | 
                                                        atlas::option::levels(nLevels_));

    // target mesh and functional space
    atlas::RectangularLonLatDomain rd({140,50}, {-45,45});
    atlas::Grid areaGrid(fs_.grid(), rd); 

    atlas::MeshGenerator outputMeshGen_("structured");
    // Floating point exception
    atlas::Mesh outputMesh_ = outputMeshGen_.generate(
        areaGrid, atlas::grid::MatchingPartitioner(fs_)
    );

    atlas::finalize();
    return 0;
}

Environment

This was run on our HPC2020 system with relevant modules intel/2021.4, hpcx-openmpi/2.9.0
The reproducer is run on e.g. ecinteractive queue with mpirun -np 2 --use-hwthread-cpus.
This seems to be equivalent to MPI_SIZE=2 and OMP_NUM_THREADS=2!

Reason

When using the MatchingFunctionSpacePartitionerLonLatPolygon for small grids with OpenMP,
we encounter a DIV_BY_ZERO.
This is due to the integer chunk_size = grid.size() / (1000 * num_threads) is computed as zero.
In that case really it needs to be chunk_size = grid.size().

@wdeconinck wdeconinck merged commit fc1cf50 into develop Nov 22, 2024
169 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant