Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support MPI_Aint_add and MPI_Aint_diff #44

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

spoutn1k
Copy link
Contributor

@spoutn1k spoutn1k commented Dec 8, 2022

Hey again,

Sorry for the multiple PRs, but I felt this one is a feature integration rather than the bug fix of my other one, thus deserving a separate PR.

This PR is meant to fix #18 by introducing support of the MPI 3.1 functions MPI_Aint_add and MPI_Aint_diff.

The encountered issue was that OpenMPI's implementation of those functions is a macro, making the symbols absent from the library. This means that when running with preload using --from MPICH/Intel --to OpenMPI, the application crashes with an undefined symbol.

This is resolved by overriding the symbols and preventing the ASM router from doing its work, and rather implementing the OpenMPI macro directly.

Addition to the header

I added the definition found in the MPI3.1 standard of the two functions to interface/interface_utils/include/mpi.h. I have no experience with fortran so I decided not to touch things on the fortran side.

Symbol contents

I implemented the overrides by making sure they matched the output of OpenMPI's mpicc. For the following code:

#include <mpi.h>

MPI_Aint get_diff(MPI_Aint lhs, MPI_Aint rhs) {
  return MPI_Aint_diff(lhs, rhs);
}

MPI_Aint get_sum(MPI_Aint lhs, MPI_Aint rhs) { return MPI_Aint_add(lhs, rhs); }

We get the following assembly, compiled using mpicc -c -g -O0:

$ objdump --disassemble=get_diff aint.o 

aint.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <get_diff>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	48 89 7d f8          	mov    %rdi,-0x8(%rbp)
   8:	48 89 75 f0          	mov    %rsi,-0x10(%rbp)
   c:	48 8b 55 f8          	mov    -0x8(%rbp),%rdx
  10:	48 8b 45 f0          	mov    -0x10(%rbp),%rax
  14:	48 29 c2             	sub    %rax,%rdx
  17:	48 89 d0             	mov    %rdx,%rax
  1a:	5d                   	pop    %rbp
  1b:	c3                   	ret

$ objdump --disassemble=get_sum aint.o 

aint.o:     file format elf64-x86-64


Disassembly of section .text:

000000000000001c <get_sum>:
  1c:	55                   	push   %rbp
  1d:	48 89 e5             	mov    %rsp,%rbp
  20:	48 89 7d f8          	mov    %rdi,-0x8(%rbp)
  24:	48 89 75 f0          	mov    %rsi,-0x10(%rbp)
  28:	48 8b 55 f0          	mov    -0x10(%rbp),%rdx
  2c:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
  30:	48 01 d0             	add    %rdx,%rax
  33:	5d                   	pop    %rbp
  34:	c3                   	ret

So simple additions and subtractions. This is reflected in the overrides:
https://github.com/spoutn1k/wi4mpi/blob/96e458a0733c14ca80156db7ed0e39aa37fe2da1/src/common/override.c#L340-L357

When this applies

The override is enabled only when OpenMPI is used as target, not when any other combination of library is used. The override directives #define MPI_AINT_XXX_OVERRIDE are defined in the following files:

  • interface/header/_OMPI/app_mpi.h
  • preload/header/INTEL_OMPI/app_mpi.h
  • preload/header/MPICH_OMPI/app_mpi.h

What's missing

This PR is my first real code contribution to the project, and I am unsure about certain things:

  • The overrides do not map the MPI_Aint variables. This seemed not to be an issue experimentally, but it might be necessary if they stray further between vendors ? I could not find a aint_a2r or aint_r2a, so I just skipped that step. Let me know if I should look into implementing that.
  • The fortran headers have been left completely untouched. I have the header definitions from the standard (https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf), but no idea where to insert them.

@spoutn1k
Copy link
Contributor Author

spoutn1k commented Dec 8, 2022

Hmm for some reason github added my merge of master as individual commits ... Does anyone know how to fix that ?

@adrien-cotte
Copy link
Contributor

Awesome work, as always.
But, why not putting #define MPI_AINT_XXX_OVERRIDE into override.h ?

@spoutn1k
Copy link
Contributor Author

spoutn1k commented Dec 8, 2022

Thank you :)

The idea is to only override the symbols when necessary. Putting the define in override.h would override the symbols project-wide, while this solution only overrides symbols when the target library is missing them (OpenMPI here).

That way, if another library decides to change the implementation of the symbol, it will be respected.

@ducrotv
Copy link
Contributor

ducrotv commented Sep 12, 2023

This PR is a good fix for this issue. After review we decided to fix this another way: we will add these functions to the generated ones. Thanks again for this work, it will serve as a base to our implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing MPI_Aint_add function
3 participants