Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alltoallv_validation #31

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions alltoallv_validation/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Makefile
aclocal.m4
autom4te.cache
config
config.log
config.status
configure
src/Makefile
src/stamp-h1
src/test_config.h
**.in
**~
**.o
src/alltoallv_ddt
src/sanity
src/.deps
12 changes: 12 additions & 0 deletions alltoallv_validation/Makefile.am
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# -*- makefile -*-
#
# Copyright (c) 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# $HEADER$
#

ACLOCAL_AMFLAGS = -I config

# The reporting subdir must be built before all others

SUBDIRS = src
169 changes: 169 additions & 0 deletions alltoallv_validation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
# Alltoallv Validation of complex datatypes

This test creates a variety of configurations for testing data validation of
the alltoallv collective using non-standard datatypes.

The approach is the following sequence:
- Create some datatype
- Determine the packed size, and allocate both packed and unpacked buffers to
hold the send data.
- Fill the packed buffer with a test pattern, then sendrecv it to the unpacked
send buffer by sending from a MPI_BYTES buffer to the test datatype.
- Perform the alltoallv collective
- Transfer the received data back into a packed format.
- Verify the contents of the packed format using knowledge of what data was
being sent.
- Verify that no buffer under-runs or over-runs occured in the buffers by
checking some guard bytes.

Validation is the only purpose of this test. It should not be used for
performance timing, as many extra memory copies and assignments are performed.
No timing is printed.

The code is written in C++ only to access a predictable random number generator.
All MPI calls are done via C interface.

## Test Overview

Tests are broken into complexity levels.

### Level 1

Level 1 types are composed of basic MPI types like `MPI_CHAR`, `MPI_REAL`,
`MPI_INT64_T` and so forth. The data types are not exhaustive, only 9 are used.
Executing only the level 1 tests will perform only 9 tests: both sending and
receiving the same datatype.

### Level 2

Level 2 types are collections of Level 1 types. There are 7 Level 1 types in
various configurations including:

- increasing the count, using the same type
- contiguous and non-contiguous vectors
- contiguous and non-contiguous vectors with negative stride

Level 2 tests all exchange compatible types, therefore all combinations of the
above are used as send and receive types. With 7 types, Level 2 executes 49
tests.

All level 2 tests are performed with the same basic datatype (MPI_INT).

Note that each "one" of these types is a vector, so setting `--item-count` to 10
really means you are sending 10 vectors each with some number (happens to be 12)
of basic types.

### Level 3

Level 3 tests collections of two different Level 1 types. We test MPI_INT and
MPI_CHAR together. These tests create the type using MPI_Type_create_struct in
various orders and configurations including:
- contiguous and non-contiguous in-order elements
- contiguous and non-contiguous reverse-order elements
- Negative lower bounds
- Padding in extents

There are 6 Level 3 tests, and like Level 2 tests they are all compatible types,
so 36 total tests are executed.

### Level 4

There are two hand-made Level 4 tests. These are composed of several layers of
level 2 and level 3 types in combination with each other to make collections of
different kinds of types in vectors with various paddings and spacings. Best to
read the code for these. They are not cross-compatible, so only 2 tests are
executed.

Again note that these constructed tytes are somewhat large themselves (hundreds
of bytes), so setting a high `--item-count` could result in longer runtimes.

### Total

As of the initial version of this program, there were 96 tests. The
configuration where all ranks send and receive 1 count for only 1 iteration
results in each rank sending and receiving approximately 2.7KBytes of data per
rank during the full test battery.

However there is not so much data that the execution time is unreasonable. Test
execution of 32 ranks on a single host using all default options takes less than
5 seconds, and most ranks send about 630 KBytes.

# Compile
```
$ ./autogen.sh && ./configure && make

$ mpirun -n 13 src/alltoallv_ddt
Rank 0 sent 254104 bytes, and received 265152 bytes.
[OK] All tests passsed. Executed 96 tests with seed 0 with 13 total ranks).

```

# Usage
```
Test alltoallv using various ddt's and validate results.
This test uses pseudo-random sequences from C++'s mt19937 generator.
The test (but not necessarily the implementation) is deterministic
when the options and number of ranks remain the same.
Options:
[-s|--seed <seed>] Change the seed to shuffle which datapoints are exchanged
[-c|--item-count <citems>] Each rank will create <citems> to consider for exchange (default=10).
[-i|--prob-item <prob>] Probability that rank r will send item k to rank q. (0.50)
[-r|--prob-rank <prob>] Probability that rank r will send anything to rank q. (0.90)
[-w|--prob-world <prob>] Probability that rank r will do anything at all. (0.95)
[-t|--iters <iters>] The number of iterations to test each dtype.
[-o|--only <high,low>] Only execute a specific test signified by the pair high,low.
[-v|--verbose=level ] Set verbosity during execution (0=quiet (default). 1,2,3: loud).
[-h|--help] Print this help and exit.
[-z|--verbose-rank] Only the provided rank will print. Default=0. ALL = -1.
```

Some recommended test cases:
```
# no ranks exchange any data
alltoallv_ddt -w 0

# same as alltoall: all ranks exchange same amount of data
alltoallv_ddt -w 1 -r 1 -i 1

# perform a different test each time you run, or repeat the same test:
alltoallv_ddt -s $RANDOM
alltoallv_ddt -s 1234
```

Note since alltoall is a hefty collective, and we go to the trouble of
validating every single message, caution should be used when exercising large
numbers of ranks, large numbers of counts, or large numbers of iterations.

# Debugging

In the case of data validation failure: re-run the test harness on only the
failing test (using `--only` and increase the verbosity up to 3. You may also
need to set the verbosity of a particular rank with `-z`).

For example at verbosity 0, we only know that validation failed on rank 1, but
not which test.

```
mpirun -n 2 src/alltoallv_ddt -z 1 -v 3 -w 1
Rank 1 failed to validate data!
ERROR: Validation failed on rank 1!
```

Setting the rank-specific verbosity to that rank (or to all ranks) and the
verbosity up to 2 reveals some additional details including which test, and what
part of the buffer:

```
$ mpirun -n 2 src/alltoallv_ddt -z 1 -v 3 -w 1
--- Starting test 2,1. Crossing 0 x 0
Rank 1 failed to validate data!
0010: 42-42 99-43 44-44 45-45 46-46 47-47 48-48 49-49 50-50 51-51 -- CORRUPT
0020: 52-52 53-53 54-54 55-55 56-56 57-57 58-58 59-59 60-60 61-61 -- VALID
ERROR: Validation failed on rank 1!
```

Buffer addresses are provided. These are base-10 addresses relative to the
packed representation of the datatype. The first number is what was received,
the second number is what was expected. To avoid too much print-outs,
subsequent CORRUPT lines are skipped and only the next valid line is printed, so
output will allways appear to alternate between CORRUPT and VALID.
1 change: 1 addition & 0 deletions alltoallv_validation/autogen.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
autoreconf -ivf
152 changes: 152 additions & 0 deletions alltoallv_validation/configure.ac
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# -*- shell-script -*-
#
# Copyright (c) 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# $HEADER$
#
# modified from ompi-tests/cxx-test-suite's autoconf by Luke Robison 2024.

#
# Init autoconf
#

AC_PREREQ([2.63])
AC_INIT([alltoallv_validation], [1.0], [[email protected]], [openmpi-cxx-test-suite])
AC_CONFIG_AUX_DIR([config])
AC_CONFIG_MACRO_DIR([config])

#
# Get the version of ompitest that we are configuring
#

echo "Configuring Open MPI C++ test suite"

AM_INIT_AUTOMAKE([1.10 foreign dist-bzip2 no-define])

# If Automake supports silent rules, enable them.
m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])])

# Setup the reporting/ompitest_config.h file

AH_TOP([/* -*- c -*-
*
* ompitest configuation header file.
*
* Function: - OS, CPU and compiler dependent configuration
*/

#ifndef OMPITEST_CONFIG_H
#define OMPITEST_CONFIG_H
])
AH_BOTTOM([#endif /* OMPITEST_CONFIG_H */])

#
# This is useful later
#

AC_CANONICAL_HOST
AC_DEFINE_UNQUOTED(OMPITEST_ARCH, "$host",
[Architecture that we are compiled for])

#
# We always want debugging flags
#
CXXFLAGS="$CXXFLAGS -g"
CFLAGS="$CFLAGS -g"

#
# Get various programs
# C compiler - bias towards mpicc
#

if test "$CC" != ""; then
BASE="`basename $CC`"
else
BASE=
fi
if test "$BASE" = "" -o "$BASE" = "." -o "$BASE" = "cc" -o \
"$BASE" = "gcc" -o "$BASE" = "xlc" -o \
"$BASE" = "icc" -o "$BASE" = "pgcc"; then
AC_CHECK_PROG(HAVE_MPICC, mpicc, yes, no)
if test "$HAVE_MPICC" = "yes"; then
CC=mpicc
export CC
fi
fi

CFLAGS_save="$CFLAGS"
AC_PROG_CC
CFLAGS="$CFLAGS_save"

#
# Get various programs
# C++ compiler - bias towards mpic++, with fallback to mpiCC
#

if test "$CXX" != ""; then
BASE="`basename $CXX`"
else
BASE=
fi
if test "$BASE" = "" -o "$BASE" = "." -o "$BASE" = "CC" -o \
"$BASE" = "g++" -o "$BASE" = "c++" -o "$BASE" = "xlC" -o \
"$BASE" = "icpc" -o "$BASE" = "pgCC"; then
AC_CHECK_PROG(HAVE_MPICPP, mpic++, yes, no)
if test "$HAVE_MPICPP" = "yes"; then
CXX=mpic++
export CXX
else
AC_CHECK_PROG(HAVE_MPICXX, mpiCC, yes, no)
if test "$HAVE_MPICXX" = "yes"; then
CXX=mpiCC
export CXX
fi
fi
fi

CXXFLAGS_save="$CXXFLAGS"
AC_PROG_CXX
CXXFLAGS="$CXXFLAGS_save"

#
# Find ranlib
#

AC_PROG_RANLIB

#
# Ensure that we can compile and link an MPI program
#

# See if we can find <mpi.h>
AC_CHECK_HEADER([mpi.h], [],
[AC_MSG_WARN([Cannot find mpi.h])
AC_MSG_ERROR([Cannot continue])
])

#
# See if we can find the symbol MPI_Init. Be a little smart and use
# AC CHECK_FUNC if we're using mpicc, or AC CHECK_LIB otherwise.
# Aborts if MPI_Init is not found.
#
base=`basename $CC`
bad=0
AS_IF([test "$base" = "mpicc"],
[AC_CHECK_FUNC([MPI_Init], [], [bad=1])],
[AC_CHECK_LIB([mpi], [MPI_Init], [], [bad=1])])

AS_IF([test "$bad" = "1"],
[AC_MSG_WARN([Cannot link against MPI_Init])
AC_MSG_ERROR([Cannot continue])
])

#
# Party on
#

AC_CONFIG_HEADERS([src/test_config.h])
AC_CONFIG_FILES([
Makefile
src/Makefile
])
AC_OUTPUT
16 changes: 16 additions & 0 deletions alltoallv_validation/src/Makefile.am
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# -*- makefile -*-
#

bin_PROGRAMS = \
alltoallv_ddt \
sanity

alltoallv_ddt_SOURCES = \
$(common_sources) \
alltoallv_ddt.cpp

sanity_SOURCES = \
$(common_sources) \
sanity.cpp

common_sources = typemap.c
Loading
Loading