Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IMB-NBC and IMB-IO #51

Open
shruticd opened this issue Nov 9, 2023 · 3 comments
Open

IMB-NBC and IMB-IO #51

shruticd opened this issue Nov 9, 2023 · 3 comments

Comments

@shruticd
Copy link

shruticd commented Nov 9, 2023

In IMB-NBC, I receive an integrity fail in Ireduce Scatter with standard Intel Omni Path.
In IMB-IO, P_Write_shared, P_IWrite_Shared, P_READ_Shared, P_IRead_Shared, C_Read_Shared and C_IRead_Shared represent with either a segmentation fault or an integrity fail.
can you please tell why this is happening?

@JuliaRS
Copy link
Contributor

JuliaRS commented Jan 7, 2024

@shruticd hi,

Please give me more information:

  1. Which MPI did you use?
  2. How did you run the benchmarks?
  3. Please attach full output log?

@shruticd
Copy link
Author

shruticd commented Jan 15, 2024

@JuliaRS hi,
I used Mvapich2 - 2.3.7 with psm2 and IMB-v2021.7.
The command I used: mpirun -n 2 ./IMB-IO C_Read_Shared

NBC - Ireduce_Scatter


Intel(R) MPI Benchmarks 2018, MPI-NBC part

Date : Mon Jan 15 12:10:29 2024
Machine : x86_64
System : Linux
Release : 3.10.0-957.1.3.el7.x86_64
Version : #1 SMP Thu Nov 29 14:49:43 UTC 2018
MPI Version : 3.1
MPI Thread Environment:

Calling sequence was:

./IMB-NBC Ireduce_scatter

Minimum message length in bytes: 0
Maximum message length in bytes: 4194304

MPI_Datatype : MPI_BYTE
MPI_Datatype for reductions : MPI_FLOAT
MPI_Op : MPI_SUM

List of Benchmarks to run:

Ireduce_scatter


Benchmarking Ireduce_scatter
processes = 2

   bytes repetitions t_ovrl[usec] t_pure[usec]  t_CPU[usec]   overlap[%]      defects
        0         1000         0.63         0.30         0.30         0.00         0.00

1: Error Ireduce_scatter_pure,size = 4,sample #0
Process 1: Got invalid buffer:
Buffer entry: 0.000000
pos: 0
Process 1: Expected buffer:
Buffer entry: 0.300000
4 1000 1.87 1.00 0.85 0.00 0.00
Application error code 1 occurred
application called MPI_Abort(MPI_COMM_WORLD, 16) - process 1
[cli_1]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 16) - process 1

IO - C_Read_shared

Intel(R) MPI Benchmarks 2018, MPI-IO partn----------------------------------------------------------------

Date : Mon Jan 15 12:14:51 2024
Machine : x86_64
System : Linux
Release : 3.10.0-957.1.3.el7.x86_64
Version : #1 SMP Thu Nov 29 14:49:43 UTC 2018
MPI Version : 3.1
MPI Thread Environment:

Calling sequence was:

./IMB-IO C_Read_Shared

Minimum io portion in bytes: 0
Maximum io portion in bytes: 4194304

List of Benchmarks to run:

C_Read_Shared


Benchmarking C_Read_Shared
processes = 1
( 1 additional process waiting in MPI_Barrier)

MODE: AGGREGATE 

   bytes repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec      defects
        0         1000         5.48         5.48         5.48         0.00         0.00

Fatal error in PMPI_Gather: Invalid buffer pointer, error stack:
PMPI_Gather(929): MPI_Gather(sbuf=0x7ffe51b7bc20, scount=1, MPI_INT, rbuf=(nil), rcount=1, MPI_INT, root=0, comm=0xc4000003) failed
PMPI_Gather(851): Null buffer pointer
[cli_0]: aborting job:
Fatal error in PMPI_Gather: Invalid buffer pointer, error stack:
PMPI_Gather(929): MPI_Gather(sbuf=0x7ffe51b7bc20, scount=1, MPI_INT, rbuf=(nil), rcount=1, MPI_INT, root=0, comm=0xc4000003) failed
PMPI_Gather(851): Null buffer pointer

IO - P_IRead_Shared

Intel(R) MPI Benchmarks 2018, MPI-IO partn----------------------------------------------------------------

Date : Mon Jan 15 12:13:58 2024
Machine : x86_64
System : Linux
Release : 3.10.0-957.1.3.el7.x86_64
Version : #1 SMP Thu Nov 29 14:49:43 UTC 2018
MPI Version : 3.1
MPI Thread Environment:

Calling sequence was:

./IMB-IO P_IREAD_Shared
Minimum io portion in bytes: 0
Maximum io portion in bytes: 4194304

List of Benchmarks to run:

P_IRead_Shared

For nonblocking benchmarks:

Function CPU_Exploit obtains an undisturbed
performance of 745.98 MFlops


Benchmarking P_IRead_Shared
processes = 1
( 1 additional process waiting in MPI_Barrier)

MODE: AGGREGATE 

   bytes repetitions t_ovrl[usec] t_pure[usec]  t_CPU[usec]   overlap[%]      defects
        0         1000      3401.78         0.46      1030.45         0.00         0.00

Fatal error in PMPI_Gather: Invalid buffer pointer, error stack:
PMPI_Gather(929): MPI_Gather(sbuf=0x7fff82c51f60, scount=1, MPI_INT, rbuf=(nil), rcount=1, MPI_INT, root=0, comm=0xc4000003) failed
PMPI_Gather(851): Null buffer pointer
[cli_0]: aborting job:
Fatal error in PMPI_Gather: Invalid buffer pointer, error stack:
PMPI_Gather(929): MPI_Gather(sbuf=0x7fff82c51f60, scount=1, MPI_INT, rbuf=(nil), rcount=1, MPI_INT, root=0, comm=0xc4000003) failed
PMPI_Gather(851): Null buffer pointer

OI - P_IWrite_Shared

Intel(R) MPI Benchmarks 2018, MPI-IO partn----------------------------------------------------------------

Date : Mon Jan 15 12:12:38 2024
Machine : x86_64
System : Linux
Release : 3.10.0-957.1.3.el7.x86_64
Version : #1 SMP Thu Nov 29 14:49:43 UTC 2018
MPI Version : 3.1
MPI Thread Environment:

Calling sequence was:

./IMB-IO P_IWrite_shared

Minimum io portion in bytes: 0
Maximum io portion in bytes: 4194304

List of Benchmarks to run:

P_IWrite_Shared

For nonblocking benchmarks:

Function CPU_Exploit obtains an undisturbed
performance of 753.20 MFlops


Benchmarking P_IWrite_Shared
processes = 1
( 1 additional process waiting in MPI_Barrier)

MODE: AGGREGATE 

   bytes repetitions t_ovrl[usec] t_pure[usec]  t_CPU[usec]   overlap[%]      defects
        0         1000      1081.32         1.53      1004.70         0.00         0.00

[mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 340197 RUNNING AT shrestha1.cdac.in
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

OI - P_Read_Shared

Intel(R) MPI Benchmarks 2018, MPI-IO partn----------------------------------------------------------------

Date : Mon Jan 15 12:13:33 2024
Machine : x86_64
System : Linux
Release : 3.10.0-957.1.3.el7.x86_64
Version : #1 SMP Thu Nov 29 14:49:43 UTC 2018
MPI Version : 3.1
MPI Thread Environment:

Calling sequence was:

./IMB-IO P_READ_Shared

Minimum io portion in bytes: 0
Maximum io portion in bytes: 4194304

List of Benchmarks to run:

P_Read_Shared


Benchmarking P_Read_Shared
processes = 1
( 1 additional process waiting in MPI_Barrier)

MODE: AGGREGATE 

   bytes repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec      defects
        0         1000         0.46         0.46         0.46         0.00         0.00

Fatal error in PMPI_Gather: Invalid buffer pointer, error stack:
PMPI_Gather(929): MPI_Gather(sbuf=0x7ffe753f1000, scount=1, MPI_INT, rbuf=(nil), rcount=1, MPI_INT, root=0, comm=0xc4000003) failed
PMPI_Gather(851): Null buffer pointer
[cli_0]: aborting job:
Fatal error in PMPI_Gather: Invalid buffer pointer, error stack:
PMPI_Gather(929): MPI_Gather(sbuf=0x7ffe753f1000, scount=1, MPI_INT, rbuf=(nil), rcount=1, MPI_INT, root=0, comm=0xc4000003) failed
PMPI_Gather(851): Null buffer pointer

OI - P_Write_Shared

Intel(R) MPI Benchmarks 2018, MPI-IO partn----------------------------------------------------------------

Date : Mon Jan 15 12:12:11 2024
Machine : x86_64
System : Linux
Release : 3.10.0-957.1.3.el7.x86_64
Version : #1 SMP Thu Nov 29 14:49:43 UTC 2018
MPI Version : 3.1
MPI Thread Environment:

Calling sequence was:

./IMB-IO P_Write_shared

Minimum io portion in bytes: 0
Maximum io portion in bytes: 4194304

List of Benchmarks to run:

P_Write_Shared


Benchmarking P_Write_Shared
processes = 1
( 1 additional process waiting in MPI_Barrier)

MODE: AGGREGATE 

   bytes repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec      defects
        0         1000         1.14         1.14         1.14         0.00         0.00

[mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 339463 RUNNING AT shrestha1.cdac.in
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

OI - C_IRead_Shared

Intel(R) MPI Benchmarks 2018, MPI-IO ----------------------------------------------------------------

Date : Mon Jan 15 12:30:04 2024
Machine : x86_64
System : Linux
Release : 3.10.0-957.1.3.el7.x86_64
Version : #1 SMP Thu Nov 29 14:49:43 UTC 2018
MPI Version : 3.1
MPI Thread Environment:

Calling sequence was:

./IMB-IO C_IRead_Shared

Minimum io portion in bytes: 0
Maximum io portion in bytes: 4194304

List of Benchmarks to run:

C_IRead_Shared

For nonblocking benchmarks:

Function CPU_Exploit obtains an undisturbed
performance of 740.11 MFlops


Benchmarking C_IRead_Shared
processes = 1
( 1 additional process waiting in MPI_Barrier)

MODE: AGGREGATE 

   bytes repetitions t_ovrl[usec] t_pure[usec]  t_CPU[usec]   overlap[%]      defects
        0         1000      1016.93         5.48       987.29         0.00         0.00

Fatal error in PMPI_Gather: Invalid buffer pointer, error stack:
PMPI_Gather(929): MPI_Gather(sbuf=0x7ffd7ac58ae0, scount=1, MPI_INT, rbuf=(nil), rcount=1, MPI_INT, root=0, comm=0xc4000003) failed
PMPI_Gather(851): Null buffer pointer
[cli_0]: aborting job:
Fatal error in PMPI_Gather: Invalid buffer pointer, error stack:
PMPI_Gather(929): MPI_Gather(sbuf=0x7ffd7ac58ae0, scount=1, MPI_INT, rbuf=(nil), rcount=1, MPI_INT, root=0, comm=0xc4000003) failed
PMPI_Gather(851): Null buffer pointer

@JuliaRS
Copy link
Contributor

JuliaRS commented Jul 8, 2024

@shruticd did you try to use running with environment variable FI_PROVIDER=tcp?
It might be provider problem.
I checked the same becnhmarks with Intel MPI and it works.
Also, you can try to use IMB2021.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants