This is my OpenACC porting for TeaLeaf mini-app.
This particular version of TeaLeaf is written in Fortran, and is derived from https://github.com/UK-MAC/TeaLeaf_ref. The code has been ported to many different programming models, and turns out OpenACC was missing. I ported this as a candidate for an ongoing benchmark development.
Note that TeaLeaf is also available in C : https://github.com/UoB-HPC/TeaLeaf
Hardware:
dual Xeon E5-2640 v2 + Tesla K20
Software:
PGI/19.4 + OpenMPI/3.1.4
CPU Rpeak
about 0.25 Tflops, GPU Rpeak
about 1.1 Tflops
GPU/CPU Rpeak ratio
about 4x
Sampling profile for serial run is provided in profile.txt
MPI 1 rank :
real 1m13.791s
user 1m12.640s
sys 0m0.228s
MPI 16 ranks:
real 0m7.688s
user 1m59.308s
sys 0m0.820s
OpenACC single K20:
real 0m13.389s
user 0m5.536s
sys 0m2.320s
MPI 16 ranks:
real 5m21.711s
user 85m23.308s
sys 0m7.508s
OpenACC single K20: GPU mem high watermark ~2GB
real 1m37.851s
user 0m58.652s
sys 0m27.748s