SYCL-For-CUDA-Examples/examples/distrib_batch_gemm at master · codeplaysoftware/SYCL-For-CUDA-Examples

History

Name		Name	Last commit message	Last commit date
parent directory ..
Makefile		Makefile
README.md		README.md
distributed-batch-gemm.cpp		distributed-batch-gemm.cpp
main.cpp		main.cpp
vadd_cuda.cu		vadd_cuda.cu
vadd_sycl.cpp		vadd_sycl.cpp

README.md

Distributed Batch GEMM example

This example shows how to integrate MPI calls within the SYCL DAG using Host Tasks to distribute Batch GEMM accross MPI process.

Requisites

The Makefile provided assumes the MPICXX compiler points to the DPCPP compiler with CUDA support. That requires the MPI implementation to be built, or use, the DPCPP compiler. The MPI implementation needs to have been built with CUDA support (typically called "CUDA-aware" MPI")

The example uses SYCL-BLAS library to call the GEMM routine. The SYCL-BLAS Library should be compiled by DPCPP compiler to target CUDA backend. The following command line is used to build SYCL-BLAS library:

cmake -GNinja ../ -DTARGET=NVIDIA_GPU -DSYCL_COMPILER=dpcpp -DBLAS_DATA_TYPES=float -DGEMM_VECTORIZATION_SUPPORT=ON -DBLAS_ENABLE_TESTING=OFF -DENABLE_EXPRESSION_TESTS=OFF -DBLAS_ENABLE_BENCHMARK=OFF -DBLAS_VERIFY_BENCHMARK=OFF -DBLAS_BUILD_SAMPLES=OFF

Compilation

If MPICXX points to DPC++ with CUDA support and its on the path, "make" should build the program.

Execution

The makefile contains a target to execute the problem in two processes:

make run

The target assumes mpirun is on the PATH

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distrib_batch_gemm

distrib_batch_gemm

README.md

Distributed Batch GEMM example

Requisites

Compilation

Execution

Files

distrib_batch_gemm

Directory actions

More options

Directory actions

More options

Latest commit

History

distrib_batch_gemm

Folders and files

parent directory

README.md

Distributed Batch GEMM example

Requisites

Compilation

Execution