This git repository contains the SyCLDslash package. SyCLDslash depends for its testing reference implementatins on QDP++. To build this code it is recommended to perform a recursive checkout of the repository:
github.com:bjoo/SyCLDslashWorkspace.git
which contains QDP++ and its dependencies
In addition SyCLDslash
depends on googletest
and includes it as a sub-module.
It is highly recommended that this library be checked out using
the --recursive
option to Git so that the googletest
submodule
is checked out. Alternatively consider checking out SyCLDslashWorkspace
.
This code builds with CMake.
Useful CMake options are:
-
-DQDPXX_DIR=<qdp-install-location>/share
- points to theshare
directory of an installation of QDP++ which contains aFindQDPXX.cmake
file. -
-DMG_FORTRANLIKE_COMPLEX=ON
enables (RIRIRIRI) storage for complex numbers in vectors. Turning it to 'OFF' selects a more vector oriented storage order of (RRRR...)(IIII...) -
-DMG_USE_NEIGHBOR_TABLE=OFF
(if set to 'ON` would use a neighbour table that is actualy a lookup table. This option is not currently implemented properly, neighbor indices are explicitly computed, and this option should remain 'OFF') -
-DMG_USE_LAYOUT_LEFT=OFF
( if set toON
selects left index fastest indexing for View objects, if set toOFF
uses right fastest indexing, following Kokkos. ) -
-DMG_USE_COMPUTE_CPP=ON
( set this toON
if using the ComputeCPP compiler from codeplay, and specify the ComputeCpp install directory using-DComputeCpp_DIR
-- this sets some extra CMake magic to drive their compiler chain ) -
-DComputeCpp_DIR=<path>
- set this to the install location of the ComputeCPP compiler if using.
Once building is complete the executables will be in build/build_sycl_dslash/tests
Two executables are of primary interest: test_dslash_sycl
and test_dslash_sycl_vperf
.
The test_dslash_sycl
app performs some unit testing of the dslash operator for a selection
of vector lenghts.
The performance test is test_dslash_sycl_vperf
which will first apply a dslash for the
purposes of JIT-compiling all the components, and then it will apply dslash again to time
a single application. It will use this timing to choose a number of iterations commensurate
to 5 seconds of runtime or minimally 1 iteration if it takes longer than 5 seconds. It will
then perform 5 timing loops.
- The code has been tested only with the Intel LLVM/SyCL compiler which is in development, on a system running an OpenCL runtime. It occasionally crashes OpenCL with the error message:
OpenCL API failed. OpenCL API returns: -34 (CL_INVALID_CONTEXT)
This issue needs to be chased down still.
- Need to figure out how to run this code with other SyCL Compilers e.g. CodePlay etc
The Jefferson Lab License under which this workspace is distributed is in the file LICENSE
Licenses for dependencies are included in their source code and/or in the LICENSES
directory.