-
Notifications
You must be signed in to change notification settings - Fork 3
CLOC Compiler and Sample SDK
CLOC (CL Offline Compiler) is a script that helps developers to easily take advantage of HSA accelerators (like GPUs) by writing the host code (running on CPU device) and kernel (running on HSA accelerator device). To keep things simple, we have taken OpenCL-like language to write the kernel and compile into HSAIL/BRIG. This can be loaded and launched on the device using the host program. This is an effort to encourage developers to write simple applications which can show the power of HSA features on AMD platforms.
HSA Foundation has released the CLOC tool (along with required binaries to build the HSAIL/BRIG files from kernels) along with samples.
- Download the CLOC utility from https://github.com/HSAFoundation/CLOC
- Download the High Level Compiler components - There are two options. a. Open source version - git clone https://github.com/HSAFoundation/HSAIL-HLC-Development b. Closed source version - git clone https://github.com/HSAFoundation/HSAIL-HLC-Stable
- Download HSAIL-Tools from git clone https://github.com/HSAFoundation/HSAIL-Tools. Follow build instructions to build hsailasm. This tool is required to disassemble BRIG generated by CLOC.
- Setting environment variables for CLOC.
- . export HSA_LLVM_PATH=HSAIL-HLC-Developement/bin or HSAIL-HLC-Stable/bin
- export HSA_LIBHSAIL_PATH=HSAIL-Tool/libHSAIL/build_linux
- Compiling a cl kernel file
- Generating kernel.brig -./cloc kernel.cl
- Generating kernel.hsail- ./cloc -hsail kernel.cl
The OpenCL High Level Compiler generates HSAIL for OpenCL kernels. The HLC can generate code in large (64-bit) or small mode (32-bit). Each OpenCL type is mapped to a HSAIL type. All HSAIL types generated by the HLC are naturally aligned. Below are the general rules for mapping
- OpenCL Local/global pointers and size_t are mapped to kernarg_u32 in small mode and kernarg_u64 in large mode. 2.OpenCL char, integer, floats and doubles are mapped to kernarg_u8, kernarg_u32, kernarg_f32 and kernarg_f64 respectively.
- OpenCL Vectors - This depends on whether HSAIL_HLC_Stable or HSAIL_HLC_Development
- HSAIL-HLC-Stable - Vectors are flattened. A float16 is flattened out to 16 different HSAIL arguments of type kernarg32.
- HSAIL-HLC-Development - Vectors are converted to HSAIL Vectors - A float 16 is converted to kernarg_f32 arg[16].
- OpenCL Images are converted to HSAIL types Kernarg_rwimg or kernarg_roimg. The size of the image is 48 bytes and the alignment in the kernel argument buffer is 16 bytes.
- OpenCL Samplers are converted to Kernarg_samp. The size of the sampler is 32 bytes and alignment in the kernel argument buffer is 32 bytes.
Extra arguments in HSAIL_HLC_Stable
Additional arguments The OpenCL HLC/runtime handles global offsets and other features by always adding six additional arguments at the beginning of the argument list. These three additional arguments are of type size_t and are set by the OpenCL runtime. The user must set the first three arguments to zero if he/she does not want to use global offsets in the calculation.
Vectors such as float4 are flattened i.e they are broken down into 4 different arguments. Hence, the user must take care to pass 4 different arguments
The CLOC/examples/hsa directory contain examples that use CLOC + HSA runtime to dispatch kernels to the GPU. The CLOC/examples/okra contain examples the use CLOC + OKRA runtime to dispatch kernels. To build and execute the examples, you need to download the following github repositories
- git clone https://github.com/HSAFoundation/HSA-Runtime-AMD
- git clone https://github.com/HSAFoundation/HSA-Drivers-Linux-AMD
- git clone https://github.com/HSAFoundation/okra
Set the environment variables
- HSA_RUNTIME_PATH= Path to HSA-Runtime-AMD
- HSA_KMT_PATH= Path to HSA-Drivers-linux-AMD/libhsakmt
- HSA_OKRA_PATH= Path to okra
Building HSA example
- cd CLOC/examples/hsa && make all && make test
Building OKRA examples
- cd CLOC/examples/okra && make all && make test
Note : If the HSAIL_HLC_Stable is used for the higher level compiler. Run "make all CFLAGS=-DDUMMY_ARGS=1" instead of "make all".
Suppose you wish to write an HSA program using CLOC and host CPP program to compute sum of two vectors of numbers. This operation is inherently parallel: the addition of corresponding vector elements can be performed in parallel by individual GPU threads. This is a classic case in which the power of GPU compute can be utilized.
The kernel code:
The first step is to write the kernel using OpenCL. This would be:
kernel void test(global int *a, global int *b, global int *sum) { int id = get_global_id(0); sum[id] = a[id] + b[id]; }
As we can see the above kernel just adds the elements of two input vectors and puts the sum into another vector. The host would create these three vectors and pass the pointers into this kernel. This kernel computes the sum of the vectors.
We can use the CLOC utility on this kernel to create the HSAIL/BRIG format.
amd@msdnkv69:~/CLOC/CLOC/example/vector_copy$ ./cloc -p ~/Prakash/BenTest/bin/D2/ -hsail vector_copy.cl
Info: Version: 0.6
Info: OpenCL file: /home/amd/CLOC/CLOC/example/vector_copy/vector_copy.cl
Info: Output file: /home/amd/CLOC/CLOC/example/vector_copy/vector_copy.hsail
Info: Run date: Wed Jul 30 16:29:48 IST 2014
Info: Compile(clc) cl --> bc ...
Info: Disassmbl(llvm-as) bc --> ll ...
Info: Link(llvm-link) bc --> lnkd.bc ...
Info: Optimize(opt) lnkd.bc --> opt.bc ...
Info: llc arch=hsail opt.bc --> brig ...
Info: hsailasm brig --> vector_copy.hsail ...
Info: Done
Now, we have the HSAIL program for this kernel, which can be used in the host side.
Host Program:
Let us look at the host side of this program. We will show this example by using the OKRA interface to HSAIL RT. The OKRA APIs is just a layer over HSAIL RT APIs for simpler usage. The host program has to do the following:
-
Create the OKRA context giving
OkraContext *context = OkraContext::Create(); if (context == NULL) {cout << "...unable to create context\n"; exit(-1);}
-
Create the "kernel" from the HSAIL source that we generated in the previous section by
string sourceFileName = "vector_copy.hsail"; char* vcopySource = buildStringFromSourceFile(sourceFileName); OkraContext::Kernel *kernel = context->createKernel(vcopySource, "&run"); if (kernel == NULL) {cout << "...unable to create kernel\n"; exit(-1);}
-
Create and register the vectors that are to be used for computing the vector additions:
context->registerArrayMemory(vecA, NUMELEMENTS * sizeof(float)); context->registerArrayMemory(vecB, NUMELEMENTS * sizeof(float)); context->registerArrayMemory(vecS, NUMELEMENTS * sizeof(float));
-
Set the arguments to the kernel. We need to set the pointers of three arrays in the same order as in the kernel: kernel->clearArgs(); kernel->pushPointerArg(vecA); kernel->pushPointerArg(vecB); kernel->pushPointerArg(vecS);
-
Set the "NDRange" of the kernel using global and local dimensions of the kernel execution
size_t globalDims[] = {NUMELEMENTS}; size_t localDims[] = {NUMELEMENTS}; kernel->setLaunchAttributes(1, globalDims, localDims); // 1 dimension
-
Dispatch the kernel synchronously to the device and wait for the device to complete execution
kernel->dispatchKernelWaitComplete();
-
Finally, check for the results from the kernel execution
The complete C++ program for this sample can be seen in the SDK supplied along with this CLOC utility.
This CLOC utility just illustrates the usage of HSA programs using OpenCL-like language to write the kernels. It also demonstrates that HSA is a standalone stack and does not require the runtime of any other language runtime (such as OpenCL runtime). One only needs to compile the kernel source into an HSAIL program. If one can write the HSAIL program directly, then one does not even need the device compiler.
However, this simple utility cannot address all the features of OpenCL, such as handling images, global offsets, and global variables. Supporting these features requires the OpenCL run-time.
Getting Started (Under Construction)
[HSA Platforms & Installation] (https://github.com/HSAFoundation/HSA-Docs-AMD/wiki/HSA-Platforms-&-Installation)
[CLOC Compiler and Sample SDK] (https://github.com/HSAFoundation/HSA-Docs-AMD/wiki/CLOC-Compiler-and-Sample-SDK)
[HSA Docker Containers] (https://github.com/HSAFoundation/HSA-Docs-AMD/wiki/HSA-Docker-Containers:---Downloading-and-Running-HSA-on-Docker)
[List of Specifications, Presentations, Drivers, Compilers, Tools, Libraries] (https://github.com/HSAFoundation/HSA-Docs-AMD/wiki/List-of-Specifications,-Presentations,-Drivers,-Compilers,-Tools,-Libraries)
[Performance Tuning] (https://github.com/HSAFoundation/HSA-Docs-AMD/wiki/Performance-Tuning)