Skip to content

CLOC Compiler and Sample SDK

Prakash Raghavendra edited this page Jul 30, 2014 · 26 revisions

Introduction

CLOC is a script which helps developers to easily take advantage of HSA accelerators (like GPUs) by writing the host code (running on CPU device) and kernel (running on HSA accelerator device). To keep things simple, we have taken OpenCL-like language to write the kernel and compile into HSAIL/BRIG. This can be loaded and launched on the device using the host program. This is an effort to encourage developers to write simple applications which can show the power of HSA features on AMD platforms.

HSA Foundation has released the CLOC tool (along with required binaries to build the HSAIL/BRIG files from kernels) along with samples.

How to use the CLOC utility to run complete HSA programs

To build a complete HSA program using CLOC tool, you have to follow the following steps:

  1. Download the CLOC utility from the HSA Foundation web site - GITHUB. You can download this from This can be downloaded from https://github.com/HSAFoundation/CLOC
  2. Write the host program using C++ and using the HSAIL Runtime APIs directly to create the kernel, setting the parameters and launching the kernel. You can use the OKRA interface to HSAIL RT (a layer over HSAIL RT) to abstract out the lower level details of HSAIL RT APIs. The OKRA interface can be downloaded at https://github.com/HSAFoundation/Okra-Interface-to-HSA-Device
  3. Create the kernel HSAIL/BRIG using the CLOC script
  4. Build the host program using native C++ compiler (GCC). Link the OKRA runtime library.

An Example

Let us consider an example:

Let us say we have to write a HSA program using CLOC and host CPP program, to compute sum of two vectors of numbers. This operation is inherently parallel, where addition of corresponding vector elements can be added in parallel, by individual GPU thread. This is a classic case where we can utilize the power of GPU compute.

The kernel code The first step is to write the kernel using OpenCL. This would be

kernel void test(global int *a, global int *b, global int *sum) { int id = get_global_id(0); sum[id] = a[id] + b[id]; }

As we can see the above kernel just adds the elements of two input vectors and puts the sum into another vector. The host would create these three vectors and pass the pointers into this kernel. This kernel computes the sum of the vectors.

Host Program Let us look at the host side of this program. The host program has to