Skip to content
Martin Jambor edited this page May 27, 2015 · 8 revisions

Introduction

  • HSA-OpenMP work aim towards enabling OpenMP users to target HSA device with minimal effort. This involves one-time-setup of HSA platform, building OpenMP applications using GCC(from hsa branch) and running on a HSA device.
  • NOTE: The initial work started with supporting some of the constructs of OpenMP 3.1 spec

What is HSA

Prerequisites

Hardware

This release is intended for use with any hardware configuration that contains a Kaveri APU. The motherboards must support the FM2+ socket, run latest BIOS version and have the IOMMU enabled in the BIOS. The following is a reference hardware configuration that was used for testing purposes:

  • APU: AMD A10-7850K APU
  • Motherboard: ASUS A88X-PRO motherboard (ATX form factor)
  • Memory: G.SKILL Ripjaws X Series 16GB (2 x 8GB) 240-Pin DDR3 SDRAM DDR3 2133
  • No discrete GPU present in the system

Software

OS

Actual set-up is tested with Ubuntu and openSUSE platform.To download:

HSA prerequisites

Other Software packages

  • build-dependency package. On Ubuntu, run "sudo apt-get build-dep gcc" at shell prompt
  • build-essential package. On Ubuntu, run "sudo apt-get install build-essential" at shell prompt
  • Flex, bison, git, gcc, gcc-c++, make, libelf-dev

GCC

HSA Platform Setup

Setup KFD, firmware for HSA platform

Download

    $ cd ~ 
    $ git clone https://github.com/HSAFoundation/HSA-Drivers-Linux-AMD.git

From here we can install our new image and setup the HSA KFD (the driver for HSA)and reboot to the new kernel.

Install on Ubuntu

KFD and Firmware for Ubuntu is pre-packaged and available in just 'cloned' HSA-Drivers site

    $ cd ~/HSA-Drivers-Linux-AMD
    $ sudo dpkg -i kfd-1.2/ubuntu/*.deb

Install on OpenSuSE Tumbleweed

  • openSUSE Tumbleweed kernel is new enough for HSA and allows you to run it (as opposed to openSUSE 13.2 or older which would need a kernel upgrade).
  • You however probably need radeon kaveri firmware, get it by doing something like the following:
    $ wget http://people.freedesktop.org/~gabbayo/kfd-v1.2/radeon_ucode.tar.gz
    $ tar xzf radeon_ucode.tar.gz
    $ cp -iv radeon/kaveri*.bin /lib/firmware/radeon/

Create a KFD device and reboot

    $ cd ~/HSA-Drivers-Linux-AMD
    $ echo "KERNEL==\"kfd\", MODE=\"0666\"" | sudo tee /etc/udev/rules.d/kfd.rules
    $ sudo reboot
  • After reboot, 'uname -a' will show something like:
Linux <nodename> 3.19.0-031950-generic #201503241132 SMP Tue Mar 24 11:33:39 IST 2015 x86_64 x86_64 x86_64 GNU/Linux

Download HSA Runtime

Now we need a runtime for executing HSAIL code. To get latest runtime:

    $ cd ~ 
    $ git clone https://github.com/HSAFoundation/HSA-Runtime-AMD

Build and Install GCC

  • Pull the GCC sources from hsa branch. Create source, build and installation directory under gcc directory
    $ mkdir gcc
    $ cd gcc
    $ svn co svn://gcc.gnu.org/svn/gcc/branches/hsa src
    $ ./src/contrib/download_prerequisites
  • Build GCC.
    $ cd ..
    $ mkdir build
    $ cd build
    $ ../src/configure --disable-bootstrap --enable-languages=c,c++,fortran --prefix=$(DESTINATION)
    $ make
  • Install GCC - This will install the gcc in $(DESTINATION) directory you specified before
    $ make install

Sanity check HSA platform setup

  • Run kfd_check_installation.sh script available in HSA enabled kernel image that tests HSA setup. If successful, output will look like:
   $  cd ~/HSA-Drivers-Linux-AMD
   $ ./kfd_check_installation.sh
    Kaveri detected:............................Yes
    Kaveri type supported:......................Yes
    Radeon module is loaded:....................Yes
    KFD module is loaded:.......................Yes
    AMD IOMMU V2 module is loaded:..............Yes
    KFD device exists:..........................Yes
    KFD device has correct permissions:.........Yes
    Valid GPU ID is detected:...................Yes
    Can run HSA.................................YES

Run first OpenMP program on HSA

  • Download the samples
  $ git clone https://github.com/HSAFoundation/HSA-OpenMP-GCC-AMD.git
  • Edit,validate, and set setenv.gcc
  $ cd HSA-OpenMP-GCC-AMD/samples
  $ cat setenv.gcc
  #  ADD INSTRUCTIONS HERE.
  $ source setenv.gcc
  • Build and run vectorCopy
  $ cd vectorCopy
  $ make
  $ ./run.sh
    Vector Copy - Passed
  • Build and run matrixMultiply
  $ cd matrixMultiply
  $ make
  $ ./run.sh
    Matrix multiplication  - Passed
  • NOTE1: HSA run time will expect the HSA kernel in object file with the same name as the input file, only with the suffix changed to .o, in the current working directory when executing the program. If you use LTO, there is no input file (such as when compiling from standard input) or the input file name does not have a dot in it, run-time will expect the HSA ELF sections in a file called hsakernel.o. This is a temporary situation and will be fixed,of course.
  • NOTE2: If you also provide the -fdump-tree-ompexp-details option to the compiler, it will create a file with .ompexp suffix which you can search for optimization notes indicating whether the compiler has succeeded in turning OMP loops into kernels stripped off all OMP-generated control flow and suitable for a GPGPU. If it for some reason failed, the note will also give you the reason why. In vectorCopy example, however, it reports success like this:
     omp_veccopy.c:13:12: note: Parallel construct will be turned into an HSA kernel

How to read the textural form of generated BRIG(Kernel)

HSA foundation has tools to assemble (HSAIL to BRIG) and disassemble (BRIG to HSAIL) at https://github.com/HSAFoundation/HSAIL-Tools. Download the HSAIL-Tools, follow the README instructions to build, use the disassembler to read the BRIG generated by GCC

   $ git clone https://github.com/HSAFoundation/HSAIL-Tools
   $ cd libHSAIL
   $ make -j LLVM_CONFIG=llvm-config-3.2
   $ objcopy -O binary -j .brig omp_veccopy.o omp_veccopy.brig
   $ ./build_linux/hsailasm -disassemble omp_veccopy.brig ==> Generates omp_veccopy.hsail

Current Limitations:

Complete support for OpenMP targeting HSA is still ongoing. The current limitations are:

  • Unsupported OpenMP constructs:
    • Non-looping construct like "omp section"
    • Multiple OMP constructs within OMP parallel
    • parallel construct within another parallel construct
    • Schedule kind - Dynamic, guided and runtime
    • Collapse >1
    • Reductions
    • Limited support of OpenMP runtime calls
    • NOTE: If you provide the -fdump-tree-ompexp-details option to the compiler, it will create a file with .ompexp suffix. This will have reason why turning OMP loops into kernels failed.
  • Read/Write of globals in Kernel that is declared in host, is not supported yet. GCC would emit a warning describing about such global variable access. Correctness of program is not guaranteed in such cases.
  • Scope to improve register allocation (and reduce spilling)
  • Function calls: All function calls in a kernel, defined within same compilation unit, gets inlined at >=O1. Across multiple compilation units, one can perform Link time optimization (-flto -flto-partitions=none) to inline those functions.