-
Notifications
You must be signed in to change notification settings - Fork 4
Home
- HSA-OpenMP work aim towards enabling OpenMP users to target HSA device with minimal effort. This involves one-time-setup of HSA platform, building OpenMP applications using GCC(from hsa branch) and running on a HSA device.
- NOTE: The initial work started with supporting some of the constructs of OpenMP 3.1 spec
- If you are new to HSA, refer http://developer.amd.com/resources/heterogeneous-computing/what-is-heterogeneous-system-architecture-hsa/ to familiarize yourself
- Detailed material presented at ISCA - http://www.slideshare.net/hsafoundation/isca-2014-heterogeneous-system-architecture-hsa-architecture-and-algorithms-tutorial
This release is intended for use with any hardware configuration that contains a Kaveri APU. The motherboards must support the FM2+ socket, run latest BIOS version and have the IOMMU enabled in the BIOS. The following is a reference hardware configuration that was used for testing purposes:
- APU: AMD A10-7850K APU
- Motherboard: ASUS A88X-PRO motherboard (ATX form factor)
- Memory: G.SKILL Ripjaws X Series 16GB (2 x 8GB) 240-Pin DDR3 SDRAM DDR3 2133
- No discrete GPU present in the system
Actual set-up is tested with Ubuntu and openSUSE platform.To download:
- Ubuntu : 14.04 64-bit edition available at http://www.ubuntu.com/download
- openSUSE : Install x86_64 openSUSE Tumbleweed from https://en.opensuse.org/Portal:Tumbleweed
- HSA enabled kernel image for kfd 1.2 available at https://github.com/HSAFoundation/HSA-Drivers-Linux-AMD (openSUSE tumbleweed kernel works too, you still need to provide the firmware and create KFD device as described below, though).
- build-dependency package. On Ubuntu, run "sudo apt-get build-dep gcc" at shell prompt
- build-essential package. On Ubuntu, run "sudo apt-get install build-essential" at shell prompt
- Flex, bison, git, gcc, gcc-c++, make, libelf-dev
- GCC with HSA support available in 'hsa' branch at http://gcc.gnu.org/svn/gcc/branches/hsa/
There is also a README in svn repository with recipe instructions (Ref: https://gcc.gnu.org/viewcvs/gcc/branches/hsa/gcc/README.hsa?view=markup)
- Unless you run Linux kernel 3.19 or newer, you most probably need a special HSA enabled kernel image
- Please refer the section "Installing and configuring the kernel" in https://github.com/HSAFoundation/HSA-Drivers-Linux-AMD for up-to-date kfd installation instruction
$ cd ~
$ git clone https://github.com/HSAFoundation/HSA-Drivers-Linux-AMD.git
From here we can install our new image and setup the HSA KFD (the driver for HSA)and reboot to the new kernel.
KFD and Firmware for Ubuntu is pre-packaged and available in just 'cloned' HSA-Drivers site
$ cd ~/HSA-Drivers-Linux-AMD
$ sudo dpkg -i kfd-1.2/ubuntu/*.deb
- openSUSE Tumbleweed kernel is new enough for HSA and allows you to run it (as opposed to openSUSE 13.2 or older which would need a kernel upgrade).
- You however probably need radeon kaveri firmware, get it by doing something like the following:
$ wget http://people.freedesktop.org/~gabbayo/kfd-v1.2/radeon_ucode.tar.gz
$ tar xzf radeon_ucode.tar.gz
$ cp -iv radeon/kaveri*.bin /lib/firmware/radeon/
$ cd ~/HSA-Drivers-Linux-AMD
$ echo "KERNEL==\"kfd\", MODE=\"0666\"" | sudo tee /etc/udev/rules.d/kfd.rules
$ sudo reboot
- After reboot, 'uname -a' will show something like:
Linux <nodename> 3.19.0-031950-generic #201503241132 SMP Tue Mar 24 11:33:39 IST 2015 x86_64 x86_64 x86_64 GNU/Linux
Now we need a runtime for executing HSAIL code. To get latest runtime:
$ cd ~
$ git clone https://github.com/HSAFoundation/HSA-Runtime-AMD
- Pull the GCC sources from hsa branch. Create source, build and installation directory under gcc directory
$ mkdir gcc
$ cd gcc
$ svn co svn://gcc.gnu.org/svn/gcc/branches/hsa src
- Pull mpc, mpfr, gmp pre-requisites required for GCC build. If you still face issues in building GCC, refer exhaustive list of prerequisites at https://gcc.gnu.org/install/prerequisites.html
$ ./src/contrib/download_prerequisites
- Build GCC.
$ cd ..
$ mkdir build
$ cd build
$ ../src/configure --disable-bootstrap --enable-languages=c,c++,fortran --prefix=$(DESTINATION)
$ make
- Install GCC - This will install the gcc in $(DESTINATION) directory you specified before
$ make install
- Run kfd_check_installation.sh script available in HSA enabled kernel image that tests HSA setup. If successful, output will look like:
$ cd ~/HSA-Drivers-Linux-AMD
$ ./kfd_check_installation.sh
Kaveri detected:............................Yes
Kaveri type supported:......................Yes
Radeon module is loaded:....................Yes
KFD module is loaded:.......................Yes
AMD IOMMU V2 module is loaded:..............Yes
KFD device exists:..........................Yes
KFD device has correct permissions:.........Yes
Valid GPU ID is detected:...................Yes
Can run HSA.................................YES
- Download the samples
$ git clone https://github.com/HSAFoundation/HSA-OpenMP-GCC-AMD.git
- Edit,validate, and set setenv.gcc
$ cd HSA-OpenMP-GCC-AMD/samples
$ cat setenv.gcc
# ADD INSTRUCTIONS HERE.
$ source setenv.gcc
- Build and run vectorCopy
$ cd vectorCopy
$ make
$ ./run.sh
Vector Copy - Passed
- Build and run matrixMultiply
$ cd matrixMultiply
$ make
$ ./run.sh
Matrix multiplication - Passed
- NOTE1: HSA run time will expect the HSA kernel in object file with the same name as the input file, only with the suffix changed to .o, in the current working directory when executing the program. If you use LTO, there is no input file (such as when compiling from standard input) or the input file name does not have a dot in it, run-time will expect the HSA ELF sections in a file called hsakernel.o. This is a temporary situation and will be fixed,of course.
- NOTE2: If you also provide the -fdump-tree-ompexp-details option to the compiler, it will create a file with .ompexp suffix which you can search for optimization notes indicating whether the compiler has succeeded in turning OMP loops into kernels stripped off all OMP-generated control flow and suitable for a GPGPU. If it for some reason failed, the note will also give you the reason why. In vectorCopy example, however, it reports success like this:
omp_veccopy.c:13:12: note: Parallel construct will be turned into an HSA kernel
HSA foundation has tools to assemble (HSAIL to BRIG) and disassemble (BRIG to HSAIL) at https://github.com/HSAFoundation/HSAIL-Tools. Download the HSAIL-Tools, follow the README instructions to build, use the disassembler to read the BRIG generated by GCC
$ git clone https://github.com/HSAFoundation/HSAIL-Tools
$ cd libHSAIL
$ make -j LLVM_CONFIG=llvm-config-3.2
$ objcopy -O binary -j .brig omp_veccopy.o omp_veccopy.brig
$ ./build_linux/hsailasm -disassemble omp_veccopy.brig ==> Generates omp_veccopy.hsail
Complete support for OpenMP targeting HSA is still ongoing. The current limitations are:
- Unsupported OpenMP constructs:
- Non-looping construct like "omp section"
- Multiple OMP constructs within OMP parallel
- parallel construct within another parallel construct
- Schedule kind - Dynamic, guided and runtime
- Collapse >1
- Reductions
- Limited support of OpenMP runtime calls
- NOTE: If you provide the -fdump-tree-ompexp-details option to the compiler, it will create a file with .ompexp suffix. This will have reason why turning OMP loops into kernels failed.
- Read/Write of globals in Kernel that is declared in host, is not supported yet. GCC would emit a warning describing about such global variable access. Correctness of program is not guaranteed in such cases.
- Scope to improve register allocation (and reduce spilling)
- Function calls: All function calls in a kernel, defined within same compilation unit, gets inlined at >=O1. Across multiple compilation units, one can perform Link time optimization (-flto -flto-partitions=none) to inline those functions.