Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Programming page #325

Merged
merged 4 commits into from
Aug 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion _sidebar.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
* [Bash](/best_practices/language_guides/bash.md)
* [JavaScript and TypeScript](/best_practices/language_guides/javascript.md)
* [Python](/best_practices/language_guides/python.md)
* [OpenCL and CUDA](/best_practices/language_guides/opencl_cuda.md)
* [GPU](/best_practices/language_guides/opencl_cuda.md)
* [R](/best_practices/language_guides/r.md)
* [C and C++](/best_practices/language_guides/ccpp.md)
* [Fortran](/best_practices/language_guides/fortran.md)
Expand Down
174 changes: 124 additions & 50 deletions best_practices/language_guides/opencl_cuda.md
Original file line number Diff line number Diff line change
@@ -1,69 +1,143 @@
# OpenCL & CUDA
# GPU Programming Languages

*Page maintainer: Alessio Sclocco* [@isazi](https://github.com/isazi)


## Sources for learning
*please add university courses and informative videos*
* Parallel Reduction [[Slides](http://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/reduction/doc/reduction.pdf)]
* GPU Memory bootcamp - Tony Scudiero [[git repo](https://github.com/tscudiero/MemBootcamp)]
* Best Practices [[Slides](https://github.com/tscudiero/MemBootcamp/blob/master/Slides/S5353_Scudiero_Bootcamp1.pdf)] [[Video](http://on-demand.gputechconf.com/gtc/2015/video/S5353.html)]
* Beyond the Best Practices [[Slides](https://github.com/tscudiero/MemBootcamp/blob/master/Slides/S5376-Scudiero_Bootcamp2.pdf)] [[Video](http://on-demand.gputechconf.com/gtc/2015/video/S5376.html)]
* Collaborative Access Patterns [[Slides](https://github.com/tscudiero/MemBootcamp/blob/master/Slides/S6181-Scudiero_Bootcamp3.pdf)] [[Video](http://on-demand.gputechconf.com/gtc/2016/video/s6181-tony-scudiero-bootcamp-3.mp4)]
* CUB: CUDA Collective primitives library [[Git](https://github.com/NVlabs/cub)] [[Slides](http://on-demand.gputechconf.com/gtc/2015/presentation/S5617-Duane-Merrill.pdf)] [[Video](http://on-demand.gputechconf.com/gtc/2015/video/S5617.html)]
* Best Practices Guide by PRACE [[HTML](https://prace-ri.eu/training-support/best-practice-guides/best-practice-guide-gpgpu/)] [[PDF](https://prace-ri.eu/wp-content/uploads/Best-Practice-Guide_GPGPU.pdf)]

## Learning Resources

* Carpentries GPU Programming course
* [Lesson material](https://carpentries-incubator.github.io/lesson-gpu-programming/)
* Introduction to CUDA C
* [Slides](http://developer.download.nvidia.com/compute/developertrainingmaterials/presentations/cuda_language/Introduction_to_CUDA_C.pptx)
* [Video](http://on-demand.gputechconf.com/gtc/2012/video/S0624-Monday-Introduction-to-CUDA-C.mp4)
* Introduction to OpenACC
* [Slides](http://developer.download.nvidia.com/compute/developertrainingmaterials/presentations/openacc/Introduction_To_OpenACC.pptx)
* Introduction to HIP Programming
* [Video](https://www.youtube.com/watch?v=3ejUwypP0bI)
* SYCL Introduction and Best Practices
* [Video](https://www.youtube.com/watch?v=TbkrODiVDQY)
* CSCS GPU Programming with Julia
* [Course recordings](https://github.com/omlins/julia-gpu-course)

## Documentation
* OpenCL specification [[1.2](https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/)] [[2.0](https://www.khronos.org/registry/cl/sdk/2.0/docs/man/xhtml/)]
* CUDA Toolkit [[latest](http://docs.nvidia.com/cuda/index.html)]
* [CUDA Programming Guide](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html)
* [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)


## Source-to-source translation between CUDA and OpenCL
* vtsynergy (https://github.com/vtsynergy)
* This was shown to work on DAS5 after copying /usr/include/limits.h to $PWD and commenting out the lines around # include_next (122-125) :
"cu2cl-tool host_code.cc device_code.cu -- -DGPU_ON -I$PWD:/usr/include -I/usr/lib/gcc/x86_64-redhat-linux/4.8.2/include".
* cutocl (https://github.com/benvanwerkhoven/cutocl)


## Overview of libraries
* OpenCL-based libraries
* [CLBlast](https://github.com/CNugteren/CLBlast)
* [clFFT](https://github.com/clMathLibraries/clFFT)
* CUDA-based libraries
* CUDA
* [C programming guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html)
* [Runtime API](https://docs.nvidia.com/cuda/cuda-runtime-api/)
* [Driver API](https://docs.nvidia.com/cuda/cuda-driver-api/index.html)
* [Fortran programming guide](https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/index.html)
* HIP
* [Kernel language syntax](https://rocm.docs.amd.com/projects/HIP/en/latest/reference/kernel_language.html)
* [Runtime API](https://rocm.docs.amd.com/projects/HIP/en/latest/.doxygen/docBin/html/modules.html)
* SYCL
* [Specification](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html)
* [Reference guide](https://www.khronos.org/files/sycl/sycl-2020-reference-guide.pdf)
* OpenCL
* [Guide](https://github.com/KhronosGroup/OpenCL-Guide)
* [API](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html)
* [OpenCL C specification](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html)
* [Reference guide](https://www.khronos.org/files/opencl30-reference-guide.pdf)
* OpenACC
* [Programming guide](https://www.openacc.org/sites/default/files/inline-files/OpenACC_Programming_Guide_0_0.pdf)
* [Reference guide](https://www.openacc.org/sites/default/files/inline-files/API%20Guide%202.7.pdf)
* OpenMP
* [Reference guide](https://www.openmp.org/wp-content/uploads/OpenMPRef-5.0-111802-web.pdf)

## Overview of Libraries

* CUDA
* [cuBLAS](http://docs.nvidia.com/cuda/cublas/index.html)
* [NVBLAS](http://docs.nvidia.com/cuda/nvblas/index.html)
* [cuFFT](http://docs.nvidia.com/cuda/cufft/index.html)
* [cuGRAPH](https://docs.rapids.ai/api/cugraph/stable/)
* [cuRAND](http://docs.nvidia.com/cuda/curand/index.html)
* [cuSPARSE](http://docs.nvidia.com/cuda/cusparse/index.html)
* HIP
* [hipBLAS](https://rocm.docs.amd.com/projects/hipBLAS/en/latest/index.html)
* [hipFFT](https://rocm.docs.amd.com/projects/hipFFT/en/latest/index.html)
* [hipRAND](https://rocm.docs.amd.com/projects/hipRAND/en/latest/index.html)
* [hipSPARSE](https://rocm.docs.amd.com/projects/hipSPARSE/en/latest/index.html)
* SYCL
* [OneAPI BLAS](https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-0/blas-routines.html#GUID-F277361F-49BA-437B-8100-3D2B6BBC3CC1)
* [OneAPI FFT](https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-0/fourier-transform-functions.html#GUID-BB8891E9-D3D3-40B9-BBB1-2390C4340CDA)
* [OneAPI sparse](https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-0/sparse-blas-routines.html#GUID-7123D31B-9C7F-4BA7-9792-02A417293E60)
* [OneAPI random number generators](https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-0/random-number-generators.html#GUID-FFC80D12-C323-4A9F-83E3-D0ACDB686876)
* OpenCL
* [CLBlast](https://github.com/CNugteren/CLBlast)
* [clFFT](https://github.com/clMathLibraries/clFFT)

## Source-to-source Translation

* CUDA to HIP
* [hipify](https://github.com/ROCm/HIPIFY)
* CUDA to SYCL
* [SYCLomatic](https://github.com/oneapi-src/SYCLomatic)
* CUDA to OpenCL
* [cutocl](https://github.com/benvanwerkhoven/cutocl)

## Foreign Function Interfaces

* C++
* CUDA
* [cudawrappers](https://github.com/nlesc-recruit/cudawrappers)
* OpenCL
* [CLHPP](https://github.com/KhronosGroup/OpenCL-CLHPP)
* Python
* CUDA
* [PyCuda](https://mathema.tician.de/software/pycuda/)
* [CuPy](https://cupy.dev/)
* [cuda-python](https://nvidia.github.io/cuda-python/)
* HIP
* [PyHIP](https://github.com/jatinx/PyHIP)
* SYCL
* [dpctl](https://github.com/IntelPython/dpctl)
* OpenCL
* [PyOpenCL](https://mathema.tician.de/software/pycuda/)
* Julia
* CUDA
* [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl)
* HIP
* [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl)
* SYCL
* [oneAPI.jl](https://github.com/JuliaGPU/oneAPI.jl)
* Java
* CUDA
* [JCuda](http://www.jcuda.org/)
* OpenCL
* [JOCL](http://www.jocl.org/)

## High-Level Abstractions

* C++
* [Kokkos](https://github.com/kokkos/kokkos)
* [Raja](https://github.com/LLNL/RAJA)
* Python
* [Numba](https://numba.pydata.org/)
* [pykokkos](https://github.com/kokkos/pykokkos)

## Foreign Function Interfaces for CUDA and OpenCL
* C++: [[Cuda](https://github.com/eyalroz/cuda-api-wrappers/)], [[OpenCL](https://github.com/KhronosGroup/OpenCL-CLHPP)]
* Python: [[PyCuda](https://mathema.tician.de/software/pycuda/)], [[PyOpenCL](https://mathema.tician.de/software/pycuda/)]
* Java: [[JCuda](http://www.jcuda.org/)], [[JOCL](http://www.jocl.org/)]

## Debugging and Profiling Tools

## Testing
* Unit Testing
* Example of a unit test for CUDA kernel using the [Kernel Tuner](https://github.com/benvanwerkhoven/kernel_tuner/blob/master/examples/cuda/test_vector_add.py)
* [comparing floating-point results](http://docs.nvidia.com/cuda/floating-point/index.html)
* CUDA
* [Nsight Systems](https://developer.nvidia.com/nsight-systems)
* [Nsight Compute](https://developer.nvidia.com/nsight-compute)
* [CUDA-GDB](http://docs.nvidia.com/cuda/cuda-gdb/index.html)
* [compute-sanitizer](https://docs.nvidia.com/compute-sanitizer/index.html)
* HIP
* [omniperf](https://github.com/AMDResearch/omniperf)
* [rocprof](https://github.com/ROCm/rocprofiler)
* SYCL
* [oneprof](https://github.com/intel/pti-gpu/tree/master/tools/oneprof)
* [onetrace](https://github.com/intel/pti-gpu/tree/master/tools/onetrace)

## Performance Optimization

## Debugging and Profiling Tools
* [Nvidia Visual Profiler](https://developer.nvidia.com/nvidia-visual-profiler) [[User Guide](http://docs.nvidia.com/cuda/profiler-users-guide)]
* [CUDA-GDB](http://docs.nvidia.com/cuda/cuda-gdb/index.html)
* [compute-sanitizer](https://docs.nvidia.com/cuda/compute-sanitizer/index.html)
* [PRACE best practice guide on modern accelerators](https://zenodo.org/records/5839488)
* [CUDA best practices](https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html)
* [OneAPI SYCL best practices](https://www.intel.com/content/www/us/en/docs/oneapi/programming-guide/2023-2/optimize-your-sycl-applications.html)

## Auto-tuning

## Performance Optimization
* Resources:
* Better Performance at Lower Occupancy [[Slides](http://www.nvidia.com/content/gtc-2010/pdfs/2238_gtc2010.pdf)] [[Video](http://on-demand.gputechconf.com/gtc/2010/video/S12238-Better-Performance-at-Lower-Occupancy.mp4)]
* [Maxwell Tuning Guide](http://docs.nvidia.com/cuda/maxwell-tuning-guide)
* [Pascal Tuning Guide](http://docs.nvidia.com/cuda/pascal-tuning-guide)

* Generic Auto Tuners:
* [Kernel Tuner](https://github.com/benvanwerkhoven/kernel_tuner) (Python)
* [CLTune](https://github.com/CNugteren/CLTune) (C++)
* Kernel Tuner
* [GitHub repository](https://github.com/KernelTuner/kernel_tuner)
* [Documentation](https://kerneltuner.github.io/kernel_tuner/stable/)
* [Tutorial](https://github.com/KernelTuner/kernel_tuner_tutorial)
Loading