Skip to content
This repository has been archived by the owner on Aug 15, 2023. It is now read-only.

Compiling and running accelerated functions (aka shaders)

Jacek Olszak edited this page Feb 29, 2020 · 50 revisions

This is a draft design of accelerated functions (now known as AcceleratedCommands). The feature has been implemented in v0.6.0. The page is left for future reference.

What is an Accelerated Function?

Accelerated Function is a special kind of function executed outside the CPU.

The most obvious alternative to a CPU is the GPU installed on a video card. The architecture of such unit is dramatically different than CPU and it was specifically designed for image processing (mainly for 3D, but it can also be used in 2D). Video card might be used for efficient processing of large number of pixels - millions of operations each frame is not a problem.

But using a GPU is not an answer for every performance issue. GPU is efficient when the number of calls to the GPU driver is relatively small and the program is using batch processing extensively. The game should know in advance what it will draw next frame and it should reorder things accordingly to avoid switching shaders and binding textures. This may seriously complicate how the game or the engine is implemented.

Also there are number of cases where GPU is actually slower than CPU. The cost of executing driver call (such as OpenGL function) may be an order of magnitude higher than directly updating pixels in RAM. This is especially true for small images and textures. And in Pixel Art most (if not all) images are low-res.

But definitely the highest disadvantage of using Accelerated Functions is that you can't use Go programming language to write them. They are also extremely hard to debug and the source code become a mess real quickly.

We want to have an ability to run Accelerated Function for:

  • manipulating large images (such as FullHD)
  • running the post-processing on the whole screen
  • and for all these cases when the CPU can't handle the load anymore

Compiling

Before the Accelerated Function can be used it should be compiled.

One of the most popular API for using video cards is OpenGL. It compiles programs at runtime. You just pass a source code as a string and OpenGL driver compiles it to a form which can be executed later multiple times (I mean million of times). The OpenGL programs are written in GLSL language. Vulkan on the other hand is a bleeding-edge API which supports compiling shaders before the game is executed (as you normally do with Go source code). Shaders are compiled into some kind of intermediate code (aka bytecode) which then is compiled into machine code at runtime. This feature might be useful some day, because it basically allows to write a shader code in any programming language.

Running

When the program is compiled it can be run on the Image. More specifically on the image.Selection. The most usual use case is to modify the whole selection. You can basically run the same function on each pixel from left to right, from top to bottom. Video cards are really good in parallel execution, so they will run this function simultaneously on hundreds of pixels at the same time. On the other hand CPU can run similar function on just few cores (4 to 8 is the usual number of cores on today's computers).

The shader program can take parameters as a normal Go function. Same parameter values are used for reach pixel.

Example code

Low-level example (simplest possible implementation):

program := gl.Compile("color=vec4(red,1.,1.,1.")
call := program.NewCall()
call.SetFloat64("red", 1)
selection.Modify(call)

Game code using some library providing accelerated function:

// beginning of the program
var openGL opengl.OpenGL = ...
blend, err := glblender.Compile(openGL)

// some place in game loop:
source := image.Selection(10,10).WithSize(30,20)
target := image.WholeImageSelection()
blend(source).Into(target)

Library code:

package glblender

import "github.com/jacekolszak/pixiq/glshader" 

func Compile(compiler glshader.Compiler) func(sourceSelection image.Selection) (BlendFunc, error) {
  source := glshader.New(compiler)
  source.AddImageSelection("source")
  source.SetMainSource("color = get(selection, x, y)")
  program, err := source.Compile()
  if err != nil { .. }
  return func(sourceSelection ImageSelection) BlendFunc {
    return BlendFunc{sourceSelection: sourceSelection, program: program}
  }, nil
}

type BlendFunc struct {}

func (f BlendFunc) Into(target image.Selection) {
  func := f.program.NewCall()
  func.SetImageSelection("source", f.sourceSelection)
  target.Modify(func)
}

glshader package code - create shader source code using handy API

package glshader

type Compiler interface {
  Compile(source string) (CompiledProgram, error)
}

type CompiledProgram interface {
	NewCall() GLAcceleratedCall // no pointer here to avoid escaping GLAcceleratedCall to heap
}

type GLAcceleratedCall struct {}

func (f GLAcceleratedCall) SetFloat64(name string, value float64) {}

func (c GLAcceleratedCall) Program() *Program {} // used by opengl package to find a compiled program

type GLAcceleratedCall interface {
  SetTexture(name string, selection image.Selection)
  SetInt(name string, val int)
  // and maybe some day in the future to support rotating and scaling with smoothing and anti-aliasing (but then Pixiq will become full blown 2D graphics API)
  SetTransformedTexture(name string, selection image.Selection, matrix TransformationMatrix)
}

func New(compiler Compiler) *ShaderSource {
  return &ShaderSource(
      compiler: compiler, 
      declarations: "", 
      main: "", 
      functions: "")
}

type ShaderSource struct {compiler Compiler}

func (s *ShaderSource) AddImageSelection(name string) {
  // append declaration of texture and 4 uniforms: x,y,w,h
  s.declarations += fmt.Sprintf("%s texture2D\n", name) 
  s.functions += "func get(s selection, x, y int) { .... } "
}

func (s *ShaderSource)  SetMainSource(main string) {
  s.main = main
}

func (s *ShaderSource) Compile() (*Program, error) {
  compiledProgram := s.compiler.Compile(declarations + main + functions) 
  return ...
}

type Program struct { compiledProgram glshader. CompiledProgram }

func (p *Program) NewCall() *Call {
  return &Call {compiledCall: compiledProgram.NewCall()}
}

type Call struct {compiledCall CompiledCall}

func (c *Call) SetTargetSelection(name string, image.Selection) {
  // validate if given parameter was declared image.Selection
  c.compiledCall.SetTexture(name,  image, x, w, h, h)
}

func (c *Call) Run() {
  c.compiledCall.Run()
}

image package keeps track of what has changed in the image (both in RAM and VRAM)

type AcceleradedCall interface {} // must be empty. image just passing it to accelerate image modify method
func (s image.Selection) Modify(acceleratedCall) {
  acceleratedImageSelection := AcceleratedImageSelection{s.x,s.y,s.w,s.h}
  acceleratedImage.Upload(acceleratedImageSelection,s.image.pixels,...) // only if it was changed in memory
  acceleratedImage.Modify(acceleratedImageSelection,acceleratedCall) // this way only image package can execute accelerateFunc
  // mark area as changed on the video card
}

Change in image package to support uploading/downloading of selections

func (image.Selection) Upload() {} 

type AcceleratedImageSelection struct {
  X,Y,Width,Height int
}

type AcceleratedImage interface {
        Upload(AcceleratedImageSelection, pixels []Color, stride int)
	Download(AcceleratedImageSelection, output []Color, stride int)
        Modify(AcceleratedImageSelection, AcceleradedFunction)
}

Implement compiler in OpenGL

  • Use UNPACK_ROW_LENGTH for setting stride.
type glProgramCall struct {}
func (c *glProgramCall) SetTexture(name string, selection image.Selection) {
  selection.Upload()
  gl.BindTexture(textureIds[selection.Image()])
}

image.Image optimization

  • image.Image can store information which areas of the image have been modified in RAM
  • based on that information it might optimize what is actually uploaded to an AcceleratedImage
  • in the first version we can just store a boolean flag telling whether image has been changed in RAM or not. If it has been changed then Upload method is executed, otherwise nothing happens.
  • a game developer might tweak the performance by directly uploading the whole image selection in advance
  • image.Image can also store information what has been uploaded to the GPU and is potentially stale in RAM
  • next time the Image.Get method is called it might download the stale parts of the image