Skip to content

Options cost estimates

siliconvoodoo edited this page May 17, 2023 · 7 revisions

Introduction

In order to help engineers and technical artists to optimize their projects, o3de has a flexible concept of shader builds based on a tree of "virtual" variants that may physically be there or not. Physically meaning having a bytecode built with hardcoded values for the specific option of that variant. If it's not present, there will be a dynamic fallback where the option is a variable. The root is the bytecode with all options as variables. (Unpacked from a bitfield in an auto-generated fallback key, usually uint4, packed in a constant buffer with the rest of the SRG constants).

Here is the o3de document page about them: https://www.o3de.org/docs/atom-guide/dev-guide/shaders/azsl/shader-variant-options/

The shader management console (SMC) is a tool that allows toying with the presence or abscence of physical bytecodes depending on the options the engineer/designer deems important.

Here is the o3de wiki document user guide for SMC: https://github.com/o3de/o3de/wiki/%5BShader-Management-Console%5D-User-Guide

To help this estimate, we optionally use AMD RGA to count the number of registers used by a variant.
And secondly, AZSLc is cooking an estimate we can call the "impact cost", of an option. With the intention of providing a reasonable default order of priority for variant baking. Hopefully ordering first the baking of variant bytecodes with options-as-constants for the most impactful options first.

Gist

As a support for example, let's take this extract for enhanced PBR forward pass shader of o3de:

void ApplyDirectLighting( SurfaceData_EnhancedPBR  surface, inout  LightingData_BasePBR  lightingData, float4 screenUv)
{
    if( IsDirectLightingEnabled() )
    {
        if (o_enableDirectionalLights)
        {
            ApplyDirectionalLights(surface, lightingData, screenUv);
        }
        if (o_enablePunctualLights)
        {
            ApplySimplePointLights(surface, lightingData);
            ApplySimpleSpotLights(surface, lightingData);
        }
        if (o_enableAreaLights)
        {
            ApplyPointLights(surface, lightingData);
            ApplyDiskLights(surface, lightingData);
            ApplyCapsuleLights(surface, lightingData);
            ApplyQuadLights(surface, lightingData);
            ApplyPolygonLights(surface, lightingData);
        }
    }
    else if(IsDebuggingEnabled_PLACEHOLDER() && GetRenderDebugViewMode() == RenderDebugViewMode::CascadeShadows)
    {
        if (o_enableDirectionalLights)
        {
            ApplyDirectionalLights(surface, lightingData, screenUv);
        }
    }

Options o_enableDirectionalLights, o_enablePunctualLights, o_enableAreaLights are protecting execution of code blocks, therefore we can imagine that the cost of say o_enablePunctualLights option, is the cost of ApplySimplePointLights added to ApplySimpleSpotLights.

This is how AZSLc will estimate the "impact cost" of options.

A case study through manual execution

Let's analyze the practical results we have as of latest PR for that feature: https://github.com/o3de/o3de-azslc/pull/85

First, we need to find the azslc executable, the .azslin file that is prepared by the asset processor. Let's use everything application (unix's locate).
I got a bunch of them because I version the binaries for regression testing. But in your case you want to find the one you just built from the relevant git branch. So it will look like o3de-azslc/build... image

Then about the input shader. First you'll need to have run a project. The editor or the sample viewer for instance, because we need a cache of assets made by the AP. The AP has shader builders that execute complex preparations earlier to azslc invocation, like preprocess, SRG header injections etc. The result is a .azslin. That, is what azslc.exe can digest. (unintuitively, not the source-controlled .azsl from the git repo, they aren't ripe for compilation, shader building is a long chain of tools)

Here is how I went about it, just use a bit of wildcard, sort by size to get the biggest monster, and pick DX12, non customz: image

shift+right click->copy as path.

And here is the command line in my case:

$ "D:\o3de-azslc\build\win_x64\Release\azslc.exe" "D:\o3de-atom-sampleviewer\Cache\pc\materials\types\enhancedpbr_mainpipeline_forwardpass_enhancedlighting_dx12.azslin" --options

The result will look like:

{
   "ShaderOptions":[
      {
         "costImpact":7,
         "defaultValue":"",
         "keyOffset":0,
         "keySize":2,
         "kind":"user-defined",
         "name":"o_opacity_mode",
         "order":0,
         "range":false,
         "type":"OpacityMode",
         "values":[
            "OpacityMode::Opaque",
            "OpacityMode::Cutout",
            "OpacityMode::Blended",
            "OpacityMode::TintedTransparent"
         ]

To list only the costs from that output, we can muster a tiny python filter as such:

import json
import sys

f = json.loads("\n".join(sys.stdin.readlines()))
for so in f["ShaderOptions"]:
	print(so["name"], " ", so["costImpact"])

Now we can pipe the previous command into that python (simplifying paths for readability):

$ azslc.exe enhancedpbr_mainpipeline_forwardpass_enhancedlighting_dx12.azslin --options | py filter.py

result:

o_opacity_mode   7
o_specularF0_enableMultiScatterCompensation   2
o_enableShadows   2489
o_enableDirectionalLights   4192
o_enablePunctualLights   392
o_enableAreaLights   3881
o_enableSphereLights   736
o_enableSphereLightShadows   310
o_enableDiskLights   756
o_enableDiskLightShadows   284
o_enableCapsuleLights   479
o_enableQuadLightLTC   1864
o_enableQuadLightApprox   1843
o_enablePolygonLights   464
o_enableIBL   1
...

Analysis of the analysis

o_enableIBL

Let's take a look at what's going on with o_enableIBL, why does it cost only 1.
When we search through the .azslin file, we find only 2 occurrences, the declaration, and this line:

OUT.m_normal.a = EncodeUnorm2BitFlags(o_enableIBL, o_specularF0_enableMultiScatterCompensation);

The IBL at this stage is only a GBuffer flag, therefore its associated cost is its mere access from constant buffer (at worst).
This is a specificity that the optimizing engineers will have to keep in mind, shaders have their compartmentalized job, but they don't necessarily reflect the whole cost of the feature indicated by the option. Just the cost in that shader.

Lights

How about o_enableDirectionalLights with score 4192, versus o_enableAreaLights with score 3881?
In the previous snippet of code we saw that o_enableDirectionalLights was covering a call to ApplyDirectionalLights and o_enableAreaLights 5 calls to point, disk, capsules, quads and polygons evaluators, which ought to be more complex than directional.

Let's use the --verbose flag of azslc to get more data points.

CLI: azslc.exe enhancedpbr_mainpipeline_forwardpass_enhancedlighting_dx12.azslin --options --verbose | grep 'ApplyDirectionalLights\|o_enableDirectionalLights' | grep -v 'seenat\|new'

The greps are to filter all the prior semantic analysis and symbol registrars.
Result:

214: var decl: o_enableDirectionalLights
10449: register func: /ApplyDirectionalLights full identity: /ApplyDirectionalLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR,?float4)
Analyzing /o_enableDirectionalLights
 /ApplyDirectionalLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR,?float4) non-memoized. discovering cost
 /ApplyDirectionalLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR,?float4) call score 2094 added
 /ApplyDirectionalLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR,?float4) call score 2094 added
/o_enableDirectionalLights final cost 4192
                        "name" : "o_enableDirectionalLights",

The function covered by the option is actually about half the option's reported cost, 2094. It just turns out that in a second if block a bit under the normal shader execution path, there is a debug else case that doubles the estimated cost by re-calling the Apply function. So we can manually correct the estimates, and assume that the "real" score is more 2094 than 4192. Which makes the order of importance: o_enableAreaLights > o_enableDirectionalLights > o_enablePunctualLights for baking variants.

For science, let's compare it to the AreaLights block with that prompt:

$ azslc.exe enhancedpbr_mainpipeline_forwardpass_enhancedlighting_dx12.azslin --options --verbose | grep 'o_enableAreaLights\|ApplyPointLights\|ApplyDiskLights\|ApplyCapsuleLights\|ApplyQuadLights\|ApplyPolygonLights' | grep -v 'seenat\|new'

we get:

217: var decl: o_enableAreaLights
9608: register func: /ApplyPointLights full identity: /ApplyPointLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR)
9865: register func: /ApplyCapsuleLights full identity: /ApplyCapsuleLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR)
10783: register func: /ApplyDiskLights full identity: /ApplyDiskLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR)
11551: register func: /ApplyPolygonLights full identity: /ApplyPolygonLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR)
11845: register func: /ApplyQuadLights full identity: /ApplyQuadLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR)
Analyzing /o_enableAreaLights
 /ApplyPointLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR) non-memoized. discovering cost
 /ApplyPointLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR) call score 767 added
 /ApplyDiskLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR) non-memoized. discovering cost
 /ApplyDiskLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR) call score 787 added
 /ApplyCapsuleLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR) non-memoized. discovering cost
 /ApplyCapsuleLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR) call score 510 added
 /ApplyQuadLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR) non-memoized. discovering cost
 /ApplyQuadLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR) call score 1348 added
 /ApplyPolygonLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR) non-memoized. discovering cost
 /ApplyPolygonLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR) call score 463 added
/o_enableAreaLights final cost 3881
                        "name" : "o_enableAreaLights",

767+787+510+1348+463=3875

Under the hood

How does it work?
The cost estimate is a static analysis. Essentially a statement counter. It postulates approximations such as 1 statement = 1 cost point.
Then it lists all options. For each option it lists all appearances of said option's identifier (called references or seenats) throughout the code. Each apparition site counts for 1 cost point (since it's supposed to be alike to 1 registry read or 1 memory load).
And on top of that it will judge whether the reference is linked to a sub-tree of code. Which means expressions with a dependent block: for, if, while, do, switch.
It can't dinstinguish false positives though, such as: if (false && o_myOption) {code} in that case the costof(code) will be added to the score of o_myOption. To limit this problem, it is heuristically assumed that the deeper the expression, the less effect it has on the final result. So we restrict the level of apparition of the reference in the block expression to a depth of about 6 AST nodes.
In other words something like if (a && (b || (c + (o_myOption * 3) ? var1 : var2))) is just forfeited.

function call statements

The exception to the 1 statement 1 point rule, is for embdedded function call expressions, when such exist, they will be resolved, and the call tree is fully evaluated for cost recursively.
When intrinsic functions are encountered, some exceptional hardcoded cost is applied. Notably CallShader and TraceRay has cost 100. Memory fetch functions such as Sample and Load have cost 10.