Shader Variant Management Ideas #92

santorac · 2022-11-10T23:04:40Z

santorac
Nov 10, 2022
Collaborator

Currently we can have a .shadervariantlist file for each .shader, which simply lists every variant that should be compiled for that shader. The current process is explained pretty well here: https://github.com/o3de/o3de/wiki/%5BAtom%5D-Shader-Management-Console-(SMC)

This process has some gaps, is difficult to use, the workflow will not scale well to dozens of shaders and thousands of variants, and the problem is about to get worse as we introduce the material pipeline feature (see https://github.com/o3de/sig-graphics-audio/blob/main/rfcs/MaterialPipelineAbstraction.md).

In this document we are mainly concerned with material shaders, especially the forward lighting passes, as those tends to have the most VGPR pressure. Different strategies might be necessary for non-material shader variants or if a deferred render pipeline is being used rather than the forward+ pipeline that ships with o3de.

High Level Approaches

Note there are two families of shader options:

Material shader options. These are controlled exclusively by material properties. The various runtime render systems cannot set them directly. So it is possible to (mostly) determine the full set of material shader option permutations that a project cares about by examining offline material data (if a material property value is changed at runtime, this could change a shader option value, but it is pretty uncommon to change a property that impacts shader options, and any performance impact could be mitigated in other ways).
System shader options. These options exist inside material shaders but are not controlled by the the material's properties. These options are only set by various runtime render systems. The build system cannot know what system shader option permutations will be used at runtime.

There are three high level approaches that we can take.

Metrics based.
Material based.
Automatic enumeration.

There are many ways we could combine these different approaches to augment each other and cover a wider range of use cases. For example, the pipeline could start with a .material file to find values for the material shader options, then combine this with metrics about system shader options that have been recently used at runtime, and also enumerate the permutation space for a couple known high-impact shader options.

Keep in mind that the shader variant system supports fallback to a common "root" variant that reads shader option values through a constant buffer. So if a requested shader variant is not found at runtime, the system will always be able to use the root variant to get the exact same visual result, just with potentially lower performance. The shader build pipeline should ensure there are enough baked shader variants available, so the root variant is used infrequently enough to have negligible impact on overall performance.

Metrics Based

The runtime will collect data about what variants are being requested, including both material shader options and system shader options. The AP will compile variants based on the history of what has been requested in the past.

Pros

Avoids compilation of unnecessary shader variants.
Covers both material shader options and system shader options with one feature.

Cons

It could take a long time to passively discover the set of variants that are needed. It's not enough to just explore the physical game level, you also have to explore various configuration states, quality settings, and gameplay states.
Data could become invalidated by shader or material source data changes.
Needs to scale up to a team of developers who are all contributing data to the same "database", and be able to merge.
Requires maintenance to prune old data.

Design possibilities

For each requested variant, we would probably want to store:

The shader option values being requested
The time of last request
The frequency at which the variant is requested (times per day, or times per hour, something like that).

The bigger question is how the data would be stored:

Collect metrics in the cloud or a database server.
Collect the metrics in a single local file that gets checked into the project repository.
Collect the metrics in separate files that are checked into the project repository, to try and avoid conflicts. The build system will load all the files and merge them in memory on the fly.

Material Based

The AP will compile variants based on the shader option permutation that each .material needs.

Pros

Intuitive
Experimentation with the Loft scene showed that this approach has been an effective way to gain performance, even without considering system shader options. In fact, the same experiment showed that system shader options appear to have only a minor impact on performance.

Cons

Does not cover system shader options.
The AP doesn't have the features we need to efficiently generate these shader variants automatically. We can use a python script to scrape the project and generate shader variant lists, but that process isn't very user-friendly.
OR Implementing new AP features could be expensive, described below.

Design possibilities

Scan all material assets in the project to find every permutation of every material shader option for every shader, and automatically generate corresponding .shadervariantlist files. (This is basically changing the current GenerateShaderVariantListForMaterials script to operate on the entire project instead of just a single .shader file).
1. Maybe change .shadervariantlist to correspond to .materialtype files instead of corresponding to each .shader file, thus reducing the number of .shadervariantlists that are required.
If we had some new AP features, we could make the process more automatic and hide the shader variant lists from the user.
1. If the AP could skip producing duplicate assets: Each .material produces an intermediate .shadervariantlist file for the combination of options it needs. There will be many duplicates because many materials will need the same combination of shader options, but the AP will detect them and resolve to a single unique product. Thus there could be thousands of materials that together produce only dozens of variants.
2. Collective assets: The AP will find all the .material files and produce a single asset that contains a "database" describing all the variant requirements. When one of the .material files is updated, the AP doesn't need to scan and reload and re-combine every other .material. Instead, the builder can load only the .material that was updated and merge it in. Other builders would consume this central asset to compile the necessary shader variants.

Automatic Enumeration

The AP can loop through available shader options, exploring the permutation space and automatically selecting permutations to compile. This is best-suited for system options. We usually can't exhaustively enumerate the full permutation space because it's just too large, so we would have to employ one or more ways of filtering variants, through some central project configuration.

Pros

This could cover variants that are otherwise difficult to discover, either through a metrics based or material based approach.

Cons

Potential permutation space explosion.
Potential to compile variants that aren't actually used at runtime.
Could be difficult to configure well, to balance compile time and runtime performance.
I think in most cases, this only makes sense as an extension to a Metrics Based and/or Material Based approach.

Design possibilities

Only enumerate a particular subset of system shader options.
Use a dependency graph that describes the relationship between shader options, and eliminate combinations that would never be used at runtime.
Use a custom script with special logic for deciding important combinations of system shader options.
Utilize runtime metrics to focus on combinations that have been requested in the past. In this case, the metrics might be limited to just system shader options.

santorac · 2022-11-10T23:07:04Z

santorac
Nov 10, 2022
Collaborator Author

In the near-term, I propose we take this approach. It should be the easiest to implement, will address the immediate need:

Scan all material assets in the project to find every permutation of every material shader option for every shader, and automatically generate corresponding .shadervariantlist files. (This is basically changing the current GenerateShaderVariantListForMaterials script to operate on the entire project instead of just a single .shader file).

2 replies

galibzon Nov 11, 2022
Maintainer

I agree with this approach, but We'd need to associate the shadervariantlist with the MaterialType, which is also an interesting problem to solve because it would require intermediate merging as you pointed out already. Either We do the intermediate merging, or AP adds such feature.

santorac Nov 14, 2022
Collaborator Author

I'm not sure we'd need to associate with the .materialtype. I mean we could, but I think we could keep it based on the .shader file for now. In that case we would need to change the way that the runtime finds the shader variant tree asset, so the path doesn't necessarily need to match the way it does now.

jeremyong-az · 2022-11-15T05:40:51Z

jeremyong-az
Nov 15, 2022
Collaborator

Thanks @santorac for the great writeup! I wanted to add a few more thoughts to the discussion (much of which was brought up internally, but warrants mention in the broader community also).

Some thoughts on performance

When specializing shader code to a particular variant, performance characteristics change in discrete chunks. That is, while statically unrolling segments of code, performance of the shader might not change at all (or get slightly worse) for a period until it suddenly gets markedly better than before.

There are myriad reasons why this is the case:

VGPR allocations are done in discrete blocks (often 8-VGPR sized blocks)
The compiler itself is trying very hard to keep register counts low
The cost of the runtime branching might not have been severe to begin with

There are several cases where specializing a shader is known to have an appreciable impact

When the specialized shader avoids the use of an impactful instruction (clip, discard, depth modification, UAV write). In the examples given, avoiding the instruction ensures that early-z testing is possible even with the earlydepthstencil entry point attribute (which is not safe to just enable in the general case)
When the specialized shader avoids branches and code that resulted in unnecessary register allocation
When the specialized shader takes a fast path in the cases where evaluating what would have been dynamic branches is actually impactful (e.g. dependent reads, excessive long scoreboard stalls)
When the specialized shader is specialized for a particular set of hardware or console, being tuned to the ASIC's capabilities (e.g. wave32 vs wave64, usage of half-precision floats, preference for specific instructions known to perform better, usage of IHV extensions, usage of a more advanced shader model than the min spec, etc.)

Note that even in the second case, sometimes, scalarizing the shader can alleviate register allocation issues. Every shader that is specialized that doesn't fall in one of the above categories of improvement (or similar) actually hurts performance, resulting in at least:

One or more PSOs to compile offline and at runtime
One or more PSOs to swap between at runtime
Additional draws that cannot be batched in coalesced draw packets via DrawIndexedInstanced

What this means is that for shaders that expose tons of shader options poorly, even if we could snap our fingers and instantly generate the full power set of $2^n$ shader variants and dispatch them exactly, this may not be the outcome we want. In the worst case, every object in a scene is drawn with a different shader because of small material variations, and that is unlikely to perform much better than the lowest-common-denominator shader for those materials (especially when draw instancing is considered). In this respect, a "material crawl" or "metrics-based" system will absolutely be preferable to the third "unroll everything" approach, at least unless the state space of options to unroll is drastically reduced.

Some high level suggestions

It is very difficult for automated systems to, in general, automatically discover the discrete steps whereby shader specialization improves performance. Brute force would work technically speaking, but has enormous costs in both shader compilation time, rendering, and metrics gathering. A brute force approach is also brittle since any appreciable changes to shader code will likely change the outcome of the brute force search.

It is important to first, in my opinion, ensure that the shader options employed are options that are likely meaningful to the performance characteristics of the shader (as described above). A number of shader options currently implemented alter the material's behavior in ways that aren't likely important to the shader's performance (they could stay as a runtime branch forever and nobody would notice much).

As an alternative to the above, describing an allow-list of shader options that are allowed to be unrolled at build-time would effectively partition the set of shader options into those that remain strictly as runtime switches, to those that may ultimately affect the bound PSO. This list could also annotate of set of options that should always be unrolled.

Either way, before investing in more costly/expensive mechanisms of pruning the variant tree, I believe it's important to first make that variant tree as small as possible to start.

2 replies

siliconvoodoo Nov 24, 2022
Maintainer

It is my understanding that "always unroll" is the role of supervariants. But in general I agree with this remark. We should aim for a minimalist tree at first, because of the cost of swapping PSO.

As a helper we could leverage code analysis and try to figure out heuristics that would give weights to "potential impact" to options. Counting the number of instructions impacted under a if block, for example. Code analysis may also extract a "soft dependency order".

Say:

if (optionA)
{
   for (int i = 0; i < optionB; ++i) switch(optionC){ case A: doA(i); case B: doB(i); };
}
if (optionD) {..}

We could extract a dependency graph.
OptionA <- OptionB <- OptionC
OptionB <- OptionC
OptionC
OptionD

It doesn't even have to be correct, just some best guess. That can serve another heuristic (e.g. with access to register counts, material metadata, user hints...) that would prune the tree for an "automatic first shot" presented as starting variantlist.

jeremyong-az Nov 25, 2022
Collaborator

Yea this sort of dependency analysis was actually discussed a bit at a meeting a few months ago. I suggested it and think it could be a valuable tool, albeit possibly difficult to update/implement and maintain. The harder bits in terms of implementation may be that options can technically result in other dependent variables being set which then might interact with arbitrary shader instructions I imagine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shader Variant Management Ideas #92

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Shader Variant Management Ideas #92

santorac Nov 10, 2022 Collaborator

High Level Approaches

Metrics Based

Pros

Cons

Design possibilities

Material Based

Pros

Cons

Design possibilities

Automatic Enumeration

Pros

Cons

Design possibilities

Replies: 2 comments · 4 replies

santorac Nov 10, 2022 Collaborator Author

galibzon Nov 11, 2022 Maintainer

santorac Nov 14, 2022 Collaborator Author

jeremyong-az Nov 15, 2022 Collaborator

Some thoughts on performance

Some high level suggestions

siliconvoodoo Nov 24, 2022 Maintainer

jeremyong-az Nov 25, 2022 Collaborator

santorac
Nov 10, 2022
Collaborator

Replies: 2 comments 4 replies

santorac
Nov 10, 2022
Collaborator Author

galibzon Nov 11, 2022
Maintainer

santorac Nov 14, 2022
Collaborator Author

jeremyong-az
Nov 15, 2022
Collaborator

siliconvoodoo Nov 24, 2022
Maintainer

jeremyong-az Nov 25, 2022
Collaborator