Replies: 2 comments 4 replies
-
In the near-term, I propose we take this approach. It should be the easiest to implement, will address the immediate need:
|
Beta Was this translation helpful? Give feedback.
-
Thanks @santorac for the great writeup! I wanted to add a few more thoughts to the discussion (much of which was brought up internally, but warrants mention in the broader community also). Some thoughts on performanceWhen specializing shader code to a particular variant, performance characteristics change in discrete chunks. That is, while statically unrolling segments of code, performance of the shader might not change at all (or get slightly worse) for a period until it suddenly gets markedly better than before. There are myriad reasons why this is the case:
There are several cases where specializing a shader is known to have an appreciable impact
Note that even in the second case, sometimes, scalarizing the shader can alleviate register allocation issues. Every shader that is specialized that doesn't fall in one of the above categories of improvement (or similar) actually hurts performance, resulting in at least:
What this means is that for shaders that expose tons of shader options poorly, even if we could snap our fingers and instantly generate the full power set of Some high level suggestionsIt is very difficult for automated systems to, in general, automatically discover the discrete steps whereby shader specialization improves performance. Brute force would work technically speaking, but has enormous costs in both shader compilation time, rendering, and metrics gathering. A brute force approach is also brittle since any appreciable changes to shader code will likely change the outcome of the brute force search. It is important to first, in my opinion, ensure that the shader options employed are options that are likely meaningful to the performance characteristics of the shader (as described above). A number of shader options currently implemented alter the material's behavior in ways that aren't likely important to the shader's performance (they could stay as a runtime branch forever and nobody would notice much). As an alternative to the above, describing an allow-list of shader options that are allowed to be unrolled at build-time would effectively partition the set of shader options into those that remain strictly as runtime switches, to those that may ultimately affect the bound PSO. This list could also annotate of set of options that should always be unrolled. Either way, before investing in more costly/expensive mechanisms of pruning the variant tree, I believe it's important to first make that variant tree as small as possible to start. |
Beta Was this translation helpful? Give feedback.
-
Currently we can have a .shadervariantlist file for each .shader, which simply lists every variant that should be compiled for that shader. The current process is explained pretty well here: https://github.com/o3de/o3de/wiki/%5BAtom%5D-Shader-Management-Console-(SMC)
This process has some gaps, is difficult to use, the workflow will not scale well to dozens of shaders and thousands of variants, and the problem is about to get worse as we introduce the material pipeline feature (see https://github.com/o3de/sig-graphics-audio/blob/main/rfcs/MaterialPipelineAbstraction.md).
In this document we are mainly concerned with material shaders, especially the forward lighting passes, as those tends to have the most VGPR pressure. Different strategies might be necessary for non-material shader variants or if a deferred render pipeline is being used rather than the forward+ pipeline that ships with o3de.
High Level Approaches
Note there are two families of shader options:
There are three high level approaches that we can take.
There are many ways we could combine these different approaches to augment each other and cover a wider range of use cases. For example, the pipeline could start with a .material file to find values for the material shader options, then combine this with metrics about system shader options that have been recently used at runtime, and also enumerate the permutation space for a couple known high-impact shader options.
Keep in mind that the shader variant system supports fallback to a common "root" variant that reads shader option values through a constant buffer. So if a requested shader variant is not found at runtime, the system will always be able to use the root variant to get the exact same visual result, just with potentially lower performance. The shader build pipeline should ensure there are enough baked shader variants available, so the root variant is used infrequently enough to have negligible impact on overall performance.
Metrics Based
The runtime will collect data about what variants are being requested, including both material shader options and system shader options. The AP will compile variants based on the history of what has been requested in the past.
Pros
Cons
Design possibilities
For each requested variant, we would probably want to store:
The bigger question is how the data would be stored:
Material Based
The AP will compile variants based on the shader option permutation that each .material needs.
Pros
Cons
Design possibilities
Automatic Enumeration
The AP can loop through available shader options, exploring the permutation space and automatically selecting permutations to compile. This is best-suited for system options. We usually can't exhaustively enumerate the full permutation space because it's just too large, so we would have to employ one or more ways of filtering variants, through some central project configuration.
Pros
Cons
Design possibilities
Beta Was this translation helpful? Give feedback.
All reactions