- Proposal: 0025
- Author(s): Anupama Chandrasekhar, Mike Apodaca
- Sponsor: Damyan Pepper
- Status: Under Consideration
This specification describes the HLSL and DXIL details for a new NodeArrayOutput attribute [MaxRecordsPerNode(count)]
that specifies the maximum number of records that can be output to a specific node in a node output array. See the MaxRecordsPerNode specifications for more details.
For NodeArrayOutput
, the node output attribute [MaxRecords(count)]
specifies the maximum number of records that can
be output across the entire node array. This attribute alone is insufficient for determining how records are
distributed across an output array. For example, consider an output node array specification of
[MaxRecords(N)][NodeArraySize(N)]
. All N records could be sent to one node in the array, or one record could be
sent to each of the N nodes in the array, or the records could be spread in an arbitrary fashion across multiple nodes
in the array. An implementation cannot distinguish these different use cases.
When determining backing store memory requirements, an implementation must assume the worst-case of MaxRecords
written
to any single node in the output array. However, a common use-case is for a small number records to be written to
select nodes in a very large array of nodes. Some implementations can take advantage of this knowledge to significantly
reduce the backing store memory requirements while maintaining peak performance.
We propose a new node output attribute called MaxRecordsPerNode
. This parameter is only required for output node
arrays. This attribute specifies the maximum number of records that can be written to any single output node within a
node array.
Add a new node output attribute:
Attribute | Required | Description |
---|---|---|
[MaxRecordsPerNode(count)] |
Y | For NodeArrayOutput , specifies the maximum number of records that can be output to a node within the array. Exceeding this results in undefined behavior. This attribute can be overridden via the NumOutputOverrides / pOutputOverrides option when constructing a work graph. This attribute has no impact on existing node output limits. |
This attribute will be required starting with a future Shader Model version.
Since this may cause compilation failures with existing Work Graphs, this will
be a DefaultError
warning assigned to a warning group named
hlsl-require-max-records-per-node
to allow a command-line override.
The value of MaxRecordsPerNode
will be set equal to MaxRecords
.
The compiler will also generate an error if the MaxRecordsPerNode
value is greater than the MaxRecords
in a HLSL shader. Note that pMaxRecordsPerNode
may override this value and the runtime will validate the correctness in that case. See the feature spec for more details.
Developer's note: Implementations that do not support or ignore this attribute, will not be functionally impacted.
The following trivial example demonstrates using MaxRecordsPerNode
for a thread launch node which distributes
a single record across an array of 64 consumer thread launch nodes.
[Shader("node")]
[NodeLaunch("thread")]
[NodeIsProgramEntry]
void DispatchNode(
[MaxRecords(64)] // a maximum of 64 records are written to output node array,
[MaxRecordsPerNode(1)] // but only 1 record is written to each node in the array
[NodeArraySize(64)] NodeOutputArray<RECORD> ConsumerNodes )
{
[unroll] for(uint i = 0; i < 64; ++i)
{
ThreadNodeOutputRecords<RECORD> outputRecord = ConsumerNodes[i].GetThreadNodeOutputRecords(1);
...
outputRecord.OutputComplete();
}
}
As mentioned above, some material shading algorithms have a similar pattern: a single node which makes a decision about which node(s) in a node array (materials) to execute, where the number of possible materials is large, but the number of records submitted to any specific node is small, relative to the size of the array.
A new metadata tag is added for MaxRecordsPerNode.
Tag | Tag Encoding | Value Type | Default |
---|---|---|---|
kDxilNodeMaxRecordsPerNodeTag | 7 |
i32 |
Required, See HLSL Additions section for backward compatibility with older Shader Models |
The MaxRecordsPerNode
information will be captured to RDAT. Similar to other Node attributes, add a RDAT::NodeAttribKind
named MaxRecordsPerNode
.
Modify the definition for MaxRecords
node output attribute:
attribute | required | description |
---|---|---|
[MaxRecords(count, maxRecordsPerNode)] |
Y (this or below attribute) | Given uint count declaration, the thread group can output 0...count records to this output. The variant with maxRecordsPerNode is required for NodeArrayOutput , where count applies across all the output nodes in the array and maxRecordsPerNode specifies the maximum number of records that can be written to a single output node within the array. Exceeding these limits results in undefined behavior. The value of maxRecordsPerNode must be less-than or equal to the value of count . These attributes can be overridden via the NumOutputOverrides / pOutputOverrides option when constructing a work graph as part of the definition of a node. See Node output limits. |
Note: if the specification is MaxRecords(count, maxRecordsPerNode)
, then multiple outputs that share budget using
MaxRecordsSharedWith
must also share the same value for maxRecordsPerNode
. While in many cases this might be
correct, this locks this requirement into the spec and restricts an implementation's ability to distinguish cases where
they are different. We therefore prefer the option of specifying MaxRecordsPerNode(count)
as a separate attribute.
This attribute could be made optional, for maximum backward compatibility; i.e. existing SM6.8 Work Graphs compile with
the newer Shader Model. When MaxRecordsPerNode
is not specified, the implicit value of MaxRecordsPerNode
is
equal to MaxRecords
. This also avoids redundant attribute specifications for those usage models where the values of
MaxRecords
and MaxRecordsPerNode
are identical. However, for performance reasons, this was made a required
attribute with a compiler fall back for backward compatibilty.