Construct `OpAccessChains` for `RawBufferLoad` properly, or overhaul the API #4986

devshgraphicsprogramming · 2023-01-31T23:57:04Z

struct Test {
    float4 mem1;
    float mem2;
    int mem3;
};

Glslang with GL_EXT_buffer_reference

%23 = OpAccessChain %_ptr_Uniform__ptr_PhysicalStorageBuffer_Test_r %_ %int_0
%24 = OpLoad %_ptr_PhysicalStorageBuffer_Test_r %23
%27 = OpAccessChain %_ptr_PhysicalStorageBuffer_float %24 %int_0 %int_1
%28 = OpLoad %float %27 Aligned 4

DXC with vk::RawBufferLoad<Test>(addr).mem2

%36 = OpBitcast %_ptr_PhysicalStorageBuffer_Test %35
%37 = OpLoad %Test %36 Aligned 4
%38 = OpCompositeExtract %float %37 1
    OpStore %30 %38

Now imagine I declared a large array for histogram or something as a member variable.

This is probably a dup of some issue (searched but couldn't find it in the ocean of open SPIR-V issues), since I'm certainly not the first person to discover this.

My suggestion to fix this is

namespace vk
{

namespace impl
{

// you'll need to magically generate and instantiate internally within Clang when you encounter it
extern template<typename T, typename... Parents>
struct AccessChain;
/*
{
public:
   // work out the OpAccesChain and then actually emit the OpLoad
   operator T() const;
   
   // all members redeclared but with types exchanged from U to AccessChain<U,T,Parents...>
};
*/

}

template<typename T>
AccessChain<T> RawBufferLoad(uint64_t addr);
}

or whatever else you're able to craft with the AST privilaged access you have, such as working out the OpAccessChain while looking at AST branch the . member operators will emit.

The text was updated successfully, but these errors were encountered:

natevm · 2023-02-01T00:24:56Z

I don’t see what the issue with casting is. It is within the SPIR-V specification that OpBitcast can be used to convert integers into pointers.

Further, I see correct runtime behavior with both raw buffer load and store, on Intel, NVIDIA and AMD GPUs.

Can you articulate what exactly the advantage would be in switching from the bitcast to the OpAccessChain pattern here?

devshgraphicsprogramming · 2023-02-01T01:35:04Z

The OpBitcast isn't an issue, its about the last OpLoad

You will see correct behaviour*, but its up to the Driver SPIR-V to internal IR lowering compiler whether these loads and stores will get optimized.

This is functionally identical to

GLSL

Test* v36;
float v38;

Test v37 = *v36;
v38 = v37.mem2;

HLSL

Test* v24;
float v28;

v28 = v24->mem2;

*Now imagine I wanted to do a read-modify write on a coherent/volatile pointer.
*Also I certainly wouldn't be able to perform any atomics on mem2 with this syntax.

natevm · 2023-02-01T01:49:31Z

Wrt atomics, I’ve outlined in my issue how such behavior would work, and it certainly would work if the SPIR-V spec is to be believed… Simply use the OpAtomicCompareExchange instruction…

With a static single assignment compiler like LLVM, I don’t see why these cases couldn’t be trivially optimized.

devshgraphicsprogramming · 2023-02-01T01:53:04Z

Wrt atomics, I’ve outlined in my issue how such behavior would work, and it certainly would work if the SPIR-V spec is to be believed… Simply use the OpAtomicCompareExchange instruction…

With a single static assignment compiler like LLVM, I don’t see why these cases couldn’t be trivially optimized.

yes they can be optimized, but the SPIR-V can't be, you'll be doing an OpLoad and an OpStore on Test with DXC.

Also for atomics you just simply wouldn't be able to perform one within the confines of the current AST/compiler, without doing your own pointer arithmetic to find the uint address.

Which wont get optimized away if someone throws coherent/volatile into the mix on them.

Anyway I see that this issue should be a part of #4289

natevm · 2023-02-01T01:57:46Z

Currently users do their own arithmetic on addresses with the raw buffer calls, as HLSL does not support pointer types.

I’m confused by what you mean this can’t be done re atomics. I can quite literally emit the SPIR-V described in my issue and the atomic compare and exchange works just fine on a raw device address to an integer.

kevyuu · 2023-02-24T12:12:38Z

Is this related to this issue ? vk::RawBufferLoad for some reason compiles differently from ByteAddressBuffer.Load in dxc.

devshgraphicsprogramming · 2023-02-24T16:58:48Z

Is this related to this issue ? vk::RawBufferLoad for some reason compiles differently from ByteAddressBuffer.Load in dxc.

There's also this #4620

devshgraphicsprogramming · 2023-02-24T17:01:34Z

There's actually this repo from LunarG which proposes a [in my opinion beautiful] syntax change
Devsh-Graphics-Programming/DirectXShaderCompiler@main...greg-lunarg:DirectXShaderCompiler:bda5

Which will actually emit the proper OpAccessChains when compiled to SPIR-V

natevm · 2023-02-24T18:17:23Z

There's actually this repo from LunarG which proposes a [in my opinion beautiful] syntax change Devsh-Graphics-Programming/[email protected]:DirectXShaderCompiler:bda5

Which will actually emit the proper OpAccessChains when compiled to SPIR-V

What’s the proposed syntax change? Doing a quick glance at the link you provided, I can’t seem to find it…

natevm · 2023-02-24T18:18:46Z

Is this related to this issue ? vk::RawBufferLoad for some reason compiles differently from ByteAddressBuffer.Load in dxc.

I’ve ran into a similar issue storing structs of data to memory, but have found it depends on the GPU vendor. Intel Arc properly handles structure storage with the current (non-opchain) approach, but NVIDIA produces the errors described in the issue you mention, and AMD crashes altogether.

I worry opchains aren’t going to fix driver bugs. Maybe storing fields one by one in a struct would solve some of these issues, but I’ve also ran into a crash on AMD when attempting to store float4x4’s and float3x4s.

devshgraphicsprogramming · 2023-02-24T22:45:04Z

There's actually this repo from LunarG which proposes a [in my opinion beautiful] syntax change Devsh-Graphics-Programming/[email protected]:DirectXShaderCompiler:bda5
Which will actually emit the proper OpAccessChains when compiled to SPIR-V

What’s the proposed syntax change? Doing a quick glance at the link you provided, I can’t seem to find it…

I was given some examples privately by the author, but would need to check if its okay to share them.

greg-lunarg · 2023-02-24T23:17:42Z

Yes, by all means, feel free to share. All the samples are sharable.

One is derived from Baldur Karlson's demo of VK_EXT_Buffer_Address in his RenderDoc repo. He used a GLSL shader. I tried to keep my HLSL-based design close to what is in GLSL.

greg-lunarg · 2023-02-24T23:18:51Z

I am close to trying to merge that branch. I just have one more detail to implement: the buffer reference alignment.

devshgraphicsprogramming · 2023-02-24T23:21:11Z

[[vk::buffer_ref(16)]] typedef struct Globals_s
{
      float4 g_vSomeConstantA;
      float4 g_vTestFloat4;
      float4 g_vSomeConstantB;
} Globals_t;

struct TestPushConstant_t
{
      Globals_t m_nBufferDeviceAddress;
};

[[vk::push_constant]] TestPushConstant_t g_PushConstants;

float4 MainPs(void) : SV_Target0
{
      float4 vTest = g_PushConstants.m_nBufferDeviceAddress.g_vTestFloat4;
      return vTest;
}

produces:

; SPIR-V
; Version: 1.6
; Generator: Google spiregg; 0
; Bound: 30
; Schema: 0
               OpCapability Shader
               OpCapability PhysicalStorageBufferAddresses
               OpExtension "SPV_KHR_physical_storage_buffer"
               OpMemoryModel PhysicalStorageBuffer64 GLSL450
               OpEntryPoint Fragment %MainPs "MainPs" %out_var_SV_Target0 %g_PushConstants
               OpExecutionMode %MainPs OriginUpperLeft
               OpSource HLSL 600
               OpName %type_PushConstant_TestPushConstant_t "type.PushConstant.TestPushConstant_t"
               OpMemberName %type_PushConstant_TestPushConstant_t 0 "m_nBufferDeviceAddress"
               OpName %Globals_s "Globals_s"
               OpMemberName %Globals_s 0 "g_vSomeConstantA"
               OpMemberName %Globals_s 1 "g_vTestFloat4"
               OpMemberName %Globals_s 2 "g_vSomeConstantB"
               OpName %g_PushConstants "g_PushConstants"
               OpName %out_var_SV_Target0 "out.var.SV_Target0"
               OpName %MainPs "MainPs"
               OpName %src_MainPs "src.MainPs"
               OpName %bb_entry "bb.entry"
               OpName %vTest "vTest"
               OpDecorate %out_var_SV_Target0 Location 0
               OpMemberDecorate %Globals_s 0 Offset 0
               OpMemberDecorate %Globals_s 1 Offset 16
               OpMemberDecorate %Globals_s 2 Offset 32
               OpMemberDecorate %type_PushConstant_TestPushConstant_t 0 Offset 0
               OpDecorate %type_PushConstant_TestPushConstant_t Block
        %int = OpTypeInt 32 1
      %int_0 = OpConstant %int 0
      %int_1 = OpConstant %int 1
      %float = OpTypeFloat 32
    %v4float = OpTypeVector %float 4
  %Globals_s = OpTypeStruct %v4float %v4float %v4float
%_ptr_PhysicalStorageBuffer_Globals_s = OpTypePointer PhysicalStorageBuffer %Globals_s
%type_PushConstant_TestPushConstant_t = OpTypeStruct %_ptr_PhysicalStorageBuffer_Globals_s
%_ptr_PushConstant_type_PushConstant_TestPushConstant_t = OpTypePointer PushConstant %type_PushConstant_TestPushConstant_t
%_ptr_Output_v4float = OpTypePointer Output %v4float
       %void = OpTypeVoid
         %15 = OpTypeFunction %void
         %19 = OpTypeFunction %v4float
%_ptr_Function_v4float = OpTypePointer Function %v4float
%_ptr_PushConstant__ptr_PhysicalStorageBuffer_Globals_s = OpTypePointer PushConstant %_ptr_PhysicalStorageBuffer_Globals_s
%_ptr_PhysicalStorageBuffer_v4float = OpTypePointer PhysicalStorageBuffer %v4float
%g_PushConstants = OpVariable %_ptr_PushConstant_type_PushConstant_TestPushConstant_t PushConstant
%out_var_SV_Target0 = OpVariable %_ptr_Output_v4float Output
     %MainPs = OpFunction %void None %15
         %16 = OpLabel
         %17 = OpFunctionCall %v4float %src_MainPs
               OpStore %out_var_SV_Target0 %17
               OpReturn
               OpFunctionEnd
 %src_MainPs = OpFunction %v4float None %19
   %bb_entry = OpLabel
      %vTest = OpVariable %_ptr_Function_v4float Function
         %24 = OpAccessChain %_ptr_PushConstant__ptr_PhysicalStorageBuffer_Globals_s %g_PushConstants %int_0
         %25 = OpLoad %_ptr_PhysicalStorageBuffer_Globals_s %24
         %27 = OpAccessChain %_ptr_PhysicalStorageBuffer_v4float %25 %int_1
         %28 = OpLoad %v4float %27 Aligned 16
               OpStore %vTest %28
         %29 = OpLoad %v4float %vTest
               OpReturnValue %29
               OpFunctionEnd

devshgraphicsprogramming · 2023-02-24T23:22:06Z

Here are other samples Greg sent me...

other_samples.zip

devshgraphicsprogramming · 2023-02-24T23:24:13Z

@greg-lunarg do you have any tests & examples of SPIR-V emitted if I try to do something funky like atomics?

P.S. Also where would I put my globallycoherent decoration? on the typedef before the [[vk::buffer_ref]] or somewhere else?

greg-lunarg · 2023-02-25T00:06:09Z

I haven't looked at atomics or StructuredBuffers yet. If you can point me to some sample shaders and what you want to do I might be able to give some ideas.

devshgraphicsprogramming · 2023-09-08T13:39:57Z

I'm going to withdraw my suggestion to fix this with

template<typename T>
AccessChain<T> RawBufferLoad(uint64_t addr);

as it looks pretty ironed out in proposal microsoft/hlsl-specs#84 and we just need the proposal implemented.

devshgraphicsprogramming mentioned this issue Jan 31, 2023

[SPIR-V] Atomics can't be used with vk::RawBuffer commands #4983

Closed

natevm mentioned this issue Feb 1, 2023

Construct OpAccessChains for RawBufferLoad properly, or overhaul the API #4985

Closed

devshgraphicsprogramming mentioned this issue Feb 1, 2023

[SPIR-V] support loading "structured" data from a random device address #4289

Closed

pow2clk added the spirv Work related to SPIR-V label Feb 9, 2023

danginsburg mentioned this issue Mar 9, 2023

WIP: [SPIR-V] Add support for vk::buffer_ref attribute #5089

Closed

sudonatalie assigned sudonatalie and unassigned sudonatalie Aug 10, 2023

devshgraphicsprogramming closed this as completed Sep 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Construct `OpAccessChains` for `RawBufferLoad` properly, or overhaul the API #4986

Construct `OpAccessChains` for `RawBufferLoad` properly, or overhaul the API #4986

devshgraphicsprogramming commented Jan 31, 2023 •

edited

Loading

natevm commented Feb 1, 2023

devshgraphicsprogramming commented Feb 1, 2023 •

edited

Loading

natevm commented Feb 1, 2023 •

edited

Loading

devshgraphicsprogramming commented Feb 1, 2023

natevm commented Feb 1, 2023

kevyuu commented Feb 24, 2023 •

edited

Loading

devshgraphicsprogramming commented Feb 24, 2023

devshgraphicsprogramming commented Feb 24, 2023

natevm commented Feb 24, 2023

natevm commented Feb 24, 2023 •

edited

Loading

devshgraphicsprogramming commented Feb 24, 2023

greg-lunarg commented Feb 24, 2023

greg-lunarg commented Feb 24, 2023 •

edited

Loading

devshgraphicsprogramming commented Feb 24, 2023

devshgraphicsprogramming commented Feb 24, 2023

devshgraphicsprogramming commented Feb 24, 2023

greg-lunarg commented Feb 25, 2023

devshgraphicsprogramming commented Sep 8, 2023

Construct OpAccessChains for RawBufferLoad properly, or overhaul the API #4986

Construct OpAccessChains for RawBufferLoad properly, or overhaul the API #4986

Comments

devshgraphicsprogramming commented Jan 31, 2023 • edited Loading

natevm commented Feb 1, 2023

devshgraphicsprogramming commented Feb 1, 2023 • edited Loading

GLSL

HLSL

natevm commented Feb 1, 2023 • edited Loading

devshgraphicsprogramming commented Feb 1, 2023

natevm commented Feb 1, 2023

kevyuu commented Feb 24, 2023 • edited Loading

devshgraphicsprogramming commented Feb 24, 2023

devshgraphicsprogramming commented Feb 24, 2023

natevm commented Feb 24, 2023

natevm commented Feb 24, 2023 • edited Loading

devshgraphicsprogramming commented Feb 24, 2023

greg-lunarg commented Feb 24, 2023

greg-lunarg commented Feb 24, 2023 • edited Loading

devshgraphicsprogramming commented Feb 24, 2023

devshgraphicsprogramming commented Feb 24, 2023

devshgraphicsprogramming commented Feb 24, 2023

greg-lunarg commented Feb 25, 2023

devshgraphicsprogramming commented Sep 8, 2023

Construct `OpAccessChains` for `RawBufferLoad` properly, or overhaul the API #4986

Construct `OpAccessChains` for `RawBufferLoad` properly, or overhaul the API #4986

devshgraphicsprogramming commented Jan 31, 2023 •

edited

Loading

devshgraphicsprogramming commented Feb 1, 2023 •

edited

Loading

natevm commented Feb 1, 2023 •

edited

Loading

kevyuu commented Feb 24, 2023 •

edited

Loading

natevm commented Feb 24, 2023 •

edited

Loading

greg-lunarg commented Feb 24, 2023 •

edited

Loading