Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Construct OpAccessChains for RawBufferLoad properly, or overhaul the API #4986

Closed
devshgraphicsprogramming opened this issue Jan 31, 2023 · 18 comments
Labels
spirv Work related to SPIR-V

Comments

@devshgraphicsprogramming
Copy link

devshgraphicsprogramming commented Jan 31, 2023

struct Test {
    float4 mem1;
    float mem2;
    int mem3;
};

Glslang with GL_EXT_buffer_reference

%23 = OpAccessChain %_ptr_Uniform__ptr_PhysicalStorageBuffer_Test_r %_ %int_0
%24 = OpLoad %_ptr_PhysicalStorageBuffer_Test_r %23
%27 = OpAccessChain %_ptr_PhysicalStorageBuffer_float %24 %int_0 %int_1
%28 = OpLoad %float %27 Aligned 4

DXC with vk::RawBufferLoad<Test>(addr).mem2

%36 = OpBitcast %_ptr_PhysicalStorageBuffer_Test %35
%37 = OpLoad %Test %36 Aligned 4
%38 = OpCompositeExtract %float %37 1
    OpStore %30 %38

Now imagine I declared a large array for histogram or something as a member variable.

This is probably a dup of some issue (searched but couldn't find it in the ocean of open SPIR-V issues), since I'm certainly not the first person to discover this.

My suggestion to fix this is

namespace vk
{

namespace impl
{

// you'll need to magically generate and instantiate internally within Clang when you encounter it
extern template<typename T, typename... Parents>
struct AccessChain;
/*
{
public:
   // work out the OpAccesChain and then actually emit the OpLoad
   operator T() const;
   
   // all members redeclared but with types exchanged from U to AccessChain<U,T,Parents...>
};
*/

}

template<typename T>
AccessChain<T> RawBufferLoad(uint64_t addr);
}

or whatever else you're able to craft with the AST privilaged access you have, such as working out the OpAccessChain while looking at AST branch the . member operators will emit.

@natevm
Copy link
Contributor

natevm commented Feb 1, 2023

I don’t see what the issue with casting is. It is within the SPIR-V specification that OpBitcast can be used to convert integers into pointers.

Further, I see correct runtime behavior with both raw buffer load and store, on Intel, NVIDIA and AMD GPUs.

Can you articulate what exactly the advantage would be in switching from the bitcast to the OpAccessChain pattern here?

@devshgraphicsprogramming
Copy link
Author

devshgraphicsprogramming commented Feb 1, 2023

The OpBitcast isn't an issue, its about the last OpLoad

You will see correct behaviour*, but its up to the Driver SPIR-V to internal IR lowering compiler whether these loads and stores will get optimized.

This is functionally identical to

GLSL

Test* v36;
float v38;

Test v37 = *v36;
v38 = v37.mem2;

HLSL

Test* v24;
float v28;

v28 = v24->mem2;

*Now imagine I wanted to do a read-modify write on a coherent/volatile pointer.
*Also I certainly wouldn't be able to perform any atomics on mem2 with this syntax.

@natevm
Copy link
Contributor

natevm commented Feb 1, 2023

Wrt atomics, I’ve outlined in my issue how such behavior would work, and it certainly would work if the SPIR-V spec is to be believed… Simply use the OpAtomicCompareExchange instruction…

With a static single assignment compiler like LLVM, I don’t see why these cases couldn’t be trivially optimized.

@devshgraphicsprogramming
Copy link
Author

Wrt atomics, I’ve outlined in my issue how such behavior would work, and it certainly would work if the SPIR-V spec is to be believed… Simply use the OpAtomicCompareExchange instruction…

With a single static assignment compiler like LLVM, I don’t see why these cases couldn’t be trivially optimized.

yes they can be optimized, but the SPIR-V can't be, you'll be doing an OpLoad and an OpStore on Test with DXC.

Also for atomics you just simply wouldn't be able to perform one within the confines of the current AST/compiler, without doing your own pointer arithmetic to find the uint address.

Which wont get optimized away if someone throws coherent/volatile into the mix on them.

Anyway I see that this issue should be a part of #4289

@natevm
Copy link
Contributor

natevm commented Feb 1, 2023

Currently users do their own arithmetic on addresses with the raw buffer calls, as HLSL does not support pointer types.

I’m confused by what you mean this can’t be done re atomics. I can quite literally emit the SPIR-V described in my issue and the atomic compare and exchange works just fine on a raw device address to an integer.

@pow2clk pow2clk added the spirv Work related to SPIR-V label Feb 9, 2023
@kevyuu
Copy link

kevyuu commented Feb 24, 2023

Is this related to this issue ? vk::RawBufferLoad for some reason compiles differently from ByteAddressBuffer.Load in dxc.

@devshgraphicsprogramming
Copy link
Author

Is this related to this issue ? vk::RawBufferLoad for some reason compiles differently from ByteAddressBuffer.Load in dxc.

There's also this #4620

@devshgraphicsprogramming
Copy link
Author

There's actually this repo from LunarG which proposes a [in my opinion beautiful] syntax change
Devsh-Graphics-Programming/DirectXShaderCompiler@main...greg-lunarg:DirectXShaderCompiler:bda5

Which will actually emit the proper OpAccessChains when compiled to SPIR-V

@natevm
Copy link
Contributor

natevm commented Feb 24, 2023

There's actually this repo from LunarG which proposes a [in my opinion beautiful] syntax change Devsh-Graphics-Programming/[email protected]:DirectXShaderCompiler:bda5

Which will actually emit the proper OpAccessChains when compiled to SPIR-V

What’s the proposed syntax change? Doing a quick glance at the link you provided, I can’t seem to find it…

@natevm
Copy link
Contributor

natevm commented Feb 24, 2023

Is this related to this issue ? vk::RawBufferLoad for some reason compiles differently from ByteAddressBuffer.Load in dxc.

I’ve ran into a similar issue storing structs of data to memory, but have found it depends on the GPU vendor. Intel Arc properly handles structure storage with the current (non-opchain) approach, but NVIDIA produces the errors described in the issue you mention, and AMD crashes altogether.

I worry opchains aren’t going to fix driver bugs. Maybe storing fields one by one in a struct would solve some of these issues, but I’ve also ran into a crash on AMD when attempting to store float4x4’s and float3x4s.

@devshgraphicsprogramming
Copy link
Author

There's actually this repo from LunarG which proposes a [in my opinion beautiful] syntax change Devsh-Graphics-Programming/[email protected]:DirectXShaderCompiler:bda5
Which will actually emit the proper OpAccessChains when compiled to SPIR-V

What’s the proposed syntax change? Doing a quick glance at the link you provided, I can’t seem to find it…

I was given some examples privately by the author, but would need to check if its okay to share them.

@greg-lunarg
Copy link
Contributor

Yes, by all means, feel free to share. All the samples are sharable.

One is derived from Baldur Karlson's demo of VK_EXT_Buffer_Address in his RenderDoc repo. He used a GLSL shader. I tried to keep my HLSL-based design close to what is in GLSL.

@greg-lunarg
Copy link
Contributor

greg-lunarg commented Feb 24, 2023

I am close to trying to merge that branch. I just have one more detail to implement: the buffer reference alignment.

@devshgraphicsprogramming
Copy link
Author

[[vk::buffer_ref(16)]] typedef struct Globals_s
{
      float4 g_vSomeConstantA;
      float4 g_vTestFloat4;
      float4 g_vSomeConstantB;
} Globals_t;

struct TestPushConstant_t
{
      Globals_t m_nBufferDeviceAddress;
};

[[vk::push_constant]] TestPushConstant_t g_PushConstants;

float4 MainPs(void) : SV_Target0
{
      float4 vTest = g_PushConstants.m_nBufferDeviceAddress.g_vTestFloat4;
      return vTest;
}

produces:

; SPIR-V
; Version: 1.6
; Generator: Google spiregg; 0
; Bound: 30
; Schema: 0
               OpCapability Shader
               OpCapability PhysicalStorageBufferAddresses
               OpExtension "SPV_KHR_physical_storage_buffer"
               OpMemoryModel PhysicalStorageBuffer64 GLSL450
               OpEntryPoint Fragment %MainPs "MainPs" %out_var_SV_Target0 %g_PushConstants
               OpExecutionMode %MainPs OriginUpperLeft
               OpSource HLSL 600
               OpName %type_PushConstant_TestPushConstant_t "type.PushConstant.TestPushConstant_t"
               OpMemberName %type_PushConstant_TestPushConstant_t 0 "m_nBufferDeviceAddress"
               OpName %Globals_s "Globals_s"
               OpMemberName %Globals_s 0 "g_vSomeConstantA"
               OpMemberName %Globals_s 1 "g_vTestFloat4"
               OpMemberName %Globals_s 2 "g_vSomeConstantB"
               OpName %g_PushConstants "g_PushConstants"
               OpName %out_var_SV_Target0 "out.var.SV_Target0"
               OpName %MainPs "MainPs"
               OpName %src_MainPs "src.MainPs"
               OpName %bb_entry "bb.entry"
               OpName %vTest "vTest"
               OpDecorate %out_var_SV_Target0 Location 0
               OpMemberDecorate %Globals_s 0 Offset 0
               OpMemberDecorate %Globals_s 1 Offset 16
               OpMemberDecorate %Globals_s 2 Offset 32
               OpMemberDecorate %type_PushConstant_TestPushConstant_t 0 Offset 0
               OpDecorate %type_PushConstant_TestPushConstant_t Block
        %int = OpTypeInt 32 1
      %int_0 = OpConstant %int 0
      %int_1 = OpConstant %int 1
      %float = OpTypeFloat 32
    %v4float = OpTypeVector %float 4
  %Globals_s = OpTypeStruct %v4float %v4float %v4float
%_ptr_PhysicalStorageBuffer_Globals_s = OpTypePointer PhysicalStorageBuffer %Globals_s
%type_PushConstant_TestPushConstant_t = OpTypeStruct %_ptr_PhysicalStorageBuffer_Globals_s
%_ptr_PushConstant_type_PushConstant_TestPushConstant_t = OpTypePointer PushConstant %type_PushConstant_TestPushConstant_t
%_ptr_Output_v4float = OpTypePointer Output %v4float
       %void = OpTypeVoid
         %15 = OpTypeFunction %void
         %19 = OpTypeFunction %v4float
%_ptr_Function_v4float = OpTypePointer Function %v4float
%_ptr_PushConstant__ptr_PhysicalStorageBuffer_Globals_s = OpTypePointer PushConstant %_ptr_PhysicalStorageBuffer_Globals_s
%_ptr_PhysicalStorageBuffer_v4float = OpTypePointer PhysicalStorageBuffer %v4float
%g_PushConstants = OpVariable %_ptr_PushConstant_type_PushConstant_TestPushConstant_t PushConstant
%out_var_SV_Target0 = OpVariable %_ptr_Output_v4float Output
     %MainPs = OpFunction %void None %15
         %16 = OpLabel
         %17 = OpFunctionCall %v4float %src_MainPs
               OpStore %out_var_SV_Target0 %17
               OpReturn
               OpFunctionEnd
 %src_MainPs = OpFunction %v4float None %19
   %bb_entry = OpLabel
      %vTest = OpVariable %_ptr_Function_v4float Function
         %24 = OpAccessChain %_ptr_PushConstant__ptr_PhysicalStorageBuffer_Globals_s %g_PushConstants %int_0
         %25 = OpLoad %_ptr_PhysicalStorageBuffer_Globals_s %24
         %27 = OpAccessChain %_ptr_PhysicalStorageBuffer_v4float %25 %int_1
         %28 = OpLoad %v4float %27 Aligned 16
               OpStore %vTest %28
         %29 = OpLoad %v4float %vTest
               OpReturnValue %29
               OpFunctionEnd

@devshgraphicsprogramming
Copy link
Author

Here are other samples Greg sent me...

other_samples.zip

@devshgraphicsprogramming
Copy link
Author

@greg-lunarg do you have any tests & examples of SPIR-V emitted if I try to do something funky like atomics?

P.S. Also where would I put my globallycoherent decoration? on the typedef before the [[vk::buffer_ref]] or somewhere else?

@greg-lunarg
Copy link
Contributor

I haven't looked at atomics or StructuredBuffers yet. If you can point me to some sample shaders and what you want to do I might be able to give some ideas.

@devshgraphicsprogramming
Copy link
Author

I'm going to withdraw my suggestion to fix this with

template<typename T>
AccessChain<T> RawBufferLoad(uint64_t addr);

as it looks pretty ironed out in proposal microsoft/hlsl-specs#84 and we just need the proposal implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spirv Work related to SPIR-V
Projects
None yet
Development

No branches or pull requests

6 participants