Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong LDA instruction generated for addresses passed into functions #21

Open
ghost opened this issue Sep 17, 2015 · 13 comments
Open

Wrong LDA instruction generated for addresses passed into functions #21

ghost opened this issue Sep 17, 2015 · 13 comments

Comments

@ghost
Copy link

ghost commented Sep 17, 2015

I'm using the latest HLC compiler from the branch hsail-review-v2.
Consider the following OpenCL program:

struct Foo
{
    int a;
    int b;
};

volatile int *DbgAddr = (int *)4;

__attribute__((noinline)) void Func(const struct Foo *f)
{
    DbgAddr[0] = f->a;
    DbgAddr[1] = f->b;
}

__kernel void Bug()
{
    struct Foo f = {1, 2};
    Func(&f);
}

This OpenCL program gets compiled into the following HSAIL code:

module &__llvm_hsail_module:1:0:$full:$large:$near;

decl function &Func()(arg_u64 %f);

function &Func()(arg_u64 %f)
{
    // BB#0:
    ld_arg_align(8)_u64 $d0, [%f];
    ld_align(4)_u32 $s0, [$d0];
    st_align(4)_u32 $s0, [4];
    ld_align(4)_u32 $s0, [$d0+4];
    st_align(8)_u32 $s0, [8];
    ret;
};

prog kernel &__OpenCL_Bug_kernel(
    kernarg_u64 %__global_offset_0,
    kernarg_u64 %__global_offset_1,
    kernarg_u64 %__global_offset_2,
    kernarg_u64 %__printf_buffer,
    kernarg_u64 %__vqueue_pointer,
    kernarg_u64 %__aqlwrap_pointer)
{
    align(8) private_u8 %__privateStack[8];
    // BB#0:
    st_private_align(4)_u32 2, [%__privateStack][4];
    st_private_align(8)_u32 1, [%__privateStack];
    lda_private_u32 $s0, [%__privateStack];
    cvt_u64_u32 $d0, $s0;
    {
        arg_u64 %f;
        st_arg_align(8)_u64 $d0, [%f];
        call    &Func () (%f);
    }
    ret;
};

In the kernel __OpenCL_Bug_kernel, the address of of f is obtained by using the lda_private_u32 instruction, which computes an address relative to the private segment. However, in the function Func, a flat ld instruction is used to read from that address.

Shouldn't either be the lda instruction flat, or the ld instruction private, so that the segments they refer to match?

@ghost
Copy link
Author

ghost commented Oct 22, 2015

Bump. Actually, I was using the latest branch hsail-stable-3.7.

@arsenm
Copy link
Contributor

arsenm commented Oct 24, 2015

This program is supposed to be rejected by the frontend and is for me with trunk clang:
/tmp/foo.cl:7:15: error: program scope variable must reside in global or constant address space
volatile int* DbgAddr = (int *)4;
^
1 error generated.

@arsenm
Copy link
Contributor

arsenm commented Oct 24, 2015

If I fix it to be global address space, I see an assert:

0 clang-3.7 0x00000001038984ee llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 46
1 clang-3.7 0x000000010389a199 PrintStackTraceSignalHandler(void_) + 25
2 clang-3.7 0x0000000103896739 llvm::sys::RunSignalHandlers() + 425
3 clang-3.7 0x000000010389a51e SignalHandler(int) + 366
4 libsystem_platform.dylib 0x00007fff93413f1a sigtramp + 26
5 libsystem_platform.dylib 0x00007fff5e688a50 sigtramp + 3408350032
6 clang-3.7 0x000000010262b97e llvm::HSAILAsmPrinter::printInitVarWithAddressPragma(llvm::StringRef, unsigned long long, llvm::MCExpr const
, unsigned int, llvm::raw_ostream&) + 318
7 clang-3.7 0x000000010262c767 llvm::HSAILAsmPrinter::printGVInitialValue(llvm::GlobalValue const&, llvm::Constant const
, llvm::DataLayout const&, llvm::raw_ostream&) + 1223
8 clang-3.7 0x000000010262cd08 llvm::HSAILAsmPrinter::EmitGlobalVariable(llvm::GlobalVariable const_) + 1320
9 clang-3.7 0x00000001026395e2 llvm::HSAILAsmPrinter::EmitStartOfAsmFile(llvm::Module&) + 866
10 clang-3.7 0x000000010442e177 llvm::AsmPrinter::doInitialization(llvm::Module&) + 551
11 clang-3.7 0x000000010308fb33 llvm::FPPassManager::doInitialization(llvm::Module&) + 99
12 clang-3.7 0x0000000103090401 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) + 1009
13 clang-3.7 0x000000010308fd6b llvm::legacy::PassManagerImpl::run(llvm::Module&) + 347
14 clang-3.7 0x0000000103091431 llvm::legacy::PassManager::run(llvm::Module&) + 33
15 clang-3.7 0x0000000103b64565 (anonymous namespace)::EmitAssemblyHelper::EmitAssembly(clang::BackendAction, llvm::raw_pwrite_stream_) + 1893
16 clang-3.7 0x0000000103b63612 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module_, clang::BackendAction, llvm::raw_pwrite_stream_) + 162
17 clang-3.7 0x0000000103e2eacb clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) + 1531
18 clang-3.7 0x00000001052d61ae clang::ParseAST(clang::Sema&, bool, bool) + 1294
19 clang-3.7 0x000000010438b8bf clang::ASTFrontendAction::ExecuteAction() + 511
20 clang-3.7 0x0000000103e2ce6f clang::CodeGenAction::ExecuteAction() + 6047
21 clang-3.7 0x000000010438ae20 clang::FrontendAction::Execute() + 112
22 clang-3.7 0x00000001042e2822 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 1602
23 clang-3.7 0x0000000104419710 clang::ExecuteCompilerInvocation(clang::CompilerInstance_) + 4416
24 clang-3.7 0x00000001015837f8 cc1_main(llvm::ArrayRef<char const*>, char const_, void_) + 4968
25 clang-3.7 0x000000010157311c ExecuteCC1Tool(llvm::ArrayRef<char const*>, llvm::StringRef) + 476
26 clang-3.7 0x0000000101570c65 main + 3285
27 libdyld.dylib 0x00007fff930935c9 start + 1

@arsenm
Copy link
Contributor

arsenm commented Oct 24, 2015

I have no idea what the expected behavior of pointer initializers with a constant is. I can fix the crash by printing the number in place of a symbol name, but initialization of a pointer is an undocumented extension pragma initvarwithaddress so I'm not sure what it's supposed to do in this case.

@arsenm
Copy link
Contributor

arsenm commented Oct 24, 2015

It seems to work for me emitting ld_privates

module &__llvm_hsail_module:1:0:$full:$large:$near;

prog alloc(agent) global_u32 &DbgAddr = 0;
pragma "initvarwithaddress:&DbgAddr:0:4:0:4";

decl prog function &Func()(arg_u32 %f);

decl prog function &Bug()();

    .data
prog function &Func()(arg_u32 %f)
{

// BB#0:                                // %entry
    ld_arg_align(4)_u32 $s0, [%f];
    ld_private_align(4)_u32 $s1, [$s0];
    ld_global_align(4)_u32  $s2, [&DbgAddr];
    st_private_align(4)_u32 $s1, [$s2];
    ld_private_align(4)_u32 $s0, [$s0+4];
    ld_global_align(4)_u32  $s1, [&DbgAddr];
    st_private_align(4)_u32 $s0, [$s1+4];
    ret;
};

    .text
    .data
prog function &Bug()()
{

    align(8) private_u8 %__privateStack[8];
// BB#0:                                // %entry
    st_private_align(8)_u64 8589934593, [%__privateStack];
    lda_private_u32 $s0, [%__privateStack];
    {
        arg_u32 %f;
        st_arg_align(4)_u32 $s0, [%f];
        call    &Func () (%f);
    }
    ret;
};

@arsenm
Copy link
Contributor

arsenm commented Oct 24, 2015

I've pushed a new hsail-review-v4 branch with the asserts fixed. I've also cherry picked the fix to the hsail-stable-3.7 branch and repeated it for the BRIGAsmPrinter. I'm not seeing the wrong load/store emitted though.

@arsenm arsenm closed this as completed Oct 24, 2015
@ghost
Copy link
Author

ghost commented Oct 26, 2015

Thank you very much for your support Mark!

I'm using the frontend from the bin directory in the master branch of the CLOC repository on Github. The kernel from my original post builds without errors with the -cl-std=CL2.0 flag set. This flag is set by default when using the cloc.sh script to compile kernels. When I use clc2 directly with -cl-std=CL1.2, I'm getting the error you mentioned as well.

IIRC, the reason for using the global variable DbgAddr was the following crash, encountered when I tried to strip down the original OpenCL kernel in which I found the reported bug.

struct Foo
{
    int a;
    int b;
};

__attribute__((noinline))
void Func(const struct Foo *f, volatile __global int *dstMem)
{
    dstMem[0] = f->a;
    dstMem[1] = f->b;
}

__kernel void Bug(volatile __global int *dstMem)
{
    struct Foo f = {1, 2};
    Func(&f, dstMem);
}

When compiling this kernel with cloc.sh I get the following crash:

ERROR:  The following command failed with return code 134.
        ./llc -O2 -march=hsail64 -filetype=asm -o /tmp/cloc3601/temp.hsail /tmp/cloc3601/temp.opt.bc

llc: /home/dgeier/Documents/swarm64/code/hsa/HLC-HSAIL-Development-LLVM/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:3346: llvm::SDValue llvm::SelectionDAG::getNode(unsigned int, llvm::SDLoc, llvm::EVT, llvm::SDValue, llvm::SDValue, const llvm::SDNodeFlags*): Assertion `N1.getValueType() == N2.getValueType() && N1.getValueType() == VT && "Binary operator types must match!"' failed.
0  llc             0x0000000002234860 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 59
1  llc             0x0000000002234bb4
2  llc             0x000000000223363f
3  libpthread.so.0 0x00007f6f083f9d10
4  libc.so.6       0x00007f6f0760c267 gsignal + 55
5  libc.so.6       0x00007f6f0760deca abort + 362
6  libc.so.6       0x00007f6f0760503d
7  libc.so.6       0x00007f6f076050f2
8  llc             0x00000000020912e5 llvm::SelectionDAG::getNode(unsigned int, llvm::SDLoc, llvm::EVT, llvm::SDValue, llvm::SDValue, llvm::SDNodeFlags const*) + 2285
9  llc             0x00000000017e2b5f llvm::HSAILTargetLowering::getArgStore(llvm::SelectionDAG&, llvm::SDLoc, llvm::EVT, llvm::Type*, unsigned int, llvm::SDValue, llvm::SDValue, llvm::SDValue, unsigned int, llvm::SDValue, llvm::AAMDNodes const&, unsigned long) const + 741
10 llc             0x00000000017e35cb llvm::HSAILTargetLowering::LowerArgument(llvm::SDValue, llvm::SDValue, bool, llvm::SmallVectorImpl<llvm::ISD::InputArg> const*, llvm::SmallVectorImpl<llvm::ISD::OutputArg> const*, llvm::SDLoc, llvm::SelectionDAG&, llvm::SmallVectorImpl<llvm::SDValue>*, unsigned int&, llvm::Type*, unsigned int, char const*, llvm::SDValue, llvm::SmallVectorImpl<llvm::SDValue> const*, bool, llvm::AAMDNodes const&, unsigned long) const + 2131
11 llc             0x00000000017e4c63 llvm::HSAILTargetLowering::LowerCall(llvm::TargetLowering::CallLoweringInfo&, llvm::SmallVectorImpl<llvm::SDValue>&) const + 4339
12 llc             0x00000000020e5d36 llvm::TargetLowering::LowerCallTo(llvm::TargetLowering::CallLoweringInfo&) const + 4450
13 llc             0x00000000020dc470 llvm::SelectionDAGBuilder::lowerInvokable(llvm::TargetLowering::CallLoweringInfo&, llvm::MachineBasicBlock*) + 444
14 llc             0x00000000020dc9ef llvm::SelectionDAGBuilder::LowerCallTo(llvm::ImmutableCallSite, llvm::SDValue, bool, llvm::MachineBasicBlock*) + 951
15 llc             0x00000000020df0d6 llvm::SelectionDAGBuilder::visitCall(llvm::CallInst const&) + 2294
16 llc             0x00000000020be464 llvm::SelectionDAGBuilder::visit(unsigned int, llvm::User const&) + 1256
17 llc             0x00000000020bded7 llvm::SelectionDAGBuilder::visit(llvm::Instruction const&) + 155
18 llc             0x000000000210bfc5 llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator<llvm::Instruction const>, llvm::ilist_iterator<llvm::Instruction const>, bool&) + 83
19 llc             0x000000000210f35e llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 2866
20 llc             0x000000000210b1ef llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 1049
21 llc             0x00000000017d2f97
22 llc             0x0000000001a805ab llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 95
23 llc             0x0000000001e0cc2b llvm::FPPassManager::runOnFunction(llvm::Function&) + 305
24 llc             0x0000000001e0cdca llvm::FPPassManager::runOnModule(llvm::Module&) + 112
25 llc             0x0000000001e0d142
26 llc             0x0000000001e0d822 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 252
27 llc             0x0000000001e0da5b llvm::legacy::PassManager::run(llvm::Module&) + 39
28 llc             0x0000000000cb0376
29 llc             0x0000000000caf30a main + 257
30 libc.so.6       0x00007f6f075f7a40 __libc_start_main + 240
31 llc             0x0000000000cae049 _start + 41
Stack dump:
0.  Program arguments: ./llc -O2 -march=hsail64 -filetype=asm -o /tmp/cloc3601/temp.hsail /tmp/cloc3601/temp.opt.bc 
1.  Running pass 'Function Pass Manager' on module '/tmp/cloc3601/temp.opt.bc'.
2.  Running pass 'HSAIL DAG->DAG Instruction Selection' on function '@__OpenCL_Bug_kernel'
/opt/amd/bin/cloc.sh: line 333:  3625 Aborted                 (core dumped) $HSA_LLVM_PATH/$CMD_LLC -o $TMPDIR/$FNAME.hsail $TMPDIR/$FNAME.opt.bc

@ghost
Copy link
Author

ghost commented Oct 26, 2015

The reason why the kernel compiles successfully with --cl-std=CL2.0 is, that OpenCL 2.0 supports a generic address space, which changes the requirements for tagging the pointers with __global, __constant and so on. See https://software.intel.com/en-us/articles/the-generic-address-space-in-opencl-20.

@arsenm
Copy link
Contributor

arsenm commented Oct 26, 2015

It does not compile successfully with OpenCL 2.0. Global variables still need to have a specified address space, they can't be generic.

@ghost
Copy link
Author

ghost commented Oct 27, 2015

I pulled your changes, but it doesn't fix the issue for me.

You're right. My argument doesn't hold for variables with program scope. Though, the OpenCL C Specification (Version 2.0) says in section 6.5:

The address space for a variable at program scope or a static variable inside a function can either be __global or __constant, but defaults to __global if not specified.

So it should compile. Anyways,if your compiler doesn't like it, just change the declaration to volatile __global int *DbgAddr = (__global int *)4; or use the front-end (clc2) from the master branch of the CLOC repository.

As I wrote in my previous post, the original version of the kernel exhibiting this bug doesn't use any pointers in program scope, but a pointer passed in as kernel argument (see kernel above). Unfortunately, the code generator crashes when I try to compile this kernel. It would be nice if this bug could also be investigated on.

What about updating the front-end binaries in the CLOC repository?

Thanks for your support!

@arsenm
Copy link
Contributor

arsenm commented Oct 27, 2015

Can you post your IR testcase?

@arsenm arsenm reopened this Oct 27, 2015
@ghost
Copy link
Author

ghost commented Oct 28, 2015

Here you find all the intermediate files of running

  • cloc.sh -ll bug0_bad_seg_in_ld_instr.cl (the bug I reported initially) and
  • cloc.sh -ll bug1_crash.cl (the crash I reported afterwards).

@geidav
Copy link

geidav commented Jan 5, 2016

Any news one this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants