-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port to more current rust-nightly #98
base: master
Are you sure you want to change the base?
Conversation
…ust nightly - also port from register_attr to register_tool (approach shamelessly taken from rust-gpu)
It seems that e.g. extern crate alloc;
use cuda_std::prelude::*;
#[kernel]
#[allow(improper_ctypes_definitions, clippy::missing_safety_doc)]
pub unsafe fn add(a: &[f32], b: &[f32], c: *mut f32) {
let idx = thread::index_1d() as usize;
if idx < a.len() {
let elem = &mut *c.add(idx);
*elem = a[idx] + b[idx];
if idx == 0 {
cuda_std::println!("Elem 0: {}", *elem);
}
}
} The resulting ptx will be invalid with
A offending ptx section looks like: Top section, truncated: .const .align 8 .u8 _ZN4core3fmt12USIZE_MARKER17h8e203fb7dfec90c9E[8] = {0XFF(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF00(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF0000(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF000000(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF00000000(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF0000000000(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF000000000000(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF00000000000000(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE)}; $L__BB6_5:
mov.u64 %rd112, 0;
ld.v2.u32 {%r5, %r6}, [%rd108+32];
ld.u8 %rs3, [%rd108+40];
st.local.u8 [%rd7+8], %rs3;
st.local.v2.u32 [%rd7], {%r5, %r6};
ld.u64 %rd109, [%rd108+24];
ld.u16 %rs4, [%rd108+16];
and.b16 %rs2, %rs4, 3;
setp.eq.s16 %p6, %rs2, 2;
mov.u64 %rd110, %rd112;
@%p6 bra $L__BB6_10;
setp.ne.s16 %p7, %rs2, 1;
@%p7 bra $L__BB6_9;
shl.b64 %rd63, %rd109, 4;
add.s64 %rd64, %rd115, %rd63;
add.s64 %rd18, %rd64, 8;
ld.u64 %rd65, [_ZN4core3fmt12USIZE_MARKER17h8e203fb7dfec90c9E];
ld.u64 %rd66, [%rd64+8];
setp.ne.s64 %p8, %rd66, %rd65;
mov.u64 %rd110, %rd112;
@%p8 bra $L__BB6_10;
ld.u64 %rd68, [%rd18+-8];
ld.u64 %rd109, [%rd68];
mov.u64 %rd110, 1;
bra.uni $L__BB6_10; Those both references are to the Even though this error is there, a way more complex example code (not using |
@apriori hello there. Just wanted to check-in and see if you've been having any success on this branch. I have a few open PRs, some of which I am actively using, and I'm thinking about rebasing them onto this branch in order to gain the update rustc benefits. Think it is reasonable to rebase onto this branch? |
I am getting an I moved back to master because I have a few parallel reduction algorithms that make heavy use of shared memory, and I don't want to take the time right now to debug the code gen issue |
TBH, I really hope that @RDambrosio016 (hope all is well) comes back some day. Having to move over to a C++ wrapper pattern, building lots of shared libraries, multi-stage nvcc build pipelines and such ... not fun. This framework on the other hand already has a lot of work put into it, and keeping it up-to-date and moving forward is a huge boon to the community. I'm still holding out hope that it will be revitalized soon |
I would wish the same, but so far it seems @RDambrosio016 lost interest/has no time anymore. For me non-trivial programs were working as long as Anyway, some more work should happen on this, or this framework will loose connection to rustc development entirely - nor will it gain acceptance. |
Sorry, i've just been really busy with my degree and other things. I think being tied to a different codegen, and especially to libnvvm is not the way to go for the future. I think adding the required linking logic for nvptx in rustc is much easier and better. Im doing some experiments trying to do that. |
@RDambrosio016 nice! Hope all is going well with your studies. |
@RDambrosio016 so you want to prefer using the already existing nvptx codegen backend of rustc? |
BTW, something I've done to help mitigate the issue with having to use the older compiler version:
There are a few ways to optimize this. Doesn't need to be an example, there are other ways. Keeping it out of the build.rs of the larger project is a way to help ensure that the rust toolchain limitation doesn't spread. |
@apriori Hello there! I would like to port Despite what @RDambrosio016 said a few weeks ago about abandoning the NVVM codegen and moving to what's already implemented in I also heard that NVIDIA might be in the process of updating their tools to a much more recent LLVM version as even for them it's too difficult to rely on something as old as v7.0.1. This would probably simplify some of the logic implemented in |
@dssgabriel interesting, what do you mean by invalid PTX? i was not able to build anything since it requires a custom linker (my proposal in rustc would put the linking logic in rustc) that doesnt work on windows. The LLVM PTX backend is mature enough that i would expect it to generate valid code unless rustc is emitting something very suspicious. |
Unfortunately no. rustc is a rapidly moving target. I once just checked a just slightly more recent nightly after 2022/12/10 and compilation failed. There is two approaches for this I would consider "valid": a) Fix the issues in this MR and continue from there One can and should use rustc_codegen_llvm as a template. But here and there more detailed knowledge about cuda PTX is required - some solutions I merely guessed and I bet I was wrong with that. As far as I know though, libNVVM 2.0 is very different from prior versions. I think @RDambrosio016 can comment more on the feasability of this. I would also prefer to have these efforts more "upstream", but we are kind of lost if upstream rustc is not moving and/or improving with the PTX backend.
I cannot comment on this other than that I never really tried the official rustc ptx backend. Rust-cuda was simply the way more compelling and accessible solution. This is also due to @RDambrosio016 good documentation and immediately runnable examples, let alone all his hard work on building pretty much an ecosystem of libraries.
As the interfacing would still be via libNVVM I doubt that has any impact on general accessibility. Maybe developer install experience might improve a bit when not depending on ancient llvm versions, but that is pretty much about it.
So far my experience also with rust-cuda was that single-source is a thing I would really love to see, but its hard with the rust compilation model I would imagine, especially with |
Is there anything I can do to help? Is it just an issue of putting some line of code in the right place? Can we write a new bindgen wrapper to get low-level accesss? |
add_gpu
example workingbreaking:
cuda_builder
. The user has no longer to define the respectivecfg_attr
section in gpu code. Leaving them still in gpu code will result in a compile error from cargo.to be further tested: