-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updated extendr to 0.7.0 #1154
base: main
Are you sure you want to change the base?
updated extendr to 0.7.0 #1154
Conversation
Thanks @CGMossa, let us know once this is ready for review |
Please review this PR now, and let me know what you think. We'll publish a patch release based on the out come of this PR. We had to merge a few PRs to make the code nicer here. I think we've accomplished that. |
This PR generates a segfault in some tests. I can reproduce locally, the line that causes it is: expect_grepl_error(pl$head(df$get_column("a"), -2)) (L185, "tests/testthat/test-lazy_functions.R"). I don't know the cause |
Running with a debugger yields:
|
Details
|
> pl$n_unique(pl$all())
polars Expr: dtype_columns([Duration(Nanoseconds)]).n_unique() I don't know what's going on. This time I ran with debug symbols Details
|
I have very little bandwidth to look at this, but it seems to me that the following part is not working correctly. r-polars/src/rust/src/utils/mod.rs Lines 744 to 749 in 471c070
Perhaps the effect of the breaking changes made in extendr on something like TryFrom? |
Maybe the external pointer is not handled properly. It may be related to #1161 |
From the content of the error, I think that what was previously treated as an external pointer was now treated as an environment and was broken. Extendr treats what is defined as a struct in Rust as an external pointer and environment at the same time, right? |
I must say, I don't think we made an error in extendr, and the behavior we are seeing here is most likely due to faulty assumptions on what extendr does.
I think there are plenty of things here, that could have been relayed to us, and that we could have improved, such that r-polars wouldn't resort to all of this complex code working around |
Thanks for the details, I'll try to take a look at this during the week.
Most of the foundations on the Rust side were built by @sorhawell more than 1.5 years ago. Most of what @eitsupi and I did on the Rust side was upgrading |
I am not trying to say that extendr is wrong, just pointing out the part that might be a clue to fixing the broken functions. |
Does r-polars require the ability to deepclone an ExternalPtr? |
@CGMossa I am sorry, but I do not understand the intent of your question. Basically, the structs defined in r-polars are wrappers for polars structs, so there is basically no case where it itself must be copied. |
@CGMossa @JosiahParry I think the problem here (#1154 (comment)) is that the recommended way to convert something like Robj to a Rust struct is unclear. |
I'm a bit busy at the moment. But I will reply to this. And I think you're on to something, because I changed this particular thing.. To resolve some issues. |
Okay, I'll try giving an answer here. The issue is that the ownership semantics between R and Rust was not really adhered to before. Fixing that have solved a long term bug that we had, plus it has enabled us to do provide a functionality for chaining commands... First, we have unified Once you have an
For historic reasons, we do support We've also made Finally, it might have been possible to go from But we added a mechanism to be able to return #[derive(Debug)]
struct Person {
name: String,
age: u32,
}
#[extendr]
impl Person {
fn new() -> Self {
Self { name: "".to_string(), age: 0 }
}
fn older(&mut self, other: Self) -> &Self {
if self.age > other.age {
self
} else {
other
}
}
}
extendr_module! {
mod classes;
impl Person;
} See also the extensive tests of this https://github.com/extendr/extendr/blob/master/tests/extendrtests/src/rust/src/externalptr.rs Again, r-polars and the success of r-polars is a high priority to me. Let me know if I can answer more or contribute more. |
@CGMossa Thanks. I am not familiar with extendr's internal implementation so I am not sure, but I am glad you have identified the problem. Could you make modifications to merge this PR so that it passes all tests? |
I haven't identified the problem. I'm just answered your question. |
Ok, thanks. |
@CGMossa @etiennebacher Any update on this? |
Nope, I don't plan to dedicate time to this since it is low-priority IMO (as you said above this is only bothering us on the R-devel CI for now and you're rewriting using |
I will happily give a status: I have a few maintenance PRs that I had hoped would get merged, so that we could make a patch release of extendr-*+libR-sys. Unfortunately, not all of these PR have merged yet. Regarding fixing all of r-polars issues with extendr: I think it is very reasonable to use extendr with r-polars. I constantly explore some of the pain points listed in this repo, and I've suggested radical changes to extendr to accommodate you guys. These have been rejected internally. I do have a potential very radical idea to bring in multiple impl-blocks to I suspect that it would be easier to identify how to use extendr for r-polars, if r-polars was intentionally written to work with extendr or savvy. I think a lot of complexity are unnecessarily sprinkled around. If those were removed, it would be straight forward to propose PRs to r-polars. For now, we are waiting for me to get more time, to either improve extendr and this PR. I ask you to kindly be patient, as I have most definitely not forgotten about this PR. Or r-polars. Cheers |
Thanks for the response, I'm glad to hear it. I agree that the current r-polars is too complex, and as @etiennebacher pointed out I am in the process of completely rewriting r-polars from scratch, so if no one can maintain the current code base, I think we can stop update the current code without putting any effort into this PR. I have been very busy lately and have not had time to be active on GitHub, but I will resume rewriting r-polars as soon as I have time. |
Hey Guys, I am having a look at this, because I am very interested in getting Many of the failing test mostly are due too different errors, which is not a problem. But the segfaulting behavior is problematic. I am investigating from here:
I have take a look at the entrypoint to the rust code: #[extendr]
pub fn create_cols_from_datatypes(list_of_dtypes: Robj) -> RResult<RPolarsExpr> {
# added debugging print
println!("{list_of_dtypes:#?}");
let dtypes = robj_to!(Vec, PLPolarsDataType, list_of_dtypes)?;
Ok(RPolarsExpr(dsl::dtype_cols(dtypes)))
} The content of r-polars/src/rust/src/utils/mod.rs Lines 744 to 749 in 471c070
So an Any pointers or suggestions where to look for further clues would be greatly appreciated. Kind regards, |
This is exactly what this code previously assumed! r-polars/src/rust/src/utils/mod.rs Lines 744 to 749 in 471c070
So line Though I am still unsure that the changes are sufficient. There might me be more parts of the code that use the previous behavior and expect to return the error to R. So there might be more cases where users can run into segfaults that are not tested for already! To be honest I am not satisfied that the responsibility of checking if the conversion will work onto the libraries building upon |
Hello @ju6ge. I thank you for digging into this, and precisely identifying the problem. I did try to dig, but r-polars is very complicated for me, and I decided instead, to list the things I found important to communicate about instead. I totally agree that extendr should be more like Rust, and less like C. As I said, the desired feature for runtime type checking, is something I have prototyped. See this extendr/extendr#609. Us over at extendr are more than happy to collaborate, and coordinate on features for r-polars. In general, we have benefitted a lot from each other, and long-term wishes from r-polars folks have been successfully implemented by us. And some things are still being discussed and prototyped. From where I'm looking, I think we could definitely benefit from someone willing to communicate with us, and being involved with r-polars. I would also say, the typeid check is not without issues: First, it is a performance issue, since it will be needed to be done every time you invoke an extendr-impl method from R. Second, it is a runtime check. That's not really that beneficial anyways. Lastly, there are some caveats about typeid not being consistent across compilations.. So what happens when you save a Rust object and then want to invoke it again? I don't know. There are too many open questions. But if these can be chased down, and someone can argue for them, of course we'll accommodate that. And there are plenty of experiments and POCs that can be pulled, if there is something of interest. |
@CGMossa no problem. My interest in r-polars is more from a user perspective, in the sense that I am writing a rust library that exports bindings to different languages (r,python and julia). In this context polars is a very interesting provider for a common dataframe implementation. For now I am not deeply involved with r-polars. Though I do intend to dig into stuff and make sure stuff keeps working and is updated if I can find the time.
I would be totally in favor of seeing something like that. The issues is that R is not suited to doing deep type checking and is a dynamic language. If
Well, I think performance at the expense of safety is a lousy argument. If you might crash your program because things are not properly checked than that is a bigger concern than performance. Also I still have to check anyway now, I just have to do in explicitly so nothing was won here anyway in terms of performance. Checking on rust side also is probably faster than from the r side. Yes it is runtime, there is just no way around that with R since R is an interpreted language. I don't think compiled R is an argument here, because R can not take advantage of rust compiler checks anyway. I have not dug into x = Person$new("ju6ge", 9001)
x$name() This translates to the call Person$name <- function() .Call(wrap__RPerson__name, self) Which directly corresponds to #[extendr]
impl Person {
fn name(&mut self) -> String {
…
}
} Why would we run into a type check in this situation? Would it be possible to only run into type_checks when calling I hate that R has the concept of saving your variable session, that is such an anti feature in my opinion. It just leads to people writing broken code, because the forget to declare their variables, because they are present magically in their environment. Also global variables are commonly used. Sorry for the rant. Actually that should be more the problem of I hope my view on things is useful. Kind regards, |
This sounds very good. Truthfully, I think extendr needs more users around it. Just to answer this good point of yours: The issue is not with constructing the #[extendr]
impl Person {
// redacted other impls
fn name(&self) -> &str {
self.name.as_str()
}
} which is invoked in R by way of pub extern "C" fn wrap__Person__name(_self: extendr_api::SEXP) -> extendr_api::SEXP {
let result = unsafe {
std::panic::catch_unwind(std::panic::AssertUnwindSafe(
move || -> std::result::Result<extendr_api::Robj, extendr_api::Error> {
let mut _self_robj = extendr_api::robj::Robj::from_sexp(_self);
Ok(extendr_api::Robj::from(
extendr_api::unwrap_or_throw_error(<&Person>::try_from(&_self_robj)).name(),
))
},
))
};
// redacted treatment of this result object
} (I've edited the version above for clarity, see below for full expansion) Again, I'm willing to put up another PR (I'm a bit booked at the moment, but definitely willing) that has this in it, for actual users, and actual debates and communications on it. Details: Full expansion of `wrap__Person__new````rust #[no_mangle] #[allow(non_snake_case, clippy::not_unsafe_ptr_arg_deref)] pub extern "C" fn wrap__Person__name(_self: extendr_api::SEXP) -> extendr_api::SEXP { use extendr_api::robj::*; let wrap_result_state: std::result::Result< std::result::Result, Box, > = unsafe { std::panic::catch_unwind(std::panic::AssertUnwindSafe( move || -> std::result::Result { let mut _self_robj = extendr_api::robj::Robj::from_sexp(_self); Ok(extendr_api::Robj::from( extendr_api::unwrap_or_throw_error(<&Person>::try_from(&_self_robj)).name(), )) }, )) }; match wrap_result_state { Ok(Ok(zz)) => { return unsafe { zz.get() }; } Ok(Err(conversion_err)) => { let err_string = conversion_err.to_string(); drop(conversion_err); extendr_api::throw_r_error(&err_string); } Err(unwind_err) => { drop(unwind_err); let err_string = { let res = $crate::fmt::format($crate::__export::format_args!( "User function panicked: {}", "name" )); res }; extendr_api::handle_panic(err_string.as_str(), || { #[cold] #[track_caller] #[inline(never)] const fn panic_cold_explicit() -> ! { $crate::panicking::panic_explicit() } panic_cold_explicit(); }); } } { $crate::panicking::panic_fmt($crate::const_format_args!( "internal error: entered unreachable code: {}", $crate::format_args!("internal extendr error, this should never happen.") )); } } ``` Thus, on the comment
The things I listed are not opinions that I myself hold on type-checking |
Thanks for your input, I will definitly consider communicating directly with
Thanks for your explanation, so I guess my assumption was wrong about R handling the passing the data to C directly. So R passes SEXP, and If you find the time to open a new PR on I think the discussion here can be considered complete. Discussion of this particular MR can be moved to #1187. Should we close the MR? |
These are the changes necessary to update r-polars.
I discovered that we could make our macros a bit nicer, so I've done so in:
We will consider a patch release for extendr, where these improvements can be in. We'll just wait a few days / (1-2 weeks) to figure out if there are more patches to be made.
Cheers!
Close #1105