-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial Windows support for Ark #140
Conversation
Figured out where the path <- "C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.37.32822\\bin\\Hostx86\\x86"
if (!dir.exists(path)) {
stop("Visual Studio tools directory is incorrect or the tools have not been installed.")
}
# Put the path containing the tools on the PATH.
Sys.setenv(PATH = paste(path, Sys.getenv("PATH"), sep = ";"))
# Find R DLLs.
dlls <- list.files(R.home("bin"), pattern = "dll$", full.names = TRUE)
message("Generating .lib files for DLLs in ", R.home("bin"))
# Generate corresponding 'lib' file for each DLL.
for (dll in dlls) {
# check to see if we've already generated our exports
def <- sub("dll$", "def", dll)
if (file.exists(def))
next
# Call it on R.dll to generate exports.
command <- sprintf("dumpbin.exe /EXPORTS /NOLOGO %s", dll)
message("> ", command)
output <- system(paste(command), intern = TRUE)
# Remove synonyms.
output <- sub("=.*$", "", output)
# Find start, end markers
start <- grep("ordinal\\s+hint\\s+RVA\\s+name", output)
end <- grep("^\\s*Summary\\s*$", output)
contents <- output[start:(end - 1)]
contents <- contents[nzchar(contents)]
# Remove forwarded fields (not certain that this does anything)
contents <- grep("forwarded to", contents, invert = TRUE, value = TRUE, fixed = TRUE)
# parse into a table
tbl <- read.table(text = contents, header = TRUE, stringsAsFactors = FALSE)
exports <- tbl$name
# sort and re-format exports
exports <- sort(exports)
exports <- c("EXPORTS", paste("\t", tbl$name, sep = ""))
# Write the exports to a def file
def <- sub("dll$", "def", dll)
cat(exports, file = def, sep = "\n")
# Call 'lib.exe' to generate the library file.
outfile <- sub("dll$", "lib", dll)
fmt <- "lib.exe /def:%s /out:%s /machine:%s"
cmd <- sprintf(fmt, def, outfile, .Platform$r_arch)
system(cmd)
}
|
ecd242b
to
d292666
Compare
6040061
to
b7f20dc
Compare
b2336c0
to
28dbb53
Compare
18233dd
to
ec160e0
Compare
b2336c0
to
c4c9d5f
Compare
cfg_if::cfg_if! { | ||
if #[cfg(unix)] { | ||
mod unix; | ||
pub use self::unix::*; | ||
} else if #[cfg(windows)] { | ||
mod windows; | ||
pub use self::windows::*; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each crate now has a sys.rs
file that looks like this. Typically the way this works is that, for example, the crate level signals
module will call sys::signals::fn()
for a platform specific version of fn()
that it needs. i.e. handle_interrupt_request()
is one such fn()
.
You typically won't call sys::signals::*
outside of the signals
module. The signals
module itself then either re-exposes functions through pub use
or exposes its own slightly higher level wrappers around the system specific utilities.
So then other files call crate::signals::*
This is modeled after the Rust std library
} | ||
|
||
pub fn listen(&self) { | ||
// TODO: Windows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stub for stream capturing on Windows. A big TODO, but not critical to "just make it work"
fn initialize_signal_handlers() { | ||
// Reset the signal block. | ||
// | ||
// This appears to be necessary on macOS; 'sigprocmask()' specifically | ||
// blocks the signals in _all_ threads associated with the process, even | ||
// when called from a spawned child thread. See: | ||
// | ||
// https://github.com/opensource-apple/xnu/blob/0a798f6738bc1db01281fc08ae024145e84df927/bsd/kern/kern_sig.c#L1238-L1285 | ||
// https://github.com/opensource-apple/xnu/blob/0a798f6738bc1db01281fc08ae024145e84df927/bsd/kern/kern_sig.c#L796-L839 | ||
// | ||
// and note that 'sigprocmask()' uses 'block_procsigmask()' to apply the | ||
// requested block to all threads in the process: | ||
// | ||
// https://github.com/opensource-apple/xnu/blob/0a798f6738bc1db01281fc08ae024145e84df927/bsd/kern/kern_sig.c#L571-L599 | ||
// | ||
// We may need to re-visit this on Linux later on, since 'sigprocmask()' and | ||
// 'pthread_sigmask()' may only target the executing thread there. | ||
// | ||
// The behavior of 'sigprocmask()' is unspecified after all, so we're really | ||
// just relying on what the implementation happens to do. | ||
let mut sigset = SigSet::empty(); | ||
sigset.add(SIGINT); | ||
sigprocmask(SigmaskHow::SIG_BLOCK, Some(&sigset), None).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you see a big deletion, it didn't get removed, just moved to the unix/*
equivalent file, i.e. unix/interface.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Unix specific startup code from interface.rs
has been moved here. Not much has changed really. Unix specific callables from the R api are imported at the bottom of here now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manually generated bindings for Rstart
using a forked version of libR-sys
I'm not entirely sure how this is going to behave on different versions of R. Hopefully it "just works" since it has C representation and older versions of R simply won't access fields like EmitEmbeddedUTF8
, added in R 4.0.0
(Similar to what RStudio does)
pub fn initialize_signal_handlers() { | ||
// TODO: Windows | ||
} | ||
|
||
pub fn initialize_signal_block() { | ||
// TODO: Windows | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncertain if we actually need these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signal masking/blocking is POSIX only.
We don't need a SIGINT handler since we can set the global pending var from the Control thread right? Technically we don't need this handler on Unix either, but I guess we want to support sending interrupts from e.g. a terminal. Though we could just let the default SIGINT handler from R set up for that purpose?
pub fn interrupts_pending() -> bool { | ||
unsafe { UserBreak == Rboolean_TRUE } | ||
} | ||
|
||
pub fn set_interrupts_pending(pending: bool) { | ||
if pending { | ||
unsafe { UserBreak = Rboolean_TRUE }; | ||
} else { | ||
unsafe { UserBreak = Rboolean_FALSE }; | ||
} | ||
} | ||
|
||
#[link(name = "R", kind = "dylib")] | ||
extern "C" { | ||
static mut UserBreak: libR_shim::Rboolean; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently UserBreak
is the Windows equivalent (ish) to R_pending_interrupts
, so I've created some helpers to wrap these in a platform agnostic way, and we use those everywhere now
let x = MultiByteToWideChar(code_page, flags, x)?; | ||
|
||
// `WC::NoValue` doesn't exist, so we make it unsafely: | ||
// https://github.com/rodrigocfd/winsafe/issues/110 | ||
let flags = unsafe { WC::from_raw(0) }; | ||
let default_char = None; | ||
let used_default_char = None; | ||
|
||
let x = WideCharToMultiByte(CP::UTF8, flags, &x, default_char, used_default_char)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I currently have a locally patched version of winsafe for rodrigocfd/winsafe#111
It won't actually break anything if you don't have a patched version, console output just won't look quite right (extra spaces and new lines typically)
* | ||
*/ | ||
|
||
pub use sys::line_ending::NATIVE_LINE_ENDING; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the opportunity to move this platform specific code into the new folder structure too
With this PR and its positron companion (posit-dev/positron#1900) and following these instructions, I have successfully built ark and used R inside positron! The only coffee table I banged my shins on is that, when you run the The Code Known As # What Davis used (subtle difference in ?version?)
# path <- "C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.37.32822\\bin\\Hostx86\\x86"
path <- "C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.38.33130\\bin\\Hostx86\\x86" Then it's important to launch RStudio (or what have you) as administrator before you try to execute the code that creates the I also forgot to set the |
The recent work on
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not reviewing the implementation, but rather confirming that I can build ARK on Windows from this branch and, together with posit-dev/positron#1900, can launch a dev build of Positron on Windows, with a functional R interpreter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
crates/ark/src/control.rs
Outdated
// TODO: Windows. | ||
// TODO: Needs to send a SIGINT to the whole process group so that | ||
// processes started by R will also be interrupted. | ||
sys::control::handle_interrupt_request(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This namespace makes me think that it's from the stdlib rather than something from ark, but I guess I just need to get used to it.
crates/ark/src/main.rs
Outdated
let mut sigset = SigSet::empty(); | ||
sigset.add(SIGINT); | ||
sigprocmask(SigmaskHow::SIG_BLOCK, Some(&sigset), None).unwrap(); | ||
signals::initialize_signal_block(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use ark::signals::
for this sort of stdlib-sounding namespaces? Same with traps::
and sys::
. We're lucky because ark::
is very short so that might be a good convention for us.
@@ -0,0 +1,133 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These abstractions look good
/// | ||
/// \002\377\376 <text> \003\377\376 | ||
/// | ||
/// strangely, we see these escapes around text that is not UTF-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was an rstudio comment i copied over
pub fn initialize_signal_handlers() { | ||
// TODO: Windows | ||
} | ||
|
||
pub fn initialize_signal_block() { | ||
// TODO: Windows | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signal masking/blocking is POSIX only.
We don't need a SIGINT handler since we can set the global pending var from the Control thread right? Technically we don't need this handler on Unix either, but I guess we want to support sending interrupts from e.g. a terminal. Though we could just let the default SIGINT handler from R set up for that purpose?
One more important thing I just ran into and then @DavisVaughan reminded me, from a slack conversation: This timeout is currently too short for us in the AWS Windows cloud machines (generally very sluggish and laggy for me) and the way you'll find out is you'll get crashes while trying to do normal things, like install a package. Increase from 5 to 5000 to get unstuck for now. |
Very nix specific
Does nothing on Windows
This seems to handle interrupting a for loop and an input request like `readline()`
Also patches over winsafe issue directly in ark, so others get the patch too
Since this seems to be POSIX only, I think
46fbca5
to
466cec2
Compare
Branched from #172
This PR does the initial work to get Windows support up and running enough that I think we can merge it in. That will allow us to then iterate on a few things at the same time - including getting a GitHub Actions workflow that builds an executable for us.
I'd like to take this chance to document some Windows peculiarities that we have to work around. It may be useful for later.
Platform specific folder structure
We now have a decent chunk of platform specific code, so I've introduced a new convention to handle this. Each crate has the following structure:
sys.rs
sys/unix.rs
sys/windows.rs
sys/unix/
sys/windows/
In
sys.rs
, we always do:This allows us to create 2 identical APIs in
sys/unix.rs
andsys/windows.rs
, and then use them in a platform agnostic way withsys::module::fn()
from outside ofsys/
. This is inspired by the Rust std library.Write console buffer on Windows
On Windows, the
write_console()
buffer that R provides us has some weird quirks. In general, the buffer that comes through will be in the system encoding, so we typically need to convert it to UTF-8 to then put it in a rust string and forward it to the frontend. Okay, no problem.The weirdness is that if
EmbeddedUTF8
is set for R, then R surrounds UTF-8 text with a 3-byte escape like\x02\xFF\xFE<text>\x03\xFF\xFE
to "mark" it. There could be more text in the buffer before or after this that is just in the system encoding. We have to break the buffer into the system encoded bits and the UTF-8text
bits, re-encode the system encoded bits to UTF-8, drop the 3-byte escapes, and put everything back together. This is done inconsole_to_utf8()
.Interrupts
On the unix side, we use
R_pending_interrupts
to flag when R should handle an interrupt. On the windows side, this is apparently calledUserBreak
, and is mostly handled the same way, except I don't seeR_suspended_interrupts
for Windows.Global variables from libR-sys
libR-sys exposes both functions and global variables from R for us to use. However, either it, bindgen, or rustc have an issue where global variables bindings are not being written in such a way that they work for Windows. Take
R_NilValue
for example. It is exposed in libR-sys through:And libR-sys sets
println!("cargo:rustc-link-lib=dylib=R");
. Now, AFAICT that should be enough because it tells rustc to link to R dynamically.And it does work fine on Mac, but on Windows global variables MUST be marked with
dllimport
otherwise the linker cannot find them, and this does not seem to be happening with this setup. You must also add:If you do that, then it works right on Windows. This causes quite an issue because we import many global variables from R.
To work around this, I have created a new subcrate, libR-shim, in this PR #172. It re-exports everything we need from libR-sys except for the global variables, which it then manually provides bindings for tagged with the additional
#link[]
attribute. This wasn't fun to do, but it seemed like the best way.Practically, all this means is that we now import
libR_shim::*
instead oflibR_sys::*
, and if we are missing anything in the shim, we have to re-export it before we can use it.libR-sys has been removed as a dependency from ark and harp, so you can't accidentally use
libR_sys::R_NilValue
, which would be bad.R setup
On Unix, we have access to a number of
ptr_*
global variables that we use to set hooks likeReadConsole()
(ptr_R_ReadConsole
) and friends. On Windows, those don't actually exist. They would be declared inRinterface.h
, but they are behind a#ifdef R_INTERFACE_PTRS
, and that is only defined on Unix.https://github.com/wch/r-source/blob/55cd975c538ad5a086c2085ccb6a3037d5a0cb9a/src/include/Rinterface.h#L131
Instead, we set things like this through
Rstart
, a struct that holds startup options that is passed toR_DefParamsEx()
, which sets up some initial options, and thenR_SetParams()
, which assigns the things inRstart
into global variables that R manages from then on.https://github.com/wch/r-source/blob/55cd975c538ad5a086c2085ccb6a3037d5a0cb9a/src/include/R_ext/RStartup.h#L127-L129
The problem is, libR-sys doesn't include
Rstart
in the generated bindings on Windows. They would need to include#include <R_ext/RStartup.h>
in theirwrapper.h
file here (https://github.com/extendr/libR-sys/blob/6fc2b5f2371fe20ec53d2cdf845a6e13dfdd3cfe/wrapper.h#L42), and they would also need to#define Win32
to generate the full Windows bindings for it.For now, I have actually gone and done this manually using a fork of libR-sys. The generated bindings for
Rstart
lives ininterface_types.rs
, which we then use ininterface.rs
. Hopefully we can remove this in the future?DLLs and LIBs
At
cargo build
time, ark / libR-sys needR.lib
to exist atC:\\Program Files\\R\\R-4.3.2\\bin\\x64\\R.lib
. But R doesn't ship with a lib file, it ships withR.dll
. To workaround this, we use a script from RStudio calleddll2lib.R
(see the comment below) which uses Visual Studio utilities to generate theR.lib
from the dll.You typically have to set
R_HOME
beforecargo build
the first time you build the libR-sys dependency, I do that withAfter building it once, you don't need that to iterate on ark itself.
At ark load time, i.e. when the chosen version of R is being initialized,
ark.exe
needs to be able to dynamically link to theR.dll
that goes with the user's R version. Windows uses thePATH
as one place where it looks for libraries to dynamically link against, so in positron-r we addC:\\Program Files\\R\\R-<VERSION>\\bin\\x64
to thePATH
of the subprocess that ark is run in.Still TODO after merging
Create a GitHub Actions workflow for building ark on Windows
ark.exe
needs a Windows Application Manifest so that it can declare UTF-8 as the default codepage. Right now 1252 comes through instead. RStudio actually producesrsession.exe
andrsession-utf8.exe
and ships them both, one has UTF-8 support in the manifest and the other does not. We may have to do the same for ark. produce separate UTF-8 rsession binary for ucrt builds rstudio/rstudio#10524. There is a rust crate to help us with this https://lib.rs/crates/embed-manifest.Stream capturing on Windows, i.e. so
system2()
commands show stdout on the screen. Use https://github.com/rstudio/rstudio/blob/main/src/cpp/core/system/Win32OutputCapture.cpp as a guide.Right now we set the
RSTART_VERSION
to0
inR_DefParamsEx()
, but in theory we should use1
. That may "just work", I haven't fully tested it. It would give us access to some additional hooks in newer versions of R (like a cleanup hook).The positron-r build process needs to utilize
dll2lib.R
during the build process.We need to merge some variation of Barebones work for R discovery on Windows positron#1900, particularly for updating the
PATH
to know about R's bin directory whereR.dll
lives, since thePATH
is how Windows finds runtime dynamic library dependencies (crazy). That PR also has some kind of hardcoded support for detecting an R installation in the first place, which needs to be generalized.Either update to dev winsafe if this is fixed, or write our own wrappers around the bare bones equivalents to avoid the
+1
issue, or manually shorten the string in the meantime if we know this is going to be appended. I don't thinkMultiByteToWideChar
andWideCharToMultiByte
need to add 1 tonum_bytes
rodrigocfd/winsafe#111Outdated notes below, possibly still useful for historical purposes:
I got stuck for awhile at Rust's (rustc's) linking step where it threw out something like: "Can't find
R.lib
"And indeed in the
R_HOME/bin
directory there is anR.dll
but noR.lib
. It seems like in RStudio for Windows they totally regenerate theR.lib
on the fly usingdll2lib.R
?? It uses this script:https://github.com/rstudio/rstudio/blob/d26c0d74ad873a0f48b9153c2b0e56111cae2932/src/cpp/tools/dll2lib.R
It's kind of bonkers, but this works (from Command Prompt):
cd "C:\Program Files\Microsoft Visual Studio\2022\Community\Common7\Tools\"
VsDevCmd.bat -clean_env -no_logo
VsDevCmd.bat -arch=amd64 -startdir=none -host_arch=amd64 -winsdk=10.0.22000.0 -no_logo
dll2lib.R
in an R session started from that Command Prompt session. Skip theSys.setenv()
line at the beginning.R.lib
(and some others) in yourR_HOME/bin
(These commands were taken from https://github.com/rstudio/rstudio/blob/d26c0d74ad873a0f48b9153c2b0e56111cae2932/package/win32/make-package.bat#L179-L188, I think the idea is that it ends up putting some required locations on the path, i.e. for
dumpbin.exe
,lib.exe
, and maybecl.exe
(a c compiler))That allowed it to find
R.lib
, and now I am stuck with a new linker error:I think we knew this was coming. Things like
ptr_R_WriteConsole
are declared in an R header file, but are only defined in Unix specific files. We define them ourselves at runtime, but this is too late for Rust's linker check, I think 😢On Windows there is no
-undefined dynamic_linking
flag that we can set, so we need to do something else here, like work with theRstart
struct and set these pointers through that (although that is also tricky because it has changed over time, most recently in R 4.2.0, and RStudio does some crazy stuff there to support many versions)