From 80ebea655c3c81aab3346d710126802a2c8343f8 Mon Sep 17 00:00:00 2001 From: Dewey Dunnington Date: Thu, 12 Oct 2023 22:07:11 -0300 Subject: [PATCH] GH-37945: [R] Update developer documentation (#38220) ### Rationale for this change Several PRs over the last few months have update the build system to be more friendly for developers. During this process it has also come to light that we haven't supported the Windows development setup documented here since R 4.1 (released in spring 2021). I had to remove Windows from the test-r-devdocs job because the approach used there was not compatible with the `setup-r@ v2` action, and the job was failing with the `@ v1` action. ### What changes are included in this PR? - Updated the sections on using pre-built static libraries and bundled builds - Removed the Windows section regarding the bundled build. This section would need rewriting to support the last two minor releases of R but in the meantime I think it is mostly confusing. ### Are these changes tested? They are documentation changes. They are also slightly optimisitc: we can fix problems with the developer setup incrementally between releases, but it's more difficult to update our documentation. This PR documents the intended behaviour after https://github.com/apache/arrow/pull/38236 . ### Are there any user-facing changes? No. * Closes: #37945 Lead-authored-by: Dewey Dunnington Co-authored-by: Dewey Dunnington Co-authored-by: Jacob Wujciak-Jens Signed-off-by: Dewey Dunnington --- r/vignettes/developers/setup.Rmd | 136 ++++++------------------------- 1 file changed, 26 insertions(+), 110 deletions(-) diff --git a/r/vignettes/developers/setup.Rmd b/r/vignettes/developers/setup.Rmd index 479af577aa848..de33e72407792 100644 --- a/r/vignettes/developers/setup.Rmd +++ b/r/vignettes/developers/setup.Rmd @@ -38,50 +38,32 @@ set -e set -x ``` - -```{bash, save=run & windows, hide=TRUE} -# For some reason CRAN Mirror goes missing in CI -echo 'options(repos=structure(c(CRAN="https://cloud.r-project.org")))' > $HOME/.Rprofile -``` - -Windows and macOS users who wish to contribute to the R package and -don't need to alter libarrow (Arrow's C++ library) may be able to obtain a -recent version of the library without building from source. - -### Linux - -On Linux, you can download a .zip file containing libarrow from the -[nightly repository](https://nightlies.apache.org/arrow/r/libarrow/bin/). - -The directory names correspond to the OpenSSL version the binaries built with: -- "linux-openssl-1.0" (OpenSSL 1.0) -- "linux-openssl-1.1" (OpenSSL 1.1) -- "linux-openssl-3.0" (OpenSSL 3.0) - -Version numbers in that repository correspond to dates. - -You'll need to create a `libarrow` directory inside the R package directory and unzip the zip file containing the compiled libarrow binary files into it. - -### macOS -On macOS, you can install libarrow using [Homebrew](https://brew.sh/): - -```bash -# For the released version: -brew install apache-arrow -# Or for a development version, you can try: -brew install apache-arrow --HEAD -``` - -### Windows - -On Windows, you can download a .zip file containing libarrow from the -[nightly repository](https://nightlies.apache.org/arrow/r/libarrow/bin/windows/). - -Version numbers in that repository correspond to dates. - -You can set the `RWINLIB_LOCAL` environment variable to point to the zip file containing libarrow before installing the arrow R package. - -## R and C++ +The Arrow R package is unique compared to other R packages that you may have +contributed to because it builds on top of the large and feature-rich Arrow C++ +implementation. Because the R package integrates tightly with Arrow C++, +it typically requires a dedicated copy of the library (i.e., it is usually +not possible to link to a system version of libarrow during development). + +## Option 1: Using nightly libarrow binaries + +On Linux, MacOS, and Windows you can use the same workflow you might use for another +package that contains compiled code (e.g., `R CMD INSTALL .` from +a terminal, `devtools::load_all()` from an R prompt, or `Install & Restart` from +RStudio). If the `arrow/r/libarrow` directory is not populated, the configure script will +attempt to download the latest nightly libarrow binary, extract it to the +`arrow/r/libarrow` directory (MacOS, Linux) or `arrow/r/windows` +directory (Windows), and continue building the R package as usual. + +Most of the time, you won't need to update your version of libarrow because +the R package rarely changes with updates to the C++ library; however, if you +start to get errors when rebuilding the R package, you may have to remove the +`libarrow` directory (MacOS, Linux) or `windows` directory (Windows) +and do a "clean" rebuild. You can do this from a terminal with +`R CMD INSTALL . --preclean`, from RStudio using the "Clean and Install" +option from "Build" tab, or using `make clean` if you are using the `Makefile` +located in the root of the R package. + +## Option 2: Use a local Arrow C++ development build If you need to alter both libarrow and the R package code, or if you can't get a binary version of the latest libarrow elsewhere, you'll need to build it from source. This section discusses how to set up a C++ libarrow build configured to work with the R package. For more general resources, see the [Arrow C++ developer guide](https://arrow.apache.org/docs/developers/cpp/building.html). @@ -103,43 +85,6 @@ sudo apt install -y cmake libcurl4-openssl-dev libssl-dev brew install cmake openssl ``` -#### Windows - -The package can be built on Windows using [RTools 4](https://cran.r-project.org/bin/windows/Rtools/). It can be built for mingw32 (i386), mingw64 (x64), or ucrt64 (UCRT x64). mingw64 is the recommended 64-bit installation. - -Open the corresponding RTools Bash, for example "Rtools MinGW 64-bit" for mingw64. - -Install CMake, ccache, and Ninja with: - -```{bash, save=run & windows} -pacman --sync --refresh --noconfirm \ - ${MINGW_PACKAGE_PREFIX}-{ccache,cmake,ninja,openssl} -export CMAKE_GENERATOR=Ninja -``` - -You will need to add R to your path. For a user-level installation, R will be at something like `~/Documents/R/R-4.1.2/bin`. For a global installation, R will be at something like `/c/Program\ Files/R/R-4.1.2/bin`. The R on your path needs to match the architecture you are compiling for, so if you are compiling on 32-bit specify `.../bin/i386` instead of `.../bin/x64`. - -```{bash} -export PATH=~/Documents/R/R-4.1.2/bin/x64:$PATH -``` - -You can install additional dependencies like so. Note that you are limited to the packages in [the RTools repo](https://github.com/r-windows/rtools-packages), which does not contain every dependency used by Arrow. - -```{bash, save=run & windows} -pacman --sync --refresh --noconfirm \ - ${MINGW_PACKAGE_PREFIX}-boost \ - ${MINGW_PACKAGE_PREFIX}-brotli \ - ${MINGW_PACKAGE_PREFIX}-lz4 \ - ${MINGW_PACKAGE_PREFIX}-protobuf \ - ${MINGW_PACKAGE_PREFIX}-snappy \ - ${MINGW_PACKAGE_PREFIX}-thrift \ - ${MINGW_PACKAGE_PREFIX}-zlib \ - ${MINGW_PACKAGE_PREFIX}-zstd \ - ${MINGW_PACKAGE_PREFIX}-aws-sdk-cpp \ - ${MINGW_PACKAGE_PREFIX}-re2 \ - ${MINGW_PACKAGE_PREFIX}-libutf8proc -``` - ### Step 2 - Configure the libarrow build We recommend that you configure libarrow to be built to a user-level directory rather than a system directory for your development work. This is so that the development version you are using doesn't overwrite a released version of libarrow you may already have installed, and so that you are also able work with more than one version of libarrow (by using different `ARROW_HOME` directories for the different versions). @@ -158,13 +103,6 @@ export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH echo "export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH" >> ~/.bash_profile ``` -_Special instructions on Windows:_ You will need to add `$ARROW_HOME/bin` to your `PATH` if you are using dynamic libraries (which is recommended). - -```{bash, save=run & windows} -export PATH=$ARROW_HOME/bin:$PATH -echo "export PATH=\"$ARROW_HOME/bin:$PATH\"" >> ~/.bash_profile -``` - Start by navigating in a terminal to the arrow repository. You will need to create a directory into which the C++ build will put its contents. We recommend that you make a `build` directory inside of the `cpp` directory of the Arrow git repository (it is git-ignored, so you won't accidentally check it in). Next, change directories to be inside `cpp/build`: ```{bash, save=run & !sys_install} @@ -197,32 +135,10 @@ cmake \ .. ``` -##### Windows - -```{bash, save=run & !sys_install & windows} -cmake \ - -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ - -DCMAKE_INSTALL_LIBDIR=lib \ - -DARROW_COMPUTE=ON \ - -DARROW_CSV=ON \ - -DARROW_DATASET=ON \ - -DARROW_EXTRA_ERROR_CONTEXT=ON \ - -DARROW_FILESYSTEM=ON \ - -DARROW_MIMALLOC=ON \ - -DARROW_JSON=ON \ - -DARROW_PARQUET=ON \ - -DARROW_WITH_SNAPPY=OFF \ - -DARROW_WITH_ZLIB=ON \ - .. -``` - #### {-} `..` refers to the C++ source directory: you're in `cpp/build` and the source is in `cpp`. -**For Windows**: some options, including `-DARROW_JEMALLOC`, are not supported on Windows. - - ```{bash, save=run & !sys_install, hide=TRUE} # For testing purposes, build with only shared libraries cmake \