To install the development version of Polars or develop new features, you must install some tools outside of R.
- rustup, the cross-platform Rust installer.
- The nightly Rust toolchain (required version is recorded in the
DESCRIPTION
file).- On Windows, GNU toolchain is required.
For example,
rustup toolchain install nightly-2021-10-12-gnu
.
- On Windows, GNU toolchain is required.
For example,
- Windows: Make sure the latest version of Rtools is installed and on your PATH.
- macOS: Make sure Xcode is installed.
- Install CMake and add it to your PATH.
- If generate the website locally, please install Python with venv and Quarto CLI in your PATH.
- Install Task, used as a task runner.
Note that the Taskfile.yml
in the root directory of the repository provides some
useful commands (e.g. task setup-dev
to install the required version of
Rust toolchain dependent R packages, and Python virtual environment).
If you have access to a Dev Container execution environment such as GitHub Codespaces, you can work within a container that contains all of the above tools.
About Rust code for R packages, see also
the hellorust
package documentation.
Here are the steps required for an example contribution, where we are implementing the cosine expression:
- Look up the polars.Expr.cos method in py-polars documentation.
- Press the
[source]
button to see the Python implementation - Find the cos py-polars rust implementation (likely just a simple call to the Rust-Polars API)
- Adapt the Rust part and place it here.
- Adapt the Python frontend syntax to R and place it here. Add the roxygen docs + examples above.
- Notice we use
Expr_cos = "use_extendr_wrapper"
, it means we're just using unmodified the extendr auto-generated wrapper - Write a test here.
- Run
rextendr::document()
to recompile and confirm the added method functions as intended, e.g.
pl$DataFrame(a = c(0, pi/2, pi, NA_real_))$select(pl$col("a")$cos())
- Run
devtools::test()
. See below for how to set up your development environment correctly.
There are some wildlife examples of implementations via GitHub Pull Requests:
- Implementing the
$peak_min()
and$peak_max()
methods for theExpr
class: #462 - Implementing the
RPolarsSQLContext
class and related functions: #457
Each class object's methods are defined in multiple source files as follows:
- Methods implemented in Rust structs are written in the
extendr-wrappers.R
file automatically generated by extendr. (e.g.RPolarsSQLContext$execute
) - The methods in
extendr-wrappers.R
are moved to the.pr
object by the processing inafter-wrappers.R
. (e.g..pr$SQLContext$execute
) - The
zzz.R
file (named zzz to be last file sourced) replaces the methods inextendr-wrappers.R
with functions prefixed with the class name and deletes the original methods. (e.g.SQLContext_execute
is replaced withpl$SQLContext$execute
)
This package uses the roxygen2
package to generate Rd files.
Note that, some S3 methods exported in the zzz.R
file are not recognized as S3
methods by roxygen2 if the suggested package is not loaded.
For example, to generate the documentation for the
nanoarrow::as_nanoarrow_array_stream.RPolarsDataFrame
function, the nanoarrow
package must be installed and loaded.
s3_register("nanoarrow::as_nanoarrow_array_stream", "RPolarsDataFrame")
If not loaded, the Rd file will be generated as a normal function. This is not intended, so please do not commit the updated Rd file.
When updating the Rust Polars crate that the R package depends on, the following steps are required:
- Since the version of the Polars crate is specified by the Git revision,
update the
rev
of allpolars-*
crates in thesrc/rust/Cargo.toml
file. - Update the
Config/polars/RustToolchainVersion
field in theDESCRIPTION
file to the version of the Rust toolchain specified in thetoolchain.channel
field of therust-toolchain.toml
file in the Polars crate Git repository. - Update the toolchain to the version specified in the
DESCRIPTION
file. - Repeat the build, test, and bug fixes of the R package.
After finishing the editing of the Rust library before the R package release, create a library release to GitHub.
Please push a tag (requires write access to the repository) named starting with
lib-v
(e.g. lib-v0.35.0
, 0.35.0
is matched against the version number in
the src/rust/Cargo.toml
file). This triggers the GitHub action to build the
libraries for all platforms and upload them to the release.
The version number of the Rust library is only used for compatibility with the
R package, so any version number different from the previous ones are fine.
Though, it is recommended to use the same major / minor version number as
the polars
crate (rust-polars) to consistency.
After creating the release, run the dev/generate-lib-sums.R
script to generate
tools/lib-sums.tsv
, which is used to download the binaries during the source R
package installation process:
task build-lib-sums
or
Rscript dev/generate-lib-sums.R
The R package releases are done on GitHub pull requests.
- Create a local branch for the release, push it to the remote repository (main
repository), then open a pull request to the
main
branch. - Bump the R package version with the
usethis
package.
usethis::use_version()
# Please choose `major`, `minor` or `patch`
- Check the CI status of the pull request.
- Push a tag named starting with
v
(e.g.v0.10.0
). It triggers the GitHub action to build the website and create a GitHub release. - Bump the R package version to "dev version" with the
usethis
package before merging the pull request.
usethis::use_dev_version()
If you experience unexpected sluggish performance, when using polars in a given
IDE, we'd like to hear about it. You can try to activate
options(polars.debug_polars = TRUE)
to profile what methods are being touched
(not necessarily run) and how fast. Below is an example of good behavior.
library(polars)
pl$set_options(debug_polars = TRUE)
pl$DataFrame(iris)$select("Species")
#> [TIME? ms]
#> pl$DataFrame() -> [3.257ms]
#> pl$lit() -> [2.721ms]
#> pl$Series() -> [0.2244ms]
#> .pr$RPolarsSeries$new() -> [5.901ms]
#> RPolarsExpr$alias() -> [20.62ms]
#> pl$lit() -> [0.4537ms]
#> pl$Series() -> [0.1681ms]
#> .pr$RPolarsSeries$new() -> [0.4008ms]
#> RPolarsExpr$alias() -> [0.3057ms]
#> pl$lit() -> [0.2573ms]
#> pl$Series() -> [0.1891ms]
#> .pr$RPolarsSeries$new() -> [0.3707ms]
#> RPolarsExpr$alias() -> [0.2408ms]
#> pl$lit() -> [0.3285ms]
#> pl$Series() -> [0.1342ms]
#> .pr$RPolarsSeries$new() -> [0.2878ms]
#> RPolarsExpr$alias() -> [0.2875ms]
#> pl$lit() -> [0.283ms]
#> pl$Series() -> [0.1855ms]
#> .pr$RPolarsSeries$new() -> [9.417ms]
#> RPolarsExpr$alias() -> [0.2825ms]
#> pl$select() -> [0.1724ms]
#> .pr$RPolarsDataFrame$select() -> [45.21ms]
#> RPolarsDataFrame$select() -> [0.2534ms]
#> .pr$RPolarsDataFrame$select() ->
#> [6.062ms]
#> RPolarsDataFrame$print() -> [0.2882ms]
#> .pr$RPolarsDataFrame$print() -> shape: (150, 1)
#> ┌───────────┐
#> │ Species │
#> │ --- │
#> │ cat │
#> ╞═══════════╡
#> │ setosa │
#> │ setosa │
#> │ setosa │
#> │ setosa │
#> │ … │
#> │ virginica │
#> │ virginica │
#> │ virginica │
#> │ virginica │
#> └───────────┘
To speed up the local rextendr::document() or R CMD check, run the following:
source("inst/misc/develop_polars.R")
#to rextendr:document() + not_cran + load packages + all_features
load_polars()
#to check package + reuses previous compilation in check, protects against deletion
check_polars() #assumes rust target at `paste0(getwd(),"/src/rust")`