Skip to content

Latest commit

 

History

History
225 lines (179 loc) · 9.21 KB

DEVELOPMENT.md

File metadata and controls

225 lines (179 loc) · 9.21 KB

Development

System requirements

To install the development version of Polars or develop new features, you must install some tools outside of R.

  • rustup, the cross-platform Rust installer.
  • The nightly Rust toolchain (required version is recorded in the DESCRIPTION file).
    • On Windows, GNU toolchain is required. For example, rustup toolchain install nightly-2021-10-12-gnu.
  • Windows: Make sure the latest version of Rtools is installed and on your PATH.
  • macOS: Make sure Xcode is installed.
  • Install CMake and add it to your PATH.
  • If generate the website locally, please install Python with venv and Quarto CLI in your PATH.
  • Install Task, used as a task runner.

Note that the Taskfile.yml in the root directory of the repository provides some useful commands (e.g. task setup-dev to install the required version of Rust toolchain dependent R packages, and Python virtual environment).

If you have access to a Dev Container execution environment such as GitHub Codespaces, you can work within a container that contains all of the above tools.

About Rust code for R packages, see also the hellorust package documentation.

Implementing new functions on the Rust side

Here are the steps required for an example contribution, where we are implementing the cosine expression:

  1. Look up the polars.Expr.cos method in py-polars documentation.
  2. Press the [source] button to see the Python implementation
  3. Find the cos py-polars rust implementation (likely just a simple call to the Rust-Polars API)
  4. Adapt the Rust part and place it here.
  5. Adapt the Python frontend syntax to R and place it here. Add the roxygen docs + examples above.
  6. Notice we use Expr_cos = "use_extendr_wrapper", it means we're just using unmodified the extendr auto-generated wrapper
  7. Write a test here.
  8. Run rextendr::document() to recompile and confirm the added method functions as intended, e.g.
pl$DataFrame(a = c(0, pi/2, pi, NA_real_))$select(pl$col("a")$cos())
  1. Run devtools::test(). See below for how to set up your development environment correctly.

There are some wildlife examples of implementations via GitHub Pull Requests:

  • Implementing the $peak_min() and $peak_max() methods for the Expr class: #462
  • Implementing the RPolarsSQLContext class and related functions: #457

Each class object's methods are defined in multiple source files as follows:

  • Methods implemented in Rust structs are written in the extendr-wrappers.R file automatically generated by extendr. (e.g. RPolarsSQLContext$execute)
  • The methods in extendr-wrappers.R are moved to the .pr object by the processing in after-wrappers.R. (e.g. .pr$SQLContext$execute)
  • The zzz.R file (named zzz to be last file sourced) replaces the methods in extendr-wrappers.R with functions prefixed with the class name and deletes the original methods. (e.g. SQLContext_execute is replaced with pl$SQLContext$execute)

API documentation

This package uses the roxygen2 package to generate Rd files.

Note that, some S3 methods exported in the zzz.R file are not recognized as S3 methods by roxygen2 if the suggested package is not loaded.

For example, to generate the documentation for the nanoarrow::as_nanoarrow_array_stream.RPolarsDataFrame function, the nanoarrow package must be installed and loaded.

s3_register("nanoarrow::as_nanoarrow_array_stream", "RPolarsDataFrame")

If not loaded, the Rd file will be generated as a normal function. This is not intended, so please do not commit the updated Rd file.

Updating Rust Polars

When updating the Rust Polars crate that the R package depends on, the following steps are required:

  1. Since the version of the Polars crate is specified by the Git revision, update the rev of all polars-* crates in the src/rust/Cargo.toml file.
  2. Update the Config/polars/RustToolchainVersion field in the DESCRIPTION file to the version of the Rust toolchain specified in the toolchain.channel field of the rust-toolchain.toml file in the Polars crate Git repository.
  3. Update the toolchain to the version specified in the DESCRIPTION file.
  4. Repeat the build, test, and bug fixes of the R package.

Release

Binary library release

After finishing the editing of the Rust library before the R package release, create a library release to GitHub.

Please push a tag (requires write access to the repository) named starting with lib-v (e.g. lib-v0.35.0, 0.35.0 is matched against the version number in the src/rust/Cargo.toml file). This triggers the GitHub action to build the libraries for all platforms and upload them to the release.

The version number of the Rust library is only used for compatibility with the R package, so any version number different from the previous ones are fine. Though, it is recommended to use the same major / minor version number as the polars crate (rust-polars) to consistency.

After creating the release, run the dev/generate-lib-sums.R script to generate tools/lib-sums.tsv, which is used to download the binaries during the source R package installation process:

task build-lib-sums

or

Rscript dev/generate-lib-sums.R

R package release

The R package releases are done on GitHub pull requests.

  1. Create a local branch for the release, push it to the remote repository (main repository), then open a pull request to the main branch.
  2. Bump the R package version with the usethis package.
usethis::use_version()
# Please choose `major`, `minor` or `patch`
  1. Check the CI status of the pull request.
  2. Push a tag named starting with v (e.g. v0.10.0). It triggers the GitHub action to build the website and create a GitHub release.
  3. Bump the R package version to "dev version" with the usethis package before merging the pull request.
usethis::use_dev_version()

Check the performance via debug mode

If you experience unexpected sluggish performance, when using polars in a given IDE, we'd like to hear about it. You can try to activate options(polars.debug_polars = TRUE) to profile what methods are being touched (not necessarily run) and how fast. Below is an example of good behavior.

library(polars)
pl$set_options(debug_polars = TRUE)
pl$DataFrame(iris)$select("Species")
#> [TIME? ms]
#> pl$DataFrame() -> [3.257ms]
#> pl$lit() -> [2.721ms]
#> pl$Series() -> [0.2244ms]
#>    .pr$RPolarsSeries$new() -> [5.901ms]
#> RPolarsExpr$alias() -> [20.62ms]
#> pl$lit() -> [0.4537ms]
#> pl$Series() -> [0.1681ms]
#>    .pr$RPolarsSeries$new() -> [0.4008ms]
#> RPolarsExpr$alias() -> [0.3057ms]
#> pl$lit() -> [0.2573ms]
#> pl$Series() -> [0.1891ms]
#>    .pr$RPolarsSeries$new() -> [0.3707ms]
#> RPolarsExpr$alias() -> [0.2408ms]
#> pl$lit() -> [0.3285ms]
#> pl$Series() -> [0.1342ms]
#>    .pr$RPolarsSeries$new() -> [0.2878ms]
#> RPolarsExpr$alias() -> [0.2875ms]
#> pl$lit() -> [0.283ms]
#> pl$Series() -> [0.1855ms]
#>    .pr$RPolarsSeries$new() -> [9.417ms]
#> RPolarsExpr$alias() -> [0.2825ms]
#> pl$select() -> [0.1724ms]
#>    .pr$RPolarsDataFrame$select() -> [45.21ms]
#> RPolarsDataFrame$select() -> [0.2534ms]
#>    .pr$RPolarsDataFrame$select() ->
#> [6.062ms]
#> RPolarsDataFrame$print() -> [0.2882ms]
#>    .pr$RPolarsDataFrame$print() -> shape: (150, 1)
#> ┌───────────┐
#> │ Species   │
#> │ ---       │
#> │ cat       │
#> ╞═══════════╡
#> │ setosa    │
#> │ setosa    │
#> │ setosa    │
#> │ setosa    │
#> │ …         │
#> │ virginica │
#> │ virginica │
#> │ virginica │
#> │ virginica │
#> └───────────┘

Other tips

To speed up the local rextendr::document() or R CMD check, run the following:

source("inst/misc/develop_polars.R")

#to rextendr:document() + not_cran + load packages + all_features
load_polars()

#to check package + reuses previous compilation in check, protects against deletion
check_polars() #assumes rust target at `paste0(getwd(),"/src/rust")`