Skip to content

Releases: pola-rs/r-polars

v0.18.0

05 Jul 15:02
Compare
Choose a tag to compare

Breaking changes

  • Updated rust-polars to 0.41.3 (#1147, #1156).
  • In $n_chunks(), the default value of strategy now is "first" (#1137).
  • $sample() for Expr and DataFrame (#1136):
    • the argument frac is renamed fraction;
    • all the arguments except n must be named;
    • for the Expr method only, the first argument is now n (it was already the
      case for the DataFrame method);
    • for the Expr method only, the default value for with_replacement is now
      FALSE (it was already the case for the DataFrame method).
  • $melt() had several changes (#1147):
    • melt() is renamed $unpivot().
    • Some arguments were renamed: id_vars is now index, value_vars is now
      on.
    • The order of arguments has changed: on is now first, then index. The
      order of the other arguments hasn't changed. Note that on can be unnamed
      but all the other arguments must be named.
  • pivot() had several changes (#1147):
    • The argument columns is renamed on.
    • The order of arguments has changed: on is now first, then index and
      values. The order of the other arguments hasn't changed. Note that on
      can be unnamed but all the other arguments must be named.
  • In $write_parquet() and $sink_parquet(), the default value of argument
    statistics is now TRUE and can take other values than TRUE/FALSE (#1147).
  • In $dt$truncate() and $dt$round(), the argument offset has been removed.
    Use $dt$offset_by() after those functions instead (#1147).
  • In $top_k() and $bottom_k() for Expr, the arguments nulls_last,
    maintain_order and multithreaded have been removed. If any null values
    are in the top/bottom k values, they will always be positioned last (#1147).
  • $replace() has been split in two functions depending on the desired
    behaviour (#1147):
    • $replace() recodes some values in the column, leaving all other values
      unchanged. Compared to the previous version, it doesn't use the arguments
      default and return_dtype anymore.
    • $replace_strict() replaces all values by different values. If a value
      doesn't have a specific mapping, it is replaced by the default value.
  • $str$concat() is deprecated, use $str$join() (with the same arguments)
    instead (#1147).
  • In pl$date_range() and pl$date_ranges(), the arguments time_unit and
    time_zone have been removed. They were deprecated in previous versions
    (#1147).
  • In $join(), when how = "cross", on, left_on and right_on must be
    NULL (#1147).

New features

  • New method $has_nulls() (#1133).
  • New method $list$explode() (#1139).
  • $over() gains a new argument order_by to specify the order of values
    within each group. This is useful when the operation depends on the order of
    values, such as $shift() (#1147).
  • $value_counts() gains an argument normalize to give relative frequencies
    of unique values instead of their count (#1147).

New Contributors

Full Changelog: v0.17.0...v0.18.0

lib-v0.41.0

05 Jul 14:18
d67c57d
Compare
Choose a tag to compare
lib-v0.41.0 Pre-release
Pre-release
test: tempolary disable the test of `pl$mem_address` (#1161)

v0.17.0

04 Jun 03:22
Compare
Choose a tag to compare

Breaking changes

  • Updated rust-polars to unreleased version (> 0.40.0) (#1104, #1110, #1117, #1124):
    • In $join(), there is a new argument coalesce and the how options now accept "full" instead of "outer" and "outer_coalesce".
    • $top_k() and $bottom_k() gain three arguments nulls_last, maintain_order and multithreaded.
    • All $rolling_*() functions lose the arguments by, closed and warn_if_unsorted. Rolling computations based on by must be made via the corresponding rolling_*_by(), e.g rolling_mean_by() instead of rolling_mean(by =) (#1115).
    • pl$scan_parquet() and pl$read_parquet() gain an argument glob which defaults to TRUE. Set it to FALSE to avoid considering * as a globing pattern.
    • $is_not_nan() on a null value (NA in R) now returns null. Previously, it returned TRUE.
    • In $reshape(), argument dims is renamed dimensions and there is a new argument nested_type specifying if the output should be of type List or Array.
    • In $value_counts(), all arguments must be named and there is a new argument name to specify the name of the output.
    • In all functions accepting optimization parameter (such as projection_pushdown), there is a new parameter cluster_with_columns to combine sequential independent calls to $with_columns().
    • $str$explode() is removed.
    • The check_sorted argument is removed from $rolling() and $group_by_dynamic(). Sortedness is now verified in a quick manner, so this argument is no longer needed (pola-rs/polars#16494).
    • $name$map() stacks on Linux, so this method is deprecated and the document is removed. Please use other methods like <LazyFrame>$rename(<function>) instead (#1123).
  • As warned in v0.16.0, the order of arguments in pl$Series is changed (#1071). The first argument is now name, and the second argument is values.
  • $to_struct() on an Expr is removed. This method is now only available for Series, DataFrame, and in the $list and $arr subnamespaces. For example, pl$col("a", "b", "c")$to_struct() should be replaced with pl$struct(c("a", "b", "c")) (#1092).
  • pl$Struct() now only accepts named inputs and objects of class RPolarsField. For example, pl$Struct(pl$Boolean) doesn't work anymore and should be named like pl$Struct(a = pl$Boolean) (#1053).
  • In $all() and $any(), the argument drop_nulls is renamed ignore_nulls, and this argument must be named (#1050).
  • New method $struct$with_fields() (#1109) and new function pl$field() to be used in expressions in $struct$with_fields() (#1113).
  • New methods for RPolarsDataType: $is_enum(), $is_categorical(), $is_known(), $is_string(), $contains_views(), $contains_categorical() (#1112).
  • In $dt$combine(), the arguments tm and tu are renamed time and time_unit (#1116).
  • The default value of the rechunk argument of pl$concat() is changed from TRUE to FALSE (#1125).
  • In $rename() for LazyFrame and DataFrame, key-value pairs of names are changed to old_name = "new_name" instead of new_name = "old_name" (#1129).
  • In $rename() for LazyFrame and DataFrame, no argument is not allowed (#1129).
  • In all $rolling_*() functions, the arguments center and ddof must be named (#1115).

New features

  • Allow specify a function in $rename() for LazyFrame and DataFrame. They are equivalent to polars.LazyFrame.rename(mapping: Callable[[str], str]) or polars.DataFrame.rename(mapping: Callable[[str], str]) in Python Polars (#1122, #1129).

Full Changelog: v0.16.4...v0.17.0

lib-v0.40.0

03 Jun 22:56
3e3eece
Compare
Choose a tag to compare
lib-v0.40.0 Pre-release
Pre-release
Add `$rolling_*_by()` expressions (#1115)

Co-authored-by: eitsupi <[email protected]>

v0.16.4

08 May 15:41
Compare
Choose a tag to compare

New features

  • pl$read_ipc() can read a raw vector of Apache Arrow IPC file (#1072).
  • New method <DataFrame>$to_raw_ipc() to serialize a DataFrame to a raw vector of Apache Arrow IPC file format (#1072).
  • New method <LazyFrame>$serialize() to serialize a LazyFrame to a character vector of JSON representation (#1073).
  • New function pl$deserialize_lf() to deserialize a LazyFrame from a character vector of JSON representation (#1073).
  • New methods $str$head() and $str$tail() (#1074).
  • New S3 methods nanoarrow::as_nanoarrow_array_stream() and nanoarrow::infer_nanoarrow_schema() for RPolarsSeries (#1076).
  • New method $dt$is_leap_year() (#1077).
  • as_polars_df() and as_polars_series() supports arrow::RecordBatchReader (#1078).
  • The new experimental argument for as_polars_df(<ArrowTabular>), as_polars_df(<RecordBatchReader>), as_polars_series(<nanoarrow_array_stream>), and as_polars_df(<nanoarrow_array_stream>) (#1078).
    If experimental = TRUE, these functions switch to use the Arrow C stream interface internally.
    At this point, the performance is degraded under the expected use cases, so the default is set to experimental = FALSE.

Full Changelog: v0.16.3...v0.16.4

lib-v0.39.3

08 May 14:31
b42ee0a
Compare
Choose a tag to compare
lib-v0.39.3 Pre-release
Pre-release
feat: import_stream internal method for Series to support Arrow C strโ€ฆ

v0.16.3

03 May 05:31
Compare
Choose a tag to compare

New features

  • New method <SQLContext>$register_globals() (#1064).
  • New experimental method $sql() for DataFrame and LazyFrame (#1065).

Miscellaneous

  • Move the API document website to the new place (#1067, #1068).
    Access to the old website is set to redirect to the top page of the new website.
    • Old URL: https://rpolars.github.io/
    • New URL: https://pola-rs.github.io/r-polars/

Full Changelog: v0.16.2...v0.16.3

v0.16.2

27 Apr 05:57
Compare
Choose a tag to compare

New features

  • $cut() and $qcut() to bin continuous values into discrete categories (#1057).
  • pl$scan_parquet() and pl$read_parquet() can read data from the internet by specifying a URL to the first argument (#1056, @andyquinterom).
  • pl$scan_parquet() and pl$read_parquet() gain an argument storage_options to scan/read data via cloud storage providers (GCP, AWS, Azure). Note that this support is experimental (#1056, @andyquinterom).
  • Add support for the Enum datatype via pl$Enum() (#1061).

Bug fixes

  • In some read/scan functions, downloading files could fail if the URL was too long. This is now fixed (#1049, @DyfanJones).

New Contributors

Full Changelog: v0.16.1...v0.16.2

lib-v0.39.2

27 Apr 04:27
4a81740
Compare
Choose a tag to compare
lib-v0.39.2 Pre-release
Pre-release
ci: exclude R devel on windows from binary library check step (#1062)

v0.16.1

16 Apr 13:21
Compare
Choose a tag to compare

This is a small hot-fix release to update dependent Rust polars to 0.39.1 (#1042).

Also, there are some updates.

Bug fixes

  • $len() now correctly includes null values in the count (#1044).

Other improvements

  • $arr$max() and $arr$min() work without the nightly feature (#1042).

Full Changelog: v0.16.0...v0.16.1