Skip to content

Releases: pola-rs/r-polars

lib-v0.39.1

16 Apr 11:47
bb234ed
Compare
Choose a tag to compare
lib-v0.39.1 Pre-release
Pre-release
fix: `$len()` should also count `null` values (#1044)

v0.16.0

15 Apr 14:08
Compare
Choose a tag to compare

Breaking changes

  • Rust polars is updated to 0.39.0 (#937, #1034).

  • R objects inside an R list are now converted to Polars data types via
    as_polars_series() (#1021, #1022, #1023). For example, up to polars 0.15.1,
    a list containing a data.frame with a column of {clock} naive-time class
    was converted to a nested List type of Float64:

    data = data.frame(time = clock::naive_time_parse("1990-01-01", precision = "day"))
    pl$select(
      nested_data = pl$lit(list(data))
    )
    #> shape: (1, 1)
    #> β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    #> β”‚ nested_data              β”‚
    #> β”‚ ---                      β”‚
    #> β”‚ list[list[list[f64]]]    β”‚
    #> β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•‘
    #> β”‚ [[[2.1475e9], [7305.0]]] β”‚
    #> β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

    From 0.16.0, nested types are correctly converted, so that will be
    a List type of Struct type containing a Datetime type.

    data = data.frame(time = clock::naive_time_parse("1990-01-01", precision = "day"))
    pl$select(
      nested_data = pl$lit(list(data))
    )
    #> shape: (1, 1)
    #> β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    #> β”‚ nested_data             β”‚
    #> β”‚ ---                     β”‚
    #> β”‚ list[struct[1]]         β”‚
    #> β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•‘
    #> β”‚ [{1990-01-01 00:00:00}] β”‚
    #> β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  • Several functions have been rewritten to match the behavior of Python Polars.
    There are four types of changes: i) change in argument names, ii) change in
    the way arguments are passed (named or by position), iii) arguments are removed,
    and iv) change in the default and accepted values. Those are addressed separately
    below.

    1. Change in argument names:

      • In $reshape(), the dims argument is renamed to dimensions (#1019).
      • In pl$read_* and pl$scan_* functions, the first argument is now
        source (#935).
      • In pl$Series(), the argument x is renamed values (#933).
      • In <DataFrame>$write_* functions, the first argument is now file (#935).
      • In <LazyFrame>$sink_* functions, the first argument is now path (#935).
      • In <LazyFrame>$sink_ipc(), the argument memmap is renamed to memory_map (#1032).
      • In <DataFrame>$rolling(), <LazyFrame>$rolling(), <DataFrame>$group_by_dynamic()
        and <LazyFrame>$group_by_dynamic(), the by argument is renamed to
        group_by (#983).
      • In $dt$convert_time_zone() and $dt$replace_time_zone(), the tz
        argument is renamed to time_zone (#944).
      • In $str$strptime(), the argument datatype is renamed to dtype (#939).
      • In $str$to_integer() (renamed from $str$parse_int()), argument radix is
        renamed to base (#1038).
    2. Change in the way arguments are passed:

      • In all input/output functions, all arguments except the first argument
        must be named arguments (#935).

      • In <DataFrame>$rolling() and <DataFrame>$group_by_dynamic(), all
        arguments except index_column must be named arguments (#983).

      • In $unique() for DataFrame and LazyFrame, arguments keep and
        maintain_order must be named (#953).

      • In $bin$decode(), the strict argument must be a named argument (#980).

      • In $dt$replace_time_zone(), all arguments except time_zone must be named
        arguments (#944).

      • In $str$contains(), the arguments literal and strict must be named
        (#982).

      • In $str$contains_any(), the ascii_case_insensitive argument must be
        named (#986).

      • In $str$count_matches(), $str$replace() and $str$replace_all(),
        the literal argument must be named (#987).

      • In $str$strptime(), $str$to_date(), $str$to_datetime(), and
        $str$to_time(), all arguments (except the first one) must be named (#939).

      • In $str$to_integer() (renamed from $str$parse_int()), all arguments
        must be named (#1038).

      • In pl$date_range(), the arguments closed, time_unit, and time_zone
        must be named (#950).

      • In $set_sorted() and $sort_by(), argument descending must be named
        (#1034).

      • In pl$Series(), using positional arguments throws a warning, since the
        argument positions will be changed in the future (#966).

        # polars 0.15.1 or earlier
        # The first argument is `x`, the second argument is `name`.
        pl$Series(1:3, "foo")
        
        # The code above will warn in 0.16.0
        # Use named arguments to silence the warning.
        pl$Series(values = 1:3, name = "foo")
        pl$Series(name = "foo", values = 1:3)
        
        # polars 0.17.0 or later (future version)
        # The first argument is `name`, the second argument is `values`.
        pl$Series("foo", 1:3)

        This warning can also be silenced by replacing pl$Series(<values>, <name>)
        by as_polars_series(<values>, <name>).

    3. Arguments removed:

      • The argument columns in $drop() is removed. $drop() now accepts
        several character scalars, such as $drop("a", "b", "c") (#912).
      • In pl$col(), the name argument is removed, and the ... argument no
        longer accepts a list of characters and RPolarsSeries class objects (#923).
      • In pl$date_range(), the unused argument (not working in recent versions)
        explode is removed. (#950).
    4. Change in arguments default and accepted values:

      • In pl$Series(), the argument values has a new default value NULL
        (#966).
      • In $unique() for DataFrame and LazyFrame, argument keep has a new
        default value "any" (#953).
      • In rolling aggregation functions (such as $rolling_mean()), the default
        value of argument closed now is NULL. Using closed with a fixed
        window_size now throws an error (#937).
      • In pl$date_range(), the argument end must be specified and the default
        value of interval is changed to "1d". The arguments start and end
        no longer accept numeric values (#950).
      • In pl$scan_parquet(), the default value of the argument rechunk is
        changed from TRUE to FALSE (#1033).
      • In pl$scan_parquet() and pl$read_parquet(), the argument parallel
        only accepts "auto", "columns", "row_groups", and "none".
        Previously, it also accepted upper-case notation of "auto", "columns",
        "none", and "RowGroups" instead of "row_groups" (#1033).
      • In $str$to_integer() (renamed from $str$parse_int()), the default
        value of base is changed from 2 to 10 (#1038).
  • The usage of pl$date_range() to create a range of Datetime data type is
    deprecated. pl$date_range() will always create a range of Date data type
    in the future. Use pl$datetime_range() if you want to create a range of
    Datetime instead (#950).

  • <DataFrame>$get_columns() now returns an unnamed list instead of a named
    list (#991).

  • Removed $argsort() which was an old alias for $arg_sort() (#930).

  • Removed pl$expr_to_r() which was an alias for $to_r() (#938).

  • <Series>$to_r_list() is renamed <Series>$to_list() (#938).

  • Removed <Series>$to_r_vector() which was an old alias for
    <Series>$to_vector() (#938).

  • Removed <Expr>$rep_extend(), which was an experimental method created at the
    early stage of this package and does not exist in other language APIs (#1028).

  • The following deprecated functions are now removed: pl$threadpool_size(),
    <DataFrame>$with_row_count(), <LazyFrame>$with_row_count() (#965).

  • In $group_by_dynamic(), the first datapoint is always preserved (#1034).

  • $str$parse_int() is renamed to $str$to_integer() (#1038).

New features

  • New functions:

    • pl$arg_sort_by() (#929).
    • pl$arg_where() to get the indices that match a condition (#922).
    • pl$datetime(), pl$date(), and pl$time() to easily create Expr of class
      datetime, date, and time via columns and literals (#918).
    • pl$datetime_range(), pl$date_ranges() and pl$datetime_ranges() (#950, #962).
    • pl$int_range() and pl$int_ranges() (#968)
    • pl$mean_horizontal() (#959)
    • pl$read_ipc() (#1033).
    • is_polars_dtype() (#927).
  • New methods:

    • <LazyFrame>$to_dot() to print the query plan of a LazyFrame with graphviz
      dot syntax (#928).
    • $clear() for DataFrame, LazyFrame, and Series (#1004).
    • $item() for DataFrame and Series (#992).
    • $select_seq() and $with_columns_seq() for DataFrame and LazyFrame
      (#1003).
    • $arr$to_list() (#1018).
    • $str$extract_groups() (#979).
    • $str$find() (#985).
    • <DataFrame>$write_ipc() (#1032).
    • RPolarsDataType gains several methods to check the datatype, such as
      $is_integer(), $is_null() or $is_list() (#1036).
  • New arguments or argument values:

    • ambiguous can now take the value "null" to convert ambigous datetimes to
      null values (#937).
    • n in $str$replace() (#987).
    • non_existent in $dt$replace_time_zone() to specify what should happen
      when a datetime doesn't exist.
    • mapping_strategy in $over() (#984, #988).
    • raise_if_undetermined in $meta$output_name() (#961).
    • null_on_oob in $arr$get() and $list$get() to determine what happens
      when the index is out of bounds (#1034).
    • nulls_last, multithreaded, and maintain_order in $sort_by() (#1034).
  • Other:

    • pl$Series() now calls as_polars_series() internally, so it can convert
      more classes to Series properly (#1015).
    • Export the Duration datatype (#955).
    • New active binding <Series>$struct$fields (#1002).
    • All $write_*() and $sink_*() functions now invisibly return the input
      data (#1039).

Bug fixes

  • The join_nulls and ...
Read more

lib-v0.39.0

15 Apr 12:21
7cbffaf
Compare
Choose a tag to compare
lib-v0.39.0 Pre-release
Pre-release
refactor!: `$str$parse_int()` -> `$str$to_integer()` (#1038)

Co-authored-by: Etienne Bacher <[email protected]>

v0.15.1

11 Mar 15:16
Compare
Choose a tag to compare

New features

  • rust-polars is updated to 0.38.2 (#907).
    • Minimum supported Rust version (MSRV) is now 1.76.0.
  • as_polars_df(<nanoarrow_array>) is added (#893).
  • It is now possible to create an empty DataFrame with a specific schema with pl$DataFrame(schema = my_schema) (#901).
  • New arguments dtype and nan_to_null for pl$Series() (#902).
  • New method <DataFrame>$partition_by() (#898).

Bug fixes

  • The default value of the format of $str$strptime() is now correctly set (#892).

Other improvements

  • Performance of as_polars_df(<nanoarrow_array_stream>) is improved (#896).

Full Changelog: v0.15.0...v0.15.1

lib-v0.38.1

11 Mar 14:07
2b3a001
Compare
Choose a tag to compare
lib-v0.38.1 Pre-release
Pre-release
feat: bump polars to 0.38.2 (#907)

Co-authored-by: Etienne Bacher <[email protected]>

v0.15.0

03 Mar 11:45
Compare
Choose a tag to compare

Breaking changes due to Rust-polars update

  • rust-polars is updated to 0.38.1 (#865, #872).
    • in $pivot(), arguments aggregate_function, maintain_order, sort_columns and separator must be named. Values that are passed by position are ignored.
    • in $describe(), the name of the first column changed from "describe" to "statistic".
    • $mod() methods and %% works correctly to guarantee x == (x %% y) + y * (x %/% y).

Other breaking changes

  • Removed as.list() for class RPolarsExpr as it is a simple wrapper around list() (#843).

  • Several functions have been rewritten to match the behavior of Python Polars.

    • pl$col(...) requires at least one argument. (#852)
    • pl$head(), pl$tail(), pl$count(), pl$first(), pl$last(), pl$max(), pl$min(), pl$mean(), pl$media(), pl$std(), pl$sum(), pl$var(), pl$n_unique(), and pl$approx_n_unique() are syntactic sugar for pl$col(...)$<method()>. The argument ... now only accepts characters, that are either column names or regular expressions (#852).
    • There is no argument for pl$len(). If you want to measure the length of specific columns, you should use pl$count(...) (#852).
    • <Expr>$str$concat() method's delimiter argument's default value is changed from "-" to "" (#853).
    • <Expr>$str$concat() method's ignore_nulls argument must be a named argument (#853).
    • pl$Datetime()'s arguments are renamed: tu to time_unit, and tz to time_zone (#887).
  • pl$Categorical() has been improved to allow specifying the ordering type (either lexical or physical). This also means that calling pl$Categorical doesn't create a DataType anymore. All calls to pl$Categorical must be replaced by pl$Categorical() (#860).

  • <Series>$rem() is removed. Use <Series>$mod() instead (#886).

  • The conversion strategy between the POSIXct type without time zone attribute and Polars datetime has been changed (#878). POSIXct class vectors without a time zone attribute have UTC time internally and is displayed based on the system's time zone. Previous versions of polars only considered the internal value and interpreted it as UTC time, so the time displayed as POSIXct and in Polars was different.

    # polars 0.14.1
    Sys.setenv(TZ = "Europe/Paris")
    datetime = as.POSIXct("1900-01-01")
    datetime
    #> [1] "1900-01-01 PMT"
    
    s = polars::as_polars_series(datetime)
    s
    #> polars Series: shape: (1,)
    #> Series: '' [datetime[ms]]
    #> [
    #>  1899-12-31 23:50:39
    #> ]
    
    as.vector(s)
    #> [1] "1900-01-01 PMT"

    Now the internal value is updated to match the displayed value.

    # polars 0.15.0
    Sys.setenv(TZ = "Europe/Paris")
    datetime = as.POSIXct("1900-01-01")
    datetime
    #> [1] "1900-01-01 PMT"
    
    s = polars::as_polars_series(datetime)
    s
    #> polars Series: shape: (1,)
    #> Series: '' [datetime[ms]]
    #> [
    #>  1900-01-01 00:00:00
    #> ]
    
    as.vector(s)
    #> [1] "1900-01-01 PMT"

    This update may cause errors when converting from Polars to POSIXct for non-existent or ambiguous times. It is recommended to explicitly add a time zone before converting from Polars to R.

    Sys.setenv(TZ = "America/New_York")
    ambiguous_time = as.POSIXct("2020-11-01 01:00:00")
    ambiguous_time
    #> [1] "2020-11-01 01:00:00 EDT"
    
    pls = polars::as_polars_series(ambiguous_time)
    pls
    #> polars Series: shape: (1,)
    #> Series: '' [datetime[ms]]
    #> [
    #>  2020-11-01 01:00:00
    #> ]
    
    ## This will be error!
    # pls |> as.vector()
    
    pls$dt$replace_time_zone("UTC") |> as.vector()
    #> [1] "2020-11-01 01:00:00 UTC"
  • Removed argument eager in pl$date_range() and pl$struct() for more consistency of output. It is possible to replace eager = TRUE by calling $to_series() (#882).

New features

  • In the when-then-otherwise expressions, the last $otherwise() is now optional, as in Python Polars. If $otherwise() is not specified, rows that don't respect the condition set in $when() will be filled with null (#836).
  • <DataFrame>$head() and <DataFrame>$tail() methods now support negative row numbers (#840).
  • $group_by() now works with named expressions (#846).
  • New methods for the arr subnamespace: $median(), $var(), $std(), $shift(), $to_struct() (#867).
  • $min() and max() now work on categorical variables (#868).
  • New methods for the list subnamespace: $n_unique(), $gather_every() (#869).
  • Converts clock_time_point and clock_zoned_time objects from the {clock} package to Polars datetime type (#861).
  • New methods for the name subnamespace: $prefix_fields() and suffix_fields() (#873).
  • pl$Datetime()'s time_zone argument now accepts "*" to match any time zone (#887).

Bug fixes

  • R no longer crashes when calling an invalid Polars object that points to a null pointer (#874). This was occurring, such as when a Polars object was saved in an RDS file and loaded from another session.

New Contributors

Full Changelog: v0.14.1...v0.15.0

lib-v0.38.0

03 Mar 09:27
1156dd8
Compare
Choose a tag to compare
lib-v0.38.0 Pre-release
Pre-release
docs(news): move old changelog to the NEWS.0.md file (#885)

v0.14.1

23 Feb 13:09
Compare
Choose a tag to compare

Breaking changes

  • Since most of the methods of Expr are now available for Series, the experimental <Series>$expr subnamespace is removed (#831). Use <Series>$<method> instead of <Series>$expr$<method>.

New features

  • New active bindings $flags for DataFrame to show the flags used internally for each column. The output of $flags for Series was also improved and now contains FAST_EXPLODE for Series of type list and array (#809).
  • Most of Expr methods are also available for Series (#819, #828, #831).
  • as_polars_df() for data.frame is more memory-efficient and new arguments schema and schema_overrides are added (#817).
  • Use polars_code_completion_activate() to enable code suggestions and autocompletion after $ on polars objects. This is an experimental feature that is disabled by default. For now, it is only supported in the native R terminal and in RStudio (#597).

Bug fixes

  • <Series>$list sub namespace methods returns Series class object correctly (#819).

lib-v0.37.1

18 Feb 12:20
Compare
Choose a tag to compare
lib-v0.37.1 Pre-release
Pre-release
ci: fix migration to actions/download-artifact@v4

v0.14.0

12 Feb 07:27
Compare
Choose a tag to compare

Breaking changes due to Rust-polars update

  • rust-polars is updated to 0.37.0 (#776).
    • Minimum supported Rust version (MSRV) is now 1.74.1.
    • $with_row_count() for DataFrame and LazyFrame is deprecated and will be removed in 0.15.0. It is replaced by $with_row_index().
    • pl$count() is deprecated and will be removed in 0.15.0. It is replaced by pl$len().
    • $explode() for DataFrame and LazyFrame doesn't work anymore on string columns.
    • $list$join() and pl$concat_str() gain an argument ignore_nulls. The current behavior is to return a null if the row contains any null. Setting ignore_nulls = TRUE changes that.
    • All row_count_* args in reading/scanning functions are renamed row_index_*.
    • $sort() for Series gains an argument nulls_last.
    • $str$extract() and $str$zfill() now accept an Expr and parse strings as column names. Use pl$lit() to recover the old behavior.
    • $cum_count() now starts from 1 instead of 0.

Other breaking changes

  • The simd feature of the Rust library is removed in favor of the new nightly feature (#800). If you specified simd via the LIBR_POLARS_FEATURES environment variable during source installations, please use nightly instead; there is no change if you specified full_features because it now contains nightly instead of simd.
  • The following functions were deprecated in 0.13.0 and are now removed (#783):
    • $list$lengths() -> $list$len()
    • pl$from_arrow() -> as_polars_df() or as_polars_series()
    • pl$set_options() and pl$reset_options() -> polars_options()
  • $is_between() had several changes (#788):
    • arguments start and end are renamed lower_bound and upper_bound. Their behaviour doesn't change.
    • include_bounds is renamed closed and must be one of "left", "right", "both", or "none".
  • polars_info() returns a slightly changed list.
    • $threadpool_size, which means the number of threads used by Polars, is changed to $thread_pool_size (#784)
    • $version, which indicates the version of this package, is changed to $versions$r_package (#791).
    • $rust_polars, which indicates the version of the dependent Rust Polars, is changed to $versions$rust_crate (#791).
  • New behavior when creating a DataFrame with a single list-variable. pl$DataFrame(x = list(1:2, 3:4)) used to create a DataFrame with two columns named "new_column" and "new_column_1", which was unexpected. It now produces a DataFrame with a single list variable. This also applies to list-column created in $with_columns() and $select() (#794).

Deprecations

  • pl$threadpool_size() is deprecated and will be removed in 0.15.0. Use pl$thread_pool_size() instead (#784).

New features

  • Implementation of the subnamespace $arr for expressions on array-type columns. An array column is similar to a list column, but is stricter as each sub-array must have the same number of elements (#790).

Other improvements

  • The sql feature is included in the default feature (#800). This means that functionality related to the RPolarsSQLContext class is now always included in the binary package.