Skip to content

v0.15.0

Compare
Choose a tag to compare
@eitsupi eitsupi released this 03 Mar 11:45

Breaking changes due to Rust-polars update

  • rust-polars is updated to 0.38.1 (#865, #872).
    • in $pivot(), arguments aggregate_function, maintain_order, sort_columns and separator must be named. Values that are passed by position are ignored.
    • in $describe(), the name of the first column changed from "describe" to "statistic".
    • $mod() methods and %% works correctly to guarantee x == (x %% y) + y * (x %/% y).

Other breaking changes

  • Removed as.list() for class RPolarsExpr as it is a simple wrapper around list() (#843).

  • Several functions have been rewritten to match the behavior of Python Polars.

    • pl$col(...) requires at least one argument. (#852)
    • pl$head(), pl$tail(), pl$count(), pl$first(), pl$last(), pl$max(), pl$min(), pl$mean(), pl$media(), pl$std(), pl$sum(), pl$var(), pl$n_unique(), and pl$approx_n_unique() are syntactic sugar for pl$col(...)$<method()>. The argument ... now only accepts characters, that are either column names or regular expressions (#852).
    • There is no argument for pl$len(). If you want to measure the length of specific columns, you should use pl$count(...) (#852).
    • <Expr>$str$concat() method's delimiter argument's default value is changed from "-" to "" (#853).
    • <Expr>$str$concat() method's ignore_nulls argument must be a named argument (#853).
    • pl$Datetime()'s arguments are renamed: tu to time_unit, and tz to time_zone (#887).
  • pl$Categorical() has been improved to allow specifying the ordering type (either lexical or physical). This also means that calling pl$Categorical doesn't create a DataType anymore. All calls to pl$Categorical must be replaced by pl$Categorical() (#860).

  • <Series>$rem() is removed. Use <Series>$mod() instead (#886).

  • The conversion strategy between the POSIXct type without time zone attribute and Polars datetime has been changed (#878). POSIXct class vectors without a time zone attribute have UTC time internally and is displayed based on the system's time zone. Previous versions of polars only considered the internal value and interpreted it as UTC time, so the time displayed as POSIXct and in Polars was different.

    # polars 0.14.1
    Sys.setenv(TZ = "Europe/Paris")
    datetime = as.POSIXct("1900-01-01")
    datetime
    #> [1] "1900-01-01 PMT"
    
    s = polars::as_polars_series(datetime)
    s
    #> polars Series: shape: (1,)
    #> Series: '' [datetime[ms]]
    #> [
    #>  1899-12-31 23:50:39
    #> ]
    
    as.vector(s)
    #> [1] "1900-01-01 PMT"

    Now the internal value is updated to match the displayed value.

    # polars 0.15.0
    Sys.setenv(TZ = "Europe/Paris")
    datetime = as.POSIXct("1900-01-01")
    datetime
    #> [1] "1900-01-01 PMT"
    
    s = polars::as_polars_series(datetime)
    s
    #> polars Series: shape: (1,)
    #> Series: '' [datetime[ms]]
    #> [
    #>  1900-01-01 00:00:00
    #> ]
    
    as.vector(s)
    #> [1] "1900-01-01 PMT"

    This update may cause errors when converting from Polars to POSIXct for non-existent or ambiguous times. It is recommended to explicitly add a time zone before converting from Polars to R.

    Sys.setenv(TZ = "America/New_York")
    ambiguous_time = as.POSIXct("2020-11-01 01:00:00")
    ambiguous_time
    #> [1] "2020-11-01 01:00:00 EDT"
    
    pls = polars::as_polars_series(ambiguous_time)
    pls
    #> polars Series: shape: (1,)
    #> Series: '' [datetime[ms]]
    #> [
    #>  2020-11-01 01:00:00
    #> ]
    
    ## This will be error!
    # pls |> as.vector()
    
    pls$dt$replace_time_zone("UTC") |> as.vector()
    #> [1] "2020-11-01 01:00:00 UTC"
  • Removed argument eager in pl$date_range() and pl$struct() for more consistency of output. It is possible to replace eager = TRUE by calling $to_series() (#882).

New features

  • In the when-then-otherwise expressions, the last $otherwise() is now optional, as in Python Polars. If $otherwise() is not specified, rows that don't respect the condition set in $when() will be filled with null (#836).
  • <DataFrame>$head() and <DataFrame>$tail() methods now support negative row numbers (#840).
  • $group_by() now works with named expressions (#846).
  • New methods for the arr subnamespace: $median(), $var(), $std(), $shift(), $to_struct() (#867).
  • $min() and max() now work on categorical variables (#868).
  • New methods for the list subnamespace: $n_unique(), $gather_every() (#869).
  • Converts clock_time_point and clock_zoned_time objects from the {clock} package to Polars datetime type (#861).
  • New methods for the name subnamespace: $prefix_fields() and suffix_fields() (#873).
  • pl$Datetime()'s time_zone argument now accepts "*" to match any time zone (#887).

Bug fixes

  • R no longer crashes when calling an invalid Polars object that points to a null pointer (#874). This was occurring, such as when a Polars object was saved in an RDS file and loaded from another session.

New Contributors

Full Changelog: v0.14.1...v0.15.0