Releases: pola-rs/r-polars
lib-v0.39.1
fix: `$len()` should also count `null` values (#1044)
v0.16.0
Breaking changes
-
R objects inside an R list are now converted to Polars data types via
as_polars_series()
(#1021, #1022, #1023). For example, up to polars 0.15.1,
a list containing a data.frame with a column of{clock}
naive-time class
was converted to a nested List type of Float64:data = data.frame(time = clock::naive_time_parse("1990-01-01", precision = "day")) pl$select( nested_data = pl$lit(list(data)) ) #> shape: (1, 1) #> ββββββββββββββββββββββββββββ #> β nested_data β #> β --- β #> β list[list[list[f64]]] β #> ββββββββββββββββββββββββββββ‘ #> β [[[2.1475e9], [7305.0]]] β #> ββββββββββββββββββββββββββββ
From 0.16.0, nested types are correctly converted, so that will be
a List type of Struct type containing a Datetime type.data = data.frame(time = clock::naive_time_parse("1990-01-01", precision = "day")) pl$select( nested_data = pl$lit(list(data)) ) #> shape: (1, 1) #> βββββββββββββββββββββββββββ #> β nested_data β #> β --- β #> β list[struct[1]] β #> βββββββββββββββββββββββββββ‘ #> β [{1990-01-01 00:00:00}] β #> βββββββββββββββββββββββββββ
-
Several functions have been rewritten to match the behavior of Python Polars.
There are four types of changes: i) change in argument names, ii) change in
the way arguments are passed (named or by position), iii) arguments are removed,
and iv) change in the default and accepted values. Those are addressed separately
below.-
Change in argument names:
- In
$reshape()
, thedims
argument is renamed todimensions
(#1019). - In
pl$read_*
andpl$scan_*
functions, the first argument is now
source
(#935). - In
pl$Series()
, the argumentx
is renamedvalues
(#933). - In
<DataFrame>$write_*
functions, the first argument is nowfile
(#935). - In
<LazyFrame>$sink_*
functions, the first argument is nowpath
(#935). - In
<LazyFrame>$sink_ipc()
, the argumentmemmap
is renamed tomemory_map
(#1032). - In
<DataFrame>$rolling()
,<LazyFrame>$rolling()
,<DataFrame>$group_by_dynamic()
and<LazyFrame>$group_by_dynamic()
, theby
argument is renamed to
group_by
(#983). - In
$dt$convert_time_zone()
and$dt$replace_time_zone()
, thetz
argument is renamed totime_zone
(#944). - In
$str$strptime()
, the argumentdatatype
is renamed todtype
(#939). - In
$str$to_integer()
(renamed from$str$parse_int()
), argumentradix
is
renamed tobase
(#1038).
- In
-
Change in the way arguments are passed:
-
In all input/output functions, all arguments except the first argument
must be named arguments (#935). -
In
<DataFrame>$rolling()
and<DataFrame>$group_by_dynamic()
, all
arguments exceptindex_column
must be named arguments (#983). -
In
$unique()
forDataFrame
andLazyFrame
, argumentskeep
and
maintain_order
must be named (#953). -
In
$bin$decode()
, thestrict
argument must be a named argument (#980). -
In
$dt$replace_time_zone()
, all arguments excepttime_zone
must be named
arguments (#944). -
In
$str$contains()
, the argumentsliteral
andstrict
must be named
(#982). -
In
$str$contains_any()
, theascii_case_insensitive
argument must be
named (#986). -
In
$str$count_matches()
,$str$replace()
and$str$replace_all()
,
theliteral
argument must be named (#987). -
In
$str$strptime()
,$str$to_date()
,$str$to_datetime()
, and
$str$to_time()
, all arguments (except the first one) must be named (#939). -
In
$str$to_integer()
(renamed from$str$parse_int()
), all arguments
must be named (#1038). -
In
pl$date_range()
, the argumentsclosed
,time_unit
, andtime_zone
must be named (#950). -
In
$set_sorted()
and$sort_by()
, argumentdescending
must be named
(#1034). -
In
pl$Series()
, using positional arguments throws a warning, since the
argument positions will be changed in the future (#966).# polars 0.15.1 or earlier # The first argument is `x`, the second argument is `name`. pl$Series(1:3, "foo") # The code above will warn in 0.16.0 # Use named arguments to silence the warning. pl$Series(values = 1:3, name = "foo") pl$Series(name = "foo", values = 1:3) # polars 0.17.0 or later (future version) # The first argument is `name`, the second argument is `values`. pl$Series("foo", 1:3)
This warning can also be silenced by replacing
pl$Series(<values>, <name>)
byas_polars_series(<values>, <name>)
.
-
-
Arguments removed:
- The argument
columns
in$drop()
is removed.$drop()
now accepts
several character scalars, such as$drop("a", "b", "c")
(#912). - In
pl$col()
, thename
argument is removed, and the...
argument no
longer accepts a list of characters andRPolarsSeries
class objects (#923). - In
pl$date_range()
, the unused argument (not working in recent versions)
explode
is removed. (#950).
- The argument
-
Change in arguments default and accepted values:
- In
pl$Series()
, the argumentvalues
has a new default valueNULL
(#966). - In
$unique()
forDataFrame
andLazyFrame
, argumentkeep
has a new
default value"any"
(#953). - In rolling aggregation functions (such as
$rolling_mean()
), the default
value of argumentclosed
now isNULL
. Usingclosed
with a fixed
window_size
now throws an error (#937). - In
pl$date_range()
, the argumentend
must be specified and the default
value ofinterval
is changed to"1d"
. The argumentsstart
andend
no longer accept numeric values (#950). - In
pl$scan_parquet()
, the default value of the argumentrechunk
is
changed fromTRUE
toFALSE
(#1033). - In
pl$scan_parquet()
andpl$read_parquet()
, the argumentparallel
only accepts"auto"
,"columns"
,"row_groups"
, and"none"
.
Previously, it also accepted upper-case notation of"auto"
,"columns"
,
"none"
, and"RowGroups"
instead of"row_groups"
(#1033). - In
$str$to_integer()
(renamed from$str$parse_int()
), the default
value ofbase
is changed from2
to10
(#1038).
- In
-
-
The usage of
pl$date_range()
to create a range ofDatetime
data type is
deprecated.pl$date_range()
will always create a range ofDate
data type
in the future. Usepl$datetime_range()
if you want to create a range of
Datetime
instead (#950). -
<DataFrame>$get_columns()
now returns an unnamed list instead of a named
list (#991). -
Removed
$argsort()
which was an old alias for$arg_sort()
(#930). -
Removed
pl$expr_to_r()
which was an alias for$to_r()
(#938). -
<Series>$to_r_list()
is renamed<Series>$to_list()
(#938). -
Removed
<Series>$to_r_vector()
which was an old alias for
<Series>$to_vector()
(#938). -
Removed
<Expr>$rep_extend()
, which was an experimental method created at the
early stage of this package and does not exist in other language APIs (#1028). -
The following deprecated functions are now removed:
pl$threadpool_size()
,
<DataFrame>$with_row_count()
,<LazyFrame>$with_row_count()
(#965). -
In
$group_by_dynamic()
, the first datapoint is always preserved (#1034). -
$str$parse_int()
is renamed to$str$to_integer()
(#1038).
New features
-
New functions:
pl$arg_sort_by()
(#929).pl$arg_where()
to get the indices that match a condition (#922).pl$datetime()
,pl$date()
, andpl$time()
to easily create Expr of class
datetime, date, and time via columns and literals (#918).pl$datetime_range()
,pl$date_ranges()
andpl$datetime_ranges()
(#950, #962).pl$int_range()
andpl$int_ranges()
(#968)pl$mean_horizontal()
(#959)pl$read_ipc()
(#1033).is_polars_dtype()
(#927).
-
New methods:
<LazyFrame>$to_dot()
to print the query plan of a LazyFrame with graphviz
dot syntax (#928).$clear()
forDataFrame
,LazyFrame
, andSeries
(#1004).$item()
forDataFrame
andSeries
(#992).$select_seq()
and$with_columns_seq()
forDataFrame
andLazyFrame
(#1003).$arr$to_list()
(#1018).$str$extract_groups()
(#979).$str$find()
(#985).<DataFrame>$write_ipc()
(#1032).RPolarsDataType
gains several methods to check the datatype, such as
$is_integer()
,$is_null()
or$is_list()
(#1036).
-
New arguments or argument values:
ambiguous
can now take the value"null"
to convert ambigous datetimes to
null values (#937).n
in$str$replace()
(#987).non_existent
in$dt$replace_time_zone()
to specify what should happen
when a datetime doesn't exist.mapping_strategy
in$over()
(#984, #988).raise_if_undetermined
in$meta$output_name()
(#961).null_on_oob
in$arr$get()
and$list$get()
to determine what happens
when the index is out of bounds (#1034).nulls_last
,multithreaded
, andmaintain_order
in$sort_by()
(#1034).
-
Other:
Bug fixes
- The
join_nulls
and ...
lib-v0.39.0
refactor!: `$str$parse_int()` -> `$str$to_integer()` (#1038) Co-authored-by: Etienne Bacher <[email protected]>
v0.15.1
New features
- rust-polars is updated to 0.38.2 (#907).
- Minimum supported Rust version (MSRV) is now 1.76.0.
as_polars_df(<nanoarrow_array>)
is added (#893).- It is now possible to create an empty
DataFrame
with a specific schema withpl$DataFrame(schema = my_schema)
(#901). - New arguments
dtype
andnan_to_null
forpl$Series()
(#902). - New method
<DataFrame>$partition_by()
(#898).
Bug fixes
- The default value of the
format
of$str$strptime()
is now correctly set (#892).
Other improvements
- Performance of
as_polars_df(<nanoarrow_array_stream>)
is improved (#896).
Full Changelog: v0.15.0...v0.15.1
lib-v0.38.1
feat: bump polars to 0.38.2 (#907) Co-authored-by: Etienne Bacher <[email protected]>
v0.15.0
Breaking changes due to Rust-polars update
- rust-polars is updated to 0.38.1 (#865, #872).
- in
$pivot()
, argumentsaggregate_function
,maintain_order
,sort_columns
andseparator
must be named. Values that are passed by position are ignored. - in
$describe()
, the name of the first column changed from"describe"
to"statistic"
. $mod()
methods and%%
works correctly to guaranteex == (x %% y) + y * (x %/% y)
.
- in
Other breaking changes
-
Removed
as.list()
for classRPolarsExpr
as it is a simple wrapper aroundlist()
(#843). -
Several functions have been rewritten to match the behavior of Python Polars.
pl$col(...)
requires at least one argument. (#852)pl$head()
,pl$tail()
,pl$count()
,pl$first()
,pl$last()
,pl$max()
,pl$min()
,pl$mean()
,pl$media()
,pl$std()
,pl$sum()
,pl$var()
,pl$n_unique()
, andpl$approx_n_unique()
are syntactic sugar forpl$col(...)$<method()>
. The argument...
now only accepts characters, that are either column names or regular expressions (#852).- There is no argument for
pl$len()
. If you want to measure the length of specific columns, you should usepl$count(...)
(#852). <Expr>$str$concat()
method'sdelimiter
argument's default value is changed from"-"
to""
(#853).<Expr>$str$concat()
method'signore_nulls
argument must be a named argument (#853).pl$Datetime()
's arguments are renamed:tu
totime_unit
, andtz
totime_zone
(#887).
-
pl$Categorical()
has been improved to allow specifying theordering
type (either lexical or physical). This also means that callingpl$Categorical
doesn't create aDataType
anymore. All calls topl$Categorical
must be replaced bypl$Categorical()
(#860). -
<Series>$rem()
is removed. Use<Series>$mod()
instead (#886). -
The conversion strategy between the POSIXct type without time zone attribute and Polars datetime has been changed (#878).
POSIXct
class vectors without a time zone attribute have UTC time internally and is displayed based on the system's time zone. Previous versions ofpolars
only considered the internal value and interpreted it as UTC time, so the time displayed asPOSIXct
and in Polars was different.# polars 0.14.1 Sys.setenv(TZ = "Europe/Paris") datetime = as.POSIXct("1900-01-01") datetime #> [1] "1900-01-01 PMT" s = polars::as_polars_series(datetime) s #> polars Series: shape: (1,) #> Series: '' [datetime[ms]] #> [ #> 1899-12-31 23:50:39 #> ] as.vector(s) #> [1] "1900-01-01 PMT"
Now the internal value is updated to match the displayed value.
# polars 0.15.0 Sys.setenv(TZ = "Europe/Paris") datetime = as.POSIXct("1900-01-01") datetime #> [1] "1900-01-01 PMT" s = polars::as_polars_series(datetime) s #> polars Series: shape: (1,) #> Series: '' [datetime[ms]] #> [ #> 1900-01-01 00:00:00 #> ] as.vector(s) #> [1] "1900-01-01 PMT"
This update may cause errors when converting from Polars to
POSIXct
for non-existent or ambiguous times. It is recommended to explicitly add a time zone before converting from Polars to R.Sys.setenv(TZ = "America/New_York") ambiguous_time = as.POSIXct("2020-11-01 01:00:00") ambiguous_time #> [1] "2020-11-01 01:00:00 EDT" pls = polars::as_polars_series(ambiguous_time) pls #> polars Series: shape: (1,) #> Series: '' [datetime[ms]] #> [ #> 2020-11-01 01:00:00 #> ] ## This will be error! # pls |> as.vector() pls$dt$replace_time_zone("UTC") |> as.vector() #> [1] "2020-11-01 01:00:00 UTC"
-
Removed argument
eager
inpl$date_range()
andpl$struct()
for more consistency of output. It is possible to replaceeager = TRUE
by calling$to_series()
(#882).
New features
- In the when-then-otherwise expressions, the last
$otherwise()
is now optional, as in Python Polars. If$otherwise()
is not specified, rows that don't respect the condition set in$when()
will be filled withnull
(#836). <DataFrame>$head()
and<DataFrame>$tail()
methods now support negative row numbers (#840).$group_by()
now works with named expressions (#846).- New methods for the
arr
subnamespace:$median()
,$var()
,$std()
,$shift()
,$to_struct()
(#867). $min()
andmax()
now work on categorical variables (#868).- New methods for the
list
subnamespace:$n_unique()
,$gather_every()
(#869). - Converts
clock_time_point
andclock_zoned_time
objects from the{clock}
package to Polars datetime type (#861). - New methods for the
name
subnamespace:$prefix_fields()
andsuffix_fields()
(#873). pl$Datetime()
'stime_zone
argument now accepts"*"
to match any time zone (#887).
Bug fixes
- R no longer crashes when calling an invalid Polars object that points to a null pointer (#874). This was occurring, such as when a Polars object was saved in an RDS file and loaded from another session.
New Contributors
- @detroyejr made their first contribution in #830
Full Changelog: v0.14.1...v0.15.0
lib-v0.38.0
docs(news): move old changelog to the NEWS.0.md file (#885)
v0.14.1
Breaking changes
- Since most of the methods of
Expr
are now available forSeries
, the experimental<Series>$expr
subnamespace is removed (#831). Use<Series>$<method>
instead of<Series>$expr$<method>
.
New features
- New active bindings
$flags
forDataFrame
to show the flags used internally for each column. The output of$flags
forSeries
was also improved and now containsFAST_EXPLODE
forSeries
of typelist
andarray
(#809). - Most of
Expr
methods are also available forSeries
(#819, #828, #831). as_polars_df()
fordata.frame
is more memory-efficient and new argumentsschema
andschema_overrides
are added (#817).- Use
polars_code_completion_activate()
to enable code suggestions and autocompletion after$
on polars objects. This is an experimental feature that is disabled by default. For now, it is only supported in the native R terminal and in RStudio (#597).
Bug fixes
<Series>$list
sub namespace methods returnsSeries
class object correctly (#819).
lib-v0.37.1
ci: fix migration to actions/download-artifact@v4
v0.14.0
Breaking changes due to Rust-polars update
- rust-polars is updated to 0.37.0 (#776).
- Minimum supported Rust version (MSRV) is now 1.74.1.
$with_row_count()
forDataFrame
andLazyFrame
is deprecated and will be removed in 0.15.0. It is replaced by$with_row_index()
.pl$count()
is deprecated and will be removed in 0.15.0. It is replaced bypl$len()
.$explode()
forDataFrame
andLazyFrame
doesn't work anymore on string columns.$list$join()
andpl$concat_str()
gain an argumentignore_nulls
. The current behavior is to return anull
if the row contains anynull
. Settingignore_nulls = TRUE
changes that.- All
row_count_*
args in reading/scanning functions are renamedrow_index_*
. $sort()
forSeries
gains an argumentnulls_last
.$str$extract()
and$str$zfill()
now accept anExpr
and parse strings as column names. Usepl$lit()
to recover the old behavior.$cum_count()
now starts from 1 instead of 0.
Other breaking changes
- The
simd
feature of the Rust library is removed in favor of the newnightly
feature (#800). If you specifiedsimd
via theLIBR_POLARS_FEATURES
environment variable during source installations, please usenightly
instead; there is no change if you specifiedfull_features
because it now containsnightly
instead ofsimd
. - The following functions were deprecated in 0.13.0 and are now removed (#783):
$list$lengths()
->$list$len()
pl$from_arrow()
->as_polars_df()
oras_polars_series()
pl$set_options()
andpl$reset_options()
->polars_options()
$is_between()
had several changes (#788):- arguments
start
andend
are renamedlower_bound
andupper_bound
. Their behaviour doesn't change. include_bounds
is renamedclosed
and must be one of"left"
,"right"
,"both"
, or"none"
.
- arguments
polars_info()
returns a slightly changed list.$threadpool_size
, which means the number of threads used by Polars, is changed to$thread_pool_size
(#784)$version
, which indicates the version of this package, is changed to$versions$r_package
(#791).$rust_polars
, which indicates the version of the dependent Rust Polars, is changed to$versions$rust_crate
(#791).
- New behavior when creating a
DataFrame
with a single list-variable.pl$DataFrame(x = list(1:2, 3:4))
used to create aDataFrame
with two columns named "new_column" and "new_column_1", which was unexpected. It now produces aDataFrame
with a singlelist
variable. This also applies to list-column created in$with_columns()
and$select()
(#794).
Deprecations
pl$threadpool_size()
is deprecated and will be removed in 0.15.0. Usepl$thread_pool_size()
instead (#784).
New features
- Implementation of the subnamespace
$arr
for expressions onarray
-type columns. Anarray
column is similar to alist
column, but is stricter as each sub-array must have the same number of elements (#790).
Other improvements
- The
sql
feature is included in the default feature (#800). This means that functionality related to theRPolarsSQLContext
class is now always included in the binary package.