v0.16.0
Breaking changes
-
R objects inside an R list are now converted to Polars data types via
as_polars_series()
(#1021, #1022, #1023). For example, up to polars 0.15.1,
a list containing a data.frame with a column of{clock}
naive-time class
was converted to a nested List type of Float64:data = data.frame(time = clock::naive_time_parse("1990-01-01", precision = "day")) pl$select( nested_data = pl$lit(list(data)) ) #> shape: (1, 1) #> ┌──────────────────────────┐ #> │ nested_data │ #> │ --- │ #> │ list[list[list[f64]]] │ #> ╞══════════════════════════╡ #> │ [[[2.1475e9], [7305.0]]] │ #> └──────────────────────────┘
From 0.16.0, nested types are correctly converted, so that will be
a List type of Struct type containing a Datetime type.data = data.frame(time = clock::naive_time_parse("1990-01-01", precision = "day")) pl$select( nested_data = pl$lit(list(data)) ) #> shape: (1, 1) #> ┌─────────────────────────┐ #> │ nested_data │ #> │ --- │ #> │ list[struct[1]] │ #> ╞═════════════════════════╡ #> │ [{1990-01-01 00:00:00}] │ #> └─────────────────────────┘
-
Several functions have been rewritten to match the behavior of Python Polars.
There are four types of changes: i) change in argument names, ii) change in
the way arguments are passed (named or by position), iii) arguments are removed,
and iv) change in the default and accepted values. Those are addressed separately
below.-
Change in argument names:
- In
$reshape()
, thedims
argument is renamed todimensions
(#1019). - In
pl$read_*
andpl$scan_*
functions, the first argument is now
source
(#935). - In
pl$Series()
, the argumentx
is renamedvalues
(#933). - In
<DataFrame>$write_*
functions, the first argument is nowfile
(#935). - In
<LazyFrame>$sink_*
functions, the first argument is nowpath
(#935). - In
<LazyFrame>$sink_ipc()
, the argumentmemmap
is renamed tomemory_map
(#1032). - In
<DataFrame>$rolling()
,<LazyFrame>$rolling()
,<DataFrame>$group_by_dynamic()
and<LazyFrame>$group_by_dynamic()
, theby
argument is renamed to
group_by
(#983). - In
$dt$convert_time_zone()
and$dt$replace_time_zone()
, thetz
argument is renamed totime_zone
(#944). - In
$str$strptime()
, the argumentdatatype
is renamed todtype
(#939). - In
$str$to_integer()
(renamed from$str$parse_int()
), argumentradix
is
renamed tobase
(#1038).
- In
-
Change in the way arguments are passed:
-
In all input/output functions, all arguments except the first argument
must be named arguments (#935). -
In
<DataFrame>$rolling()
and<DataFrame>$group_by_dynamic()
, all
arguments exceptindex_column
must be named arguments (#983). -
In
$unique()
forDataFrame
andLazyFrame
, argumentskeep
and
maintain_order
must be named (#953). -
In
$bin$decode()
, thestrict
argument must be a named argument (#980). -
In
$dt$replace_time_zone()
, all arguments excepttime_zone
must be named
arguments (#944). -
In
$str$contains()
, the argumentsliteral
andstrict
must be named
(#982). -
In
$str$contains_any()
, theascii_case_insensitive
argument must be
named (#986). -
In
$str$count_matches()
,$str$replace()
and$str$replace_all()
,
theliteral
argument must be named (#987). -
In
$str$strptime()
,$str$to_date()
,$str$to_datetime()
, and
$str$to_time()
, all arguments (except the first one) must be named (#939). -
In
$str$to_integer()
(renamed from$str$parse_int()
), all arguments
must be named (#1038). -
In
pl$date_range()
, the argumentsclosed
,time_unit
, andtime_zone
must be named (#950). -
In
$set_sorted()
and$sort_by()
, argumentdescending
must be named
(#1034). -
In
pl$Series()
, using positional arguments throws a warning, since the
argument positions will be changed in the future (#966).# polars 0.15.1 or earlier # The first argument is `x`, the second argument is `name`. pl$Series(1:3, "foo") # The code above will warn in 0.16.0 # Use named arguments to silence the warning. pl$Series(values = 1:3, name = "foo") pl$Series(name = "foo", values = 1:3) # polars 0.17.0 or later (future version) # The first argument is `name`, the second argument is `values`. pl$Series("foo", 1:3)
This warning can also be silenced by replacing
pl$Series(<values>, <name>)
byas_polars_series(<values>, <name>)
.
-
-
Arguments removed:
- The argument
columns
in$drop()
is removed.$drop()
now accepts
several character scalars, such as$drop("a", "b", "c")
(#912). - In
pl$col()
, thename
argument is removed, and the...
argument no
longer accepts a list of characters andRPolarsSeries
class objects (#923). - In
pl$date_range()
, the unused argument (not working in recent versions)
explode
is removed. (#950).
- The argument
-
Change in arguments default and accepted values:
- In
pl$Series()
, the argumentvalues
has a new default valueNULL
(#966). - In
$unique()
forDataFrame
andLazyFrame
, argumentkeep
has a new
default value"any"
(#953). - In rolling aggregation functions (such as
$rolling_mean()
), the default
value of argumentclosed
now isNULL
. Usingclosed
with a fixed
window_size
now throws an error (#937). - In
pl$date_range()
, the argumentend
must be specified and the default
value ofinterval
is changed to"1d"
. The argumentsstart
andend
no longer accept numeric values (#950). - In
pl$scan_parquet()
, the default value of the argumentrechunk
is
changed fromTRUE
toFALSE
(#1033). - In
pl$scan_parquet()
andpl$read_parquet()
, the argumentparallel
only accepts"auto"
,"columns"
,"row_groups"
, and"none"
.
Previously, it also accepted upper-case notation of"auto"
,"columns"
,
"none"
, and"RowGroups"
instead of"row_groups"
(#1033). - In
$str$to_integer()
(renamed from$str$parse_int()
), the default
value ofbase
is changed from2
to10
(#1038).
- In
-
-
The usage of
pl$date_range()
to create a range ofDatetime
data type is
deprecated.pl$date_range()
will always create a range ofDate
data type
in the future. Usepl$datetime_range()
if you want to create a range of
Datetime
instead (#950). -
<DataFrame>$get_columns()
now returns an unnamed list instead of a named
list (#991). -
Removed
$argsort()
which was an old alias for$arg_sort()
(#930). -
Removed
pl$expr_to_r()
which was an alias for$to_r()
(#938). -
<Series>$to_r_list()
is renamed<Series>$to_list()
(#938). -
Removed
<Series>$to_r_vector()
which was an old alias for
<Series>$to_vector()
(#938). -
Removed
<Expr>$rep_extend()
, which was an experimental method created at the
early stage of this package and does not exist in other language APIs (#1028). -
The following deprecated functions are now removed:
pl$threadpool_size()
,
<DataFrame>$with_row_count()
,<LazyFrame>$with_row_count()
(#965). -
In
$group_by_dynamic()
, the first datapoint is always preserved (#1034). -
$str$parse_int()
is renamed to$str$to_integer()
(#1038).
New features
-
New functions:
pl$arg_sort_by()
(#929).pl$arg_where()
to get the indices that match a condition (#922).pl$datetime()
,pl$date()
, andpl$time()
to easily create Expr of class
datetime, date, and time via columns and literals (#918).pl$datetime_range()
,pl$date_ranges()
andpl$datetime_ranges()
(#950, #962).pl$int_range()
andpl$int_ranges()
(#968)pl$mean_horizontal()
(#959)pl$read_ipc()
(#1033).is_polars_dtype()
(#927).
-
New methods:
<LazyFrame>$to_dot()
to print the query plan of a LazyFrame with graphviz
dot syntax (#928).$clear()
forDataFrame
,LazyFrame
, andSeries
(#1004).$item()
forDataFrame
andSeries
(#992).$select_seq()
and$with_columns_seq()
forDataFrame
andLazyFrame
(#1003).$arr$to_list()
(#1018).$str$extract_groups()
(#979).$str$find()
(#985).<DataFrame>$write_ipc()
(#1032).RPolarsDataType
gains several methods to check the datatype, such as
$is_integer()
,$is_null()
or$is_list()
(#1036).
-
New arguments or argument values:
ambiguous
can now take the value"null"
to convert ambigous datetimes to
null values (#937).n
in$str$replace()
(#987).non_existent
in$dt$replace_time_zone()
to specify what should happen
when a datetime doesn't exist.mapping_strategy
in$over()
(#984, #988).raise_if_undetermined
in$meta$output_name()
(#961).null_on_oob
in$arr$get()
and$list$get()
to determine what happens
when the index is out of bounds (#1034).nulls_last
,multithreaded
, andmaintain_order
in$sort_by()
(#1034).
-
Other:
Bug fixes
- The
join_nulls
andvalidate
arguments of<DataFrame>$join()
now work
correctly (#945). - We said in the changelog of 0.14.0 that all
row_count_*
args in I/O functions
were renamedrow_index_*
, but this change was not made for CSV and IPC
functions. This renaming is now made (#964). - Evaluating
Series
methods fromExpr
inside functions now works correctly (#973).
Thanks @Yunuuuu for the report. - The dependent crate
extendr-api
is updated to 2024-03-31 unreleased version (#995).
The issue that the R session crashes when a panic occurs in the Rust side is resolved.
Thanks @CGMossa for the upstream fix. - The
parallel
argument ofpl$scan_parquet()
andpl$read_parquet()
now works
correctly (#1033). Previously, any correct value was treated as"auto"
.
New Contributors
- @george-wood made their first contribution in #949
- @Yunuuuu made their first contribution in #999
Full Changelog: v0.15.1...v0.16.0