Releases: pathwaycom/pathway
Releases · pathwaycom/pathway
v0.7.7
Added
- pathway.xpacks.llm.splitter.TokenCountSplitter.
v0.7.6
New Features
Conversion Methods in pw.Json
- Introducing new methods for strict conversion of
pw.Json
to desired types within a UDF body:as_int()
as_float()
as_str()
as_bool()
as_list()
as_dict()
DateTime Functionality
- Added
table.col.dt.utc_from_timestamp
method: CreatesDateTimeUtc
from timestamps represented asint
s orfloat
s. - Enhanced the
table.col.dt.timestamp
method with a newunit
argument to specify the unit of the returned timestamp.
Experimental Features
- Introduced an experimental xpack with a Microsoft SharePoint input connector.
Enhancements
Improved JSON Handling
- Index operator (
[]
) can now be directly applied topw.Json
within UDFs to access elements of JSON objects, arrays, and strings.
Expanded Timestamp Functionality
- Enhanced the
table.col.dt.from_timestamp
method to createDateTimeNaive
from timestamps represented asint
s orfloat
s. - Deprecated not specifying the
unit
argument of thetable.col.dt.timestamp
method.
KNNIndex Enhancements
KNNIndex
now supports returning computed distances.- Added support for cosine similarity in
KNNIndex
.
Deprecated Features
- The
offset
argument ofpw.stdlib.temporal.sliding
andpw.stdlib.temporal.tumbling
is deprecated. Useorigin
instead, as it represents a point in time, not a duration.
Bug Fixes
DateTime Fixes
- Sliding window now works correctly with UTC Datetimes.
asof_join
Improvements
- Temporal column in
asof_join
no longer has to be namedt
. asof_join
includes rows with equal times for all values of thedirection
parameter.
Fixed Issues
- Fixed an issue with
pw.io.gdrive.read
: Shared folders support is now working seamlessly.
v0.7.5
Added
- Added Table.split() method for splitting table based on an expression into two tables.
- Columns with datatype duration can now be multiplied and divided by floats.
- Columns with datatype duration now support both true and floor division (
/
and//
) by integers.
Changed
- Pathway is better at typing if_else expressions when optional types are involved.
table.flatten()
operator now supports Json array.- Buffers (used to delay outputs, configured via delay in
common_behavior
) now flush the data when the computation is finished. The effect of this change can be seen when run in bounded (batch / multi-revision) mode. pw.io.subscribe()
takes additional argumenton_time_end
- the callback function to be called on each closed time of computation.pw.io.subscribe()
is now a single-worker operator, guaranteeing thaton_end
is triggered at most once.KNNIndex
supports now metadata filtering. Each query can specify it's own filter in the JMESPath format.
Fixed
- Resolved an optimization bug causing
pw.iterate
to malfunction when handling columns effectively pointing to the same data.
v0.7.4
Fixed
- Fixed issues with standalone panel+Bokeh dashboards to ensure optimal functionality and performance.
v0.7.3
Added
- A method
weekday
has been added to thedt
namespace, that can be called on column expressions containing datetime data. This method returns an integer that represents the day of the week. - EXPERIMENTAL: Methods
show
andplot
on Tables, providing visualizations of data using HoloViz Panel. - Added support for
instance
parameter togroupby
,join
,windowby
and temporal join methods. pw.PersistenceMode.UDF_CACHING
persistence mode enabling automatic caching ofAsyncTransformer
invocations.
Changed
- Methods
round
andfloor
on columns with datetimes now accept duration argument to be a string. pw.debug.compute_and_print
andpw.debug.compute_and_print_update_stream
have a new argumentn_rows
that limits the number of rows printed.pw.debug.table_to_pandas
has a new argumentinclude_id
(by defaultTrue
). If set toFalse
, creates a new index for the Pandas DataFrame, rather than using the keys of the Pathway Table.windowby
functionshard
argument is now deprecated andinstance
should be used.- Special column name
_pw_shard
is now deprecated, and_pw_instance
should be used. pw.ReplayMode
now can be accessed aspw.PersistenceMode
, while theSPEEDRUN
andREALTIME
variants are now accessible asSPEEDRUN_REPLAY
andREALTIME_REPLAY
.- EXPERIMENTAL:
pw.io.gdrive.read
has a new argumentwith_metadata
(by defaultFalse
). If set toTrue
, adds a_metadata
column containing file metadata to the resulting table. - Methods
get_nearest_items
andget_nearest_items_asof_now
ofKNNIndex
allow to specifyk
(number of returned elements) separately in each query.
v0.7.2
Added
- Added ability of creating custom reducers using
pw.reducers.udf_reducer
decorator. Usepw.BaseCustomAccumulator
as a base class
for creating accumulators. Decorating accumulator returns reducer following custom logic. - A function
pw.debug.compute_and_print_update_stream
that computes and prints the update stream of the table. - SQLite input connector (
pw.io.sqlite
).
Changed
pw.debug.parse_to_table
is now deprecated,pw.debug.table_from_markdown
should be used instead.pw.schema_from_csv
now hasquote
anddouble_quote_escapes
arguments.
Fixed
- Schema returned from
pw.schema_from_csv
will have quotes removed from column names, so it will now work properly withpw.io.csv.read
.
v0.7.1
Added
- Experimental Google Drive input connector.
- Stateful deduplication function (
pw.stateful.deduplicate
) allowing alerting on significant changes. - The ability to split data into batches in
pw.debug.table_from_markdown
andpw.debug.table_from_pandas
.
v0.7.0
Added
- class
Behavior
, a superclass of all behavior classes. - class
ExactlyOnceBehavior
indicating we want to create aCommonBehavior
that results in each window producing exactly one output (shifted in time by an optionalshift
parameter). - function
exactly_once_behavior
creating an instance ofExactlyOnceBehavior
.
Changed
- BREAKING:
WindowBehavior
is now calledCommonBehavior
, as it can be also used with interval joins. - BREAKING:
window_behavior
is now calledcommon_behavior
, as it can be also used with interval joins. - Deprecating parameter
keep_queries
inpw.io.http.rest_connector
. Nowdelete_completed_queries
with an opposite meaning should be used instead. The default is stilldelete_completed_queries=True
(equivalent tokeep_queries=False
) but it will soon be required to be set explicitly.
v0.6.0
Added
- A flag
with_metadata
for the filesystem-based connectors to attach the source file metadata to the table entries. - Methods
pw.debug.table_from_list_of_batches
andpw.debug.table_from_list_of_batches_by_workers
for creating tables with defined data being inserted over time.
Changed
- BREAKING:
pw.debug.table_from_pandas
andpw.debug.table_from_markdown
now will create tables in the streaming mode, instead of static, if given table definition contains_time
column. - BREAKING: Renamed the parameter
keep_queries
inpw.io.http.rest_connector
todelete_queries
with the opposite meaning. It changes the default behavior - it waskeep_queries=False
, now it isdelete_queries=False
.
v0.5.3
Added
- A method
get_nearest_items_asof_now
inKNNIndex
that allows to get nearest neighbors without updating old queries in the future. - A method
asof_now_join
inTable
to join rows from left side of the join with right side of the join at their processing time. Past rows from left side are not used when new data appears on the right side.