Releases: pathwaycom/pathway
Releases · pathwaycom/pathway
v0.15.4
Added
pw.io.kafka.read
now supports reading entries starting from a specified timestamp.pw.io.nats.read
andpw.io.nats.write
methods for reading from and writing Pathway tables to NATS.
Changed
pw.Table.diff
now supports settinginstance
parameter that allows computing differences for multiple groups.pw.io.postgres.write_snapshot
now keeps the Postgres table fully in sync with the current state of the table in Pathway. This means that if an entry is deleted in Pathway, the same entry will also be deleted from the Postgres table managed by the output connector.
Fixed
pw.PyObjectWrapper
is now picklable.
v0.15.3
Added
pw.io.mongodb.write
connector for writing Pathway tables in MongoDB.pw.io.s3.read
now supports downloading objects from an S3 bucket in parallel.
Changed
pw.io.fs.read
performance has been improved for directories containing a large number of files.
v0.15.2
Added
pw.io.deltalake.read
now supports custom S3 Delta Lakes with HTTP endpoints.pw.io.deltalake.read
now supports specifying both a custom endpoint and a custom region for Delta Lakes viapw.io.s3.AwsS3Settings
.
Changed
- Indices in
pathway.stdlib.indexing.nearest_neighbors
can now work also on numpy arrays. Previously they only acceptedlist[float]
. Working with numpy arrays improves memory efficiency. pw.io.s3.read
has been optimized to minimize new object requests whenever possible.- It is now possible to set the size limit of cache in
pw.udfs.DiskCache
. - State persistence now uses a single backend for both metadata and stream storage. The
pw.persistence.Config.simple_config
method is therefore deprecated. Now you can use thepw.persistence.Config
constructor with the same parameters that were previously used insimple_config
.
Fixed
pw.io.bigquery.write
connector now correctly handlespw.Json
columns.
v0.15.1
Fixed
pw.temporal.session
andpw.temporal.asof_join
now correctly works with multiple entries with the same time.- Fixed an issue in
pw.stdlib.indexing
where filters would cause runtime errors while usingHybridIndexFactory
.
v0.15.0
Added
- Experimental A
pw.xpacks.llm.document_store.DocumentStore
to process and index documents. pw.xpacks.llm.servers.DocumentStoreServer
used to expose REST server for retrieving documents frompw.xpacks.llm.document_store.DocumentStore
.pw.xpacks.stdlib.indexing.HybridIndex
used for querying multiple indices and combining their results.pw.io.airbyte.read
now also supports streams that only operate infull_refresh
mode.
Changed
- Running servers for answering queries is extracted from
pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
intopw.xpacks.llm.servers.QARestServer
andpw.xpacks.llm.servers.QASummaryRestServer
. - BREAKING:
query
andquery_as_of_now
ofpathway.stdlib.indexing.data_index.DataIndex
now produce an empty list instead ofNone
if no match is found
v0.14.3
Fixed
pw.io.deltalake.read
andpw.io.deltalake.write
now correctly work with lakes hosted in S3 over min.io, Wasabi and Digital Ocean.
Added
- The Pathway CLI command
spawn
can now execute code directly from a specified GitHub repository. - A new CLI command,
spawn-from-env
, has been added. This command runs the Pathway CLIspawn
command using arguments provided in thePATHWAY_SPAWN_ARGS
environment variable.
v0.14.2
Fixed
- Switched
pw.xpacks.llm.embedders.GeminiEmbedder
to be sync to resolve compatibility issues with the Google Colab runs. - Pinned
surya-ocr
module version for stability.
v0.14.1
Added
pw.xpacks.llm.embedders.GeminiEmbedder
which is a wrapper for Google Gemini Embedding services.
v0.14.0
Fixed
pw.debug.table_to_pandas
now exportsint | None
columns correctly.
Changed
pw.io.airbyte.read
can now be used with Airbyte connectors implemented in Python without requiring Docker.- BREAKING: UDFs now verify the type of returned values at runtime. If it is possible to cast a returned value to a proper type, the values is cast. If the value does not match the expected type and can't be cast, an error is raised.
- BREAKING:
pw.reducers.ndarray
reducer requires input column to either have typefloat
,int
orArray
. pw.xpacks.llm.parsers.OpenParse
can now extract and parse images & diagrams from PDFs. This can be enabled by setting theparse_images
.processing_pipeline
can be also set to customize the post processing of doc elements.
v0.13.2
Added
pw.io.deltalake.read
now supports S3 data sources.pw.xpacks.llm.parsers.ImageParser
which allows parsing images with the vision LMs.pw.xpacks.llm.parsers.SlideParser
that enables parsing PDF and PPTX slides with the vision LMs.pw.xpacks.llm.parsers.question_answering.RAGClient
, Python client for Pathway hosted RAG apps.pw.xpacks.llm.parsers.question_answeringDeckRetriever
, a RAG app that enables searching through slide decks with visual-heavy elements.
Fixed
pw.xpacks.llm.vector_store.VectorStoreServer
now uses new indexes.
Changed
pw.xpacks.llm.parsers.OpenParse
now supports any vision Language model including local and propriety models via LiteLLM.