- Add
arrow=54
support
Bug fixes:
- Fixed deserialization from sliced arrays (#248). Note that the current solution requires up front work when constructing the array deserializers, as described in the issue. The removal of the performance penalty is tracked in (#250)
New features
- Add support for various
jiff
types (jiff::Date
,jiff::Time
,jiff::DateTime
,jiff::Timestamp
,jiff::Span
,jiff::SignedDuration
) - Add support for tracing lists as
List
instead ofLargeList
by settingsequence_as_large_list
tofalse
inTracingOptions
- Add support for tracing strings and strings in dictionaries as
Utf8
instead ofLargeUtf8
by settingstrings_as_large_utf8
tofalse
inTracingOptions
- Add support to auto-detect dates (
2024-09-30
, mapped toDate32
) and times (12:00:00
, mapped toTime64(Nanosecond))
) infrom_samples
- Improved error messages for non self describing types (
chrono::*
,uuid::Uuid
,std::net::IpAddr
)
The following people contributed to this release:
- @jkylling added support for tracing lists as
List
and strings asUtf8
Refactor the underlying implementation to prepare for further development
New features
- Add
Binary
,LargeBinary
,FixedSizeBinary(n)
,FixedSizeList(n)
support forarrow2
- Add support to serialize / deserialize
bool
from integer arrays - Add a helper to construct
Bool8
arrays - Include the path of the field that caused an error in the error message
- Include backtrace information only for the debug representations of errors
API changes
- Use
impl serde::Serialize
instead of&(impl serde::Serialize + ?Sized)
- Use
&[FieldRef]
instead of&[Field]
in arrow APIs
Removed deprecated API
- Remove
serde_arrow::schema::Schema
- Remove
serde_arrow::ArrowBuilder
andserde_arrow::Arrow2Builder
- Remove
from_arrow_fields
/to_arrow_fields
forSerdeArrowSchema
, use theTryFrom
conversions to convert between fields andSerdeArrowSchema
- Remove
SerdeArrowSchema::new()
,Overwrites::new()
- Add
arrow=53
support
The following people contributed to this release:
- shehabgamin prepared this release (pr)
- Fix tracing of JSON mixing nulls with non-null data
- Add
arrow=52
support - Add support for
Binary
,LargeBinary
(onlyarrow
) - Add support for
FixedSizeBinary(n)
(onlyarrow>=47
) - Add support for
FixedSizeList(n)
(onlyarrow
) - Add support to overwrite field definitions with
TracingOptions::overwrite
- Add support to serialize enums without data (e.g.,
enum E { A, B, C}
) as strings by setting the corresponding field to a string value (Utf
,LargeUtf
,Dictionary(_, Utf8)
,Dictionary(_, LargeUtf8
) - Allow to trace enums without data as dictionary encoded strings by setting
enums_without_data_as_strings
totrue
inTracingOptions
- Add
serde_arrow::Serializer
- Add support for new type wrappers, tuples and tuple structs to
serde_arrow::Deserializer
- Add a generic
serde_arrow::ArrayBuilder
with support for botharrow
andarrow2
- Implement
TryFrom<&[Field]>
(arrow
andarrow2
) andTryFrom<&[FieldRef]>
(arrow
only) forSerdeArrowSchema
- Implement
TryFrom<&SerdeArrowSchema>
forVec<Field>
andVec<FieldRef>
forarrow
- Add
serde_arrow::Deserializer
- Support for serializing/deserializing timestamps with second, microsecond, and nanosecond encoding.
- Fixed (de)serialization of fractional seconds.
The following people contributed to this release:
- @ryzhyk added string support for timestamps with non-millisecond units, fixed the handling of fractional seconds (PR)
- Support
Duration(unit)
- Rewrite data type parsing with stricter parsing
- Support
Timestamp(Second, tz)
,Timestamp(Millisecond, tz)
,Timestamp(Nanosecond, tz)
. At the moment only (de)serialization from / to integers is supported for non-microsecond units - Support
Time32(unit)
0.11.0
does not contain any known breaking changes. However it's a major
refactoring and untested behavior may change.
The biggest feature is the removal of the bytecode deserializer and use of the
Serde API directly. With this change, the code is easier to understand and
extend. Further Deserialization
implementations can request specific types and
serde_arrow
is able to supply them. As a consequence deserialization of
chrono::DateTime<Utc>
is supported by serde_arrow
without an explicit
strategy.
Further changes:
- Add
arrow=51
support - Add
Date32
andTime64
support - Add
to_record_batch
,from_record_batch
to offer more streamlined APIs for working with record batches - Allow to perform zero-copy deserialization from arrow arrays
- Allow to use
arrow
schemas inSchemaLike::from_value()
, e.g.,let fields = Vec::<Field>::from_value(&batch.schema())
. - Implement
SchemaLike
forarrow::datatypes::FieldRef
s - Fix bug in
SchemaLike::from_type()
for nested unions
The following people contributed to this release:
- @gz added
Date32
andTime64
support (PR) - @progval added additional error messages (PR)
- @gstvg contributed zero-copy deserialization (PR)
- Remove deprecated APIs
- Use the serde serialization APIs directly, instead of using the bytecode
serializer. Serialization will be about
2x
faster - Fix bug in
SchemaLike::from_value
with incorrect strategy deserialization
The following people contributed to this release:
- @Ten0 motivated the rewrite to use the serde API directly and contributed additional benchmarks for JSON transcoding (PR)
- @alamb added improved documentation on how to use
serde_arrow
with thearrow
crate (PR)
Decimal128
support: serialize / deserializerust_decimal
andbigdecimal
objects- Add
arrow=50
support - Improved error messages when deserializing
SchemaLike
- Relax
Sized
requirement forSchemaLike::from_samples(..)
,SchemaLike::from_type(..)
,SchemaLike::from_value(..)
- Derive
Debug
,PartialEq
forItem
andItems
Breaking changes:
- Make tracing options non-exhaustive
- Remove the
try_parse_dates
field in favor of theguess_dates
field inTracingOptions
(the setter name is not affected) - Remove the experimental configuration api
Improvements:
- Simpler and streamlined API (
to_arrow
/from_arrow
andto_arrow2
/from_arrow2
) - Add
SchemaLike
trait to support direct construction of arrow / arrow2 fields - Add type based tracing to allow schema tracing without samples
(
SchemaLike::form_type()
) - Allow to build schema objects from serializable objects, e.g.,
serde_json::Value
(SchemaLike::from_value()
) - Add support for
arrow=47
,arrow=48
,arrow=49
- Improve error messages in schema tracing
- Fix bug in
arrow2=0.16
support - Fix unused warnings without selected arrow versions
Deprecations (see the documentation of deprecated items for how to migrate):
- Rename
serde_arrow::schema::Schema
toserde_arrow::schema::SerdeArrowSchema
to prevent name clashes with the schema types ofarrow
andarrow2
. - Deprecate
serialize_into_arrays
,deserialize_from_arrays
methods in favor ofto_arrow
/to_arrow2
andfrom_arrow
/from_arrow2
- Deprecate
serialize_into_fields
methods in favor ofSchemaLike::from_samples
- Deprecated single item methods in favor of using the
Items
andItem
wrappers
Make bytecode based serialization and deserialization the default
- Remove state machine serialization, and use bytecode serialization as the default. This change results in a 2.6x speed up for the default configuration
- Implement deserialization via bytecode (remove state machine implementation)
- Add deserialization support for arrow
Update arrow version support
- Add
arrow=40
,arrow=41
,arrow=42
,arrow=43
,arrow=44
,arrow=45
,arrow=46
support - Remove for
arrow=35
,arrow=36
support
Improve type support
- Implement bytecode serialization / deserialization of f16
- Add support for coercing different numeric types (use
TracingOptions::default().coerce_numbers(true)
) - Add support for
Timestamp(Milliseconds, None)
andTimestamp(Milliseconds, Some("UTC"))
.
Quality of life features
- Ignore unknown fields in serialization (Rust -> Arrow)
- Raise an error if resulting arrays are of unequal length (#78)
- Add an experimental schema struct under
serde_arrow::experimental::Schema
that can be easily serialized and deserialized.
No longer export the base
module: the implementation details as-is where not
really useful. Remove for now and think about a better design.
Bug fixes:
- Fix bug in bytecode serialization for missing fields (#79)
- Fix bytecode serialization for nested options, .e.g,
Option<Option<T>>
. - Fix bytecode serialization of structs with missing fields, e.g., missing keys with maps serialized as structs
- Fix nullable top-level fields in bytecode serialization
- Fix bug in bytecode serialization for out of order fields (#80)
- Fix a bug for unions with unknown variants reported here. Now
serde_arrow
correctly handles unions during serialization, for which not all variants were encountered during tracing. Serializing unknown variants will result in an error. All variants that are seen during tracing are save to use.
-
Breaking change: add new
Item
event emitted before list items, tuple items, or map entries -
Add support for
arrow=38
andarrow=39
with thearrow-38
andarrow-39
features -
Add support for an experimental bytecode serializer that shows speeds of up to 4x. Enable it with
serde_arrow::experimental::configure(|config| { config.serialize_with_bytecode = true; });
This setting is global and used for all calls to
serialize_to_array
andserialize_to_arrays
. At the moment the following features are not supported by the bytecode serializer:- nested options (
Option<Option<T>>
) - creating
float16
arrays
- nested options (
The following people contributed to this release:
- Add support for
arrow=37
with thearrow-37
feature
Now both arrow and arrow2 are supported. Use the features to select the
relevant version of either crate. E.g., to use serde_arrow
with arrow=0.36
:
serde_arrow = { version = "0.6", features = ["arrow-36"] }
serde_arrow
now supports to deserialize Rust objects from arrays. At the
moment this operation is only support for arrow2
. Adding support arrow
is
planned.
serde_arrow
now supports many more Rust and Arrow features.
- Rust: Struct, Lists, Maps, Enums, Tuples
- Arrow: Struct, List, Maps, Unions, ...
serde_arrow
no longer relies on its own schema object. Now all schema
information is retrieved from arrow fields with additional metadata.
In addition to the previous API that worked on a sequence of records,
serde_arrow
now also supports to operate on a sequence of individual items
(serialize_into_array
, deserialize_form_array
) and to operate on single
items (ArraysBuilder
).
serde_arrow
supports dictionary encoding for string arrays. This way string
arrays are encoded via a lookup table to avoid including repeated string values.
- Bump arrow to version 16.0.0