Skip to content

Commit

Permalink
Merge pull request #243 from chmp/feature/231-jiff
Browse files Browse the repository at this point in the history
Add jiff support
  • Loading branch information
chmp authored Oct 5, 2024
2 parents a3f3970 + c5698bf commit 73b102a
Show file tree
Hide file tree
Showing 29 changed files with 2,155 additions and 301 deletions.
27 changes: 27 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions Changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ New features
to `Time64(Nanosecond))`) in `from_samples`
- Improved error messages for non self describing types (`chrono::*`, `uuid::Uuid`,
`std::net::IpAddr`)
- Add support for various `jiff` types (`jiff::Date`, `jiff::Time`, `jiff::DateTime`,
`jiff::Timestamp`, `jiff::Span`, `jiff::SignedDuration`)

## 0.12.0

Expand Down
1 change: 1 addition & 0 deletions serde_arrow/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@ serde_bytes = "0.11"
rand = "0.8"
bigdecimal = {version = "0.4", features = ["serde"] }
uuid = { version = "1.10.0", features = ["serde", "v4"] }
jiff = { version = "0.1", features = ["serde"] }

# for benchmarks
# arrow-version:replace: arrow-json-{version} = {{ package = "arrow-json", version = "{version}" }}
Expand Down
175 changes: 125 additions & 50 deletions serde_arrow/Status.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
# Status

Supported arrow data types:
The page documents the supported types both from an Arrow and a Rust perspective.

- [Arrow data types](#arrow-data-types)
- [Rust types](#rust-types)
- [Native / standard types](#native--standard-types)
- [`chrono` types](#chrono-types)
- [`jiff` types](#jiff-types)
- [`rust_decimal` and `bigdecimal` types](#rust_decimal-and-bigdecimal-types)

## Arrow data types

- [x] [`Null`](https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Null)
- [x] [`Boolean`](https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Boolean)
Expand Down Expand Up @@ -49,7 +58,9 @@ Supported arrow data types:
serialization error.
- [ ] [`Decimal256(precision, scale)`](https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Decimal256)

Native / standard Rust types:
## Rust types

### Native / standard types

- [x] `bool`
- [x] `i8`, `i16`, `i32`, `i64`
Expand All @@ -72,54 +83,118 @@ Native / standard Rust types:
supported
- [x] `struct S(T)`: newtype structs are supported, if `T` is supported

Non-standard Rust types

- [x] `chrono::DateTime<Utc>`:
- is serialized / deserialized as strings
- can be mapped to `Utf8`, `LargeUtf8`, `Timestamp(.., Some("Utc"))`, `Date64` with strategy `UtcStrAsDate64`
- `from_samples` detects the type `LargeUtf8` without configuration, the type `Date64` with
strategy `UtcStrAsDate64` when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing
- [x] `chrono::DateTime<Utc>` using [`chrono::serde::ts_microseconds`][chrono-ts-microseconds]:
- is serialized / deserialized as `i64`
- can be mapped to `Utf8`, `LargeUtf8`, `Timestamp(.., Some("Utc"))`, `Date64` without Strategy,
`Date64` with strategy `UtcStrAsDate64`
- `from_samples` and `from_type` detect the type `Int64`
- [x] `chrono::NaiveDateTime`:
- is serialized / deserialized as strings
- can be mapped to `Utf8`, `LargeUtf8`, `Timestamp(.., None)`, `Date64` with strategy `NaiveStrAsDate64`
- `from_samples` detects the type `LargeUtf8` without configuration, the type `Date64` with
strategy `NaiveStrAsDate64` when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing
- [x] `chrono::NaiveTime`:
- serialized / deserialized as strings
- can be mapped to `Utf8`, `LargeUtf8`, `Time32(..)` and `Time64` arrays
- `from_samples` detects the type `LargeUtf8` without configuration, the type `Time64(Nanosecond)`
when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing
- [x] `chrono::NaiveDate`:
- is serialized as Serde strings
- can be mapped to `Utf8`, `LargeUtf8`, `Date32` arrays
- `from_samples` detects the type `LargeUtf8` without configuration, to `Date32` when setting
`guess_dates = true`
- `from_type` is not supported, as the type is not self-describing
- [ ] `chrono::Duration`: does not support Serde and is therefore not supported
- [x] [`rust_decimal::Decimal`][rust_decimal::Decimal] for the `float` and `str`
(de)serialization options when using the `Decimal128(..)` data type
- [x] [`bigdecimal::BigDecimal`][bigdecimal::BigDecimal] when using the
`Decimal128(..)` data type


[crate::base::Event]: https://docs.rs/serde_arrow/latest/serde_arrow/event/enum.Event.html
[crate::to_record_batch]: https://docs.rs/serde_arrow/latest/serde_arrow/fn.to_record_batch.html
[crate::trace_schema]: https://docs.rs/serde_arrow/latest/serde_arrow/fn.trace_schema.html
[serde::Serialize]: https://docs.serde.rs/serde/trait.Serialize.html
[serde::Deserialize]: https://docs.serde.rs/serde/trait.Deserialize.html
[crate::Schema::from_records]: https://docs.rs/serde_arrow/latest/serde_arrow/struct.Schema.html#method.from_records
[chrono]: https://docs.rs/chrono/latest/chrono/

[crate::base::EventSource]: https://docs.rs/serde_arrow
[crate::base::EventSink]: https://docs.rs/serde_arrow
### `chrono` types

#### `chrono::DateTime<Utc>`

- is serialized / deserialized as strings
- can be mapped to `Utf8`, `LargeUtf8`, `Timestamp(.., Some("Utc"))`, `Date64` with strategy `UtcStrAsDate64`
- `from_samples` detects
- `LargeUtf8` without configuration
- `Date64` with strategy `UtcStrAsDate64` when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing

With [`chrono::serde::ts_microseconds`][chrono-ts-microseconds]:

- is serialized / deserialized as `i64`
- can be mapped to `Utf8`, `LargeUtf8`, `Timestamp(.., Some("Utc"))`, `Date64` without Strategy,
`Date64` with strategy `UtcStrAsDate64`
- `from_samples` and `from_type` detect `Int64`

#### `chrono::NaiveDateTime`

- is serialized / deserialized as strings
- can be mapped to `Utf8`, `LargeUtf8`, `Timestamp(.., None)`, `Date64` with strategy `NaiveStrAsDate64`
- `from_samples` detects
- `LargeUtf8` without configuration
- `Date64` with strategy `NaiveStrAsDate64` when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing

#### `chrono::NaiveTime`

- serialized / deserialized as strings
- can be mapped to `Utf8`, `LargeUtf8`, `Time32(..)` and `Time64` arrays
- `from_samples` detects
- `LargeUtf8` without configuration
- `Time64(Nanosecond)` when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing

#### `chrono::NaiveDate`

- is serialized as Serde strings
- can be mapped to `Utf8`, `LargeUtf8`, `Date32` arrays
- `from_samples` detects
- `LargeUtf8` without configuration
- `Date32` when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing

`chrono::Duration` does not support Serde and is therefore not supported

### `jiff` types

#### `jiff::Date`

- is serialized as Serde strings
- can me mapped to `Utf8`, `LargeUtf8`, `Date32`
- `from_samples` detects
- `LargeUtf8` without configuration
- `Date32` when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing

#### `jiff::Time`

- is serialized as Serde strings
- can me mapped to `Utf8`, `LargeUtf8`, `Time32(..)`, `Time64(..)`
- `from_samples` detects
- `LargeUtf8` without configuration
- `Time64(Nanosecond)` when setitng `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing

#### `jiff::DateTime`

- is serialized as Serde strings
- can me mapped to `Utf8`, `LargeUtf8`, `Timestmap(.., None)`, `Date64` with strategy
`NaiveStrAsDate64`
- `from_samples` detects
- `LargeUtf8` without configuration
- `Date64` with strategy `NaiveStrAsDate64` when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing

#### `jiff::Timestamp`

- is serialized as Serde strings
- can me mapped to `Utf8`, `LargeUtf8`, `Timestamp(.., Some("UTC"))`, `Date64` with strategy
`UtcStrAsDate64`
- `from_samples` detects
- `LargeUtf8` without configuration
- `Date64` with strategy `UtcStrDate64` when setting `guess_dates = true`
- `from_type` is not supported, as the type is not self-describing

#### `jiff::Span`

- is serialized as Serde strings
- can me mapped to `Utf8`, `LargeUtf8`, `Duration(..)`
- `from_samples` detects `LargeUtf8`
- `from_type` is not supported, as the type is not self-describing

#### `jiff::SignedDuration`

Same as `jiff::Span`

#### `jiff::Zoned`

is not supported as there is no clear way of implementation

### `rust_decimal` and `bigdecimal` types

### [`rust_decimal::Decimal`][rust_decimal::Decimal]

- for the `float` and `str` (de)serialization options when using the `Decimal128(..)` data type

### [`bigdecimal::BigDecimal`][bigdecimal::BigDecimal]

- when using the `Decimal128(..)` data type

[chrono-ts-microseconds]: https://docs.rs/chrono/latest/chrono/serde/ts_microseconds/
[rust_decimal::Decimal]: https://docs.rs/rust_decimal/latest/rust_decimal/struct.Decimal.html
[bigdecimal::BigDecimal]: https://docs.rs/bigdecimal/0.4.2/bigdecimal/struct.BigDecimal.html
4 changes: 2 additions & 2 deletions serde_arrow/src/_impl/docs/defs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ pub fn example_arrow_arrays() -> (Vec<crate::_impl::arrow::datatypes::FieldRef>,
let items = example_records();

let fields = Vec::<crate::_impl::arrow::datatypes::FieldRef>::from_type::<Record>(TracingOptions::default()).unwrap();
let arrays = crate::to_arrow(&fields, &items).unwrap();
let arrays = crate::to_arrow(&fields, items).unwrap();

(fields, arrays)
}
Expand All @@ -40,7 +40,7 @@ pub fn example_arrow2_arrays() -> (Vec<crate::_impl::arrow2::datatypes::Field>,
let items = example_records();

let fields = Vec::<crate::_impl::arrow2::datatypes::Field>::from_type::<Record>(TracingOptions::default()).unwrap();
let arrays = crate::to_arrow2(&fields, &items).unwrap();
let arrays = crate::to_arrow2(&fields, items).unwrap();

(fields, arrays)
}
2 changes: 1 addition & 1 deletion serde_arrow/src/arrow2_impl/api.rs
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ impl crate::internal::array_builder::ArrayBuilder {
/// Construct `arrow2` arrays and reset the builder (*requires one of the
/// `arrow2-*` features*)
pub fn to_arrow2(&mut self) -> Result<Vec<Box<dyn Array>>> {
self.to_arrays()?
self.build_arrays()?
.into_iter()
.map(Box::<dyn Array>::try_from)
.collect()
Expand Down
2 changes: 1 addition & 1 deletion serde_arrow/src/arrow_impl/api.rs
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ impl crate::internal::array_builder::ArrayBuilder {
/// Construct `arrow` arrays and reset the builder (*requires one of the
/// `arrow-*` features*)
pub fn to_arrow(&mut self) -> Result<Vec<ArrayRef>> {
self.to_arrays()?
self.build_arrays()?
.into_iter()
.map(ArrayRef::try_from)
.collect()
Expand Down
2 changes: 1 addition & 1 deletion serde_arrow/src/internal/array_builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ impl ArrayBuilder {
self.builder.extend(items)
}

pub(crate) fn to_arrays(&mut self) -> Result<Vec<Array>> {
pub(crate) fn build_arrays(&mut self) -> Result<Vec<Array>> {
let mut arrays = Vec::new();
for field in self.builder.take_records()? {
arrays.push(field.into_array()?);
Expand Down
Loading

0 comments on commit 73b102a

Please sign in to comment.