Skip to content

Commit

Permalink
BREAKING CHANGE: Towards #40, #33 Rename public methods to support sc…
Browse files Browse the repository at this point in the history
…alars + chunks

- Rename ArrowDeserialize to ArrowFieldDeserialize. Similarly for arrow_deserialize
- Rename ArrowSerialize to ArrowFieldSerialize. Similary for arrow_deserialize
- Rename internal trait ArrowArray to ArrowArrayIterator for clarity
  • Loading branch information
ncpenke committed Jun 26, 2022
1 parent f5d13ea commit 9d91aa5
Show file tree
Hide file tree
Showing 7 changed files with 157 additions and 157 deletions.
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ The Arrow ecosystem provides many ways to convert between Arrow and other popula

## Design

Types that implements the `ArrowField`, `ArrowSerialize` and `ArrowDeserialize` traits can be converted to/from Arrow via the `try_into_arrow` and the `try_into_collection` methods.
Types that implements the `ArrowField`, `ArrowFieldSerialize` and `ArrowDeserialize` traits can be converted to/from Arrow via the `try_into_arrow` and the `try_into_collection` methods.

The `ArrowField` implementation for a type defines the Arrow schema. The `ArrowSerialize` and `ArrowDeserialize` implementations provide the conversion logic via arrow2's data structures.
The `ArrowField` implementation for a type defines the Arrow schema. The `ArrowFieldSerialize` and `ArrowDeserialize` implementations provide the conversion logic via arrow2's data structures.

## Features

Expand Down Expand Up @@ -79,11 +79,11 @@ fn test_simple_roundtrip() {

### Similarities with Serde

The design is inspired by serde. The `ArrowSerialize` and `ArrowDeserialize` are analogs of serde's `Serialize` and `Deserialize` respectively.
The design is inspired by serde. The `ArrowFieldSerialize` and `ArrowDeserialize` are analogs of serde's `Serialize` and `Deserialize` respectively.

However unlike serde's traits provide an exhaustive and flexible mapping to the serde data model, arrow2_convert's traits provide a much more narrower mapping to arrow2's data structures.

Specifically, the `ArrowSerialize` trait provides the logic to serialize a type to the corresponding `arrow2::array::MutableArray`. The `ArrowDeserialize` trait deserializes a type from the corresponding `arrow2::array::ArrowArray`.
Specifically, the `ArrowFieldSerialize` trait provides the logic to serialize a type to the corresponding `arrow2::array::MutableArray`. The `ArrowDeserialize` trait deserializes a type from the corresponding `arrow2::array::ArrowArray`.


### Workarounds
Expand All @@ -92,9 +92,9 @@ Features such as partial implementation specialization and generic associated ty

For example custom types need to explicitly enable Vec<T> serialization via the `arrow_enable_vec_for_type` macro on the primitive type. This is needed since Vec<u8> is a special type in Arrow, but without implementation specialization there's no way to special-case it.

Availability of generaic associated types would simplify the implementation for large and fixed types, since a generic MutableArray can be defined. Ideally for code reusability, we wouldn’t have to reimplement `ArrowSerialize` and `ArrowDeserialize` for large and fixed size types since the primitive types are the same. However, this requires the trait functions to take a generic bounded mutable array as an argument instead of a single array type. This requires the `ArrowSerialize` and `ArrowDeserialize` implementations to be able to specify the bounds as part of the associated type, which is not possible without generic associated types.
Availability of generaic associated types would simplify the implementation for large and fixed types, since a generic MutableArray can be defined. Ideally for code reusability, we wouldn’t have to reimplement `ArrowFieldSerialize` and `ArrowDeserialize` for large and fixed size types since the primitive types are the same. However, this requires the trait functions to take a generic bounded mutable array as an argument instead of a single array type. This requires the `ArrowFieldSerialize` and `ArrowDeserialize` implementations to be able to specify the bounds as part of the associated type, which is not possible without generic associated types.

As a result, we’re forced to sacrifice code reusability and introduce a little bit of complexity by providing separate `ArrowSerialize` and `ArrowDeserialize` implementations for large and fixed size types via placeholder structures. This also requires introducing the `Type` associated type to `ArrowField` so that the arrow type can be overriden via a macro field attribute without affecting the actual type.
As a result, we’re forced to sacrifice code reusability and introduce a little bit of complexity by providing separate `ArrowFieldSerialize` and `ArrowDeserialize` implementations for large and fixed size types via placeholder structures. This also requires introducing the `Type` associated type to `ArrowField` so that the arrow type can be overriden via a macro field attribute without affecting the actual type.

## License

Expand Down
126 changes: 63 additions & 63 deletions arrow2_convert/src/deserialize.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ use chrono::{NaiveDate, NaiveDateTime};
use crate::field::*;

/// Implemented by [`ArrowField`] that can be deserialized from arrow
pub trait ArrowDeserialize: ArrowField + Sized
pub trait ArrowFieldDeserialize: ArrowField + Sized
where
Self::ArrayType: ArrowArray,
Self::ArrayType: ArrowArrayIterator,
for<'a> &'a Self::ArrayType: IntoIterator,
{
type ArrayType;

/// Deserialize this field from arrow
fn arrow_deserialize(
fn arrow_field_deserialize(
v: <&Self::ArrayType as IntoIterator>::Item,
) -> Option<<Self as ArrowField>::Type>;

Expand All @@ -25,10 +25,10 @@ where
// Ideally we would be able to capture the optional field of the iterator via
// something like for<'a> &'a T::ArrayType: IntoIterator<Item=Option<E>>,
// However, the E parameter seems to confuse the borrow checker if it's a reference.
fn arrow_deserialize_internal(
fn arrow_field_deserialize_internal(
v: <&Self::ArrayType as IntoIterator>::Item,
) -> <Self as ArrowField>::Type {
Self::arrow_deserialize(v).unwrap()
Self::arrow_field_deserialize(v).unwrap()
}
}

Expand All @@ -38,7 +38,7 @@ where
///
/// The derive macro generates implementations for typed struct arrays.
#[doc(hidden)]
pub trait ArrowArray
pub trait ArrowArrayIterator
where
for<'a> &'a Self: IntoIterator,
{
Expand All @@ -51,11 +51,11 @@ where
// Macro to facilitate implementation for numeric types and numeric arrays.
macro_rules! impl_arrow_deserialize_primitive {
($physical_type:ty, $logical_type:ident) => {
impl ArrowDeserialize for $physical_type {
impl ArrowFieldDeserialize for $physical_type {
type ArrayType = PrimitiveArray<$physical_type>;

#[inline]
fn arrow_deserialize<'a>(v: Option<&$physical_type>) -> Option<Self> {
fn arrow_field_deserialize<'a>(v: Option<&$physical_type>) -> Option<Self> {
v.map(|t| *t)
}
}
Expand All @@ -66,7 +66,7 @@ macro_rules! impl_arrow_deserialize_primitive {

macro_rules! impl_arrow_array {
($array:ty) => {
impl ArrowArray for $array {
impl ArrowArrayIterator for $array {
type BaseArrayType = Self;

fn iter_from_array_ref(b: &dyn Array) -> <&Self as IntoIterator>::IntoIter {
Expand All @@ -80,26 +80,26 @@ macro_rules! impl_arrow_array {
}

// blanket implementation for optional fields
impl<T> ArrowDeserialize for Option<T>
impl<T> ArrowFieldDeserialize for Option<T>
where
T: ArrowDeserialize,
T::ArrayType: 'static + ArrowArray,
T: ArrowFieldDeserialize,
T::ArrayType: 'static + ArrowArrayIterator,
for<'a> &'a T::ArrayType: IntoIterator,
{
type ArrayType = <T as ArrowDeserialize>::ArrayType;
type ArrayType = <T as ArrowFieldDeserialize>::ArrayType;

#[inline]
fn arrow_deserialize(
fn arrow_field_deserialize(
v: <&Self::ArrayType as IntoIterator>::Item,
) -> Option<<Self as ArrowField>::Type> {
Some(Self::arrow_deserialize_internal(v))
Some(Self::arrow_field_deserialize_internal(v))
}

#[inline]
fn arrow_deserialize_internal(
fn arrow_field_deserialize_internal(
v: <&Self::ArrayType as IntoIterator>::Item,
) -> <Self as ArrowField>::Type {
<T as ArrowDeserialize>::arrow_deserialize(v)
<T as ArrowFieldDeserialize>::arrow_field_deserialize(v)
}
}

Expand All @@ -114,74 +114,74 @@ impl_arrow_deserialize_primitive!(i64, Int64);
impl_arrow_deserialize_primitive!(f32, Float32);
impl_arrow_deserialize_primitive!(f64, Float64);

impl ArrowDeserialize for String {
impl ArrowFieldDeserialize for String {
type ArrayType = Utf8Array<i32>;

#[inline]
fn arrow_deserialize(v: Option<&str>) -> Option<Self> {
fn arrow_field_deserialize(v: Option<&str>) -> Option<Self> {
v.map(|t| t.to_string())
}
}

impl ArrowDeserialize for LargeString {
impl ArrowFieldDeserialize for LargeString {
type ArrayType = Utf8Array<i64>;

#[inline]
fn arrow_deserialize(v: Option<&str>) -> Option<String> {
fn arrow_field_deserialize(v: Option<&str>) -> Option<String> {
v.map(|t| t.to_string())
}
}

impl ArrowDeserialize for bool {
impl ArrowFieldDeserialize for bool {
type ArrayType = BooleanArray;

#[inline]
fn arrow_deserialize(v: Option<bool>) -> Option<Self> {
fn arrow_field_deserialize(v: Option<bool>) -> Option<Self> {
v
}
}

impl ArrowDeserialize for NaiveDateTime {
impl ArrowFieldDeserialize for NaiveDateTime {
type ArrayType = PrimitiveArray<i64>;

#[inline]
fn arrow_deserialize(v: Option<&i64>) -> Option<Self> {
fn arrow_field_deserialize(v: Option<&i64>) -> Option<Self> {
v.map(|t| arrow2::temporal_conversions::timestamp_ns_to_datetime(*t))
}
}

impl ArrowDeserialize for NaiveDate {
impl ArrowFieldDeserialize for NaiveDate {
type ArrayType = PrimitiveArray<i32>;

#[inline]
fn arrow_deserialize(v: Option<&i32>) -> Option<Self> {
fn arrow_field_deserialize(v: Option<&i32>) -> Option<Self> {
v.map(|t| arrow2::temporal_conversions::date32_to_date(*t))
}
}

impl ArrowDeserialize for Vec<u8> {
impl ArrowFieldDeserialize for Vec<u8> {
type ArrayType = BinaryArray<i32>;

#[inline]
fn arrow_deserialize(v: Option<&[u8]>) -> Option<Self> {
fn arrow_field_deserialize(v: Option<&[u8]>) -> Option<Self> {
v.map(|t| t.to_vec())
}
}

impl ArrowDeserialize for LargeBinary {
impl ArrowFieldDeserialize for LargeBinary {
type ArrayType = BinaryArray<i64>;

#[inline]
fn arrow_deserialize(v: Option<&[u8]>) -> Option<Vec<u8>> {
fn arrow_field_deserialize(v: Option<&[u8]>) -> Option<Vec<u8>> {
v.map(|t| t.to_vec())
}
}

impl<const SIZE: usize> ArrowDeserialize for FixedSizeBinary<SIZE> {
impl<const SIZE: usize> ArrowFieldDeserialize for FixedSizeBinary<SIZE> {
type ArrayType = FixedSizeBinaryArray;

#[inline]
fn arrow_deserialize(v: Option<&[u8]>) -> Option<Vec<u8>> {
fn arrow_field_deserialize(v: Option<&[u8]>) -> Option<Vec<u8>> {
v.map(|t| t.to_vec())
}
}
Expand All @@ -190,7 +190,7 @@ fn arrow_deserialize_vec_helper<T>(
v: Option<Box<dyn Array>>,
) -> Option<<Vec<T> as ArrowField>::Type>
where
T: ArrowDeserialize + ArrowEnableVecForType + 'static,
T: ArrowFieldDeserialize + ArrowEnableVecForType + 'static,
for<'a> &'a T::ArrayType: IntoIterator,
{
use std::ops::Deref;
Expand All @@ -205,41 +205,41 @@ where
}

// Blanket implementation for Vec
impl<T> ArrowDeserialize for Vec<T>
impl<T> ArrowFieldDeserialize for Vec<T>
where
T: ArrowDeserialize + ArrowEnableVecForType + 'static,
<T as ArrowDeserialize>::ArrayType: 'static,
for<'b> &'b <T as ArrowDeserialize>::ArrayType: IntoIterator,
T: ArrowFieldDeserialize + ArrowEnableVecForType + 'static,
<T as ArrowFieldDeserialize>::ArrayType: 'static,
for<'b> &'b <T as ArrowFieldDeserialize>::ArrayType: IntoIterator,
{
type ArrayType = ListArray<i32>;

fn arrow_deserialize(v: Option<Box<dyn Array>>) -> Option<<Self as ArrowField>::Type> {
fn arrow_field_deserialize(v: Option<Box<dyn Array>>) -> Option<<Self as ArrowField>::Type> {
arrow_deserialize_vec_helper::<T>(v)
}
}

impl<T> ArrowDeserialize for LargeVec<T>
impl<T> ArrowFieldDeserialize for LargeVec<T>
where
T: ArrowDeserialize + ArrowEnableVecForType + 'static,
<T as ArrowDeserialize>::ArrayType: 'static,
for<'b> &'b <T as ArrowDeserialize>::ArrayType: IntoIterator,
T: ArrowFieldDeserialize + ArrowEnableVecForType + 'static,
<T as ArrowFieldDeserialize>::ArrayType: 'static,
for<'b> &'b <T as ArrowFieldDeserialize>::ArrayType: IntoIterator,
{
type ArrayType = ListArray<i64>;

fn arrow_deserialize(v: Option<Box<dyn Array>>) -> Option<<Self as ArrowField>::Type> {
fn arrow_field_deserialize(v: Option<Box<dyn Array>>) -> Option<<Self as ArrowField>::Type> {
arrow_deserialize_vec_helper::<T>(v)
}
}

impl<T, const SIZE: usize> ArrowDeserialize for FixedSizeVec<T, SIZE>
impl<T, const SIZE: usize> ArrowFieldDeserialize for FixedSizeVec<T, SIZE>
where
T: ArrowDeserialize + ArrowEnableVecForType + 'static,
<T as ArrowDeserialize>::ArrayType: 'static,
for<'b> &'b <T as ArrowDeserialize>::ArrayType: IntoIterator,
T: ArrowFieldDeserialize + ArrowEnableVecForType + 'static,
<T as ArrowFieldDeserialize>::ArrayType: 'static,
for<'b> &'b <T as ArrowFieldDeserialize>::ArrayType: IntoIterator,
{
type ArrayType = FixedSizeListArray;

fn arrow_deserialize(v: Option<Box<dyn Array>>) -> Option<<Self as ArrowField>::Type> {
fn arrow_field_deserialize(v: Option<Box<dyn Array>>) -> Option<<Self as ArrowField>::Type> {
arrow_deserialize_vec_helper::<T>(v)
}
}
Expand All @@ -263,21 +263,21 @@ where
fn try_into_collection(self) -> arrow2::error::Result<Collection>;
fn try_into_collection_as_type<ArrowType>(self) -> arrow2::error::Result<Collection>
where
ArrowType: ArrowDeserialize + ArrowField<Type = Element> + 'static,
for<'b> &'b <ArrowType as ArrowDeserialize>::ArrayType: IntoIterator;
ArrowType: ArrowFieldDeserialize + ArrowField<Type = Element> + 'static,
for<'b> &'b <ArrowType as ArrowFieldDeserialize>::ArrayType: IntoIterator;
}

/// Helper to return an iterator for elements from a [`arrow2::array::Array`].
fn arrow_array_deserialize_iterator_internal<'a, Element, Field>(
b: &'a dyn arrow2::array::Array,
) -> arrow2::error::Result<impl Iterator<Item = Element> + 'a>
where
Field: ArrowDeserialize + ArrowField<Type = Element> + 'static,
for<'b> &'b <Field as ArrowDeserialize>::ArrayType: IntoIterator,
Field: ArrowFieldDeserialize + ArrowField<Type = Element> + 'static,
for<'b> &'b <Field as ArrowFieldDeserialize>::ArrayType: IntoIterator,
{
Ok(
<<Field as ArrowDeserialize>::ArrayType as ArrowArray>::iter_from_array_ref(b)
.map(<Field as ArrowDeserialize>::arrow_deserialize_internal),
<<Field as ArrowFieldDeserialize>::ArrayType as ArrowArrayIterator>::iter_from_array_ref(b)
.map(<Field as ArrowFieldDeserialize>::arrow_field_deserialize_internal),
)
}

Expand All @@ -286,8 +286,8 @@ pub fn arrow_array_deserialize_iterator_as_type<'a, Element, ArrowType>(
) -> arrow2::error::Result<impl Iterator<Item = Element> + 'a>
where
Element: 'static,
ArrowType: ArrowDeserialize + ArrowField<Type = Element> + 'static,
for<'b> &'b <ArrowType as ArrowDeserialize>::ArrayType: IntoIterator,
ArrowType: ArrowFieldDeserialize + ArrowField<Type = Element> + 'static,
for<'b> &'b <ArrowType as ArrowFieldDeserialize>::ArrayType: IntoIterator,
{
if &<ArrowType as ArrowField>::data_type() != arr.data_type() {
Err(arrow2::error::Error::InvalidArgumentError(
Expand All @@ -306,16 +306,16 @@ pub fn arrow_array_deserialize_iterator<'a, T>(
arr: &'a dyn arrow2::array::Array,
) -> arrow2::error::Result<impl Iterator<Item = T> + 'a>
where
T: ArrowDeserialize + ArrowField<Type = T> + 'static,
for<'b> &'b <T as ArrowDeserialize>::ArrayType: IntoIterator,
T: ArrowFieldDeserialize + ArrowField<Type = T> + 'static,
for<'b> &'b <T as ArrowFieldDeserialize>::ArrayType: IntoIterator,
{
arrow_array_deserialize_iterator_as_type::<T, T>(arr)
}

impl<'a, Collection, Element, ArrowArray> TryIntoCollection<Collection, Element> for ArrowArray
where
Element: ArrowDeserialize + ArrowField<Type = Element> + 'static,
for<'b> &'b <Element as ArrowDeserialize>::ArrayType: IntoIterator,
Element: ArrowFieldDeserialize + ArrowField<Type = Element> + 'static,
for<'b> &'b <Element as ArrowFieldDeserialize>::ArrayType: IntoIterator,
ArrowArray: std::borrow::Borrow<dyn Array>,
Collection: FromIterator<Element>,
{
Expand All @@ -325,8 +325,8 @@ where

fn try_into_collection_as_type<ArrowType>(self) -> arrow2::error::Result<Collection>
where
ArrowType: ArrowDeserialize + ArrowField<Type = Element> + 'static,
for<'b> &'b <ArrowType as ArrowDeserialize>::ArrayType: IntoIterator,
ArrowType: ArrowFieldDeserialize + ArrowField<Type = Element> + 'static,
for<'b> &'b <ArrowType as ArrowFieldDeserialize>::ArrayType: IntoIterator,
{
Ok(
arrow_array_deserialize_iterator_as_type::<Element, ArrowType>(self.borrow())?
Expand Down
2 changes: 1 addition & 1 deletion arrow2_convert/src/field.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ use chrono::{NaiveDate, NaiveDateTime};
///
/// The trait simply requires defining the [`ArrowField::data_type`]
///
/// Serialize and Deserialize functionality requires implementing the [`crate::ArrowSerialize`]
/// Serialize and Deserialize functionality requires implementing the [`crate::ArrowFieldSerialize`]
/// and the [`crate::ArrowDeserialize`] traits respectively.
pub trait ArrowField {
/// This should be `Self` except when implementing large offset and fixed placeholder types.
Expand Down
Loading

0 comments on commit 9d91aa5

Please sign in to comment.