-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support conversion of rust struct to an arrow2 chunk #40
Comments
@jorgecarleitao @nielsmeima any thoughts on this? |
I agree that the root is more natural to be a
Arrow2 supports Note that in general
AFAIK we do allow a record (not an item) to be nullable, and thus the correspondence here is even simpler. Could we just expose a method "into_chunk"? It seems that we have all ingredients? |
We definitely have all the ingredients. The open question I still have is for a struct In jorgecarleitao/arrow2#1092, it was the later. Maybe this is what you're proposing too. Do you think we should provide a helper somewhere, if needed, to go from As an aside I think we should unify all the conversion methods |
…alars + chunks - Rename ArrowDeserialize to ArrowFieldDeserialize. Similarly for arrow_deserialize - Rename ArrowSerialize to ArrowFieldSerialize. Similary for arrow_deserialize - Rename internal trait ArrowArray to ArrowArrayIterator for clarity
I think we can offer a chunk with a single |
Seemed more convenient to provide a one-shot conversion, thanks for reviewing the PR! Let's see if we get more user feedback with this approach. I'll also add some examples for flight and parquet conversion in this repo. |
@ncpenke did you opt for providing such a helper? How would one currently differentiate between wanting to obtain a I would still be interested in a |
Thanks for following up @nielsmeima. You're right the current implementation always resolves to the first variant. We can add a helper to this crate to facilitate wrapping the fields of a I opened #55 with a proposal. Would be thrilled if you want to take a stab it. |
I will check the proposal and take a stab at implementation in the coming days. Thrilled to do so! |
Created from the discussion in jorgecarleitao/arrow2#1092.
A rust struct can conceptually represent either an Arrow
Struct
or anarrow2::Chunk
(a column group). Thearrow2::Chunk
is important since it's used in the deserialization/serialization API for parquet and flight conversion.We can extend the
arrow2_convert::TryIntoArrow
andarrow2_convert::FromArrow
traits to convert to/fromarrow2::Chunk
, but there are two possible mappings from a vector of structs,Vec<S>
toChunk
:Chunk
has a single field of typeStruct
Chunk
contains the same number of fields as the struct.1 can be easily supported by wrapping the an
arrow2::Array
in aChunk
.2 has a couple of approaches:
a. A new derive macro to generate the mapping to a Chunk (eg.
ArrowChunk
orArrowRoot
).b. Providing a helper method to convert a
arrow2::StructArray
to aChunk
by unwrapping the fields.One related use-case that could guide this design is to support generic typed versions of the arrow2 csv, json, parquet, and flight serialize/deserialize methods, where the schema is specified by a rust struct (opened #41 for this). To achieve this, it would be useful to access the deserialize/serialize methods of each column separately for parallelism which is cleaner via 2a.
The text was updated successfully, but these errors were encountered: