-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust): BinaryView
/Utf8View
IPC support
#13464
Conversation
BinaryView
/Utf8View
IPC support
FWIW, this has been decided (and implemented in Arrow C++) for a while: see apache/arrow#38443 (but I see that update wasn't shared on the mailing list thread ..) So the Arrow spec decided to store the lengths as the last buffer, instead of before the variadic buffers (https://arrow.apache.org/docs/dev/format/CDataInterface.html#binary-view-arrays)
The struct stores the number of buffers ( |
Also note that this is for the C Data Interface, not IPC. In the IPC message, my understanding is that there shouldn't be a additional buffer with lengths (the Buffer flatbuffer struct itself contains a length), there is only a Now, didn't check the diff of the PR in detail to see if that's what you already did. |
Hey Joris, Good to know there has been a decision. The polars flavored IPC is only a intermediate step until we know how Apache arrow will deal with it. I will see if I can implement it in the way you describe (I wasn't aware the flatbuffer had the length per buffer). Anywhere we can chat if I encounter anything? |
There is a zulip instance where several of the Arrow (C++) developers are active: https://ursalabs.zulipchat.com/ (I think there is also a general Apache slack, but I don't know if there are many Arrow developers on there) I am also on discord if you want to reach me personally. |
Can you give me a ping? I don't know your handle. ^^. My handle is ritchie46. PS. Works great with the buffers, thanks for the nudge: #13842 |
I should have sent a friend request yesterday (but not super familiar with how discord works ;)) |
Implements IPC for
BinaryView
andUtf8View
. The project isn't certain yet where to store buffer lengths. https://lists.apache.org/thread/5c7lzg9x1t4bbtvd3sxq4pwxtbkw5pxqFor now we store the buffers like this:
I believe the
buffer_lengths
should always be before thevariadic_buffers
otherwise we don't know the length with which to read them and the we don't know at which offset we should read thebuffer_lengths
if they were last. If the arrow projects comes up with something else, we can always change. For now we will support this as an opt-in Polars flavored IPC.