-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming from_record_batch to avoid allocations of Vec<Record> #253
Comments
Hi @patcollis34. As of today, no there is no way to iterate over the records, but I agree it would make a nice addition to the overall API. I plan to support random access deserialization to fix #250. This change would trivially allow to implement an iterator as well. However, I am bit swamped at the moment and don't really have the bandwith to work on major features. If it's urgent, you could implement your own deserializer as a quick workaround (only written in GH and completely untested): #[derive(Default)]
struct RecordMap(HashMap<Record, f32>);
impl<'de> Deserialize<'de> for RecordMap {
fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
let mut res = Self::default();
deserializer.visit_seq(&mut res);
Ok(res)
}
}
impl<'de, 'a> serde::Visitor<'de> for &'a mut RecordMap {
type Value = ();
fn visit_seq<A: SeqAccess<'de>>(self, seq: A) -> Result<Self::Value, A::Error> {
while let Some(record) = seq.next_element::<Record>()? {
let value = record.value;
self.0.insert(record, value);
}
Ok(())
}
} |
Great, thank you! I'll try this out and will stay tuned for new releases |
I would prefer to keep the issue open as a reminder that this feature is on the todo list :) |
Hi, is there was a was to get an iterator from the example code here for serde_arrow::from_record_batch. Or is there a recommended way to reduce allocations of that vector? Looking to get an arrow file into a hashmap of HashMap<Record, f32>, where Record is some columns from the dataframe and the f32 is a different column from the dataframe. The example code is a very clean, generic way of doing it but the allocations of that intermediate vectors is a large performance hit and memory intensive. Any help is appreciated, if this isn't the best place to discuss I can move elsewhere or can read any docs you would recommend to help me to solve on my own
The text was updated successfully, but these errors were encountered: