-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient storage of integer G3Map and G3Vector objects #140
Conversation
Use int64_t for in-memory representation of integer values, but store as 8-, 16- or 32- bit integers on disk, depending on bit depth of the underlying data. Backwards compatible with v1 int32 G3Map objects. Closes #122.
Tagging @mhasself here, in case you have opinions on this implementation, since you wrote the G3VectorInt implementation this is copied from. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tag! Looks good to me. There's a lot of repetitive code now ... but if the alternative is template mazes then that's not always better.
Couple suggestions in line, but don't stop this train for me.
Compute store_bits separately for each vector in G3MapVectorInt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor code comments in line. How much time does bit_count() take? Is there any meaningful run time impact from this change?
The code is a little spaghetti-ish, but I don't see a viable way to avoid that.
I haven't profiled this -- @mhasself do you know how efficient the |
For the record, no I do not. But it's probably like, 3 assembly instructions (in the loop), and seems like the sort of thing that would pipeline pretty well (despite the ifs). There might be a faster version where you accumulate the +ve and -ve data into separate masks and combine them at the end. An interesting question for a less busy day. |
Against my better schedule management ... here is a (relative) profiling. The bit_count function takes 10 ns / sample on my laptop. Tests done with 1k to 1M elements. It seems to take about 10% more time than this simple accumulation function:
So I think it's close enough to maxed out. |
Cool, good to know! |
Use
int64_t
for in-memory representation of integer values, but store as 8-, 16-, 32- or 64-bit integers on disk, depending on the bit depth of the underlying data. Use a consistent ABI forG3Vector
andG3Map
serialization, and ensure backwards compatibility for v1int32_t
objects on disk.Closes #122.