-
Notifications
You must be signed in to change notification settings - Fork 223
Reduce the overhead of DataType
s
#1469
base: main
Are you sure you want to change the base?
Conversation
Thanks for the PR. This hits a lot of code so I want to tune in @jorgecarleitao as well.
Could you elaborate a bit on this? What did you benchmark? How was the data type such a huge bottleneck? What is the new data type size and what was the old one? Somewhat related, In polars we use |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #1469 +/- ##
==========================================
+ Coverage 83.39% 83.96% +0.57%
==========================================
Files 391 387 -4
Lines 43008 41739 -1269
==========================================
- Hits 35867 35048 -819
+ Misses 7141 6691 -450 ☔ View full report in Codecov by Sentry. |
Any chance to get this merged? We do a lot of cloning of datatypes, and the memory use adds up extremely quickly. Using |
ec5b7f2
to
40541b4
Compare
See jorgecarleitao#1469 --------- Co-authored-by: Clement Rey <[email protected]>
Fixes #439
The entire PR pretty much comes down to this diff:
everything else is just a lot of grunt work and pain to accommodate for these new types.
As mentioned in #439 (comment): I went for the path of least resistance, so this isn't optimal, but it is already quite the improvement.
I have branches ready for
arrow2_convert
,polars
andrerun
.In Rerun, we've seen up to 50% reduced memory requirements in some use cases with this PR.