-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compressor Optimizer #367
base: main
Are you sure you want to change the base?
Compressor Optimizer #367
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #367 +/- ##
============================================
+ Coverage 68.17% 70.42% +2.24%
+ Complexity 1125 1063 -62
============================================
Files 319 305 -14
Lines 12789 12188 -601
Branches 1275 1165 -110
============================================
- Hits 8719 8583 -136
+ Misses 3542 3120 -422
+ Partials 528 485 -43
*This pull request uses carry forward flags. Click here to find out more. |
So to sum up, this is a non-breaking change, we could compress better (by ~7%) if we re-compress the full blob each time we attempt to append a block; but doing so is too slow so you propose to keep the original method, that would give a "preliminary result", and change the result behind the scenes between calls? Wouldn't this have some side effects @jpnovais ? (i.e basically the compressor could say "we compressed block 1, it takes N Bytes, current blob is now at 100kB", then recompress it with more context and update the internal state to "current blob is 98kB" without notifying the coordinator. ). Re implementation, before introducing an async optimizer, I'ld prefer to understand perf constraints better; i.e. how long it takes now, how long it would take if we recompress all the blob at each append, and within what limit we need operate . i.e. if we say the compressor could take as much as XXXms , then we may just want to have a simpler cleaner code and kill this async pattern. |
My input on this optimization is: Context:
My take based on the above:
|
Yep that would make less CPU use to do it only when full . But this last call to "CanWrite" may be 10x slower than the previous calls. |
It's ok to have a call to CanWrite that takes 500ms at the of the blob, as long as the preceding calls are not affected timewise. |
I agree, doing it synchronously at the end is a good idea. In fact it's similar to the "no compress" logic we already have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 I would probably just add one or two test to ensure this is correctly triggered and that the internal state is correctly reset after
Signed-off-by: Arya Tabaie <[email protected]>
What does this mean, "based on how much time the compressor has had to optimize" from the coordinator's PoV? |
This was for the parallel optimizer so it no longer applies. Removing from description. |
This PR implements issue #366.
The compressor now attempts to recompress everything from scratch upon encountering a full blob, before attempting to bypass compression altogether.
Checklist