-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[badger] Badger retains ownership of retrieved value buffers #4847
Comments
I think the right way forward with this is to change the datastore interfaces to allow the caller to operate on the data within a closure. This way we could unmarshal our objects directly from the underlying buffer, instead of making a copy that we only use to call unmarshal with. That aside, I'm curious to see what the cpu usage between badger and flatfs would be for similar operations. I'm thinking it will still be much better despite the extra copy. |
As far as I know the reason they do that is because the data you get is not copied, it is a slice backed by mmap of a segment of a file. They don't mmap whole files, nor all of them. After the closure is done they can change the mapping to different segment or file. This also means that they do not allocate slices, they just allocate slice header and it is possible that those headers are allocated on stack (no GC pressure) if everything is done correctly. But I am not 100percent sure about that. |
@Kubuxu Thanks for pointing me to With respect to buffer ownership, having slices pointing to the same underlying array as the As you said, the
Yes, |
I think @whyrusleeping's suggestion is the right way to approach the problem. |
Yes, from what I'm seeing in the |
@schomatis Yeah, i've been wanting to do a refactor for a while to make this better: ipfs/go-datastore#64 |
This is true unless badger closes the underlying file. We should check this. |
Yes, I'm raising an issue in Badger to check with them and discuss this further. |
@schomatis afaik, their structure doesn't allow them to write them to the same table (that is what they call mmap'able segments) if there are readers still open (so valid buffers). |
@Kubuxu Could you elaborate a bit more on that please? I'm not sure I'm understanding your last comment. |
There are |
Writing the issue for Badger I realized the dependency is not so much in the underlying array itself but in the mapped address space of the log file the array is pointing to. Closing the file itself is not a problem but Badger eventually unmaps the address space causing invalid references, so the buffer can't be held indefinitely by upper layers. @Kubuxu mentioned something of the sort about mapped regions being reused but I didn't worry as the This doesn't seem to be a problem in the case of accessing log files through standard I/O (where buffers are created specifically for the read operation) but I understand that performs much lower than memory-mapping, defeating the initial purpose of this issue. I'm closing this issue in favor of @whyrusleeping suggestion to use closures as |
Badger's
Value
used to obtain the buffer with the content of aGet
query indicates that the buffer's content is valid only within the domain of the transaction where the query was performed. This causes IPFS to (correctly)copy
its value to decouple thedatastore
(which entails the Badger transaction model) from the upper layers.The result is a severe performance impact (hence I'm assigning this issue a high priority, my assessment could be exaggerated and this might be adjusted later). The CPU time spent on
memmove_amd64.s
is in the same order of magnitude (and sometimes even bigger) than theget
operation itself from thelevelHandler
which should be the core consumer of CPU time at this point in the program. I'll report some concretepprof
statistics to quantify this impact.I need to research this further with Badger's people to understand why they recycle buffer slices instead of relying on Go's GC system (the obvious advantage seems to be avoiding the re-allocation of them but there may be a more fundamental reason) and to see if the API could be generalized to permit the user to take ownership of the buffer it receives, the restriction of operating within the domain of the current transaction seems a bit too severe.
Compare this situation with the current default
flatfs
datastore which just returns the buffer received fromReadFile
without making a decouplingcopy
call, as this buffer was created specifically for the read operation inreadAll
, so there is no extra performance penalty for its use outside the datastore context (flatfs
is still slower thanbadger
for other reasons regarding its disk write performance).The text was updated successfully, but these errors were encountered: