Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[badger] Badger retains ownership of retrieved value buffers #4847

Closed
schomatis opened this issue Mar 20, 2018 · 12 comments
Closed

[badger] Badger retains ownership of retrieved value buffers #4847

schomatis opened this issue Mar 20, 2018 · 12 comments

Comments

@schomatis
Copy link
Contributor

Badger's Value used to obtain the buffer with the content of a Get query indicates that the buffer's content is valid only within the domain of the transaction where the query was performed. This causes IPFS to (correctly) copy its value to decouple the datastore (which entails the Badger transaction model) from the upper layers.

The result is a severe performance impact (hence I'm assigning this issue a high priority, my assessment could be exaggerated and this might be adjusted later). The CPU time spent on memmove_amd64.s is in the same order of magnitude (and sometimes even bigger) than the get operation itself from the levelHandler which should be the core consumer of CPU time at this point in the program. I'll report some concrete pprof statistics to quantify this impact.

I need to research this further with Badger's people to understand why they recycle buffer slices instead of relying on Go's GC system (the obvious advantage seems to be avoiding the re-allocation of them but there may be a more fundamental reason) and to see if the API could be generalized to permit the user to take ownership of the buffer it receives, the restriction of operating within the domain of the current transaction seems a bit too severe.

Compare this situation with the current default flatfs datastore which just returns the buffer received from ReadFile without making a decoupling copy call, as this buffer was created specifically for the read operation in readAll, so there is no extra performance penalty for its use outside the datastore context (flatfs is still slower than badger for other reasons regarding its disk write performance).

@whyrusleeping
Copy link
Member

I think the right way forward with this is to change the datastore interfaces to allow the caller to operate on the data within a closure. This way we could unmarshal our objects directly from the underlying buffer, instead of making a copy that we only use to call unmarshal with.

That aside, I'm curious to see what the cpu usage between badger and flatfs would be for similar operations. I'm thinking it will still be much better despite the extra copy.

@Kubuxu
Copy link
Member

Kubuxu commented Mar 21, 2018

As far as I know the reason they do that is because the data you get is not copied, it is a slice backed by mmap of a segment of a file. They don't mmap whole files, nor all of them. After the closure is done they can change the mapping to different segment or file.

This also means that they do not allocate slices, they just allocate slice header and it is possible that those headers are allocated on stack (no GC pressure) if everything is done correctly.

But I am not 100percent sure about that.

@schomatis schomatis changed the title [badger] [P1 - High] Badger retains ownership of retrieved value buffers [badger] Badger retains ownership of retrieved value buffers Mar 21, 2018
@schomatis
Copy link
Contributor Author

@Kubuxu Thanks for pointing me to mmap. What I was seeing as a CPU intensive memory copy was actually the kernel reading the file and loading those memory pages during the copy operation, the pprof statistics pointed to the VMOVDQU (SI), Y0 instruction (and related) in charge of loading the source memory as the most expensive ones and pprof doesn't have a way to know that it is actually the kernel at work (and I should have known better). I'm removing the P1 tag, the copy is still an expensive operation (especially the malloc'ing of new buffers) but it is no longer so easy to distinguish its real cost (if I profile a DB with no mmap, the alloc/copy operation CPU time is still a 1/5th of the reading file calls, so it's definitively not negligible).

With respect to buffer ownership, having slices pointing to the same underlying array as the fmap of a value log file doesn't seem like enough of a constraint, in the next mmap a new array will be created and the buffer returned by Get still will point to the old (valid) one, Go GC can take care of when to reclaim it. The mapped regions are read-only (as far as I've seen) so the array won't be modified (except for other consumers of Get, but that's another issue -- that should be documented in the Badger's API), the downside I can see is that (for example) a 2 MB mapped region may remain in memory just because someone is holding a 1 KB slice of it, which could potentially cause some memory bloat.

As you said, the fmap slice is created from an mmap pointer through the unsafe API, but I'm guessing the GC will still take care of that pointer even when converted to a slice.

https://github.com/golang/sys/blob/0deb464c5a1b48110355376f934e733e43b483bf/unix/syscall_unix.go#L77-L91

@whyrusleeping

That aside, I'm curious to see what the cpu usage between badger and flatfs would be for similar operations. I'm thinking it will still be much better despite the extra copy.

Yes, badger is clearly much more efficient than flatfs (which is reading all around the file system for the scattered files/blocks), I only mentioned it to illustrate the difference between buffer ownerships, but despite all these (minor) analyses badger is definitively the way to go.

@vyzo
Copy link
Contributor

vyzo commented Mar 21, 2018

I think @whyrusleeping's suggestion is the right way to approach the problem.
Why make a copy if we are only going to unmarhsal data out of it? Better to just pass a closure to operate on the buffer directly.

@schomatis
Copy link
Contributor Author

Yes, from what I'm seeing in the PBNode.Unmarshal function a new array is being created for the extracted data so there would be a buffer isolation (making the previous copy unnecessary), and I'm assuming all the protobuf family of unmarshalers work the same. Still I would like to have a clear API boundary of the ownership of the buffers being handled here.

@whyrusleeping
Copy link
Member

@schomatis Yeah, i've been wanting to do a refactor for a while to make this better: ipfs/go-datastore#64

@Stebalien
Copy link
Member

With respect to buffer ownership, having slices pointing to the same underlying array as the fmap of a value log file doesn't seem like enough of a constraint

This is true unless badger closes the underlying file. We should check this.

@schomatis
Copy link
Contributor Author

This is true unless badger closes the underlying file. We should check this.

Yes, I'm raising an issue in Badger to check with them and discuss this further.

@Kubuxu
Copy link
Member

Kubuxu commented Mar 21, 2018

@schomatis afaik, their structure doesn't allow them to write them to the same table (that is what they call mmap'able segments) if there are readers still open (so valid buffers).

@schomatis
Copy link
Contributor Author

@Kubuxu Could you elaborate a bit more on that please? I'm not sure I'm understanding your last comment.

@Kubuxu
Copy link
Member

Kubuxu commented Mar 21, 2018

There are Write and Get (or some other names in badger) operations.
While Get operation is working on some table (internal shard in badger) other Get operations can also use that table but to Write into the table all Get operation closures have to finish before it is allowed to proceed. This is what I understood.

@schomatis
Copy link
Contributor Author

Writing the issue for Badger I realized the dependency is not so much in the underlying array itself but in the mapped address space of the log file the array is pointing to. Closing the file itself is not a problem but Badger eventually unmaps the address space causing invalid references, so the buffer can't be held indefinitely by upper layers. @Kubuxu mentioned something of the sort about mapped regions being reused but I didn't worry as the addr used in mmap were always NULL, but the regions will always be eventually unmapped (which is the correct behavior).

This doesn't seem to be a problem in the case of accessing log files through standard I/O (where buffers are created specifically for the read operation) but I understand that performs much lower than memory-mapping, defeating the initial purpose of this issue.

I'm closing this issue in favor of @whyrusleeping suggestion to use closures as Unmarshal will always force a copy anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants