-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Leverage page recycler for blob fetch requests #16841
Comments
This is an interesting idea. Two questions come to mind thinking about leveraging
|
Hi @finnegancarroll , thank you for the questions, I will try my best to answer those. Also now when I think about it I should probably change this to an RFC instead of "Bug" as it might have project wide impact.
We can have the buffer page cache starts with a pool. Now if we exhaust the pool there are several options which we can make configurable for the user:
In either case the result should be the minimization and utilization of buffers across the JVM process to avoid malloc and GC on the hot path.
Buffer pool (e.g. in the form of PageCacheRecycler in this case) need to be shared for all use cases. |
High Level
On the topic of Java/technical direction related to performance improvements.
Since OpenSearch is a core Java project it is important to always be mindful of GC especially in core.
The core pattern that was initially in the original project favors the use of
PageCacheRecycler
as some sort of a shared buffer pool across the JVM process for the following reasons:malloc
and GC on the hot path.Background
I noticed some of the latest implementations of remote storage etc.. are generating objects on the fly in the hot path for retrieving chunks of blobs.
For example a lot of requests like this (see reference):
This might not be a big issue to do on coarse granularity where we the number of requests is not proportional to the size of data retrieved. However, in this case the number of block/chunk level of blobs fetched can be directly proportional to the ration of
fileSize/blockSize
. For small enoughblockSize
this becomes an anti pattern as it might generate a lot of GC on the hot path.Previously for this type of scenarios to avoid unnecessary malloc followed by unnecessary GC on the hot path, the project was using the PageRecycler class.
Also notice that this
Builder
pattern is quite wasteful for hot path also in terms of CPU branching as it forces at least 5 jumps. An allocator pattern also helps avoid with that by forcing a better practice of resource creation.Proposed Solution
Leverage the PageRecycler or similar class to avoid such unnecessary penalty of malloc and GC.
For example instead of:
we will have this:
The latter is more complex but provides guarantees and predictability around GC and malloc on the hot path. Or to put in different words, it will not have a penalty of neither malloc or GC.
Another benefit specific to this case is the avoidance of unnecessary CPU branching as there currently is in the builder pattern. The existing create method with the builder performs at least 5 method jumps.
Moreover, when we are done with a set of objects we can release their pages back with an interface of the form:
pageRecycler.release(Object[] obj)
Expected behavior
The expected behavior is not to see any malloc or GC activity during blockFetchRequest.
The text was updated successfully, but these errors were encountered: