Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Leverage page recycler for blob fetch requests #16841

Open
sam-herman opened this issue Dec 12, 2024 · 2 comments
Open

[RFC] Leverage page recycler for blob fetch requests #16841

sam-herman opened this issue Dec 12, 2024 · 2 comments
Labels
bug Something isn't working Storage:Remote untriaged

Comments

@sam-herman
Copy link
Contributor

sam-herman commented Dec 12, 2024

High Level

On the topic of Java/technical direction related to performance improvements.
Since OpenSearch is a core Java project it is important to always be mindful of GC especially in core.
The core pattern that was initially in the original project favors the use of PageCacheRecycler as some sort of a shared buffer pool across the JVM process for the following reasons:

  1. Avoid malloc and GC on the hot path.
  2. Buffer pool is a central place for memory governance.

Background

I noticed some of the latest implementations of remote storage etc.. are generating objects on the fly in the hot path for retrieving chunks of blobs.
For example a lot of requests like this (see reference):

        BlobFetchRequest blobFetchRequest = BlobFetchRequest.builder()
            .blobParts(getBlobParts(blockStart, blockEnd))
            .directory(directory)
            .fileName(blockFileName)
            .build();

This might not be a big issue to do on coarse granularity where we the number of requests is not proportional to the size of data retrieved. However, in this case the number of block/chunk level of blobs fetched can be directly proportional to the ration of fileSize/blockSize. For small enough blockSize this becomes an anti pattern as it might generate a lot of GC on the hot path.
Previously for this type of scenarios to avoid unnecessary malloc followed by unnecessary GC on the hot path, the project was using the PageRecycler class.

Also notice that this Builder pattern is quite wasteful for hot path also in terms of CPU branching as it forces at least 5 jumps. An allocator pattern also helps avoid with that by forcing a better practice of resource creation.

Proposed Solution

Leverage the PageRecycler or similar class to avoid such unnecessary penalty of malloc and GC.
For example instead of:

        final String blockFileName = fileName + "." + blockId;

        final long blockStart = getBlockStart(blockId);
        final long blockEnd = blockStart + getActualBlockSize(blockId);

        BlobFetchRequest blobFetchRequest = BlobFetchRequest.builder()
            .blobParts(getBlobParts(blockStart, blockEnd))
            .directory(directory)
            .fileName(blockFileName)
            .build();

we will have this:

    final char[] blockFileName = PageRecycler.getBufferForString(80);
    final BlobFetchRequestRef ref = PageRecycler.getRequestPlaceholder();
    final long blockStart = getBlockStart(blockId);
    final long blockEnd = blockStart + getActualBlockSize(blockId);
    populateRequest(blockFileName, blockStart, blockEnd, ref);

The latter is more complex but provides guarantees and predictability around GC and malloc on the hot path. Or to put in different words, it will not have a penalty of neither malloc or GC.
Another benefit specific to this case is the avoidance of unnecessary CPU branching as there currently is in the builder pattern. The existing create method with the builder performs at least 5 method jumps.

Moreover, when we are done with a set of objects we can release their pages back with an interface of the form:
pageRecycler.release(Object[] obj)

Expected behavior

The expected behavior is not to see any malloc or GC activity during blockFetchRequest.

@finnegancarroll
Copy link
Contributor

This is an interesting idea. Two questions come to mind thinking about leveraging PageCacheRecycler here.

  1. If our PageCacheRecycler fills up how do handle incoming requests? Would we fall back to regular heap?

  2. How can we determine when an element in our cache can be freed? In the specific case of OnDemandBlockSnapshotIndexInput and BlobFetchRequest this memory appears to stick around for a bit as TransferManager.java holds onto a copy in its internal FileCache.

@sam-herman
Copy link
Contributor Author

sam-herman commented Dec 13, 2024

Hi @finnegancarroll , thank you for the questions, I will try my best to answer those. Also now when I think about it I should probably change this to an RFC instead of "Bug" as it might have project wide impact.

If our PageCacheRecycler fills up how do handle incoming requests? Would we fall back to regular heap?

We can have the buffer page cache starts with a pool. Now if we exhaust the pool there are several options which we can make configurable for the user:

  1. Just fail and block until new buffer is free. Good for users that are more sensitive to resource governance and footprint
  2. Expand the buffer for as much as we need and with lack of activity we can shrink the pool back overtime. Similar approach to the adaptive thread pool executor with starting pool and policies around expanding it.

In either case the result should be the minimization and utilization of buffers across the JVM process to avoid malloc and GC on the hot path.
Note: we might have some areas that GC/malloc occurs in the hot path, but it can't be directly proportional to the size of data (e.g. number of block fetch requests)

How can we determine when an element in our cache can be freed? In the specific case of OnDemandBlockSnapshotIndexInput and BlobFetchRequest this memory appears to stick around for a bit as TransferManager.java holds onto a copy in its internal FileCache.

Buffer pool (e.g. in the form of PageCacheRecycler in this case) need to be shared for all use cases.
We need to release unused pages back to cache when we are done, we can leverage interface in the form of pageRecycler.release(Object[] objs)

@sam-herman sam-herman changed the title [BUG] Leverage page recycler for blob fetch requests [RFC] Leverage page recycler for blob fetch requests Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Storage:Remote untriaged
Projects
Status: 🆕 New
Development

No branches or pull requests

2 participants