Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CacheMaintainer class to perform pending cache maintenance every minute #2308

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

owenhalpert
Copy link

Description

Adds a CacheMaintainer class that takes in a generic Guava cache and calls cleanUp periodically (every minute). This will perform any pending maintenance (such as evicting expired entries) which was previously only performed when the cache was accessed. The maintenance thread is created whenever a NativeMemoryCache or QuantizationStateCache is instantiated or rebuilt and can be shut down with either class's close method. Relevant logic for cleanup was added to some testing base classes.

Related Issues

Resolves #2239

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@kotwanikunal
Copy link
Member

@owenhalpert The changes look good in general. What I'd be interested in is testing under load for a resource constrained system - can you verify if this adds to the latency or impact the performance in any way?

We did implement a force evict before writes with #2015. Can you also enable this feature flag and run the above tests to ensure it behaves well?

* for more details. Thus, to perform any pending maintenance, the cleanUp method will be called periodically from a CacheMaintainer instance.
*/
public class CacheMaintainer<K, V> implements Closeable {
private final Cache<K, V> cache;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also avoid maintaining the Cache object reference here by using a functional interface. That would also get rid of the generification of this class.

Simply pass and store the runnable reference instead of the cache as new CacheMaintainer(() -> cache.cleanUp());

Possibly also move this class to the util package and call it a ScheduledExecutor with Runnable ref and interval as parameters.

public class ScheduledExecutor implements Closeable {

...

  public ScheduledExecutor(Runnable reference, long scheduleMillis) {
  ...
  }

  ...
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kotwanikunal what do you think about creating the executor and calling scheduleAtFixedRate within each cache class instead of creating a new ScheduledExecutor class?

…an accept a Runnable and an interval

Signed-off-by: owenhalpert <[email protected]>
…an accept a Runnable and an interval

Signed-off-by: owenhalpert <[email protected]>
@owenhalpert
Copy link
Author

@kotwanikunal I've completed the benchmarking on a single node cluster limited to 3GB of memory.

Before benchmarking my code, I validated the CacheMaintainer was actually running by inspecting the OpenSearch process status in the Docker container and found that after about a minute, the RssAnon value would decrease by about 0.5gb without any interference on my part, signaling the maintainer successfully cleaned up the expired entries. This is aligned with what I saw on the standard maintenance on the clean code (which I triggered manually by accessing the cache).

Results below:

Performance Summary for Resource-Constrained Testing (3GB Memory Limit)

Run 1: Clean 2.18 code

  • Indexing:
    • p50 latency: 16.96 ms
    • p90 latency: 32.47 ms
  • Search:
    • p50 latency: 330.48 ms
    • p90 latency: 437.78 ms

Run 2: Clean 2.18 code, Force evict ON

  • Indexing:
    • p50 latency: 17.27 ms
    • p90 latency 33.82 ms
  • Search:
    • p50 latency 337.43 ms
    • p90 latency 428.70 ms

Run 3: PR changes added, Force evict ON:

  • Indexing:
    • p50 latency: 16.89 ms
    • p90 latency: 32.24 ms
  • Search:
    • p50 latency: 341.38 ms
    • p90 latency: 414.06 ms

Run 4: PR changes added, Force evict OFF:

  • Indexing:
    • p50 latency: 17.43 ms
    • p90 latency: 34.63 ms
  • Search:
    • p50 latency: 346.83 ms
    • p90 latency: 430.04 ms

This suggests there is no significant impact on latency with my code changes. I've included the full results of these test runs here:

https://gist.github.com/owenhalpert/05ad4f5ae9577f717f2c59f2039d52e4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Perform cache maintenance in a separate thread
2 participants