-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to set capacity by bytes? #52
Comments
Hi, in general, I have a slightly different opinion, let me try to explain, you may know more than I do.
cache, err := otter.MustBuilder[string, string](100 * 1024 * 1024).
Cost(func(key string, value string) uint32 {
return uint32(len(key) + len(value))
}).
WithTTL(time.Hour).
Build() P.S. It looks like otter is in urgent need of more detailed guides... Hopefully next week I'll finish a couple more sections in https://maypok86.github.io/otter/, but for now there's only a more detailed description of the |
Wow, this looks great. Thanks! I previously thought the cost function was used for specifying the cost of populating a given cache entry so that entries with higher costs would be less likely to be evicted.
I missed that. I scanned the paper several times, but I didn't find the reason for setting the ghost queue length to be the same as the main one. I did find the author said that VMware found that 50% of the main queue was enough for the ghost. Furthermore, the paper seemed to treat each entry as the same size intentionally because scalability brought by ring buffers was one key motivation of the paper and the author didn't know BP-wrapper back then.
Thank you for providing the references. There is no doubt that entries are usually small. But turning this fact into an estimated capacity as a number of entries is still hard. The second paper you provided also mentioned that the object size distribution is not static over time. |
No, it's specifically about specifying capacity in bytes, usually cost or weight (in the case of caffeine) only make sense for specifying capacity in bytes or something like that. This is largely why otter doesn't allow you to specify cost for entries directly.
It was just calculated experimentally, and since they tested on large traces it's even explainable. You just don't need fingerprints for one-hit wonders in a small queue, because they won't give you much of an advantage, and will waste memory.
It very much depends on the workload, for example I found that on S3 (search) and DS1 (page cache in database) traces, limiting the ghost queue to the number of items in the main queue does not work well, because of this Otter limits the size of the ghost queue to the number of items in the small and main queues, which helps to smooth out problems on unfriendly traces.
Yes, unfortunately the original paper glosses over a large number of problems. And especially about ring buffers, most of these problems can be solved, but it will lead to a very strong complication (it is very unpleasant to add features to such architecture in principle) and after these problems I am not sure at all that in the end something simpler and faster than W-TinyLFU + BP-Wrapper will work out (in this architecture it is already easy to add features and optimize). It will be very interesting to see it if someone can implement it with the same number of features.
That's why otter allows you to specify the capacity in bytes, but usually a rough understanding of the size of entries is more than enough and from this you can calculate the required number of entries in the cache, but this is the choice of each user. |
I've written some notes here about cost-based eviction. |
The notes are great! Should we also change the examples in the README? I would like to help. Also, will it be better to specify the cost as |
I actually started writing the documentation because Otter's features were starting to exceed the README format. And in the documentation I was able to describe the
|
Brilliant! What synchronization problem do they have? I thought inplace string mutations are usually disallowed. |
Just a random thought: would it be good idea to replace the term Capacity with Budget? I think the latter one is much more connected to the term Cost. |
It's not about strings, it's about the bp-wrapper technique used to achieve excellent scalability. The main idea of which is eventual consistency when interacting with eviction policy. Because of which special effects like this can occur during concurrent execution:
Eventually the cache will have no entry with key equal to k. This problem is in both ristretto and ccache, it is not in theine, but there is another, even worse problem. When updating a node with a cost change, theine updates it directly with atomic (link). Because of which something like this can happen and it will lead to terrible consequences for the whole cache.
And all these caches still have problems with mixing up sequences of insert and update events when inserting into write buffer (including otter), but it's very hard to do it even on purpose, and the consequences, though unpleasant, are correctable in runtime. |
Looks like it. But with a non-cost cache, it can throw off users. Most likely the ideal solution would be to split it into two separate functions ( |
This was a really frustrating problem when I first tried to implement variably sized entries and an advanced eviction policy. I wrote briefly about it in the old CLHM googlecode project's design doc and changelog. I was tried to replace LRU with LIRS but ran into races and didn't realize at first (so I tried rewrites a few times, trying to comb through the code to figure out my mistakes). LIRS being so complicated and the resulting race made me uncomfortable maintaining such a beast, as I couldn't trust code that I didn't understand. That prompted writing a simulator to have a simpler reference to debug from, a brute force test suite that generates millions of scenarios with deep integrity checks, and avoiding algorithms where I'd forget how they'd work so it would be a struggle to maintain it. Anyway, my solution to that race may not have been ideal but was straightforward. I maintain two variable size fields, one from the writer's perspective and the other from the policy's. I use the replay logic to apply the updates as a delta, so they eventually become consistent and the eviction lists are correctly sized. The races resolve themselves with very little extra care needed. I forget the alternatives, which likely required additional synchronization instead of the metadata overhead. fwiw,
In that case, you might consider moving TTL methods outside of the core api so it could be optional. There are various ad hoc user needs that only make sense for particular configurations, but burden the core api. Instead I made |
Yeah, when the otter did the same thing, I didn't realize what was going on for days and was tearing my hair out. 😅 Thankfully RCU fixes this problem out of the box.
Wow, I've been using something very similar for a while. (link) I'll probably write about the rest later, especially since I've already accumulated a lot of questions (some of them quite tricky) and answers. I apologize for not replying to the message in the other issue, but I was not ready to participate in the discussion then. Now I've collected some wants, polled a lot of people, and even implemented something I never wrote before
P.S. Hopefully I'll still be able to formulate my thoughts at least some way tomorrow, because so far I'm getting something incoherent. |
haha, looks very similar. Your node's Your simulator looks nicely structured and similar to mine. I parse the traces as a buffered stream that is collected into batches, dispatched to the policy's queue, and the policy implementor consumes one-by-one. This let me parallelize it like the actor model's mailbox and thread, or goroutine + buffered channel, let that blocking queue throttle the producer for large traces. Then each eviction policy became simple single-threaded logic, and I avoided too much code reuse to avoid over abstractions that could make them harder to follow and be less self-contained algorithms. A nice structure like you have makes it easy to extend in whatever direction your interests pull you. I tried I started with fixed expiration and added variable much later, since I originally did not know how to make it amortized O(1). I hadn't wanted to force size eviction to discard dead entries and users could easily do that themselves as a workaround. If starting anew then I'd emulate the fixed expiration using the timer wheel instead of having two algorithmic approaches. The time/memory cost is very low so it doesn't seem worth the effort and could be added later as an optimization if indeed was useful. Just know it wasn't an intentional concern against timing wheels, they just were not as well known until recently. I believe Caffeine was the first to use them for cache expiration but it wasn't novel as they are used for other timer event like network timeouts. I chose a hierarchical version as a better fit for a local cache because users often want prompt notification for their eviction listener, e.g. to close a websocket or other business logic. The other variations could also make sense and it depends on the requirements. The tricky and fun part was making it cheap bitwise. I had no good references so I hope mine is sane, but curious if others would do it differently. Per your question, that linear time complexity has not yet been a problem because (a) the reordering is extremely cheap, (b) on-heap local caches are not often extremely large, (c) activity means its unlikely to do a full scan, (d) bp-wrapper allows the maintenance to be async so it doesn't impact user latencies. It could have a threshold limit, e.g. in the hill climber I amortized the region resizing over multiple maintenance runs. I have not yet had a user report or benchmark indicate that was needed, but it is a very simple change once deemed appropriate. The fix would be to not advance the timer wheel current time if it exited early and hint that a new maintenance run should be scheduled. Then over the next few runs it will fully catch up and fully advance. Caffeine has rescheduleCleanUpIfIncomplete to more aggressively deliver those prompt removal notifications. The codegen looks nice! Mine feels ugly as well, but since it is not user facing and build time that seemed okay. It's worked pretty well and I like the memory saving so users get the features at low cost. |
Hi @maypok86, thank you for the detailed explanation! If the problem is out of order of insert/update/delete tasks of the same key, how about waiting a little bit until the previous task is queued? For example, using an empty channel on every node: evicted := c.hashmap.Set(n)
if evicted != nil {
// update
<-evicted.queued
c.writeBuffer.Insert(node.NewUpdateTask(n, evicted))
} else {
// insert
c.writeBuffer.Insert(node.NewAddTask(n))
}
close(n.queued) |
Unfortunately, there are a few problems here
As a result, many good nodes can be evicted to insert a dead node. I haven't looked at how it's done in caffeine yet, so far it looks like something like a set of states a node can be in and a set of transitions between them, i.e. something like a state machine. It would be very cool to implement this with a single |
Maybe a busy loop to wait for toggling an atomic uint32 is acceptable here.
I feel like this is somehow unavoidable, but could it be probably relieved by pre-aggregating each batch from the queue? |
Yes and no. When I was bored in traffic, I came up with something like this.
And with set we do something like this evicted := c.hashmap.Set(n)
if evicted != nil {
// update
evicted.Die() // evicted.state = dead
c.writeBuffer.Insert(node.NewUpdateTask(n, evicted))
} else {
// insert
c.writeBuffer.Insert(node.NewAddTask(n))
} and in the worker just do it like this: if n.IsDead() {
continue
} That is, let's not try to fix insertion and update problems, but just filter events on the side of the worker who updates eviction policy. This seems to fix the problems when update and insert are swapped, and also doesn't corrupt the eviction policy on consecutive insert and update events. |
Yes, that looks like a better idea than what I just came up with: n.mu.Lock()
...
evicted := c.hashmap.Set(n)
if evicted != nil {
// update
evicted.mu.Lock()
c.writeBuffer.Insert(node.NewUpdateTask(n, evicted))
evicted.mu.Unlock()
} else {
// insert
c.writeBuffer.Insert(node.NewAddTask(n))
}
n.mu.Unlock() |
In fact, if we increase the number of transitions, we could even try using |
Using Using |
Ah, damn, it seems we can't. Because the scheduler will interrupt the goroutine with a get operation and after the gc cycle we will get an access to the freed memory. Debugging this would be very unpleasant. Though we should find out what happens to the elements that still have pointers to them, but they're unlikely to survive. |
Yeah, that's what I mean. |
Yes, gc will still clear the memory, too bad. :( |
Why? I mean shouldn’t we clear all the accesses to a node before putting it to the pool? Nevertheless, I found another downside of sync.Pool that might make it useless to otter: one main benefit of the pool is its thread local cache. If we get a node from many different goroutines but always put it back using the goroutine which handles the tasks queue, then the benefit is probably gone. |
Hi @maypok86, I think all questions in this thread are answered. Thank you for the very detailed explanation! Before I close this issue, I would like to ask one more question: whether the idea of |
Hi @rueian, I've been sick, but something like this was planned.
Sequence 6-9 may change, but the rest should not change. I'm most interested in code generation of nodes and implementation of dynamic buffers, but I'm not sure they will be very useful to users, so I often add some features to the roadmap with high priority if someone comes with an issue. So far the only exception is loading cache, but there is a pretty big problem with its correct and fast implementation. |
Hi @maypok86, hope you are doing well! I have been trying how to correctly process the out-of-order policy events in recent days. It feels to me that the The following procedures are what I think how to process out-of-order events without corrupting a policy: When inserting a node into a policy:
When deleting a node from a policy:
Therefore, besides an |
I use alive, retired, dead node states. Does that work for you? |
@rueian @ben-manes |
I think it was a simpler solution when the node is evicted prior to processing an explicit delete. There wasn't a difference in metadata cost between 2 and 3 states, so making it easier to reason about was the main benefit. In the evict-then-delete case, the queued removal task should no-op instead of decrementing the total size twice. We could instead check to see if the entry was in the policy to infer that it had been evicted and no-op. Unfortunately, that doesn't work because the delete can be reordered before its insertion, so now insertion needs to not increment the total if not alive. That gets very confusing when an update also changes the weight and is reordered. Now the weight is changing and the update task must also avoid incorrectly changing the total size. Given all the potential scenarios it probably seemed less confusing to just introduce an extra state for coercing the total size. |
Oh, I wonder, I think I got it. It seems that without noticing it myself, I actually introduced a third state by creating nodes with I may have to leave it as it is, since I don't really understand how to correctly make atomic transitions between more than two states without introducing locks (in nodes or arrays). And it may be even more difficult to understand. |
When I first implemented this it was on LRU so there was no queue type. I don't have a preference as I think both are relatively clear. |
Yes, it is confusing so I made a checklist for a policy to insert and delete a node. I hope that can help clarify all the possibilities.
Yes, that is what I am concerning. As @ben-manes pointed out, we should do no-op if the deletion comes before its insertion. Given the previous suggested implementation: evicted := c.hashmap.Set(n)
if evicted != nil {
// update
evicted.Die() // evicted.state = dead
c.writeBuffer.Insert(node.NewUpdateTask(n, evicted))
... The |
I think we all agree, some type 3rd state is simplest. I was replying under the assumption that only two states were used and it was inferred somehow by the state of the policy data structures. I think using the |
Hi @maypok86 and @ben-manes, Thank you both for your detailed discussion on the above topics. It looks like the order issue has been fixed in #59. So I think this issue can be closed now. |
Hi @maypok86, I am not sure if you are still considering implementing the What I was trying to do is basically the same as I previously mentioned: #33 (comment) The idea is first to insert a pinned entry, as a placeholder, into the hashmap and later update it with a second However, there is a check that can make the update be skipped, so it is possible that a pinned entry will left forever: Lines 280 to 283 in 9d61c67
I guess my only choice is to delete the check and I think you may also have a similar situation if you are going to implement the |
Actually, I wanted to release v2 with a fixed api first, then adding But there were also some problems. I am very confused about what CockroachDB's Swiss Map is going to do. If it becomes the new standard map, it will break a lot of libraries and systems (many people import the map structure from the go runtime) and otter as well (maphash is used). And the authors of Go refuse to add a hash function for comparable types to the standard library (but they have added a million functions to the standard library that are not particularly useful😵💫). As a result, all that remains is to ask the user to write the hash function on their own... but I doubt that most people will write a good hash function on their own, and it's unpleasant to do it every time. The only reassuring thing is that if this happens, a lot of things will break. And there is also a potential opportunity to import It is also possible that writing a hash function by the user is not so bad. Java seems to use a similar approach, but no one does this in Go and most likely it will scare off users. |
@rueian I think it will really be easier for you to delete this part, but I sincerely hope to write a |
I assumed something like this api type basicCache[K comparable, V any] interface {
Set(key K, value V)
Range(f func(key K, value V) bool)
..
}
type Cache[K comparable, V any] interface {
basicCache[K, V]
Has(key K) bool
Get(key K) (V, bool)
BulkGet(keys []K) map[K]V
}
type LoadingCache[K comparable, V any] interface {
basicCache[K, V]
Get(ctx context.Context, key K) (V, error)
// An incredibly convenient feature, but it will be very difficult to implement it well.
// Perhaps the option of creating goroutines is suitable or goroutine pool.
BulkGet(ctx context.Context, keys []K) (map[K]V, error)
}
type Builder[K comparable, V any] struct {
...
}
func (b *Builder[K, V]) ConstTTL(ttl time.Duration) *Builder[K, V] {
...
}
// How to provide a function for a variable ttl is not very clear...
func (b *Builder[K, V]) VarTTL(calcTTL func (key K, value V) time.Duration) *Builder[K, V] {
}
// It seems that several different build methods are much easier to implement and quite understandable to the user.
func (b *Builder[K, V]) Build() (Cache[K, V], error) {
if b.err != nil {
return nil, b.err
}
return newCache(b), nil
}
// It is probably better to take the interface as a parameter, but in Go you cannot create an anonymous structure with methods.
func (b *Builder[K, V]) BuildLoading(loader func(ctx context.Context, key K) (V, error)) (LoadingCache[K, V], error) {
if b.err != nil {
return nil, b.err
}
return newLoadingCache(b), nil
} |
According to their meeting notes, it is likely to happen.
Automatic BulkGet by using the same loader function is indeed incredibly convenient, but I guess some users would like to implement a bulk loader by themselves. |
Yes, I hope they will come up with something so as not to break so many applications.
Yeah, that's why I want to have some kind of universal solution for this. @rueian I will try to collect ideas on the new api in this issue. It will be very cool to know your opinion. There we will try to take into account |
Hi @maypok86,
First of all, thank you for making this amazing library. I didn't know S3FIFO before until now. I learned a lot from your work.
Then this issue is not a feature request because I guess setting capacity by bytes is almost impossible based on my understanding of S3FIFO which requires the maximum entry count to be set beforehand.
However, setting the maximum entry count at the beginning is hard in cases of caching data of different sizes which I believe are the most of the cases. Therefore I think it will be better for users to set the capacity by bytes.
I just want to know if you have any idea about this topic or alternatives. Thanks!
The text was updated successfully, but these errors were encountered: