-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crossbeam in elfmalloc #12
Comments
Actually, I was writing on the same topic :) Thank you for writing up! |
@stjepang Thanks so much for this comment! It's great to know that this stuff is being actively developed. @ezrosent will probably have is own comments in addition to mine, but here are mine: The way I see it, crossbeam is part of a larger class of systems that use a common pattern: there's a global singleton, and then there are thread-local caches for performance. These caches are not strictly necessary for correctness, but only improve performance. These thread-local caches are also singletons. Allocators also fall into this class. There are a couple of issues that this pattern introduces:
To solve this issue, I've been envisioning the following pattern:
Other patterns can easily be implemented in terms of this one:
The way I'm envisioning it, the pattern should be so cookie-cutter that you might in theory be able to make a macro that takes one of these handle-based implementations and generates the corresponding global singleton implementation. To be concrete, for epoch GC, this might look something like: // global state for a GC
struct GCGlobal<A: Alloc> { ... }
pub struct GC<A: Alloc>{ ... }
impl GC<Heap> {
fn new() -> GC<Heap> { ... }
}
impl<A: Alloc> GC<A> {
fn with_alloc(a: A) -> GC<A> { ... }
fn new_owned<T>(&mut self, t: T) -> Owned<T> { ... }
fn new_owned_global<T>(&self, t: T) -> Owned<T> { ... }
// etc
}
impl<A: Alloc> Clone for GC<A> {
fn clone(&self) -> GC<A> { ... }
} and then, for the global singleton: lazy_static!{ static ref GLOBAL_HANDLE: GC<Heap> = GC::new(); }
thread_local!{ static TLS_HANDLE: GC<Heap> = GLOBAL_HANDLE.clone(); }
impl<T> Owned<T> {
fn new(t: T) -> Owned<T> {
match TLS_HANDLE.try_with(|handle| handle.new_owned()) {
Ok(o) => o,
Err(_) => GLOBAL_HANDLE.new_owned_global()
}
}
} |
Also, a quick follow-up to that: With respect to your question about avoiding allocations for no-std, you don't actually need to avoid allocations, you just need to ensure that the user can provide an allocator. Even in elfmalloc, we have a simple "bootstrapping" allocator (bsalloc) that we use for internal allocations, and we'd be able to use it with crossbeam if needed. |
Also also: I read the crossbeam RFC, and with the exception of dedicated GC threads, I think that everything I've described is orthogonal to the changes that you guys are already considering. So hopefully I'm not coming along and trying to tell you to do your jobs differently :) |
Thanks for reaching out, y'all! I haven't checked in on new crossbeam features in a while and I think the work you're doing looks awesome. I think joshlf@ gave a pretty accurate summary of our current problems. I do think that having a handle-based API for the epoch GC would help us have a cleaner story around thread destruction. It looks like a lot of the work is already done (though I don't see any other implementors of Other than that, it would also be very helpful to have crossbeam structures parametric on an allocator. It seems like being able to create a local instance of the GC system is a precondition for this (there's no good way to override a default type parameter when you're talking to a global singleton). My initial thought on this was that we would have to wait for standard collections like P.S. I'm very happy to see that you can defer functions now! A while ago I made a hash table that required this feature and I just hacked it into crossbeam to do it. Now I can factor it out into it's own crate and depend on a stable branch. |
I realized that I'd made a mistake in my original comment: the global static instance should itself be a handle. I've updated the original comment, and I also wrote up a concrete example (that compiles!) of a simple logging library that buffers log lines. Check it out. |
Thanks so much for your writing detailed responses. I took some time to digest all this and think about it a little bit. The code @joshlf has outlined seems pretty reasonable to me. I think we could design realms in a very similar manner. What do you think, @jeehoonkang? We do want to support the |
No problem!
No worries - the handle-based design alone would be a huge improvement for us; even without the That said, I will say that I don't see it getting stabilized soon - there are still a lot of details to work out and a lot of other work to be done (e.g., incorporating it into libcollections). |
Thank you for the valuable comments! I felt we need use cases of Crossbeam we can target on for improving Crossbeam, and elfmalloc is an excellent one. I implemented a part of the "handle" proposal in: https://github.com/jeehoonkang/crossbeam-epoch/tree/handle . It doesn't strictly follow the pattern described above, though. Here are my two cents from this experience. It is possible that handle will degrade the performance by adding an extra indirection to the global data. In the current implementation of Crossbeam, the per-thread handle ("mutator" in the current I almost abandoned the "realm" proposal, but I think the need for supporting both std and no-std environment can be a strong motivation for it. Note that the realm proposal is almost implemented in the above branch: for early adopters, just defining a |
That's fantastic! Good to know that proposal is actually reasonable enough to be implemented :) It looks like your global |
@joshlf yes, I noticed the use of ... // TLS destructed
pin(|scope| {
let is_pinned = is_pinned(); // expected to be `true`, but how..?
}); A possible solution that changes the interface a little bit is, instead of providing the global |
I'm not familiar with how pinning works, but I was under the impression that one of the main functions of epoch's TLS is to maintain thread-local garbage lists. I was thinking that if a TLS value doesn't exist, garbage could just be appended directly onto the global list. I'm not sure how that'd interact with pinning, though. |
@joshlf Yes, I think that part (locally storing garbages and flushing them to the global queue) is already implementable with |
Crossbeam's TLS is just the current thread handle. Pinning accesses the handle, marks it as pinned, executes a closure, and finally marks it as upinned. You can think of a handle as (I'm simplifying a bit): struct Handle(Arc<Inner>);
struct Inner { list: GarbageList, pinned: Option<Epoch> }
I thought in elfmalloc you would never call Note that this way Crossbeam wouldn't touch TLS at all, so you wouldn't have to worry about Crossbeam's TLS getting destroyed first. |
Hmmm I suppose this might work. To clarify, you're saying that in the case that our TLS data has been destroyed (and thus we don't have access to our thread-local epoch handle), we would just clone a new handle from the global instance temporarily in order to use it for whatever work we needed to do? |
@joshlf Yeah, that's what we'd suggest doing. If you don't have the thread-local handle anymore, just create a one-off temporary handle so that you can move forward. Alternatively, I believe we could also implement some kind of anonymous pinning, where you can pin yourself without a handle for cases like these. In fact, I did something like this in C++ long time ago. It might a bit tricky to implement, but it's definitely possible. |
I'd be curious to see the performance of the two approaches, but intuitively, I feel like the temporary handle approach is probably fine because that approach would only be used in a slow path that's executed at most a couple of times per thread (often never executed). |
@joshlf In order to make sure I understood your proposal, I'd like ask a question: is the following what you want to build in the end?
I think the way elfmalloc is bootstrapped is really neat 😄 Also, the two instances of Crossbeam interact with each other in a very interesting way: a deallocation in the global instance may invoke the methods of the custom instance of Crossbeam via elfalloc. I hope it's okay performance wise, but I'm not really confident with that. |
Yep, that looks right!
To clarify, you mean something like
Thanks :) I don't think the interaction should be that bad honestly. The snarky answer is "abstraction," but I think that's actually reasonable here - most code (including the std version of Crossbeam) treats the allocator as a black box. So long as that allocator is performant (in speed and memory usage), it shouldn't matter how it's implemented, even if in practice it's implemented with another copy of the logic implemented in Crossbeam. I don't see how it'd be any different than if we had implemented our own custom concurrent collection internally, or used some other performant allocation strategy. |
Hi, @joshlf and @ezrosent! Congrats on the first release of elfmalloc! :) This is fantastic - I'm always excited to see new work being created in the domain of concurrency/parallelism using Rust.
I see you're using Crossbeam 0.2. Not sure if you're familiar with the current state of affairs, but we're rewriting the epoch GC and redesigning the API, and hoping to release it soon. I'd suggest looking into this RFCs repository if you're curious to see where we're heading at.
I saw a comment on Reddit saying that, to remove the dependency on std from elfmalloc, crossbeam's epoch-based memory management would need to support no-std. Well, Crossbeam's new epoch-based GC currently allocates in two places:
We'll have to think how could allocations in those situations be avoided.
Moreover, in this RFC @jeehoonkang proposes introducing realms, which is a way to create multiple garbage collectors and manually register threads to them. Currently, there is only one implicit global epoch GC and threads get lazily and automatically registered behind the scenes. This is similar to Rayon allowing you to create your own thread pool instead of using the global one. Anyways, perhaps this is something that might interest you as well, since you're writing such a low-level thing (an allocator) that needs very precise control?
I wonder if you have any ideas on how we can help you here.
More specifically, my questions would be:
no_std
-compatible API that you'd use look like?The text was updated successfully, but these errors were encountered: