Custom Allocator #21
Replies: 8 comments
-
@victorstewart you bring up a good concern. It would be nice to have documentation around what functions need to be replaced to support custom allocators. I will dip into the code base and see what I can find. |
Beta Was this translation helpful? Give feedback.
-
Here are the functions I could find thus far:
I will ask around and see if anyone can think of anything else. The zalloc/kalloc calls you are referring to are just wrappers around mmap and munmap. |
Beta Was this translation helpful? Give feedback.
-
so far everything is already included. so great. |
Beta Was this translation helpful? Give feedback.
-
It's worth pointing out that having HSE run on top of mimalloc will not cause all memory allocated by HSE to come from mimalloc. In particular, the memory used to buffer user data prior to it being migrated to media is allocated from collections of cursor heaps (see hse/src/util/cursor_heap.c). These allow very fast/efficient allocation for data that comes into existence incrementally but will be reclaimed all at once. IIRC there are other places where use-case-specific allocators are used that have mmap() at their base. I am in no way suggesting that using mimalloc isn't perfectly fine. I mention the above just in case you see metrics from mimalloc that don't square with your expectations. By way of explanation, in HSEs early life we contemplated being able to run it partly in the kernel. To support that we had a single source base that would (and did) compile for both. You can see vestiges of that with the presence of things like kalloc() wrappers. We're very interested in what you observe by substituting mimalloc for the system allocator, as well as anything you can share about your application. |
Beta Was this translation helpful? Give feedback.
-
I'm in the process of preparing for correctness testing at the moment, so if you'd like me to add any specific memory or performance tests let me know and I will. Basically I wrote a Redis Enterprise clone. Began once I realized how much Redis Enterprise costs (LOL), ended up on KeyDB. But the closed source-ness, especially of the replication logic(!!!) made it untenable for me to move forward with that either. So at that point I realized the path of least resistance was to just write my own (not to mention a serious performance boost given HSE vs RocksDB + io_uring efficencies + tailoring my logic to my specific application needs). So I implemented most Redis commands, and other application specific ones (that sometimes fold many operations into one, to reduce data duplicating and operation bloat in the pipeline). Came up with an optimal binary protocol for it. Identical headers, and then the rest of the byte stream is interpreted by each operation handler. So the database can just read in place, 0 parsing. Each operation knows what type of byte stream it's getting. And I wrote a client compile time encoder to converts a "pretty format" like Also an iterative reader to consume messages. It runs inside of an io_uring server I wrote, as does my application. Each machine runs 2 databases (this one, and a graph database I also wrote, but that doesn't use HSE... each pinned to a physical core), some Nomad scheduling binaries, and then the rest of the logical cores filled up with application server instances. The application instances speak with the database over UNIX sockets. And the database instances across machines across the planet replicate over QUIC in a star topology (I wanted to use reliable multicast but 1) that protocol basically doesn't exist and 2) no network allows multicast traffic through it lol). Let me know if you want any other details, but that's the high level. |
Beta Was this translation helpful? Give feedback.
-
@victorstewart that sounds pretty impressive. Congrats. How has development with HSE been thus far? |
Beta Was this translation helpful? Give feedback.
-
@tristan957 invisible besides the machinery to distribute lists over keys / values |
Beta Was this translation helpful? Give feedback.
-
Good to hear! |
Beta Was this translation helpful? Give feedback.
-
i'm hoping to replace the system allocator with the mimalloc allocator (https://github.com/microsoft/mimalloc) for the database I've built around HSE. (it's a great one and my go to).
this is a list of the functions it overrides (when you link the override object file during linking). https://github.com/microsoft/mimalloc/blob/master/include/mimalloc-override.h
Since statically replacing the memory management functions is much simpler than asking you guys to provide an override interface, I just want to be sure that i'm replacing the complete scope.
for example I saw mentions of kalloc/zalloc, but maybe in name only, wrapping underlying mallocs? so thought it easier to ask than read every line searching for syscalls lol.
Beta Was this translation helpful? Give feedback.
All reactions