Custom Allocator #21

victorstewart · 2020-11-16T15:58:59Z

victorstewart
Nov 16, 2020

i'm hoping to replace the system allocator with the mimalloc allocator (https://github.com/microsoft/mimalloc) for the database I've built around HSE. (it's a great one and my go to).

this is a list of the functions it overrides (when you link the override object file during linking). https://github.com/microsoft/mimalloc/blob/master/include/mimalloc-override.h

Since statically replacing the memory management functions is much simpler than asking you guys to provide an override interface, I just want to be sure that i'm replacing the complete scope.

for example I saw mentions of kalloc/zalloc, but maybe in name only, wrapping underlying mallocs? so thought it easier to ask than read every line searching for syscalls lol.

tristan957 · 2020-11-16T16:36:55Z

tristan957
Nov 16, 2020
Maintainer

@victorstewart you bring up a good concern. It would be nice to have documentation around what functions need to be replaced to support custom allocators. I will dip into the code base and see what I can find.

0 replies

tristan957 · 2020-11-16T18:41:56Z

tristan957
Nov 16, 2020
Maintainer

Here are the functions I could find thus far:

malloc
calloc
realloc
free
posix_memalign
aligned_alloc
strdup, although strdup is limited to the hse_params code

I will ask around and see if anyone can think of anything else. The zalloc/kalloc calls you are referring to are just wrappers around mmap and munmap.

0 replies

victorstewart · 2020-11-16T18:53:47Z

victorstewart
Nov 16, 2020
Author

so far everything is already included. so great.

0 replies

davidboles · 2020-11-16T20:57:52Z

davidboles
Nov 16, 2020
Collaborator

It's worth pointing out that having HSE run on top of mimalloc will not cause all memory allocated by HSE to come from mimalloc. In particular, the memory used to buffer user data prior to it being migrated to media is allocated from collections of cursor heaps (see hse/src/util/cursor_heap.c). These allow very fast/efficient allocation for data that comes into existence incrementally but will be reclaimed all at once. IIRC there are other places where use-case-specific allocators are used that have mmap() at their base.

I am in no way suggesting that using mimalloc isn't perfectly fine. I mention the above just in case you see metrics from mimalloc that don't square with your expectations.

By way of explanation, in HSEs early life we contemplated being able to run it partly in the kernel. To support that we had a single source base that would (and did) compile for both. You can see vestiges of that with the presence of things like kalloc() wrappers.

We're very interested in what you observe by substituting mimalloc for the system allocator, as well as anything you can share about your application.

0 replies

victorstewart · 2020-11-16T21:42:36Z

victorstewart
Nov 16, 2020
Author

@davidboles

I'm in the process of preparing for correctness testing at the moment, so if you'd like me to add any specific memory or performance tests let me know and I will.

Basically I wrote a Redis Enterprise clone. Began once I realized how much Redis Enterprise costs (LOL), ended up on KeyDB. But the closed source-ness, especially of the replication logic(!!!) made it untenable for me to move forward with that either. So at that point I realized the path of least resistance was to just write my own (not to mention a serious performance boost given HSE vs RocksDB + io_uring efficencies + tailoring my logic to my specific application needs). So I implemented most Redis commands, and other application specific ones (that sometimes fold many operations into one, to reduce data duplicating and operation bloat in the pipeline).

Came up with an optimal binary protocol for it. Identical headers, and then the rest of the byte stream is interpreted by each operation handler. So the database can just read in place, 0 parsing. Each operation knows what type of byte stream it's getting.

And I wrote a client compile time encoder to converts a "pretty format" like encode<"SET name {}"_ctv>(string, "john"_ctv) into a parsed sequence of aligned byte writes that occur at run time. That ""_ctv operator constructs a compile time string type i wrote.

Also an iterative reader to consume messages.

It runs inside of an io_uring server I wrote, as does my application.

Each machine runs 2 databases (this one, and a graph database I also wrote, but that doesn't use HSE... each pinned to a physical core), some Nomad scheduling binaries, and then the rest of the logical cores filled up with application server instances.

The application instances speak with the database over UNIX sockets. And the database instances across machines across the planet replicate over QUIC in a star topology (I wanted to use reliable multicast but 1) that protocol basically doesn't exist and 2) no network allows multicast traffic through it lol).

Let me know if you want any other details, but that's the high level.

0 replies

tristan957 · 2020-11-16T22:25:29Z

tristan957
Nov 16, 2020
Maintainer

@victorstewart that sounds pretty impressive. Congrats. How has development with HSE been thus far?

0 replies

victorstewart · 2020-11-17T12:20:41Z

victorstewart
Nov 17, 2020
Author

@tristan957 invisible besides the machinery to distribute lists over keys / values

0 replies

tristan957 · 2020-11-17T15:54:34Z

tristan957
Nov 17, 2020
Maintainer

Good to hear!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Allocator #21

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 8 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Custom Allocator #21

victorstewart Nov 16, 2020

Replies: 8 comments

tristan957 Nov 16, 2020 Maintainer

tristan957 Nov 16, 2020 Maintainer

victorstewart Nov 16, 2020 Author

davidboles Nov 16, 2020 Collaborator

victorstewart Nov 16, 2020 Author

tristan957 Nov 16, 2020 Maintainer

victorstewart Nov 17, 2020 Author

tristan957 Nov 17, 2020 Maintainer

victorstewart
Nov 16, 2020

tristan957
Nov 16, 2020
Maintainer

tristan957
Nov 16, 2020
Maintainer

victorstewart
Nov 16, 2020
Author

davidboles
Nov 16, 2020
Collaborator

victorstewart
Nov 16, 2020
Author

tristan957
Nov 16, 2020
Maintainer

victorstewart
Nov 17, 2020
Author

tristan957
Nov 17, 2020
Maintainer