Could consider schemes to reduce memory footprint #2656

bhaller · 2022-12-05T00:56:20Z

bhaller
Dec 5, 2022
Collaborator

I recently spent some time head-scratching over a strange observation. A simple neutral SLiM model that just forward-simulated 10,000 individuals with tree-sequence recording, but without doing any simplification at all, did not exhibit the memory-usage dynamics I expected. I expected that the memory footprint of the process would grow as a linear function of the number of generations simulated, since the tree-sequence tables would just grow and grow. Instead, the memory footprint displayed a sawtooth pattern; the memory usage would grow linearly for some number of generations, and then suddenly fall by a factor of two or more, and then resume linear growth.

Eventually I realized that this was due to a relatively new feature in the macOS kernel, memory compression. Basically, the kernel observes when a given block of memory has not been accessed for a long time, and compresses it to take less memory. If anybody tries to access the memory block, the kernel decompresses it on demand. This is all invisible to the process, which only ever sees the memory in its decompressed state. It's also quite fast. Rather remarkable.

After realizing that this was what was causing that saw-tooth memory usage pattern, I realized that the effectiveness of this memory compression scheme in reducing the memory usage of the tree sequence was actually pretty interesting. The kernel was apparently compressing the tree sequence by as much as 10x, with very little performance cost! It made me wonder: could tskit do the same sort of thing under the hood? Compress particular buffers in the tree sequence behind the scenes, and decompress them when they need to be accessed? If it resulted in a 10x reduction in memory and disk footprint, that would be pretty significant, right? And it might not actually be very hard to implement. Food for thought?

jeromekelleher · 2022-12-05T10:00:02Z

jeromekelleher
Dec 5, 2022
Maintainer

Interesting.... I guess what's going on here is that the kernel is seeing that some pages aren't being accessed (nodes and edges from long ago) and then compressing them. The buffers will be quite compressible because they are all columnar and things like node.time will contain lots of similar adjacent values. You can get quite a long way with simple compressors then.

I'm definitely open to the idea of having inline compression at some point, but the tedious details of what libraries you use and how you handle the C level dependencies have made me very wary of it. In practise tszip does a very good job of compressed files on disk and we're rarely constrained by size in memory.

0 replies

benjeffery · 2022-12-05T10:19:39Z

benjeffery
Dec 5, 2022
Maintainer

I hadn't heard of this feature! You can do a similar-ish thing in linux by having a swap space that is actually a compressed ramdisk. The apple implementation is nice as waits for spare threads on modern multi-core CPUs to do the compression.

I'm not sure you would gain much in tskit unless you "chunked" the arrays and decompressed as you iterated along? Would be a pretty hardcore refactor, especially when much of the Tables API is based on having the plain array in memory.

0 replies

bhaller · 2022-12-05T15:00:18Z

bhaller
Dec 5, 2022
Collaborator Author

OK, sounds like perhaps an idea whose time has not yet come. :-> Just thought I'd mention it. Feel free to close this issue. :->

1 reply

jeromekelleher Dec 5, 2022
Maintainer

Converted to discussion as it's something that interesting to chat about, but not something we imagine actually implementing any time soon.

bhaller · 2022-12-05T15:11:54Z

bhaller
Dec 5, 2022
Collaborator Author

@petrelharp I suppose this is something SLiM could actually do internally, even without tskit support. Since we mostly just write new data into the tables without looking at the old data, we could do some kind of per-column compression scheme internally. We would need to decompress when we simplify, though. Maybe that would eliminate all benefits of doing it, since simplification typically defines the memory high-water mark anyway; but if we could leave at least some of the columns compressed during simplification, or something like that, maybe it would be a win? I'm not sure whether this is worth thinking about or not, but I thought I'd tag you since you're interested in doing very large simulations...

9 replies

jeromekelleher Dec 5, 2022
Maintainer

If you're just swooshing through the mmapd arrays linearly the kernel should do a really good job of paging stuff in. But, then you'd need a out-of-core sorting algorithm to get the tables in the right order for simplify.

bhaller Dec 5, 2022
Collaborator Author

If you're just swooshing through the mmapd arrays linearly the kernel should do a really good job of paging stuff in. But, then you'd need a out-of-core sorting algorithm to get the tables in the right order for simplify.

What do you mean by "out-of-core"?

molpopgen Dec 6, 2022
Maintainer

"out of core" refers to methods for when the data cannot fit into main memory. For example, there are modifications of the usual sort methods making use of hybrid disk+RAM: https://en.wikipedia.org/wiki/External_sorting

molpopgen Dec 6, 2022
Maintainer

It is interesting to think about on-disk simplification. The internal data structure used to track node ancestry is also a rather large allocation (comparable to the edge table). For this type of thing to work for very large table collections, that structure would also have to be managed out of core.

jeromekelleher Dec 7, 2022
Maintainer

I think that could be addressed though - the number of lineages around at one time should be fairly small and if we used hash tables and worked a bit harder on cleaning up bits of the ancestry that have been full "used up" I think it would be fine.

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could consider schemes to reduce memory footprint #2656

{{title}}

Replies: 5 comments 10 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

This comment was marked as off-topic.

Select a reply

Could consider schemes to reduce memory footprint #2656

bhaller Dec 5, 2022 Collaborator

Replies: 5 comments · 10 replies

jeromekelleher Dec 5, 2022 Maintainer

benjeffery Dec 5, 2022 Maintainer

bhaller Dec 5, 2022 Collaborator Author

jeromekelleher Dec 5, 2022 Maintainer

bhaller Dec 5, 2022 Collaborator Author

jeromekelleher Dec 5, 2022 Maintainer

bhaller Dec 5, 2022 Collaborator Author

molpopgen Dec 6, 2022 Maintainer

molpopgen Dec 6, 2022 Maintainer

jeromekelleher Dec 7, 2022 Maintainer

This comment was marked as off-topic.

bhaller
Dec 5, 2022
Collaborator

Replies: 5 comments 10 replies

jeromekelleher
Dec 5, 2022
Maintainer

benjeffery
Dec 5, 2022
Maintainer

bhaller
Dec 5, 2022
Collaborator Author

jeromekelleher Dec 5, 2022
Maintainer

bhaller
Dec 5, 2022
Collaborator Author

jeromekelleher Dec 5, 2022
Maintainer

bhaller Dec 5, 2022
Collaborator Author

molpopgen Dec 6, 2022
Maintainer

molpopgen Dec 6, 2022
Maintainer

jeromekelleher Dec 7, 2022
Maintainer