GH-2799 avoid OOM through HashMap resizing by setting initial size #2872

abrokenjester · 2021-02-24T05:21:33Z

GitHub issue resolved: #2799

Briefly describe the changes proposed in this PR:

initialize the LinkedHashModel with a fixed size equal to the threshold we use for overflow to disk - this should avoid unexpected resizing before overflow is triggered
note that LinkedHashModel itself internally already uses size * 2 for the statement set to compensate for the load factor, so using the exact block size should be fine for our purposes

PR Author Checklist (see the contributor guidelines for more details):

my pull request is self-contained
I've added tests for the changes I made
I've applied code formatting (you can use mvn process-resources to format from the command line)
I've squashed my commits down to one or a few meaningful commits
every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change
every commit has been signed off

abrokenjester · 2021-02-24T05:29:18Z

There's a bit of a false assumption in this change I just realized, due to poor variable/constant naming: although the LARGE_BLOCK constant plays a roll in the overflow trigger, it is not as simple as "overflow when we have more statements than this fixed constant".

Alternative is that we move the "10% of the heap is still free" requirement up a bit, to, say, 15%?

hmottestad · 2021-02-24T11:28:14Z

I think we should start with a test, or a benchmark if that's easier. Essentially limit the memory available and load in a big file. I'm also curious if this scales as it should, eg. does loading in a bigger file while giving more memory trigger the overflow as consistently as loading a smaller file with less memory available?

abrokenjester · 2021-02-24T21:18:43Z

Not at my desk right now, but I believe there's an existing compliance test for the overflow that we could extend if necessary.

Update: the test I was thinking of is TestNativeStoreMemoryOverflow which is actually a unit test for the native store. I gotta admit I can't make out what the test is suposed to prove exactly: it seems to just add a bunch of statements and then check they're there. It's not obvious to me how it actually checks the MemoryOverflow behavior.

abrokenjester · 2021-02-24T22:36:33Z

I'm not aware of a simple way to control the available heap space for a single junit test - so we may want to "abuse" a jmh benchmark for this purpose. I'm currently out of spare time, so won't continue on this immediately. @hmottestad if you have an idea and time/will to set up such a test feel free to add it on to this branch.

hmottestad · 2021-02-24T23:07:39Z

After I commented here I set out to google if maybe junit 5 could fork the JVM for a test to configure the amount of memory. Could not find anything. I guess abusing JMH might be the only way unfortunately.

abrokenjester · 2021-03-29T00:59:12Z

To be fair even with benchmarking in place there's only so much we can do to make this work better: at the end of the day we are relying on an estimate of available heap space and predicted usage to trigger disk overflow. In real life situations that estimate could be way off because of circumstances beyond our control (e.g. some different Java object suddenly consuming a lot of memory or something as basic as the data currently being uploaded suddenly containing a few massive literal values). We make a best effort, but we can't guarantee the process won't run out of memory.

abrokenjester · 2021-04-15T01:16:13Z

Btw a reason disk syncing on memory overflow is giving us such a performance hit was given by Arjohn Kampman on the mailinglist:

The reason that syncing the data to disk when the overflow is triggered takes so long is also related to a call to dataset(). When overflowToDisk() calls disk.addAll(memory), this triggers a call to SailSourceModel.add(s,p,o,c...) for each statement. This method then calls both contain(s,p,o,c...) and sink().approve(s,p,o.c) for for each statement. The latter call starts a new transaction and updates the txn-status file, but the contains() call then commits the transaction for the previous statement via a call to dataset(), again updating the txn-status file. So for every cached statement, rdf4j does two I/O calls. On a spinning disk with a an average write time of 10 ms, this limits the overflow process to at most 50 statements per second.

That is not directly related to this fix, but if we can address some of that, we can have more confidence that setting a lower overflow threshold won't cause a massive performance penalty.

hmottestad · 2021-04-16T05:42:04Z

Do you mind @jeenbroekstra if I force push this branch? I would like to add a commit with a benchmark as the first commit in this branch so that we can easily test before and after.

abrokenjester · 2021-04-16T05:43:14Z

Go for it

hmottestad · 2021-04-16T06:27:25Z

ok, I've just force pushed, but will need to force push at least one more time before I'm done.

hmottestad · 2021-04-16T08:23:58Z

I've added two benchmarks, one synthetic and one with a real world file.

The synthetic one I've got failing with very low memory. The real world file I've got failing with higher memory. Both are failing at different places. And to make things even more complicated the real world file is failing only within a certain memory range (500-700 MB), outside of that range it either overflows correctly or doesn't need to overflow.

hmottestad · 2021-04-16T08:34:06Z

The synthetic benchmark only fails on java 8 for me. Probably due to using G1GC.

Btw. I tried using the parallel gc, and it failed differently:

java.lang.OutOfMemoryError: Java heap space
	at java.util.BitSet.initWords(BitSet.java:166)
	at java.util.BitSet.<init>(BitSet.java:161)
	at org.eclipse.rdf4j.sail.nativerdf.datastore.HashFile.<init>(HashFile.java:158)
	at org.eclipse.rdf4j.sail.nativerdf.datastore.HashFile.<init>(HashFile.java:98)
	at org.eclipse.rdf4j.sail.nativerdf.datastore.DataStore.<init>(DataStore.java:46)
	at org.eclipse.rdf4j.sail.nativerdf.ValueStore.<init>(ValueStore.java:139)
	at org.eclipse.rdf4j.sail.nativerdf.NativeSailStore.<init>(NativeSailStore.java:92)
	at org.eclipse.rdf4j.sail.nativerdf.NativeSailStore.<init>(NativeSailStore.java:80)
	at org.eclipse.rdf4j.sail.nativerdf.NativeStore$2.createSailStore(NativeStore.java:272)
	at org.eclipse.rdf4j.sail.nativerdf.MemoryOverflowModel.overflowToDisk(MemoryOverflowModel.java:263)
	at org.eclipse.rdf4j.sail.nativerdf.MemoryOverflowModel.checkMemoryOverflow(MemoryOverflowModel.java:250)
	at org.eclipse.rdf4j.sail.nativerdf.MemoryOverflowModel.add(MemoryOverflowModel.java:122)
	at org.eclipse.rdf4j.sail.base.Changeset.approve(Changeset.java:259)
	at org.eclipse.rdf4j.sail.base.SailSourceConnection.add(SailSourceConnection.java:709)
	at org.eclipse.rdf4j.sail.base.SailSourceConnection.addStatement(SailSourceConnection.java:577)
	at org.eclipse.rdf4j.sail.helpers.AbstractSailConnection.addStatement(AbstractSailConnection.java:443)
	at org.eclipse.rdf4j.repository.sail.SailRepositoryConnection.addWithoutCommit(SailRepositoryConnection.java:393)
	at org.eclipse.rdf4j.repository.base.AbstractRepositoryConnection.addWithoutCommit(AbstractRepositoryConnection.java:508)
	at org.eclipse.rdf4j.repository.base.AbstractRepositoryConnection.add(AbstractRepositoryConnection.java:418)
	at org.eclipse.rdf4j.sail.nativerdf.benchmark.OverflowBenchmarkSynthetic.lambda$addData$1(OverflowBenchmarkSynthetic.java:177)
	at org.eclipse.rdf4j.sail.nativerdf.benchmark.OverflowBenchmarkSynthetic$$Lambda$15/841851332.accept(Unknown Source)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
	at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
	at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272)
	at java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)
	at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
	at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)

EDIT: Yet another different failure point (Java 8 + G1GC):

java.lang.OutOfMemoryError: Java heap space
	at org.eclipse.rdf4j.sail.nativerdf.btree.Node.<init>(Node.java:59)
	at org.eclipse.rdf4j.sail.nativerdf.btree.BTree.lambda$new$0(BTree.java:96)
	at org.eclipse.rdf4j.sail.nativerdf.btree.BTree$$Lambda$11/564661451.apply(Unknown Source)
	at org.eclipse.rdf4j.sail.nativerdf.btree.ConcurrentNodeCache.lambda$readAndUse$1(ConcurrentNodeCache.java:47)
	at org.eclipse.rdf4j.sail.nativerdf.btree.ConcurrentNodeCache$$Lambda$21/1430898173.apply(Unknown Source)
	at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
	at org.eclipse.rdf4j.sail.nativerdf.btree.ConcurrentNodeCache.readAndUse(ConcurrentNodeCache.java:46)
	at org.eclipse.rdf4j.sail.nativerdf.btree.BTree.readNode(BTree.java:1023)
	at org.eclipse.rdf4j.sail.nativerdf.btree.Node.getChildNode(Node.java:220)
	at org.eclipse.rdf4j.sail.nativerdf.btree.RangeIterator.findNext(RangeIterator.java:145)
	at org.eclipse.rdf4j.sail.nativerdf.btree.RangeIterator.next(RangeIterator.java:67)
	at org.eclipse.rdf4j.sail.nativerdf.TripleStore.commit(TripleStore.java:881)
	at org.eclipse.rdf4j.sail.nativerdf.NativeSailStore$NativeSailSink.flush(NativeSailStore.java:366)
	at org.eclipse.rdf4j.sail.base.SailSourceBranch.flush(SailSourceBranch.java:263)
	at org.eclipse.rdf4j.sail.base.SailSourceBranch.autoFlush(SailSourceBranch.java:345)
	at org.eclipse.rdf4j.sail.base.SailSourceBranch$1.close(SailSourceBranch.java:187)
	at org.eclipse.rdf4j.sail.base.SailSourceBranch.flush(SailSourceBranch.java:266)
	at org.eclipse.rdf4j.sail.base.UnionSailSource.flush(UnionSailSource.java:68)
	at org.eclipse.rdf4j.sail.base.SailSourceConnection.commitInternal(SailSourceConnection.java:469)
	at org.eclipse.rdf4j.sail.nativerdf.NativeStoreConnection.commitInternal(NativeStoreConnection.java:86)
	at org.eclipse.rdf4j.sail.helpers.AbstractSailConnection.commit(AbstractSailConnection.java:392)
	at org.eclipse.rdf4j.repository.sail.SailRepositoryConnection.commit(SailRepositoryConnection.java:216)
	at org.eclipse.rdf4j.sail.nativerdf.benchmark.OverflowBenchmarkSynthetic.loadLotsOfDataEmptyStore(OverflowBenchmarkSynthetic.java:86)
	at org.eclipse.rdf4j.sail.nativerdf.benchmark.generated.OverflowBenchmarkSynthetic_loadLotsOfDataEmptyStore_jmhTest.loadLotsOfDataEmptyStore_avgt_jmhStub(OverflowBenchmarkSynthetic_loadLotsOfDataEmptyStore_jmhTest.java:232)
	at org.eclipse.rdf4j.sail.nativerdf.benchmark.generated.OverflowBenchmarkSynthetic_loadLotsOfDataEmptyStore_jmhTest.loadLotsOfDataEmptyStore_AverageTime(OverflowBenchmarkSynthetic_loadLotsOfDataEmptyStore_jmhTest.java:173)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:453)
	at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:437)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)

abrokenjester · 2021-04-16T08:36:12Z

Related: #2998

hmottestad · 2021-04-16T08:41:43Z

Not going to dig any further now. I've spent like 3 hours on this because IntelliJ was acting up and running the benchmarks takes a long time.

abrokenjester · 2021-04-17T00:13:53Z

I'm becoming more and more convinced that we should invest in replacing the current MemoryOverflowModel with a completely different implementation, something that just relies on standard Java Object Serialization or some simple abstraction like MapDB for disk syncing. I will try and spike something, see what we get.

hmottestad · 2021-04-17T07:35:11Z

Maybe for the time being just try to adjust the point where the model overflows to disk. Perhaps you can add some logging to show what it currently thinks is going on memory wise so you can see what numbers it's comparing. Also maybe try without Xms in the benchmark.

The real world benchmark is best to start with I think.

Signed-off-by: Håvard Ottestad <[email protected]>

…flowing, regardless of block size Signed-off-by: Håvard Ottestad <[email protected]>

hmottestad · 2021-04-23T14:30:44Z

I did some logging and found out that sometimes the algorithm would decide that the max block size was something like 10 MB. This is when it would overflow. So I've just added a hard limit of 32MB of required available memory. This works fine for both my tests (benchmarks) and I feel it's a reasonable tradeoff. How many users give their NativeStore 64MB or 128MB and then load large files?

abrokenjester · 2021-04-24T00:47:57Z

I did some logging and found out that sometimes the algorithm would decide that the max block size was something like 10 MB. This is when it would overflow. So I've just added a hard limit of 32MB of required available memory. This works fine for both my tests (benchmarks) and I feel it's a reasonable tradeoff. How many users give their NativeStore 64MB or 128MB and then load large files?

This seems reasonable to me as well, nice workaround.

abrokenjester

LGTM, happy to have this merged. I'll leave it to you.

abrokenjester requested a review from hmottestad February 24, 2021 05:21

hmottestad force-pushed the GH-2799-memoryoverflowmodel-oom branch from e850bb4 to 590dad6 Compare April 16, 2021 06:26

hmottestad force-pushed the GH-2799-memoryoverflowmodel-oom branch from 590dad6 to b76dde0 Compare April 16, 2021 08:21

hmottestad force-pushed the GH-2799-memoryoverflowmodel-oom branch from 1e55523 to 1716d06 Compare April 23, 2021 14:21

hmottestad and others added 4 commits April 23, 2021 16:21

GH-2799 add benchmarks

a8c31e0

Signed-off-by: Håvard Ottestad <[email protected]>

GH-2799 avoid OOM through HashMap resizing by setting initial size

8725cc4

GH-2799 overflow if less than 15% of heap is available

e7cd926

GH-2799 add a hard ceiling (32MB) of free memory required before over…

508e9ee

…flowing, regardless of block size Signed-off-by: Håvard Ottestad <[email protected]>

hmottestad force-pushed the GH-2799-memoryoverflowmodel-oom branch from 1716d06 to 508e9ee Compare April 23, 2021 14:21

abrokenjester commented Apr 24, 2021

View reviewed changes

hmottestad merged commit 62dacb6 into main Apr 24, 2021

hmottestad deleted the GH-2799-memoryoverflowmodel-oom branch April 24, 2021 09:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-2799 avoid OOM through HashMap resizing by setting initial size #2872

GH-2799 avoid OOM through HashMap resizing by setting initial size #2872

abrokenjester commented Feb 24, 2021 •

edited

Loading

abrokenjester commented Feb 24, 2021

hmottestad commented Feb 24, 2021

abrokenjester commented Feb 24, 2021 •

edited

Loading

abrokenjester commented Feb 24, 2021

hmottestad commented Feb 24, 2021

abrokenjester commented Mar 29, 2021

abrokenjester commented Apr 15, 2021

hmottestad commented Apr 16, 2021

abrokenjester commented Apr 16, 2021

hmottestad commented Apr 16, 2021

hmottestad commented Apr 16, 2021

hmottestad commented Apr 16, 2021 •

edited

Loading

abrokenjester commented Apr 16, 2021

hmottestad commented Apr 16, 2021

abrokenjester commented Apr 17, 2021

hmottestad commented Apr 17, 2021

hmottestad commented Apr 23, 2021

abrokenjester commented Apr 24, 2021

abrokenjester left a comment

GH-2799 avoid OOM through HashMap resizing by setting initial size #2872

GH-2799 avoid OOM through HashMap resizing by setting initial size #2872

Conversation

abrokenjester commented Feb 24, 2021 • edited Loading

abrokenjester commented Feb 24, 2021

hmottestad commented Feb 24, 2021

abrokenjester commented Feb 24, 2021 • edited Loading

abrokenjester commented Feb 24, 2021

hmottestad commented Feb 24, 2021

abrokenjester commented Mar 29, 2021

abrokenjester commented Apr 15, 2021

hmottestad commented Apr 16, 2021

abrokenjester commented Apr 16, 2021

hmottestad commented Apr 16, 2021

hmottestad commented Apr 16, 2021

hmottestad commented Apr 16, 2021 • edited Loading

abrokenjester commented Apr 16, 2021

hmottestad commented Apr 16, 2021

abrokenjester commented Apr 17, 2021

hmottestad commented Apr 17, 2021

hmottestad commented Apr 23, 2021

abrokenjester commented Apr 24, 2021

abrokenjester left a comment

Choose a reason for hiding this comment

abrokenjester commented Feb 24, 2021 •

edited

Loading

abrokenjester commented Feb 24, 2021 •

edited

Loading

hmottestad commented Apr 16, 2021 •

edited

Loading