Skip to content
jpountz edited this page Mar 5, 2011 · 2 revisions

This benchmark compares the memory usage and the performance of the trie when storing n entries in the trie then trimming the trie/optimizing it for read-only operations, an then performing a lookup of every key in an order which is different from the insertion order. Times are given in milliseconds and memory usage is given in bytes.

All implementations are instantiated with the default constructor.

The CompositeTrie implementation uses a FastArrayTrie for the first 2 levels and CompactArrayTries then.

The BloomFilteredTrie implementation wraps a CompactArrayTrie and uses the same hashing function as Strings with a bloom filter of 1024*64 bits.

Since keys are looked-up as Strings, HashMap is given an advantage because the hash is pre-computed, but the hash computation is very fast, so HashMap would still be faster if it had to compute the hash every time. If you want to have an idea of how slower HashMap would be without caching the hash code, you can look at the difference in time between CompactArrayTrie and BloomFilteredTrie given that a lookup costs the same plus one hash computation and one BitSet lookup.

English dictionary

This test has been done on the aspell dictionary for the english language (generated with aspell -l en dump master | aspell expand | sort | uniq then randomized) which contains 138349 words.

Insertion time Trimming/read optimization time Read time Size in memory after insertion Size in memory after trimming/optimization
HashMap 33 0 54 8590936 8590936
TreeMap 231 0 242 9961192 9961192
CompactArrayTrie 129 180 81 7297072 4365696
BloomFilteredTrie 142 185 92 7319728 4388352
FastArrayTrie 174 397 129 23628576 16583784
CompositeTrie 95 89 75 9145248 4544416
CompactArrayRadixTrie 247 218 112 8143592 2876896

As you can see, the best trie implementations (CompositeTrie and CompactArrayTrie) are not that slow compared to HashMap. Regarding memory, except FastArrayTrieMap, Trie implementations are 2 to 3 times as memory efficient as HashMap.

Thick generated dictionary

The same test has been done with a dictionary whose content is A AA AAA AAB AAC ... 997 998 999

It contains 242234 words whose length varies from 1 to 3.

Insertion time Trimming/read optimization time Read time Size in memory after insertion Size in memory after trimming/optimization
HashMap 163 0 193 16288152 16288152
TreeMap 435 0 447 17440912 17440912
CompactArrayTrie 203 157 97 3655792 3655792
BloomFilteredTrie 213 155 103 3664048 3664048
FastArrayTrie 66 99 49 4999808 4641664
CompositeTrie 111 56 96 14286384 3949792
CompactArrayRadixTrie 604 347 123 9991640 3877304

This time, most of the trie implementations behave much better than HashMap while being more memory-efficient at the same time.

Generated dictionary of sentences

This dictionary has been generated with all the combinations of 1 to 6 words from the list: "the", "brown", "fox", "jumps", "over", "the", "lazy" and "dog". This dictionary contains 299592 entries.

Insertion time Trimming/read optimization time Read time Size in memory after insertion Size in memory after trimming/optimization
HashMap 121 0 150 8551184 8551264
TreeMap 643 0 670 9882496 9882496
CompactArrayTrie 453 368 248 14622832 7686448
BloomFilteredTrie 486 378 278 14631088 7680304
FastArrayTrie 685 895 406 33132752 19451232
CompositeTrie 424 315 256 12799800 7687704
CompactArrayRadixTrie 760 272 227 8256464 2510136

HashMap becomes the fastest lookup structure again, but CompactArrayRadixTrie, which is the fastest Trie implementation is only 1.5 times slower, but 3.4 times as memory-efficient and is then a good compromise.

Clone this wiki locally