[stdlib] Add `Dict._resize_down` and `_under_load_factor()` #3133

rd4com · 2024-06-28T10:23:10Z

Hello,

this is a PR to solve a problem mentioned by @bethebunny
(in #3128)

Good little feature to have,

it probably should not be a default and be parametrized,

because of the cost in performance of halving memory.

💡 for the Dictionary that not down-resize

Did an implementation of it, maybe you'll like it !

It really grows and ungrows dynamically,

at each pop, if self.size < 1/3 of self._reserved(),

the Dict halves it's memory with a smaller list capacity.

(so the RAM usage is dynamic and change on the size)

When pop is used on all elements of a dict of 256 elements,

the transition goes from 512, 256, 128, 64, 32, 16, 8, 4

That way, it doubles on 2/3 and halves on 1/3

The ratio is quite simple and it seem to do the job:
(same as for resizing up (below 1/3 of self._reserved()))

0.33 * 256 == 84.48 <- we are below (we can resize down)
0.66 * 128 == 84.48 <- we are below (no need to resize up again)

0.66 * 256 == 168.96 <- we are above (we can resize up)
0.33 * 512 == 168.96 < we are above (no need to resize down again)

Example of a `Dict` cycling trough grow and ungrow:

from time import now
def main():
    var result: Int=0
    var dict_size = 256
    var start = now()
    var stop = now()
    
    result = 0

    small2 = Dict[Int,Int]()
    start = now()
    for y in range(2):
        for i in range(dict_size):
            # grow the dictionary up to its full size
            print("before",small2._reserved(), len(small2))
            small2[i]=i
            print("after",small2._reserved(), len(small2))

        for i in range(dict_size):
            # ungrow the dictionary
            print("before",small2._reserved(), len(small2))
            result += small2.pop(i)
            print("after",small2._reserved(), len(small2))
    stop = now()
    print(stop-start, result)

The result is correct (65280):

x = 0
for y in range(2):
    for i in range(256):
        x+=i
print(x) #65280

Maybe that kind feature need to be parametrized, so users can choose?
(resizing down reduce the performance, so may not be a good default)

DictConfig struct, as default Dict parameter is a possibility
- we might end up in a parametrization challenge later
another Dict type is also a possibility (BaloonDictionary like) ?

I suggest the second one would be better, but struggle for naming here.

If reviewers likes it, maybe we can bring it to life with more reviews ?

(There are more methods where the logic need to be done, and tests)

Note that the hard work have been done by people working on Dict,

this pr just add a halving part, which use the same ratio as the doubling.

gryznar · 2024-06-28T12:12:09Z

stdlib/src/collections/dict.mojo

@@ -790,7 +790,16 @@ struct Dict[K: KeyElement, V: CollectionElement](
            var entry_value = entry[].unsafe_take()
            entry[] = None
            self.size -= 1
-            return entry_value.value^
+            var tmp = entry_value.value^ #TODO: not to merge with current PR
+            #It is necessary for def test_pop_default()


Maybe this comment should be part of pull request, not code itself?

Why not just return entry_value.value^ directly?

👍 Thanks @bethebunny , @gryznar,

It is related to this issue:

[stdlib] mark dict entry as destroyed in Dict.pop() #2796

The test test_pop_default() would fail,

I think it has to do with values that needs to be fully initialized to be ready for the __del__ ,

but we move the value out anyway so maybe it has to do with something else ?

bethebunny · 2024-06-29T01:23:27Z

stdlib/src/collections/dict.mojo

@@ -790,7 +790,16 @@ struct Dict[K: KeyElement, V: CollectionElement](
            var entry_value = entry[].unsafe_take()
            entry[] = None
            self.size -= 1
-            return entry_value.value^
+            var tmp = entry_value.value^ #TODO: not to merge with current PR
+            #It is necessary for def test_pop_default()


Why not just return entry_value.value^ directly?

stdlib/src/collections/dict.mojo

rd4com · 2024-07-01T06:30:17Z

✅ Changes for the review by @bethebunny:

Unify _resize_down, _resize_up and _compact in _maybe_resize

✅ Changes for the review by @gryznar:

#TODO: not to merge with current PR
#It is necessary for def test_pop_default()
# See PR #2796 (unmerged, but ready, by helelex), 
# [stdlib] mark dict entry as destroyed in Dict.pop()
#
# _resize_down __del__ current self._entries for smaller new one
# is it why the test fail ?

☑️ Still need to add theses back into the new logic:

debug_assert(right < self._reserved(), "Invalid dict state")
debug_assert(entry[].__bool__(), "Logic error")

⏱️ Benchmarks:

It is slower on pop but we're good on insertion

`insert`

A: 8380416 755676 PR
B: 8380416 913786

from time import now
def main():
    var dict_size = 1<<10
    var x = Dict[Int,Int]()
    var result = 0
    var start = now()
    var stop = now()
    start = now()
    for n in range(16):
        x = Dict[Int,Int]()
        for i in range(dict_size):
            x[i] = i
            result += x[i]
    stop = now()
    
    print(result, stop-start)

`pop`

A: 523776 79933 PR
B: 523776 28685

A: 549755289600 251504430 PR
B: 549755289600 84026254

time import now
def main():
    var dict_size = 1<<10
    var result = 0

    var start = now()
    var stop = now()
    
    var x = Dict[Int,Int]()
    for i in range(dict_size):
        x[i] = i
        
    start = now()
    for i in range(dict_size):
        result += x.pop(i)
    stop = now() 
    
    print(result, stop-start)

martinvuyk · 2024-07-16T14:18:21Z

Maybe that kind feature need to be parametrized, so users can choose?

Hi, yes It would be very interesting to be able to use std Dict in a memory constrained environment like a microcontroller. Maybe something like a memory_constrained: Bool = False parameter. But it can also be that we end up developing a specialized version for such environments as an extension to the stdlib 🤷‍♂️ (see proposal #3232)

JoeLoser

Sorry for leaving this one hanging, @rd4com. The PR LGTM, do you mind rebasing so we can land this?

I'd hold off on the DictConfig idea or separate dict type for now.

Signed-off-by: rd4com <[email protected]>

rd4com · 2024-11-22T20:06:36Z

Hello, all good 👍 ,
we also have an extra speedup by revisiting this! 🥳

Added more test and one with the test_utils for __del__ counts,
also fixed a small thing in _find_slot by using a Pointer (it was doing __copyinit__ of dict_entry),
it results in a 2.27 speedup depending on the type for all methods (insert, pop, __getitem__, ..)
so it should now be even faster than before lol
(there was no Pointer/Reference when Dict got created so that's not a bug but integrating the feature)
(shout out to clattner who put us on the path to also focus on integration in last community meeting 👍 )

small benchmark (1.7x):

from time import now
from random import *
from collections import Dict

alias iteration_size = 1024
def main():
    var result: Int=0
    var start = now()
    var stop = now()

    
    small = Dict[Int,Int]()
    start = now()
    for x in range(100):
        for i in range(iteration_size):
            small[i]=i
        for i in range(iteration_size): 
            result += small[i]
    stop = now()
	_ = small
    print(stop-start, result)

🔥 6.5x speedup

from time import now
from random import *
from sys.param_env import is_defined
from collections import Dict

def main():
    var dict_size = 1<<12
    var result = 0

    var start = now()
    var stop = now()
    
    var x = Dict[String, String]()
     
    for i in range(dict_size):
        var val = str(i)
        x[val] = val

    start = now()
    for i in x.keys():
        result += len(x[i[]])
    stop = now() 
    
    print(result, stop-start)

2x for `**kwargs`

from time import now

fn sum_args(**kwargs: Int) raises ->Int:
    var res = 0
    for k in kwargs:
        res+=(kwargs[k[]])
    return res

def main():
    var iterations = 1<<10
    var result = 0

    var start = now()
    var stop = now()
    
        
    start = now()
    for i in range(iterations):
        result += sum_args(a=i, b=i+1, c=i+2)
    stop = now() 
    
    print(result, stop-start)

It is difficult to measure but Pointer should be faster anyway,
please do some benchmarks if needed 👍

There is also another idea to speedup **kwargs by using Dict[UInt, V],
and just rebind[UInt](int("arg1".unsafe_ptr())) !
(but not sure if it is good to have a pointer as K or not)

rd4com requested a review from a team as a code owner June 28, 2024 10:23

gryznar reviewed Jun 28, 2024

View reviewed changes

rd4com mentioned this pull request Jun 28, 2024

[Stdlib] Speedup Dict (reuse removed _DictIndex slots (not entries)) #3128

Open

bethebunny reviewed Jun 29, 2024

View reviewed changes

rd4com force-pushed the dict_feature_resize_down branch 2 times, most recently from 614feb7 to ade41f9 Compare July 1, 2024 06:26

JoeLoser self-assigned this Jul 12, 2024

jon-chuang mentioned this pull request Jul 16, 2024

[Feature Request] Add benchmark for Dict and explore swiss table algorithm with SIMD probing #3254

Open

1 task

JoeLoser approved these changes Nov 22, 2024

View reviewed changes

JoeLoser added the waiting for response label Nov 22, 2024

rd4com added 2 commits November 22, 2024 15:52

[stdlib] Add Dict ability to auto reduce capacity

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired
Learn about vigilant mode

02dea14

Signed-off-by: rd4com <[email protected]>

[stdlib] Add tests, speedup and rebase

b410ae7

Signed-off-by: rd4com <[email protected]>

rd4com force-pushed the dict_feature_resize_down branch from ade41f9 to b410ae7 Compare November 22, 2024 19:53

rd4com mentioned this pull request Nov 25, 2024

[stdlib] Integrate hint_trivial_type in List.__copyinit__ #3809

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stdlib] Add `Dict._resize_down` and `_under_load_factor()` #3133

[stdlib] Add `Dict._resize_down` and `_under_load_factor()` #3133

rd4com commented Jun 28, 2024 •

edited

Loading

gryznar Jun 28, 2024 •

edited

Loading

bethebunny Jun 29, 2024

rd4com Jul 1, 2024

bethebunny Jun 29, 2024

rd4com commented Jul 1, 2024 •

edited

Loading

martinvuyk commented Jul 16, 2024

JoeLoser left a comment

rd4com commented Nov 22, 2024

[stdlib] Add Dict._resize_down and _under_load_factor() #3133

Are you sure you want to change the base?

[stdlib] Add Dict._resize_down and _under_load_factor() #3133

Conversation

rd4com commented Jun 28, 2024 • edited Loading

💡 for the Dictionary that not down-resize

Example of a Dict cycling trough grow and ungrow:

gryznar Jun 28, 2024 • edited Loading

Choose a reason for hiding this comment

bethebunny Jun 29, 2024

Choose a reason for hiding this comment

rd4com Jul 1, 2024

Choose a reason for hiding this comment

bethebunny Jun 29, 2024

Choose a reason for hiding this comment

rd4com commented Jul 1, 2024 • edited Loading

✅ Changes for the review by @bethebunny:

✅ Changes for the review by @gryznar:

☑️ Still need to add theses back into the new logic:

⏱️ Benchmarks:

insert

pop

martinvuyk commented Jul 16, 2024

JoeLoser left a comment

Choose a reason for hiding this comment

rd4com commented Nov 22, 2024

small benchmark (1.7x):

🔥 6.5x speedup

2x for **kwargs

[stdlib] Add `Dict._resize_down` and `_under_load_factor()` #3133

[stdlib] Add `Dict._resize_down` and `_under_load_factor()` #3133

rd4com commented Jun 28, 2024 •

edited

Loading

Example of a `Dict` cycling trough grow and ungrow:

gryznar Jun 28, 2024 •

edited

Loading

rd4com commented Jul 1, 2024 •

edited

Loading

`insert`

`pop`

2x for `**kwargs`