Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OverwriteEmptyByte instead of ClearByte in GrowToNextAllocSize. #4754

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from

Conversation

goldvitaly
Copy link
Contributor

This is simplification that allows to remove ClearDeleted and intended
to have similar performance.

Benchmarks are close to noise. Leaning positive on X86 and leaning
negative on ARM.

  1. On Arm:
    &= ~(uint64_t{0xff} << (byte_index * 8)) -> |= (uint64_t{tag | 0x80} << (byte_index * 8)).
    This operation is even reducing latency from 4 -> 3.
    But we have another shift operation to compute the tag. Theoretically it shouldn't be slower.

  2. On x86:
    new_metadata[new_index ^ old_size] = 0; -> new_metadata[new_index] = tag | MetadataGroup::PresentMask;.
    Here we have the same latency, but we need an extra shift to compute the tag.

Benchmarks on X86:

name                                                  old cpu/op   new cpu/op   delta
BM_MapInsertSeq<Map<int, int>>/1                      32.7ns ± 4%  32.8ns ± 3%     ~     (p=0.354 n=52+52)
BM_MapInsertSeq<Map<int, int>>/2                      42.3ns ± 3%  40.4ns ± 4%   -4.56%  (p=0.000 n=54+55)
BM_MapInsertSeq<Map<int, int>>/3                      50.5ns ± 3%  51.1ns ± 3%   +1.22%  (p=0.000 n=53+56)
BM_MapInsertSeq<Map<int, int>>/4                      59.9ns ± 4%  60.8ns ± 4%   +1.45%  (p=0.000 n=54+57)
BM_MapInsertSeq<Map<int, int>>/8                      98.8ns ± 4%  98.3ns ± 3%   -0.50%  (p=0.044 n=56+56)
BM_MapInsertSeq<Map<int, int>>/16                      163ns ± 4%   158ns ± 4%   -2.77%  (p=0.000 n=55+51)
BM_MapInsertSeq<Map<int, int>>/32                      281ns ± 4%   257ns ± 3%   -8.30%  (p=0.000 n=55+55)
BM_MapInsertSeq<Map<int, int>>/64                      646ns ± 4%   624ns ± 3%   -3.41%  (p=0.000 n=55+55)
BM_MapInsertSeq<Map<int, int>>/256                    2.42µs ± 4%  2.38µs ± 3%   -1.57%  (p=0.000 n=53+56)
BM_MapInsertSeq<Map<int, int>>/4096                   38.4µs ± 3%  38.6µs ± 3%     ~     (p=0.118 n=54+52)
BM_MapInsertSeq<Map<int, int>>/65536                   968µs ± 3%   972µs ± 3%     ~     (p=0.333 n=56+57)
BM_MapInsertSeq<Map<int, int>>/1048576                16.3ms ± 5%  16.3ms ± 6%     ~     (p=0.269 n=56+55)
BM_MapInsertSeq<Map<int, int>>/16777216                646ms ± 3%   646ms ± 4%     ~     (p=0.811 n=57+55)
BM_MapInsertSeq<Map<int, int>>/56                      436ns ± 3%   423ns ± 3%   -2.89%  (p=0.000 n=56+57)
BM_MapInsertSeq<Map<int, int>>/224                    1.66µs ± 4%  1.63µs ± 4%   -2.36%  (p=0.000 n=55+57)
BM_MapInsertSeq<Map<int, int>>/3584                   25.1µs ± 4%  25.4µs ± 5%   +1.03%  (p=0.002 n=55+57)
BM_MapInsertSeq<Map<int, int>>/57344                   559µs ± 3%   567µs ± 4%   +1.31%  (p=0.000 n=56+57)
BM_MapInsertSeq<Map<int, int>>/917504                 10.4ms ± 4%  10.4ms ± 4%     ~     (p=0.417 n=56+55)
BM_MapInsertSeq<Map<int, int>>/14680064                420ms ± 3%   420ms ± 3%     ~     (p=0.740 n=57+55)
BM_MapInsertSeq<Map<int*, int*>>/1                    33.8ns ± 3%  33.9ns ± 3%     ~     (p=0.191 n=55+56)
BM_MapInsertSeq<Map<int*, int*>>/2                    37.0ns ± 3%  37.0ns ± 4%     ~     (p=0.948 n=55+56)
BM_MapInsertSeq<Map<int*, int*>>/3                    41.5ns ± 4%  41.6ns ± 4%     ~     (p=0.503 n=56+56)
BM_MapInsertSeq<Map<int*, int*>>/4                    46.1ns ± 4%  47.2ns ± 4%   +2.31%  (p=0.000 n=56+56)
BM_MapInsertSeq<Map<int*, int*>>/8                    63.6ns ± 4%  65.4ns ± 3%   +2.89%  (p=0.000 n=57+56)
BM_MapInsertSeq<Map<int*, int*>>/16                    133ns ± 4%   128ns ± 4%   -3.74%  (p=0.000 n=57+56)
BM_MapInsertSeq<Map<int*, int*>>/32                    237ns ± 3%   236ns ± 4%     ~     (p=0.082 n=57+57)
BM_MapInsertSeq<Map<int*, int*>>/64                    597ns ± 3%   631ns ± 3%   +5.73%  (p=0.000 n=57+57)
BM_MapInsertSeq<Map<int*, int*>>/256                  2.78µs ± 3%  2.89µs ± 4%   +3.82%  (p=0.000 n=57+56)
BM_MapInsertSeq<Map<int*, int*>>/4096                 51.9µs ± 3%  54.2µs ± 4%   +4.46%  (p=0.000 n=57+57)
BM_MapInsertSeq<Map<int*, int*>>/65536                1.16ms ± 3%  1.18ms ± 4%   +1.10%  (p=0.000 n=57+57)
BM_MapInsertSeq<Map<int*, int*>>/1048576              28.9ms ± 4%  29.0ms ± 4%     ~     (p=0.549 n=57+54)
BM_MapInsertSeq<Map<int*, int*>>/16777216              914ms ± 3%   912ms ± 3%     ~     (p=0.147 n=57+54)
BM_MapInsertSeq<Map<int*, int*>>/56                    366ns ± 3%   407ns ± 3%  +11.27%  (p=0.000 n=57+56)
BM_MapInsertSeq<Map<int*, int*>>/224                  1.86µs ± 4%  1.96µs ± 3%   +5.18%  (p=0.000 n=57+57)
BM_MapInsertSeq<Map<int*, int*>>/3584                 33.9µs ± 4%  35.9µs ± 3%   +6.04%  (p=0.000 n=57+52)
BM_MapInsertSeq<Map<int*, int*>>/57344                 763µs ± 3%   772µs ± 4%   +1.25%  (p=0.000 n=57+57)
BM_MapInsertSeq<Map<int*, int*>>/917504               16.8ms ±11%  16.6ms ± 6%     ~     (p=0.454 n=57+52)
BM_MapInsertSeq<Map<int*, int*>>/14680064              610ms ± 2%   608ms ± 3%     ~     (p=0.127 n=57+55)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/1          34.6ns ± 4%  34.5ns ± 3%     ~     (p=0.331 n=56+52)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/2          43.9ns ± 4%  37.8ns ± 3%  -13.78%  (p=0.000 n=57+55)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/3          47.9ns ± 4%  48.9ns ± 3%   +2.16%  (p=0.000 n=56+55)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/4          53.6ns ± 3%  54.0ns ± 3%   +0.72%  (p=0.012 n=56+54)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/8          84.6ns ± 4%  81.0ns ± 3%   -4.21%  (p=0.000 n=57+50)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/16          142ns ± 3%   136ns ± 3%   -4.22%  (p=0.000 n=57+55)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/32          258ns ± 4%   245ns ± 3%   -4.96%  (p=0.000 n=57+55)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/64          699ns ± 4%   692ns ± 3%   -1.02%  (p=0.000 n=50+50)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/256        2.96µs ± 6%  2.97µs ± 4%     ~     (p=0.098 n=57+54)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/4096       50.4µs ± 3%  50.7µs ± 2%   +0.53%  (p=0.030 n=54+53)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/65536      1.38ms ± 2%  1.42ms ± 3%   +2.76%  (p=0.000 n=54+56)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/1048576    37.8ms ± 4%  38.1ms ± 4%   +0.78%  (p=0.016 n=57+54)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/16777216    1.19s ± 2%   1.20s ± 3%     ~     (p=0.055 n=57+54)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/56          430ns ± 3%   413ns ± 3%   -4.10%  (p=0.000 n=54+57)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/224        1.94µs ± 8%  1.93µs ± 6%     ~     (p=0.587 n=57+56)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/3584       32.1µs ± 4%  32.2µs ± 3%     ~     (p=0.403 n=56+57)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/57344       746µs ± 3%   760µs ± 3%   +1.93%  (p=0.000 n=57+56)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/917504     22.0ms ± 6%  22.2ms ± 5%   +0.90%  (p=0.025 n=57+53)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/14680064    734ms ± 2%   736ms ± 3%     ~     (p=0.107 n=57+55)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/1          40.8ns ± 4%  40.8ns ± 4%     ~     (p=0.685 n=54+54)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/2          49.5ns ± 4%  49.6ns ± 7%     ~     (p=0.927 n=56+57)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/3          58.1ns ± 4%  58.2ns ± 4%     ~     (p=0.632 n=55+55)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/4          66.7ns ± 4%  67.0ns ± 5%     ~     (p=0.235 n=55+56)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/8           104ns ± 6%   105ns ± 6%     ~     (p=0.171 n=55+57)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/16          196ns ± 5%   197ns ± 6%     ~     (p=0.189 n=54+57)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/32          361ns ± 8%   364ns ± 7%     ~     (p=0.156 n=53+55)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/64         1.05µs ± 6%  1.04µs ± 6%     ~     (p=0.069 n=56+55)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/256        5.32µs ± 5%  5.24µs ± 4%   -1.46%  (p=0.000 n=57+54)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/4096        148µs ± 4%   147µs ± 4%   -0.65%  (p=0.028 n=55+57)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/65536      3.17ms ± 3%  3.13ms ± 2%   -1.49%  (p=0.000 n=54+57)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/1048576    99.1ms ± 3%  98.3ms ± 3%   -0.89%  (p=0.000 n=57+55)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/16777216    2.40s ± 3%   2.40s ± 2%     ~     (p=0.247 n=56+54)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/56          612ns ± 6%   612ns ± 8%     ~     (p=0.968 n=52+56)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/224        3.56µs ± 7%  3.51µs ± 4%   -1.40%  (p=0.006 n=57+55)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/3584       87.3µs ± 5%  88.5µs ± 6%   +1.30%  (p=0.011 n=57+57)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/57344      1.95ms ± 3%  1.93ms ± 4%   -0.77%  (p=0.000 n=55+57)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/917504     61.2ms ± 4%  61.0ms ± 3%     ~     (p=0.059 n=57+54)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/14680064    1.50s ± 3%   1.50s ± 3%     ~     (p=0.609 n=57+55)

Benchmarks on ARM:

name                                                  old cpu/op   new cpu/op   delta
BM_MapInsertSeq<Map<int, int>>/1                      39.6ns ± 1%  39.6ns ± 1%    ~     (p=0.416 n=155+156)
BM_MapInsertSeq<Map<int, int>>/2                      44.6ns ± 1%  44.6ns ± 1%    ~     (p=0.993 n=155+157)
BM_MapInsertSeq<Map<int, int>>/3                      50.1ns ± 3%  50.1ns ± 2%    ~     (p=0.585 n=156+157)
BM_MapInsertSeq<Map<int, int>>/4                      55.7ns ± 3%  55.4ns ± 1%  -0.61%  (p=0.000 n=156+119)
BM_MapInsertSeq<Map<int, int>>/8                      78.0ns ± 5%  77.2ns ± 1%  -1.01%  (p=0.000 n=157+119)
BM_MapInsertSeq<Map<int, int>>/16                      123ns ± 5%   121ns ± 0%  -1.32%  (p=0.000 n=157+119)
BM_MapInsertSeq<Map<int, int>>/32                      216ns ± 6%   213ns ± 0%  -1.51%  (p=0.000 n=157+119)
BM_MapInsertSeq<Map<int, int>>/64                      644ns ± 4%   659ns ± 3%  +2.19%  (p=0.000 n=157+146)
BM_MapInsertSeq<Map<int, int>>/256                    3.22µs ± 4%  3.30µs ± 5%  +2.56%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int, int>>/4096                   53.8µs ± 3%  55.5µs ± 2%  +3.18%  (p=0.000 n=157+144)
BM_MapInsertSeq<Map<int, int>>/65536                  1.29ms ± 3%  1.33ms ± 4%  +2.97%  (p=0.000 n=155+155)
BM_MapInsertSeq<Map<int, int>>/1048576                30.2ms ± 8%  31.0ms ± 9%  +2.58%  (p=0.000 n=157+156)
BM_MapInsertSeq<Map<int, int>>/16777216                902ms ±14%   924ms ±17%  +2.47%  (p=0.005 n=157+157)
BM_MapInsertSeq<Map<int, int>>/56                      350ns ± 6%   345ns ± 0%  -1.64%  (p=0.023 n=157+119)
BM_MapInsertSeq<Map<int, int>>/224                    2.09µs ± 5%  2.14µs ± 5%  +2.01%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int, int>>/3584                   36.6µs ± 4%  37.3µs ± 3%  +1.76%  (p=0.000 n=157+142)
BM_MapInsertSeq<Map<int, int>>/57344                   903µs ± 5%   923µs ± 4%  +2.23%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int, int>>/917504                 21.7ms ± 8%  22.0ms ± 8%  +1.52%  (p=0.000 n=157+153)
BM_MapInsertSeq<Map<int, int>>/14680064                678ms ±18%   692ms ±18%  +2.03%  (p=0.037 n=157+157)
BM_MapInsertSeq<Map<int*, int*>>/1                    40.7ns ± 1%  40.7ns ± 1%    ~     (p=0.216 n=152+153)
BM_MapInsertSeq<Map<int*, int*>>/2                    45.5ns ± 1%  45.5ns ± 1%    ~     (p=0.403 n=155+156)
BM_MapInsertSeq<Map<int*, int*>>/3                    51.1ns ± 1%  51.1ns ± 1%    ~     (p=0.962 n=155+156)
BM_MapInsertSeq<Map<int*, int*>>/4                    57.5ns ± 4%  59.7ns ± 1%  +3.86%  (p=0.000 n=157+149)
BM_MapInsertSeq<Map<int*, int*>>/8                    80.3ns ± 4%  82.3ns ± 1%  +2.52%  (p=0.000 n=157+108)
BM_MapInsertSeq<Map<int*, int*>>/16                    127ns ± 4%   129ns ± 4%  +1.57%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int*, int*>>/32                    229ns ± 4%   230ns ± 4%  +0.49%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int*, int*>>/64                    695ns ± 3%   705ns ± 4%  +1.37%  (p=0.000 n=155+156)
BM_MapInsertSeq<Map<int*, int*>>/256                  3.64µs ± 7%  3.46µs ± 3%  -4.93%  (p=0.000 n=157+154)
BM_MapInsertSeq<Map<int*, int*>>/4096                 58.8µs ± 2%  60.8µs ± 2%  +3.46%  (p=0.000 n=157+148)
BM_MapInsertSeq<Map<int*, int*>>/65536                1.14ms ± 2%  1.17ms ± 3%  +3.23%  (p=0.000 n=156+157)
BM_MapInsertSeq<Map<int*, int*>>/1048576              41.4ms ±12%  42.3ms ±13%  +2.13%  (p=0.001 n=157+156)
BM_MapInsertSeq<Map<int*, int*>>/16777216              1.10s ±20%   1.12s ±22%    ~     (p=0.082 n=157+157)
BM_MapInsertSeq<Map<int*, int*>>/56                    377ns ± 5%   372ns ± 7%  -1.19%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int*, int*>>/224                  2.40µs ± 9%  2.23µs ± 4%  -6.97%  (p=0.000 n=157+155)
BM_MapInsertSeq<Map<int*, int*>>/3584                 39.6µs ± 3%  40.0µs ± 2%  +1.01%  (p=0.000 n=157+121)
BM_MapInsertSeq<Map<int*, int*>>/57344                 776µs ± 3%   795µs ± 3%  +2.40%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int*, int*>>/917504               31.0ms ±13%  31.4ms ±14%    ~     (p=0.055 n=157+156)
BM_MapInsertSeq<Map<int*, int*>>/14680064              776ms ±22%   783ms ±25%    ~     (p=0.231 n=157+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/1          41.2ns ± 1%  41.2ns ± 1%    ~     (p=0.516 n=155+154)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/2          46.8ns ± 1%  46.8ns ± 1%    ~     (p=0.324 n=155+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/3          52.7ns ± 2%  52.7ns ± 2%    ~     (p=0.122 n=155+154)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/4          58.8ns ± 3%  58.7ns ± 1%    ~     (p=0.142 n=156+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/8          82.7ns ± 4%  82.2ns ± 1%  -0.66%  (p=0.004 n=157+119)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/16          131ns ± 4%   130ns ± 0%  -0.85%  (p=0.001 n=157+119)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/32          231ns ± 4%   229ns ± 0%  -0.94%  (p=0.000 n=157+119)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/64          727ns ± 3%   744ns ± 3%  +2.28%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/256        3.71µs ± 3%  3.79µs ± 3%  +2.24%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/4096       64.4µs ± 2%  66.2µs ± 2%  +2.75%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/65536      1.64ms ± 4%  1.68ms ± 3%  +2.11%  (p=0.000 n=149+145)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/1048576    39.5ms ±11%  40.5ms ±13%  +2.32%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/16777216    1.14s ±16%   1.16s ±17%    ~     (p=0.082 n=157+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/56          375ns ± 4%   371ns ± 0%  -1.03%  (p=0.000 n=157+119)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/224        2.35µs ± 3%  2.40µs ± 4%  +1.84%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/3584       42.4µs ± 2%  43.2µs ± 2%  +1.76%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/57344      1.15ms ± 3%  1.16ms ± 3%  +1.50%  (p=0.000 n=148+144)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/917504     27.4ms ± 8%  27.8ms ±10%  +1.44%  (p=0.002 n=157+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/14680064    786ms ±19%   791ms ±20%    ~     (p=0.239 n=157+157)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/1          56.3ns ± 1%  56.4ns ± 1%    ~     (p=0.363 n=151+149)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/2          69.7ns ± 0%  69.7ns ± 0%    ~     (p=0.330 n=153+150)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/3          83.2ns ± 0%  83.1ns ± 0%    ~     (p=0.129 n=149+154)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/4          96.5ns ± 0%  96.5ns ± 0%    ~     (p=0.619 n=153+153)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/8           154ns ± 0%   154ns ± 0%    ~     (p=0.769 n=147+149)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/16          292ns ± 1%   292ns ± 1%    ~     (p=0.511 n=146+146)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/32          542ns ± 4%   542ns ± 4%    ~     (p=0.729 n=157+156)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/64         1.61µs ± 6%  1.62µs ± 4%  +0.62%  (p=0.003 n=157+156)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/256        8.60µs ± 5%  8.69µs ± 5%  +1.02%  (p=0.000 n=148+151)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/4096        195µs ± 3%   195µs ± 3%    ~     (p=0.668 n=154+155)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/65536      4.45ms ± 4%  4.45ms ± 4%    ~     (p=0.698 n=148+141)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/1048576     126ms ±11%   130ms ±10%  +3.06%  (p=0.000 n=157+142)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/16777216    3.28s ±12%   3.43s ± 9%  +4.72%  (p=0.000 n=157+137)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/56          942ns ± 5%   939ns ± 5%    ~     (p=0.284 n=156+157)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/224        5.82µs ± 5%  5.86µs ± 5%  +0.63%  (p=0.008 n=153+155)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/3584        122µs ± 4%   122µs ± 3%    ~     (p=0.822 n=156+154)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/57344      2.82ms ± 3%  2.82ms ± 3%    ~     (p=0.765 n=150+144)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/917504     79.4ms ± 7%  80.0ms ± 7%  +0.83%  (p=0.008 n=149+148)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/14680064    2.17s ±15%   2.25s ±14%  +3.51%  (p=0.000 n=157+146)

This is simplification that allows to remove `ClearDeleted` and intended
to have similar performance.

Benchmarks are close to noise. Leaning positive on X86 and leaning
negative on ARM.

1) On Arm:
`&= ~(uint64_t{0xff} << (byte_index * 8))` -> `|= (uint64_t{tag | 0x80} << (byte_index * 8))`.
This operation is even reducing latency from 4 -> 3.
But we have another shift operation to compute the tag. Theoretically it shouldn't be slower.

2) On x86:
`new_metadata[new_index ^ old_size] = 0;` -> `new_metadata[new_index] = tag | MetadataGroup::PresentMask;`.
Here we have the same latency, but we need an extra shift to compute the tag.

Benchmarks on X86:
```
name                                                  old cpu/op   new cpu/op   delta
BM_MapInsertSeq<Map<int, int>>/1                      32.7ns ± 4%  32.8ns ± 3%     ~     (p=0.354 n=52+52)
BM_MapInsertSeq<Map<int, int>>/2                      42.3ns ± 3%  40.4ns ± 4%   -4.56%  (p=0.000 n=54+55)
BM_MapInsertSeq<Map<int, int>>/3                      50.5ns ± 3%  51.1ns ± 3%   +1.22%  (p=0.000 n=53+56)
BM_MapInsertSeq<Map<int, int>>/4                      59.9ns ± 4%  60.8ns ± 4%   +1.45%  (p=0.000 n=54+57)
BM_MapInsertSeq<Map<int, int>>/8                      98.8ns ± 4%  98.3ns ± 3%   -0.50%  (p=0.044 n=56+56)
BM_MapInsertSeq<Map<int, int>>/16                      163ns ± 4%   158ns ± 4%   -2.77%  (p=0.000 n=55+51)
BM_MapInsertSeq<Map<int, int>>/32                      281ns ± 4%   257ns ± 3%   -8.30%  (p=0.000 n=55+55)
BM_MapInsertSeq<Map<int, int>>/64                      646ns ± 4%   624ns ± 3%   -3.41%  (p=0.000 n=55+55)
BM_MapInsertSeq<Map<int, int>>/256                    2.42µs ± 4%  2.38µs ± 3%   -1.57%  (p=0.000 n=53+56)
BM_MapInsertSeq<Map<int, int>>/4096                   38.4µs ± 3%  38.6µs ± 3%     ~     (p=0.118 n=54+52)
BM_MapInsertSeq<Map<int, int>>/65536                   968µs ± 3%   972µs ± 3%     ~     (p=0.333 n=56+57)
BM_MapInsertSeq<Map<int, int>>/1048576                16.3ms ± 5%  16.3ms ± 6%     ~     (p=0.269 n=56+55)
BM_MapInsertSeq<Map<int, int>>/16777216                646ms ± 3%   646ms ± 4%     ~     (p=0.811 n=57+55)
BM_MapInsertSeq<Map<int, int>>/56                      436ns ± 3%   423ns ± 3%   -2.89%  (p=0.000 n=56+57)
BM_MapInsertSeq<Map<int, int>>/224                    1.66µs ± 4%  1.63µs ± 4%   -2.36%  (p=0.000 n=55+57)
BM_MapInsertSeq<Map<int, int>>/3584                   25.1µs ± 4%  25.4µs ± 5%   +1.03%  (p=0.002 n=55+57)
BM_MapInsertSeq<Map<int, int>>/57344                   559µs ± 3%   567µs ± 4%   +1.31%  (p=0.000 n=56+57)
BM_MapInsertSeq<Map<int, int>>/917504                 10.4ms ± 4%  10.4ms ± 4%     ~     (p=0.417 n=56+55)
BM_MapInsertSeq<Map<int, int>>/14680064                420ms ± 3%   420ms ± 3%     ~     (p=0.740 n=57+55)
BM_MapInsertSeq<Map<int*, int*>>/1                    33.8ns ± 3%  33.9ns ± 3%     ~     (p=0.191 n=55+56)
BM_MapInsertSeq<Map<int*, int*>>/2                    37.0ns ± 3%  37.0ns ± 4%     ~     (p=0.948 n=55+56)
BM_MapInsertSeq<Map<int*, int*>>/3                    41.5ns ± 4%  41.6ns ± 4%     ~     (p=0.503 n=56+56)
BM_MapInsertSeq<Map<int*, int*>>/4                    46.1ns ± 4%  47.2ns ± 4%   +2.31%  (p=0.000 n=56+56)
BM_MapInsertSeq<Map<int*, int*>>/8                    63.6ns ± 4%  65.4ns ± 3%   +2.89%  (p=0.000 n=57+56)
BM_MapInsertSeq<Map<int*, int*>>/16                    133ns ± 4%   128ns ± 4%   -3.74%  (p=0.000 n=57+56)
BM_MapInsertSeq<Map<int*, int*>>/32                    237ns ± 3%   236ns ± 4%     ~     (p=0.082 n=57+57)
BM_MapInsertSeq<Map<int*, int*>>/64                    597ns ± 3%   631ns ± 3%   +5.73%  (p=0.000 n=57+57)
BM_MapInsertSeq<Map<int*, int*>>/256                  2.78µs ± 3%  2.89µs ± 4%   +3.82%  (p=0.000 n=57+56)
BM_MapInsertSeq<Map<int*, int*>>/4096                 51.9µs ± 3%  54.2µs ± 4%   +4.46%  (p=0.000 n=57+57)
BM_MapInsertSeq<Map<int*, int*>>/65536                1.16ms ± 3%  1.18ms ± 4%   +1.10%  (p=0.000 n=57+57)
BM_MapInsertSeq<Map<int*, int*>>/1048576              28.9ms ± 4%  29.0ms ± 4%     ~     (p=0.549 n=57+54)
BM_MapInsertSeq<Map<int*, int*>>/16777216              914ms ± 3%   912ms ± 3%     ~     (p=0.147 n=57+54)
BM_MapInsertSeq<Map<int*, int*>>/56                    366ns ± 3%   407ns ± 3%  +11.27%  (p=0.000 n=57+56)
BM_MapInsertSeq<Map<int*, int*>>/224                  1.86µs ± 4%  1.96µs ± 3%   +5.18%  (p=0.000 n=57+57)
BM_MapInsertSeq<Map<int*, int*>>/3584                 33.9µs ± 4%  35.9µs ± 3%   +6.04%  (p=0.000 n=57+52)
BM_MapInsertSeq<Map<int*, int*>>/57344                 763µs ± 3%   772µs ± 4%   +1.25%  (p=0.000 n=57+57)
BM_MapInsertSeq<Map<int*, int*>>/917504               16.8ms ±11%  16.6ms ± 6%     ~     (p=0.454 n=57+52)
BM_MapInsertSeq<Map<int*, int*>>/14680064              610ms ± 2%   608ms ± 3%     ~     (p=0.127 n=57+55)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/1          34.6ns ± 4%  34.5ns ± 3%     ~     (p=0.331 n=56+52)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/2          43.9ns ± 4%  37.8ns ± 3%  -13.78%  (p=0.000 n=57+55)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/3          47.9ns ± 4%  48.9ns ± 3%   +2.16%  (p=0.000 n=56+55)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/4          53.6ns ± 3%  54.0ns ± 3%   +0.72%  (p=0.012 n=56+54)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/8          84.6ns ± 4%  81.0ns ± 3%   -4.21%  (p=0.000 n=57+50)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/16          142ns ± 3%   136ns ± 3%   -4.22%  (p=0.000 n=57+55)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/32          258ns ± 4%   245ns ± 3%   -4.96%  (p=0.000 n=57+55)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/64          699ns ± 4%   692ns ± 3%   -1.02%  (p=0.000 n=50+50)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/256        2.96µs ± 6%  2.97µs ± 4%     ~     (p=0.098 n=57+54)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/4096       50.4µs ± 3%  50.7µs ± 2%   +0.53%  (p=0.030 n=54+53)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/65536      1.38ms ± 2%  1.42ms ± 3%   +2.76%  (p=0.000 n=54+56)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/1048576    37.8ms ± 4%  38.1ms ± 4%   +0.78%  (p=0.016 n=57+54)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/16777216    1.19s ± 2%   1.20s ± 3%     ~     (p=0.055 n=57+54)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/56          430ns ± 3%   413ns ± 3%   -4.10%  (p=0.000 n=54+57)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/224        1.94µs ± 8%  1.93µs ± 6%     ~     (p=0.587 n=57+56)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/3584       32.1µs ± 4%  32.2µs ± 3%     ~     (p=0.403 n=56+57)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/57344       746µs ± 3%   760µs ± 3%   +1.93%  (p=0.000 n=57+56)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/917504     22.0ms ± 6%  22.2ms ± 5%   +0.90%  (p=0.025 n=57+53)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/14680064    734ms ± 2%   736ms ± 3%     ~     (p=0.107 n=57+55)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/1          40.8ns ± 4%  40.8ns ± 4%     ~     (p=0.685 n=54+54)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/2          49.5ns ± 4%  49.6ns ± 7%     ~     (p=0.927 n=56+57)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/3          58.1ns ± 4%  58.2ns ± 4%     ~     (p=0.632 n=55+55)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/4          66.7ns ± 4%  67.0ns ± 5%     ~     (p=0.235 n=55+56)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/8           104ns ± 6%   105ns ± 6%     ~     (p=0.171 n=55+57)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/16          196ns ± 5%   197ns ± 6%     ~     (p=0.189 n=54+57)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/32          361ns ± 8%   364ns ± 7%     ~     (p=0.156 n=53+55)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/64         1.05µs ± 6%  1.04µs ± 6%     ~     (p=0.069 n=56+55)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/256        5.32µs ± 5%  5.24µs ± 4%   -1.46%  (p=0.000 n=57+54)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/4096        148µs ± 4%   147µs ± 4%   -0.65%  (p=0.028 n=55+57)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/65536      3.17ms ± 3%  3.13ms ± 2%   -1.49%  (p=0.000 n=54+57)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/1048576    99.1ms ± 3%  98.3ms ± 3%   -0.89%  (p=0.000 n=57+55)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/16777216    2.40s ± 3%   2.40s ± 2%     ~     (p=0.247 n=56+54)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/56          612ns ± 6%   612ns ± 8%     ~     (p=0.968 n=52+56)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/224        3.56µs ± 7%  3.51µs ± 4%   -1.40%  (p=0.006 n=57+55)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/3584       87.3µs ± 5%  88.5µs ± 6%   +1.30%  (p=0.011 n=57+57)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/57344      1.95ms ± 3%  1.93ms ± 4%   -0.77%  (p=0.000 n=55+57)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/917504     61.2ms ± 4%  61.0ms ± 3%     ~     (p=0.059 n=57+54)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/14680064    1.50s ± 3%   1.50s ± 3%     ~     (p=0.609 n=57+55)
```

Benchmarks on ARM:
```
name                                                  old cpu/op   new cpu/op   delta
BM_MapInsertSeq<Map<int, int>>/1                      39.6ns ± 1%  39.6ns ± 1%    ~     (p=0.416 n=155+156)
BM_MapInsertSeq<Map<int, int>>/2                      44.6ns ± 1%  44.6ns ± 1%    ~     (p=0.993 n=155+157)
BM_MapInsertSeq<Map<int, int>>/3                      50.1ns ± 3%  50.1ns ± 2%    ~     (p=0.585 n=156+157)
BM_MapInsertSeq<Map<int, int>>/4                      55.7ns ± 3%  55.4ns ± 1%  -0.61%  (p=0.000 n=156+119)
BM_MapInsertSeq<Map<int, int>>/8                      78.0ns ± 5%  77.2ns ± 1%  -1.01%  (p=0.000 n=157+119)
BM_MapInsertSeq<Map<int, int>>/16                      123ns ± 5%   121ns ± 0%  -1.32%  (p=0.000 n=157+119)
BM_MapInsertSeq<Map<int, int>>/32                      216ns ± 6%   213ns ± 0%  -1.51%  (p=0.000 n=157+119)
BM_MapInsertSeq<Map<int, int>>/64                      644ns ± 4%   659ns ± 3%  +2.19%  (p=0.000 n=157+146)
BM_MapInsertSeq<Map<int, int>>/256                    3.22µs ± 4%  3.30µs ± 5%  +2.56%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int, int>>/4096                   53.8µs ± 3%  55.5µs ± 2%  +3.18%  (p=0.000 n=157+144)
BM_MapInsertSeq<Map<int, int>>/65536                  1.29ms ± 3%  1.33ms ± 4%  +2.97%  (p=0.000 n=155+155)
BM_MapInsertSeq<Map<int, int>>/1048576                30.2ms ± 8%  31.0ms ± 9%  +2.58%  (p=0.000 n=157+156)
BM_MapInsertSeq<Map<int, int>>/16777216                902ms ±14%   924ms ±17%  +2.47%  (p=0.005 n=157+157)
BM_MapInsertSeq<Map<int, int>>/56                      350ns ± 6%   345ns ± 0%  -1.64%  (p=0.023 n=157+119)
BM_MapInsertSeq<Map<int, int>>/224                    2.09µs ± 5%  2.14µs ± 5%  +2.01%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int, int>>/3584                   36.6µs ± 4%  37.3µs ± 3%  +1.76%  (p=0.000 n=157+142)
BM_MapInsertSeq<Map<int, int>>/57344                   903µs ± 5%   923µs ± 4%  +2.23%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int, int>>/917504                 21.7ms ± 8%  22.0ms ± 8%  +1.52%  (p=0.000 n=157+153)
BM_MapInsertSeq<Map<int, int>>/14680064                678ms ±18%   692ms ±18%  +2.03%  (p=0.037 n=157+157)
BM_MapInsertSeq<Map<int*, int*>>/1                    40.7ns ± 1%  40.7ns ± 1%    ~     (p=0.216 n=152+153)
BM_MapInsertSeq<Map<int*, int*>>/2                    45.5ns ± 1%  45.5ns ± 1%    ~     (p=0.403 n=155+156)
BM_MapInsertSeq<Map<int*, int*>>/3                    51.1ns ± 1%  51.1ns ± 1%    ~     (p=0.962 n=155+156)
BM_MapInsertSeq<Map<int*, int*>>/4                    57.5ns ± 4%  59.7ns ± 1%  +3.86%  (p=0.000 n=157+149)
BM_MapInsertSeq<Map<int*, int*>>/8                    80.3ns ± 4%  82.3ns ± 1%  +2.52%  (p=0.000 n=157+108)
BM_MapInsertSeq<Map<int*, int*>>/16                    127ns ± 4%   129ns ± 4%  +1.57%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int*, int*>>/32                    229ns ± 4%   230ns ± 4%  +0.49%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int*, int*>>/64                    695ns ± 3%   705ns ± 4%  +1.37%  (p=0.000 n=155+156)
BM_MapInsertSeq<Map<int*, int*>>/256                  3.64µs ± 7%  3.46µs ± 3%  -4.93%  (p=0.000 n=157+154)
BM_MapInsertSeq<Map<int*, int*>>/4096                 58.8µs ± 2%  60.8µs ± 2%  +3.46%  (p=0.000 n=157+148)
BM_MapInsertSeq<Map<int*, int*>>/65536                1.14ms ± 2%  1.17ms ± 3%  +3.23%  (p=0.000 n=156+157)
BM_MapInsertSeq<Map<int*, int*>>/1048576              41.4ms ±12%  42.3ms ±13%  +2.13%  (p=0.001 n=157+156)
BM_MapInsertSeq<Map<int*, int*>>/16777216              1.10s ±20%   1.12s ±22%    ~     (p=0.082 n=157+157)
BM_MapInsertSeq<Map<int*, int*>>/56                    377ns ± 5%   372ns ± 7%  -1.19%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int*, int*>>/224                  2.40µs ± 9%  2.23µs ± 4%  -6.97%  (p=0.000 n=157+155)
BM_MapInsertSeq<Map<int*, int*>>/3584                 39.6µs ± 3%  40.0µs ± 2%  +1.01%  (p=0.000 n=157+121)
BM_MapInsertSeq<Map<int*, int*>>/57344                 776µs ± 3%   795µs ± 3%  +2.40%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int*, int*>>/917504               31.0ms ±13%  31.4ms ±14%    ~     (p=0.055 n=157+156)
BM_MapInsertSeq<Map<int*, int*>>/14680064              776ms ±22%   783ms ±25%    ~     (p=0.231 n=157+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/1          41.2ns ± 1%  41.2ns ± 1%    ~     (p=0.516 n=155+154)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/2          46.8ns ± 1%  46.8ns ± 1%    ~     (p=0.324 n=155+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/3          52.7ns ± 2%  52.7ns ± 2%    ~     (p=0.122 n=155+154)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/4          58.8ns ± 3%  58.7ns ± 1%    ~     (p=0.142 n=156+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/8          82.7ns ± 4%  82.2ns ± 1%  -0.66%  (p=0.004 n=157+119)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/16          131ns ± 4%   130ns ± 0%  -0.85%  (p=0.001 n=157+119)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/32          231ns ± 4%   229ns ± 0%  -0.94%  (p=0.000 n=157+119)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/64          727ns ± 3%   744ns ± 3%  +2.28%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/256        3.71µs ± 3%  3.79µs ± 3%  +2.24%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/4096       64.4µs ± 2%  66.2µs ± 2%  +2.75%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/65536      1.64ms ± 4%  1.68ms ± 3%  +2.11%  (p=0.000 n=149+145)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/1048576    39.5ms ±11%  40.5ms ±13%  +2.32%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/16777216    1.14s ±16%   1.16s ±17%    ~     (p=0.082 n=157+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/56          375ns ± 4%   371ns ± 0%  -1.03%  (p=0.000 n=157+119)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/224        2.35µs ± 3%  2.40µs ± 4%  +1.84%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/3584       42.4µs ± 2%  43.2µs ± 2%  +1.76%  (p=0.000 n=157+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/57344      1.15ms ± 3%  1.16ms ± 3%  +1.50%  (p=0.000 n=148+144)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/917504     27.4ms ± 8%  27.8ms ±10%  +1.44%  (p=0.002 n=157+157)
BM_MapInsertSeq<Map<int, llvm::StringRef>>/14680064    786ms ±19%   791ms ±20%    ~     (p=0.239 n=157+157)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/1          56.3ns ± 1%  56.4ns ± 1%    ~     (p=0.363 n=151+149)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/2          69.7ns ± 0%  69.7ns ± 0%    ~     (p=0.330 n=153+150)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/3          83.2ns ± 0%  83.1ns ± 0%    ~     (p=0.129 n=149+154)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/4          96.5ns ± 0%  96.5ns ± 0%    ~     (p=0.619 n=153+153)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/8           154ns ± 0%   154ns ± 0%    ~     (p=0.769 n=147+149)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/16          292ns ± 1%   292ns ± 1%    ~     (p=0.511 n=146+146)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/32          542ns ± 4%   542ns ± 4%    ~     (p=0.729 n=157+156)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/64         1.61µs ± 6%  1.62µs ± 4%  +0.62%  (p=0.003 n=157+156)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/256        8.60µs ± 5%  8.69µs ± 5%  +1.02%  (p=0.000 n=148+151)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/4096        195µs ± 3%   195µs ± 3%    ~     (p=0.668 n=154+155)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/65536      4.45ms ± 4%  4.45ms ± 4%    ~     (p=0.698 n=148+141)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/1048576     126ms ±11%   130ms ±10%  +3.06%  (p=0.000 n=157+142)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/16777216    3.28s ±12%   3.43s ± 9%  +4.72%  (p=0.000 n=157+137)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/56          942ns ± 5%   939ns ± 5%    ~     (p=0.284 n=156+157)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/224        5.82µs ± 5%  5.86µs ± 5%  +0.63%  (p=0.008 n=153+155)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/3584        122µs ± 4%   122µs ± 3%    ~     (p=0.822 n=156+154)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/57344      2.82ms ± 3%  2.82ms ± 3%    ~     (p=0.765 n=150+144)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/917504     79.4ms ± 7%  80.0ms ± 7%  +0.83%  (p=0.008 n=149+148)
BM_MapInsertSeq<Map<llvm::StringRef, int>>/14680064    2.17s ±15%   2.25s ±14%  +3.51%  (p=0.000 n=157+146)
```
@github-actions github-actions bot requested a review from zygoloid January 3, 2025 13:06
@chandlerc chandlerc self-requested a review January 3, 2025 23:50
Copy link
Contributor

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being slow here -- I've been sick for over a week now...

Looking at this, I'm a bit worried about the regression on ARM CPUs... I think there is a real regression there.

For example, on my M1, I'm seeing:

----------------------------------------------------------------------------------------------------------
Benchmark                                              CPU   Iterations     CYCLES INSTRUCTIONS    KeyRate
----------------------------------------------------------------------------------------------------------
BM_MapInsertSeq<Map<int, int>>/4096               29878 ns        23383   95.6939k      460.51k 137.092M/s
BM_MapInsertSeq<Map<int, int>>/65536             675704 ns         1023   2.16291M     7.37218M 96.9892M/s
BM_MapInsertSeq<Map<int, int>>/1048576         11951256 ns           59    38.268M     114.816M 87.7377M/s
BM_MapInsertSeq<Map<int, int>>/16777216       442136773 ns            2   1.36822G     1.89526G 37.9458M/s

Vs. without this patch:

----------------------------------------------------------------------------------------------------------
Benchmark                                              CPU   Iterations     CYCLES INSTRUCTIONS    KeyRate
----------------------------------------------------------------------------------------------------------
BM_MapInsertSeq<Map<int, int>>/4096               29342 ns        23815   93.9735k     451.537k 139.593M/s
BM_MapInsertSeq<Map<int, int>>/65536             668646 ns         1035   2.13683M     7.19312M  98.013M/s
BM_MapInsertSeq<Map<int, int>>/1048576         12345606 ns           57   39.5332M     113.325M 84.9352M/s
BM_MapInsertSeq<Map<int, int>>/16777216       436957105 ns            2   1.35579G      1.8614G 38.3956M/s

I've zoomed in on the relevant columns. Note the higher instruction count without this patch consistently. This is particularly surprising with the <int, int> map because that one should be entirely deterministic -- there shouldn't be any noise or fluctuations here outside of things like context switches, and so the difference is very likely specific to this change.

And it doesn't seem to be that the latency or throughput is improved, as the cycle count for these benchmarks also seems to consistently regress with this change.

I've not yet had a chance to specifically look at the instruction trace of the grow routine before/after to try and spot why this doesn't work out as a win on ARM, but I think that's the next step here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants