-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OverwriteEmptyByte instead of ClearByte in GrowToNextAllocSize. #4754
base: trunk
Are you sure you want to change the base?
Conversation
This is simplification that allows to remove `ClearDeleted` and intended to have similar performance. Benchmarks are close to noise. Leaning positive on X86 and leaning negative on ARM. 1) On Arm: `&= ~(uint64_t{0xff} << (byte_index * 8))` -> `|= (uint64_t{tag | 0x80} << (byte_index * 8))`. This operation is even reducing latency from 4 -> 3. But we have another shift operation to compute the tag. Theoretically it shouldn't be slower. 2) On x86: `new_metadata[new_index ^ old_size] = 0;` -> `new_metadata[new_index] = tag | MetadataGroup::PresentMask;`. Here we have the same latency, but we need an extra shift to compute the tag. Benchmarks on X86: ``` name old cpu/op new cpu/op delta BM_MapInsertSeq<Map<int, int>>/1 32.7ns ± 4% 32.8ns ± 3% ~ (p=0.354 n=52+52) BM_MapInsertSeq<Map<int, int>>/2 42.3ns ± 3% 40.4ns ± 4% -4.56% (p=0.000 n=54+55) BM_MapInsertSeq<Map<int, int>>/3 50.5ns ± 3% 51.1ns ± 3% +1.22% (p=0.000 n=53+56) BM_MapInsertSeq<Map<int, int>>/4 59.9ns ± 4% 60.8ns ± 4% +1.45% (p=0.000 n=54+57) BM_MapInsertSeq<Map<int, int>>/8 98.8ns ± 4% 98.3ns ± 3% -0.50% (p=0.044 n=56+56) BM_MapInsertSeq<Map<int, int>>/16 163ns ± 4% 158ns ± 4% -2.77% (p=0.000 n=55+51) BM_MapInsertSeq<Map<int, int>>/32 281ns ± 4% 257ns ± 3% -8.30% (p=0.000 n=55+55) BM_MapInsertSeq<Map<int, int>>/64 646ns ± 4% 624ns ± 3% -3.41% (p=0.000 n=55+55) BM_MapInsertSeq<Map<int, int>>/256 2.42µs ± 4% 2.38µs ± 3% -1.57% (p=0.000 n=53+56) BM_MapInsertSeq<Map<int, int>>/4096 38.4µs ± 3% 38.6µs ± 3% ~ (p=0.118 n=54+52) BM_MapInsertSeq<Map<int, int>>/65536 968µs ± 3% 972µs ± 3% ~ (p=0.333 n=56+57) BM_MapInsertSeq<Map<int, int>>/1048576 16.3ms ± 5% 16.3ms ± 6% ~ (p=0.269 n=56+55) BM_MapInsertSeq<Map<int, int>>/16777216 646ms ± 3% 646ms ± 4% ~ (p=0.811 n=57+55) BM_MapInsertSeq<Map<int, int>>/56 436ns ± 3% 423ns ± 3% -2.89% (p=0.000 n=56+57) BM_MapInsertSeq<Map<int, int>>/224 1.66µs ± 4% 1.63µs ± 4% -2.36% (p=0.000 n=55+57) BM_MapInsertSeq<Map<int, int>>/3584 25.1µs ± 4% 25.4µs ± 5% +1.03% (p=0.002 n=55+57) BM_MapInsertSeq<Map<int, int>>/57344 559µs ± 3% 567µs ± 4% +1.31% (p=0.000 n=56+57) BM_MapInsertSeq<Map<int, int>>/917504 10.4ms ± 4% 10.4ms ± 4% ~ (p=0.417 n=56+55) BM_MapInsertSeq<Map<int, int>>/14680064 420ms ± 3% 420ms ± 3% ~ (p=0.740 n=57+55) BM_MapInsertSeq<Map<int*, int*>>/1 33.8ns ± 3% 33.9ns ± 3% ~ (p=0.191 n=55+56) BM_MapInsertSeq<Map<int*, int*>>/2 37.0ns ± 3% 37.0ns ± 4% ~ (p=0.948 n=55+56) BM_MapInsertSeq<Map<int*, int*>>/3 41.5ns ± 4% 41.6ns ± 4% ~ (p=0.503 n=56+56) BM_MapInsertSeq<Map<int*, int*>>/4 46.1ns ± 4% 47.2ns ± 4% +2.31% (p=0.000 n=56+56) BM_MapInsertSeq<Map<int*, int*>>/8 63.6ns ± 4% 65.4ns ± 3% +2.89% (p=0.000 n=57+56) BM_MapInsertSeq<Map<int*, int*>>/16 133ns ± 4% 128ns ± 4% -3.74% (p=0.000 n=57+56) BM_MapInsertSeq<Map<int*, int*>>/32 237ns ± 3% 236ns ± 4% ~ (p=0.082 n=57+57) BM_MapInsertSeq<Map<int*, int*>>/64 597ns ± 3% 631ns ± 3% +5.73% (p=0.000 n=57+57) BM_MapInsertSeq<Map<int*, int*>>/256 2.78µs ± 3% 2.89µs ± 4% +3.82% (p=0.000 n=57+56) BM_MapInsertSeq<Map<int*, int*>>/4096 51.9µs ± 3% 54.2µs ± 4% +4.46% (p=0.000 n=57+57) BM_MapInsertSeq<Map<int*, int*>>/65536 1.16ms ± 3% 1.18ms ± 4% +1.10% (p=0.000 n=57+57) BM_MapInsertSeq<Map<int*, int*>>/1048576 28.9ms ± 4% 29.0ms ± 4% ~ (p=0.549 n=57+54) BM_MapInsertSeq<Map<int*, int*>>/16777216 914ms ± 3% 912ms ± 3% ~ (p=0.147 n=57+54) BM_MapInsertSeq<Map<int*, int*>>/56 366ns ± 3% 407ns ± 3% +11.27% (p=0.000 n=57+56) BM_MapInsertSeq<Map<int*, int*>>/224 1.86µs ± 4% 1.96µs ± 3% +5.18% (p=0.000 n=57+57) BM_MapInsertSeq<Map<int*, int*>>/3584 33.9µs ± 4% 35.9µs ± 3% +6.04% (p=0.000 n=57+52) BM_MapInsertSeq<Map<int*, int*>>/57344 763µs ± 3% 772µs ± 4% +1.25% (p=0.000 n=57+57) BM_MapInsertSeq<Map<int*, int*>>/917504 16.8ms ±11% 16.6ms ± 6% ~ (p=0.454 n=57+52) BM_MapInsertSeq<Map<int*, int*>>/14680064 610ms ± 2% 608ms ± 3% ~ (p=0.127 n=57+55) BM_MapInsertSeq<Map<int, llvm::StringRef>>/1 34.6ns ± 4% 34.5ns ± 3% ~ (p=0.331 n=56+52) BM_MapInsertSeq<Map<int, llvm::StringRef>>/2 43.9ns ± 4% 37.8ns ± 3% -13.78% (p=0.000 n=57+55) BM_MapInsertSeq<Map<int, llvm::StringRef>>/3 47.9ns ± 4% 48.9ns ± 3% +2.16% (p=0.000 n=56+55) BM_MapInsertSeq<Map<int, llvm::StringRef>>/4 53.6ns ± 3% 54.0ns ± 3% +0.72% (p=0.012 n=56+54) BM_MapInsertSeq<Map<int, llvm::StringRef>>/8 84.6ns ± 4% 81.0ns ± 3% -4.21% (p=0.000 n=57+50) BM_MapInsertSeq<Map<int, llvm::StringRef>>/16 142ns ± 3% 136ns ± 3% -4.22% (p=0.000 n=57+55) BM_MapInsertSeq<Map<int, llvm::StringRef>>/32 258ns ± 4% 245ns ± 3% -4.96% (p=0.000 n=57+55) BM_MapInsertSeq<Map<int, llvm::StringRef>>/64 699ns ± 4% 692ns ± 3% -1.02% (p=0.000 n=50+50) BM_MapInsertSeq<Map<int, llvm::StringRef>>/256 2.96µs ± 6% 2.97µs ± 4% ~ (p=0.098 n=57+54) BM_MapInsertSeq<Map<int, llvm::StringRef>>/4096 50.4µs ± 3% 50.7µs ± 2% +0.53% (p=0.030 n=54+53) BM_MapInsertSeq<Map<int, llvm::StringRef>>/65536 1.38ms ± 2% 1.42ms ± 3% +2.76% (p=0.000 n=54+56) BM_MapInsertSeq<Map<int, llvm::StringRef>>/1048576 37.8ms ± 4% 38.1ms ± 4% +0.78% (p=0.016 n=57+54) BM_MapInsertSeq<Map<int, llvm::StringRef>>/16777216 1.19s ± 2% 1.20s ± 3% ~ (p=0.055 n=57+54) BM_MapInsertSeq<Map<int, llvm::StringRef>>/56 430ns ± 3% 413ns ± 3% -4.10% (p=0.000 n=54+57) BM_MapInsertSeq<Map<int, llvm::StringRef>>/224 1.94µs ± 8% 1.93µs ± 6% ~ (p=0.587 n=57+56) BM_MapInsertSeq<Map<int, llvm::StringRef>>/3584 32.1µs ± 4% 32.2µs ± 3% ~ (p=0.403 n=56+57) BM_MapInsertSeq<Map<int, llvm::StringRef>>/57344 746µs ± 3% 760µs ± 3% +1.93% (p=0.000 n=57+56) BM_MapInsertSeq<Map<int, llvm::StringRef>>/917504 22.0ms ± 6% 22.2ms ± 5% +0.90% (p=0.025 n=57+53) BM_MapInsertSeq<Map<int, llvm::StringRef>>/14680064 734ms ± 2% 736ms ± 3% ~ (p=0.107 n=57+55) BM_MapInsertSeq<Map<llvm::StringRef, int>>/1 40.8ns ± 4% 40.8ns ± 4% ~ (p=0.685 n=54+54) BM_MapInsertSeq<Map<llvm::StringRef, int>>/2 49.5ns ± 4% 49.6ns ± 7% ~ (p=0.927 n=56+57) BM_MapInsertSeq<Map<llvm::StringRef, int>>/3 58.1ns ± 4% 58.2ns ± 4% ~ (p=0.632 n=55+55) BM_MapInsertSeq<Map<llvm::StringRef, int>>/4 66.7ns ± 4% 67.0ns ± 5% ~ (p=0.235 n=55+56) BM_MapInsertSeq<Map<llvm::StringRef, int>>/8 104ns ± 6% 105ns ± 6% ~ (p=0.171 n=55+57) BM_MapInsertSeq<Map<llvm::StringRef, int>>/16 196ns ± 5% 197ns ± 6% ~ (p=0.189 n=54+57) BM_MapInsertSeq<Map<llvm::StringRef, int>>/32 361ns ± 8% 364ns ± 7% ~ (p=0.156 n=53+55) BM_MapInsertSeq<Map<llvm::StringRef, int>>/64 1.05µs ± 6% 1.04µs ± 6% ~ (p=0.069 n=56+55) BM_MapInsertSeq<Map<llvm::StringRef, int>>/256 5.32µs ± 5% 5.24µs ± 4% -1.46% (p=0.000 n=57+54) BM_MapInsertSeq<Map<llvm::StringRef, int>>/4096 148µs ± 4% 147µs ± 4% -0.65% (p=0.028 n=55+57) BM_MapInsertSeq<Map<llvm::StringRef, int>>/65536 3.17ms ± 3% 3.13ms ± 2% -1.49% (p=0.000 n=54+57) BM_MapInsertSeq<Map<llvm::StringRef, int>>/1048576 99.1ms ± 3% 98.3ms ± 3% -0.89% (p=0.000 n=57+55) BM_MapInsertSeq<Map<llvm::StringRef, int>>/16777216 2.40s ± 3% 2.40s ± 2% ~ (p=0.247 n=56+54) BM_MapInsertSeq<Map<llvm::StringRef, int>>/56 612ns ± 6% 612ns ± 8% ~ (p=0.968 n=52+56) BM_MapInsertSeq<Map<llvm::StringRef, int>>/224 3.56µs ± 7% 3.51µs ± 4% -1.40% (p=0.006 n=57+55) BM_MapInsertSeq<Map<llvm::StringRef, int>>/3584 87.3µs ± 5% 88.5µs ± 6% +1.30% (p=0.011 n=57+57) BM_MapInsertSeq<Map<llvm::StringRef, int>>/57344 1.95ms ± 3% 1.93ms ± 4% -0.77% (p=0.000 n=55+57) BM_MapInsertSeq<Map<llvm::StringRef, int>>/917504 61.2ms ± 4% 61.0ms ± 3% ~ (p=0.059 n=57+54) BM_MapInsertSeq<Map<llvm::StringRef, int>>/14680064 1.50s ± 3% 1.50s ± 3% ~ (p=0.609 n=57+55) ``` Benchmarks on ARM: ``` name old cpu/op new cpu/op delta BM_MapInsertSeq<Map<int, int>>/1 39.6ns ± 1% 39.6ns ± 1% ~ (p=0.416 n=155+156) BM_MapInsertSeq<Map<int, int>>/2 44.6ns ± 1% 44.6ns ± 1% ~ (p=0.993 n=155+157) BM_MapInsertSeq<Map<int, int>>/3 50.1ns ± 3% 50.1ns ± 2% ~ (p=0.585 n=156+157) BM_MapInsertSeq<Map<int, int>>/4 55.7ns ± 3% 55.4ns ± 1% -0.61% (p=0.000 n=156+119) BM_MapInsertSeq<Map<int, int>>/8 78.0ns ± 5% 77.2ns ± 1% -1.01% (p=0.000 n=157+119) BM_MapInsertSeq<Map<int, int>>/16 123ns ± 5% 121ns ± 0% -1.32% (p=0.000 n=157+119) BM_MapInsertSeq<Map<int, int>>/32 216ns ± 6% 213ns ± 0% -1.51% (p=0.000 n=157+119) BM_MapInsertSeq<Map<int, int>>/64 644ns ± 4% 659ns ± 3% +2.19% (p=0.000 n=157+146) BM_MapInsertSeq<Map<int, int>>/256 3.22µs ± 4% 3.30µs ± 5% +2.56% (p=0.000 n=157+157) BM_MapInsertSeq<Map<int, int>>/4096 53.8µs ± 3% 55.5µs ± 2% +3.18% (p=0.000 n=157+144) BM_MapInsertSeq<Map<int, int>>/65536 1.29ms ± 3% 1.33ms ± 4% +2.97% (p=0.000 n=155+155) BM_MapInsertSeq<Map<int, int>>/1048576 30.2ms ± 8% 31.0ms ± 9% +2.58% (p=0.000 n=157+156) BM_MapInsertSeq<Map<int, int>>/16777216 902ms ±14% 924ms ±17% +2.47% (p=0.005 n=157+157) BM_MapInsertSeq<Map<int, int>>/56 350ns ± 6% 345ns ± 0% -1.64% (p=0.023 n=157+119) BM_MapInsertSeq<Map<int, int>>/224 2.09µs ± 5% 2.14µs ± 5% +2.01% (p=0.000 n=157+157) BM_MapInsertSeq<Map<int, int>>/3584 36.6µs ± 4% 37.3µs ± 3% +1.76% (p=0.000 n=157+142) BM_MapInsertSeq<Map<int, int>>/57344 903µs ± 5% 923µs ± 4% +2.23% (p=0.000 n=157+157) BM_MapInsertSeq<Map<int, int>>/917504 21.7ms ± 8% 22.0ms ± 8% +1.52% (p=0.000 n=157+153) BM_MapInsertSeq<Map<int, int>>/14680064 678ms ±18% 692ms ±18% +2.03% (p=0.037 n=157+157) BM_MapInsertSeq<Map<int*, int*>>/1 40.7ns ± 1% 40.7ns ± 1% ~ (p=0.216 n=152+153) BM_MapInsertSeq<Map<int*, int*>>/2 45.5ns ± 1% 45.5ns ± 1% ~ (p=0.403 n=155+156) BM_MapInsertSeq<Map<int*, int*>>/3 51.1ns ± 1% 51.1ns ± 1% ~ (p=0.962 n=155+156) BM_MapInsertSeq<Map<int*, int*>>/4 57.5ns ± 4% 59.7ns ± 1% +3.86% (p=0.000 n=157+149) BM_MapInsertSeq<Map<int*, int*>>/8 80.3ns ± 4% 82.3ns ± 1% +2.52% (p=0.000 n=157+108) BM_MapInsertSeq<Map<int*, int*>>/16 127ns ± 4% 129ns ± 4% +1.57% (p=0.000 n=157+157) BM_MapInsertSeq<Map<int*, int*>>/32 229ns ± 4% 230ns ± 4% +0.49% (p=0.000 n=157+157) BM_MapInsertSeq<Map<int*, int*>>/64 695ns ± 3% 705ns ± 4% +1.37% (p=0.000 n=155+156) BM_MapInsertSeq<Map<int*, int*>>/256 3.64µs ± 7% 3.46µs ± 3% -4.93% (p=0.000 n=157+154) BM_MapInsertSeq<Map<int*, int*>>/4096 58.8µs ± 2% 60.8µs ± 2% +3.46% (p=0.000 n=157+148) BM_MapInsertSeq<Map<int*, int*>>/65536 1.14ms ± 2% 1.17ms ± 3% +3.23% (p=0.000 n=156+157) BM_MapInsertSeq<Map<int*, int*>>/1048576 41.4ms ±12% 42.3ms ±13% +2.13% (p=0.001 n=157+156) BM_MapInsertSeq<Map<int*, int*>>/16777216 1.10s ±20% 1.12s ±22% ~ (p=0.082 n=157+157) BM_MapInsertSeq<Map<int*, int*>>/56 377ns ± 5% 372ns ± 7% -1.19% (p=0.000 n=157+157) BM_MapInsertSeq<Map<int*, int*>>/224 2.40µs ± 9% 2.23µs ± 4% -6.97% (p=0.000 n=157+155) BM_MapInsertSeq<Map<int*, int*>>/3584 39.6µs ± 3% 40.0µs ± 2% +1.01% (p=0.000 n=157+121) BM_MapInsertSeq<Map<int*, int*>>/57344 776µs ± 3% 795µs ± 3% +2.40% (p=0.000 n=157+157) BM_MapInsertSeq<Map<int*, int*>>/917504 31.0ms ±13% 31.4ms ±14% ~ (p=0.055 n=157+156) BM_MapInsertSeq<Map<int*, int*>>/14680064 776ms ±22% 783ms ±25% ~ (p=0.231 n=157+157) BM_MapInsertSeq<Map<int, llvm::StringRef>>/1 41.2ns ± 1% 41.2ns ± 1% ~ (p=0.516 n=155+154) BM_MapInsertSeq<Map<int, llvm::StringRef>>/2 46.8ns ± 1% 46.8ns ± 1% ~ (p=0.324 n=155+157) BM_MapInsertSeq<Map<int, llvm::StringRef>>/3 52.7ns ± 2% 52.7ns ± 2% ~ (p=0.122 n=155+154) BM_MapInsertSeq<Map<int, llvm::StringRef>>/4 58.8ns ± 3% 58.7ns ± 1% ~ (p=0.142 n=156+157) BM_MapInsertSeq<Map<int, llvm::StringRef>>/8 82.7ns ± 4% 82.2ns ± 1% -0.66% (p=0.004 n=157+119) BM_MapInsertSeq<Map<int, llvm::StringRef>>/16 131ns ± 4% 130ns ± 0% -0.85% (p=0.001 n=157+119) BM_MapInsertSeq<Map<int, llvm::StringRef>>/32 231ns ± 4% 229ns ± 0% -0.94% (p=0.000 n=157+119) BM_MapInsertSeq<Map<int, llvm::StringRef>>/64 727ns ± 3% 744ns ± 3% +2.28% (p=0.000 n=157+157) BM_MapInsertSeq<Map<int, llvm::StringRef>>/256 3.71µs ± 3% 3.79µs ± 3% +2.24% (p=0.000 n=157+157) BM_MapInsertSeq<Map<int, llvm::StringRef>>/4096 64.4µs ± 2% 66.2µs ± 2% +2.75% (p=0.000 n=157+157) BM_MapInsertSeq<Map<int, llvm::StringRef>>/65536 1.64ms ± 4% 1.68ms ± 3% +2.11% (p=0.000 n=149+145) BM_MapInsertSeq<Map<int, llvm::StringRef>>/1048576 39.5ms ±11% 40.5ms ±13% +2.32% (p=0.000 n=157+157) BM_MapInsertSeq<Map<int, llvm::StringRef>>/16777216 1.14s ±16% 1.16s ±17% ~ (p=0.082 n=157+157) BM_MapInsertSeq<Map<int, llvm::StringRef>>/56 375ns ± 4% 371ns ± 0% -1.03% (p=0.000 n=157+119) BM_MapInsertSeq<Map<int, llvm::StringRef>>/224 2.35µs ± 3% 2.40µs ± 4% +1.84% (p=0.000 n=157+157) BM_MapInsertSeq<Map<int, llvm::StringRef>>/3584 42.4µs ± 2% 43.2µs ± 2% +1.76% (p=0.000 n=157+157) BM_MapInsertSeq<Map<int, llvm::StringRef>>/57344 1.15ms ± 3% 1.16ms ± 3% +1.50% (p=0.000 n=148+144) BM_MapInsertSeq<Map<int, llvm::StringRef>>/917504 27.4ms ± 8% 27.8ms ±10% +1.44% (p=0.002 n=157+157) BM_MapInsertSeq<Map<int, llvm::StringRef>>/14680064 786ms ±19% 791ms ±20% ~ (p=0.239 n=157+157) BM_MapInsertSeq<Map<llvm::StringRef, int>>/1 56.3ns ± 1% 56.4ns ± 1% ~ (p=0.363 n=151+149) BM_MapInsertSeq<Map<llvm::StringRef, int>>/2 69.7ns ± 0% 69.7ns ± 0% ~ (p=0.330 n=153+150) BM_MapInsertSeq<Map<llvm::StringRef, int>>/3 83.2ns ± 0% 83.1ns ± 0% ~ (p=0.129 n=149+154) BM_MapInsertSeq<Map<llvm::StringRef, int>>/4 96.5ns ± 0% 96.5ns ± 0% ~ (p=0.619 n=153+153) BM_MapInsertSeq<Map<llvm::StringRef, int>>/8 154ns ± 0% 154ns ± 0% ~ (p=0.769 n=147+149) BM_MapInsertSeq<Map<llvm::StringRef, int>>/16 292ns ± 1% 292ns ± 1% ~ (p=0.511 n=146+146) BM_MapInsertSeq<Map<llvm::StringRef, int>>/32 542ns ± 4% 542ns ± 4% ~ (p=0.729 n=157+156) BM_MapInsertSeq<Map<llvm::StringRef, int>>/64 1.61µs ± 6% 1.62µs ± 4% +0.62% (p=0.003 n=157+156) BM_MapInsertSeq<Map<llvm::StringRef, int>>/256 8.60µs ± 5% 8.69µs ± 5% +1.02% (p=0.000 n=148+151) BM_MapInsertSeq<Map<llvm::StringRef, int>>/4096 195µs ± 3% 195µs ± 3% ~ (p=0.668 n=154+155) BM_MapInsertSeq<Map<llvm::StringRef, int>>/65536 4.45ms ± 4% 4.45ms ± 4% ~ (p=0.698 n=148+141) BM_MapInsertSeq<Map<llvm::StringRef, int>>/1048576 126ms ±11% 130ms ±10% +3.06% (p=0.000 n=157+142) BM_MapInsertSeq<Map<llvm::StringRef, int>>/16777216 3.28s ±12% 3.43s ± 9% +4.72% (p=0.000 n=157+137) BM_MapInsertSeq<Map<llvm::StringRef, int>>/56 942ns ± 5% 939ns ± 5% ~ (p=0.284 n=156+157) BM_MapInsertSeq<Map<llvm::StringRef, int>>/224 5.82µs ± 5% 5.86µs ± 5% +0.63% (p=0.008 n=153+155) BM_MapInsertSeq<Map<llvm::StringRef, int>>/3584 122µs ± 4% 122µs ± 3% ~ (p=0.822 n=156+154) BM_MapInsertSeq<Map<llvm::StringRef, int>>/57344 2.82ms ± 3% 2.82ms ± 3% ~ (p=0.765 n=150+144) BM_MapInsertSeq<Map<llvm::StringRef, int>>/917504 79.4ms ± 7% 80.0ms ± 7% +0.83% (p=0.008 n=149+148) BM_MapInsertSeq<Map<llvm::StringRef, int>>/14680064 2.17s ±15% 2.25s ±14% +3.51% (p=0.000 n=157+146) ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being slow here -- I've been sick for over a week now...
Looking at this, I'm a bit worried about the regression on ARM CPUs... I think there is a real regression there.
For example, on my M1, I'm seeing:
----------------------------------------------------------------------------------------------------------
Benchmark CPU Iterations CYCLES INSTRUCTIONS KeyRate
----------------------------------------------------------------------------------------------------------
BM_MapInsertSeq<Map<int, int>>/4096 29878 ns 23383 95.6939k 460.51k 137.092M/s
BM_MapInsertSeq<Map<int, int>>/65536 675704 ns 1023 2.16291M 7.37218M 96.9892M/s
BM_MapInsertSeq<Map<int, int>>/1048576 11951256 ns 59 38.268M 114.816M 87.7377M/s
BM_MapInsertSeq<Map<int, int>>/16777216 442136773 ns 2 1.36822G 1.89526G 37.9458M/s
Vs. without this patch:
----------------------------------------------------------------------------------------------------------
Benchmark CPU Iterations CYCLES INSTRUCTIONS KeyRate
----------------------------------------------------------------------------------------------------------
BM_MapInsertSeq<Map<int, int>>/4096 29342 ns 23815 93.9735k 451.537k 139.593M/s
BM_MapInsertSeq<Map<int, int>>/65536 668646 ns 1035 2.13683M 7.19312M 98.013M/s
BM_MapInsertSeq<Map<int, int>>/1048576 12345606 ns 57 39.5332M 113.325M 84.9352M/s
BM_MapInsertSeq<Map<int, int>>/16777216 436957105 ns 2 1.35579G 1.8614G 38.3956M/s
I've zoomed in on the relevant columns. Note the higher instruction count without this patch consistently. This is particularly surprising with the <int, int>
map because that one should be entirely deterministic -- there shouldn't be any noise or fluctuations here outside of things like context switches, and so the difference is very likely specific to this change.
And it doesn't seem to be that the latency or throughput is improved, as the cycle count for these benchmarks also seems to consistently regress with this change.
I've not yet had a chance to specifically look at the instruction trace of the grow routine before/after to try and spot why this doesn't work out as a win on ARM, but I think that's the next step here.
This is simplification that allows to remove
ClearDeleted
and intendedto have similar performance.
Benchmarks are close to noise. Leaning positive on X86 and leaning
negative on ARM.
On Arm:
&= ~(uint64_t{0xff} << (byte_index * 8))
->|= (uint64_t{tag | 0x80} << (byte_index * 8))
.This operation is even reducing latency from 4 -> 3.
But we have another shift operation to compute the tag. Theoretically it shouldn't be slower.
On x86:
new_metadata[new_index ^ old_size] = 0;
->new_metadata[new_index] = tag | MetadataGroup::PresentMask;
.Here we have the same latency, but we need an extra shift to compute the tag.
Benchmarks on X86:
Benchmarks on ARM: