Slightly improve DecodeCopy() performance. #84

mverver-google · 2019-06-20T12:16:44Z

Instead of increasing address every iteration of the loop, we can just keep
it fixed and double the number of bytes copied every iteration. This works
because the string to be generated is periodic.

For example, if we want to copy 9 bytes starting 2 bytes back:

   |-------| size 9
abc.........
 ^ ^
 | |
 | target_bytes_decoded: 3
 address: 1

Then the old version of the code would generate these intermediate states:

abc.........
 ^ ^
abcbc....... (1)
   ^ ^
abcbcbc..... (2)
     ^ ^
abcbcbcbc... (3)
       ^ ^
abcbcbcbcbc. (4)
         ^ ^
abcbcbcbcbcb (5, outside the loop)

While the new version would double the range to be copied every time:

abc.........
 ^ ^
abcbc....... (1)
 ^   ^
abcbcbcbc... (2)
 ^       ^
abcbcbcbcbcb (3, outside the loop)

In general, if s = size and d = (target_bytes_decoded - address), then the
number of calls to CopyBytes is reduced from (s/d) + 1 to log(s/d)/log(2) + 1.

The total time complexity is still O(s) because CopyBytes is presumably linear
in the number of bytes copied, but we end up doing fewer calls in total, which
is likely to be faster in practice, especially if s is large and d is small.

Instead of increasing `address` every iteration of the loop, we can just keep it fixed and double the number of bytes copied every iteration. This works because the string to be generated is periodic. For example, if we want to copy 9 bytes starting 2 bytes back: |-------| size 9 abc......... ^ ^ | | | target_bytes_decoded: 3 address: 1 Then the old version of the code would generate these intermediate states: abc......... ^ ^ abcbc....... (1) ^ ^ abcbcbc..... (2) ^ ^ abcbcbcbc... (3) ^ ^ abcbcbcbcbc. (4) ^ ^ abcbcbcbcbcb (5, outside the loop) While the new version would double the range to be copied every time: abc......... ^ ^ abcbc....... (1) ^ ^ abcbcbcbc... (2) ^ ^ abcbcbcbcbcb (3, outside the loop) In general, if s = size and d = (target_bytes_decoded - address), then the number of calls to CopyBytes is reduced from (s/d) + 1 to log(s/d)/log(2) + 1. The total time complexity is still O(s) because CopyBytes is presumably linear in the number of bytes copied, but we end up doing fewer calls in total, which is likely to be faster in practice, especially if `s` is large and `d` is small.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slightly improve DecodeCopy() performance. #84

Slightly improve DecodeCopy() performance. #84

mverver-google commented Jun 20, 2019

Slightly improve DecodeCopy() performance. #84

Are you sure you want to change the base?

Slightly improve DecodeCopy() performance. #84

Conversation

mverver-google commented Jun 20, 2019