Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slightly improve DecodeCopy() performance. #84

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mverver-google
Copy link

Instead of increasing address every iteration of the loop, we can just keep
it fixed and double the number of bytes copied every iteration. This works
because the string to be generated is periodic.

For example, if we want to copy 9 bytes starting 2 bytes back:

   |-------| size 9
abc.........
 ^ ^
 | |
 | target_bytes_decoded: 3
 address: 1

Then the old version of the code would generate these intermediate states:

abc.........
 ^ ^
abcbc....... (1)
   ^ ^
abcbcbc..... (2)
     ^ ^
abcbcbcbc... (3)
       ^ ^
abcbcbcbcbc. (4)
         ^ ^
abcbcbcbcbcb (5, outside the loop)

While the new version would double the range to be copied every time:

abc.........
 ^ ^
abcbc....... (1)
 ^   ^
abcbcbcbc... (2)
 ^       ^
abcbcbcbcbcb (3, outside the loop)

In general, if s = size and d = (target_bytes_decoded - address), then the
number of calls to CopyBytes is reduced from (s/d) + 1 to log(s/d)/log(2) + 1.

The total time complexity is still O(s) because CopyBytes is presumably linear
in the number of bytes copied, but we end up doing fewer calls in total, which
is likely to be faster in practice, especially if s is large and d is small.

Instead of increasing `address` every iteration of the loop, we can just keep
it fixed and double the number of bytes copied every iteration. This works
because the string to be generated is periodic.

For example, if we want to copy 9 bytes starting 2 bytes back:

       |-------| size 9
    abc.........
     ^ ^
     | |
     | target_bytes_decoded: 3
     address: 1

Then the old version of the code would generate these intermediate states:

    abc.........
     ^ ^
    abcbc....... (1)
       ^ ^
    abcbcbc..... (2)
         ^ ^
    abcbcbcbc... (3)
           ^ ^
    abcbcbcbcbc. (4)
             ^ ^
    abcbcbcbcbcb (5, outside the loop)

While the new version would double the range to be copied every time:

    abc.........
     ^ ^
    abcbc....... (1)
     ^   ^
    abcbcbcbc... (2)
     ^       ^
    abcbcbcbcbcb (3, outside the loop)

In general, if s = size and d = (target_bytes_decoded - address), then the
number of calls to CopyBytes is reduced from (s/d) + 1 to log(s/d)/log(2) + 1.

The total time complexity is still O(s) because CopyBytes is presumably linear
in the number of bytes copied, but we end up doing fewer calls in total, which
is likely to be faster in practice, especially if `s` is large and `d` is small.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant