Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix quadratic INSERT INTO ... VALUES sanitisation performance #33

Merged
merged 2 commits into from
Dec 20, 2024

Conversation

unflxw
Copy link
Contributor

@unflxw unflxw commented Dec 20, 2024

Fixes appsignal/appsignal-elixir#975. I introduced this bug in #32.


Add simple benchmark tool

Since #[bench] is nightly-only, add a simple binary that can be
modified to test the performance of specific queries as needed.

Avoid Vec::remove with None token

Using Vec::remove(i) causes the n tokens between position i and
the end of the vector to be shifted, which is an O(n) operation.

When done in a loop, as it is done in our INSERT INTO ... VALUES
repeated list sanitisation, it makes the overall performance O(n^2).

To avoid using Vec::remove, implement a Token::None value that
represents a "non-token", a gap to be ignored in the token list.

Replace calls to Vec::remove(i) with replacing the token at
position i with Token::None.

Replace calls to Vec::insert for the same reason -- luckily, when
we insert a token, it's a replacement for a different token, so it's
possible to rewrite them in that manner. These are rarely done in a
loop, though, so the impact on sanitisation performance is much more
limited.

@unflxw unflxw added the bug label Dec 20, 2024
@unflxw unflxw self-assigned this Dec 20, 2024
@unflxw unflxw changed the title Fix quadratic insert into sanitisation Fix quadratic INSERT INTO ... VALUES sanitisation performance Dec 20, 2024
Since `#[bench]` is nightly-only, add a simple binary that can be
modified to test the performance of specific queries as needed.
Using `Vec::remove(i)` causes the `n` tokens between position `i` and
the end of the vector to be shifted, which is an `O(n)` operation.

When done in a loop, as it is done in our `INSERT INTO ... VALUES`
repeated list sanitisation, it makes the overall performance `O(n^2)`.

To avoid using `Vec::remove`, implement a `Token::None` value that
represents a "non-token", a gap to be ignored in the token list.

Replace calls to `Vec::remove(i)` with replacing the token at
position `i` with `Token::None`.

Replace calls to `Vec::insert` for the same reason -- luckily, when
we insert a token, it's a replacement for a different token, so it's
possible to rewrite them in that manner. These are rarely done in a
loop, though, so the impact on sanitisation performance is much more
limited.
@unflxw unflxw force-pushed the fix-quadratic-insert-into-sanitisation branch from 6ac7406 to 225b1c5 Compare December 20, 2024 11:53
@unflxw unflxw merged commit 22e60cd into main Dec 20, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow reports after upgrading to v2.13.2
2 participants