-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
khepri_tx_adv: Prefer list prepending to build side effect lists #291
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #291 +/- ##
==========================================
- Coverage 89.55% 89.23% -0.33%
==========================================
Files 21 21
Lines 3130 3380 +250
==========================================
+ Hits 2803 3016 +213
- Misses 327 364 +37
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great find!
I only have cosmetic comments.
ea0ea26
to
23df91d
Compare
Forgot to clean up and makes sure things were wrapped properly, whoops! I've also added a test case to assert that we handle the effects in the proper order. |
23df91d
to
f565c8d
Compare
In a transaction we build the set of `ra_machine:effect()`s for the machine to handle by using `++/2` with the old and new effects for each `khepri_tx`/`khepri_tx_adv` command. This is a noticeable bottleneck though when there are many (thousands or more) commands used in a transaction. `++/2` works by prepending every element in the left-hand side list (reversed) to the right-hand side list. As a transaction calls more `khepri_tx`/ `khepri_tx_adv` commands, the left-hand side list (old side effects) gets much longer than the right-hand side list (new side effects) and time it takes to append increases. We can avoid `++/2` by building the side effect list gradually and prepend every effect. The ordering of effects within a command isn't consequential: the effects update triggers and projections and each update is disjoint from the others. The ordering of the full side effect list is important though - a delete's effects should be handled before a put for the same path pattern for example - so we need to reverse the side effect list for the whole transaction before handling the effects.
The parent commit changes the way we build side effect lists within a transaction to use prepends instead of '++'/2. This case asserts that we still process the effects in order when it comes to projections. If we process effects out of order projections might behave strangely. A simple example is that a deletion of a path in a transaction followed by a put with new data should result in a projection on that path pattern reflecting the new data. If we handled side effects out of order then the projection table would be missing the record for that path instead.
f565c8d
to
b750381
Compare
This is an optimization for functions in the transaction APIs. Specifically this should be very noticeable when any command from the
khepri_tx
orkhepri_tx_adv
API (delete, put, etc.) is used with a pattern that affects very many nodes (10,000 or more). The gist is that we can get surprisingly nice speed ups for some functions by building our side effect lists with[Effect | Effects]
rather than++/2
.For background: I have WIP changes to simplify the exchange deletion part of vhost deletion in RabbitMQ by using a transaction (commit). It deletes all exchanges and bindings in a single command without adding/removing runtime parameters and returns the combined
rabbit_bindings:deletions()
dict. It also changes the way we delete bindings in a transaction to usekhepri_tx_adv:delete_many/1
(which returns a map of the deletions) rather thankhepri_tx:get_many/1
pluskhepri_tx:delete/1
for each binding. Against a stress test of deleting a vhost with 50,000 exchanges with no bindings or queues, I expected the deletion to be almost instant butrabbitmqctl delete_vhost myvhost
took around 500s.A flamegraph points to the BIF behind
erlang:'++'/2
(and allocation functions associated with list appending) dominating the time taken to execute akhepri_tx_adv:delete_many/1
call in this block:khepri/src/khepri_tx_adv.erl
Lines 997 to 1006 in be71afe
Flamegraph...
Zoomed in on this function in particular:
perf.data.zip
That is most expensive when deleting bindings which is odd since this test vhost only has these 50,000 exchanges - no queues or bindings or anything else.
List ++ []
is not optimized in Erlang/OTP (see erlang/otp#8743):erlang:'++'/2
clones the left-hand-side unless it is[]
which spends time and also introduces a lot of garbage. The reason that the reproduction case is slow is that we basically callList ++ []
twice for the non-existent bindings of each exchange we delete which is O(n^2) behavior (n
being the number of exchanges) since cloning the list takes O(n) time and memory. List prepending is O(1) but even better we don't need to pay any cost when we don't end up prepending a new element.Regardless of whether
List ++ []
is optimized in Erlang/OTP we should switch to prepending: a large transaction that issues many individual commands will also cause many appends and the right-hand side of that append might be non-empty.Reproduction steps in RabbitMQ...
With the RabbitMQ repository checked out at branch
md/simplify-vhost-exchange-deletion
or the linked commit above:make run-broker
.sbin/rabbitmqctl add_vhost myvhost
.sbin/rabbitmqctl enable_feature_flag --experimental khepri_db
.[rabbit_exchange:declare({resource,<<"myvhost">>,exchange,erlang:list_to_binary(lists:flatten(io_lib:format("e~b", [N])))},topic,true,false,false,[],<<"rmq-internal">>) || N <- lists:seq(1,50_000)]
in the the shell or an eval command.time sbin/rabbitmqctl delete_vhost myvhost --timeout 10000
.When the server is running with this branch the deletion takes around 12s. Before this change the deletion takes around 500s.