Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Experiment: Improve concurrent merge performance by weakly owning bra…
…nch updates (#8268) * Add "WeakOwner", a KV-based weak ownership mechanism Weak ownership is a best-effort lock that occasionally fails. This one can fail when a goroutine is delayed for a long interval. This is fine if the calling code relies on ownership for performance but not for correctness. E.g. merges and commits. * Obtain weak ownership of branch on all `BranchUpdate` operations This includes merges. Only one concurrent `BranchUpdate` operation can succeed, so unless many long-lived such operations can fail there is little point in running multiple concurrent updates. * Better default parameters for branch ownership * Add lakefs abuse merge command Performs multiple small merges in parallel. * Make branch weak ownership configurable This shows results, even on local! When I run lakefs (by default weak ownership is OFF) I get 6.6% errors with concurrency 50. Rate is <50/s. Also the long tail is _extremely_ long. When I switch weak ownership ON, using the default parameters, I get **0** errors with concurrency 50. Rate is about the same, except that the tail (when load drops) is _short_. See the difference [here][merge-abuse-speed-chart]: it's faster _and_ returns 0 errors. The distribution of actual successful merge times is somewhat slower - possibly because of the time to lock, possibly because of the fact that errors in the really slow cases cause those slow cases to be dropped. Finally, note that because we do not queue, some merges take a *long* time under sustained load. We could improve weak ownership to hold an actual queue of work. This would make merges _fair_: merges will occur roughly in order of request arrival. ==== Weak ownership OFF ==== ``sh ❯ go run ./cmd/lakectl abuse merge --amount 1000 --parallelism 50 lakefs://abuse/main Source branch: lakefs://abuse/main merge - completed: 34, errors: 0, current rate: 33.81 done/second merge - completed: 80, errors: 0, current rate: 45.98 done/second merge - completed: 128, errors: 0, current rate: 48.02 done/second merge - completed: 177, errors: 0, current rate: 49.03 done/second merge - completed: 222, errors: 0, current rate: 44.97 done/second merge - completed: 265, errors: 3, current rate: 43.03 done/second merge - completed: 308, errors: 9, current rate: 42.97 done/second merge - completed: 357, errors: 15, current rate: 49.01 done/second merge - completed: 406, errors: 21, current rate: 49.03 done/second merge - completed: 451, errors: 22, current rate: 44.97 done/second merge - completed: 499, errors: 29, current rate: 48.01 done/second merge - completed: 545, errors: 30, current rate: 46.01 done/second merge - completed: 585, errors: 31, current rate: 39.97 done/second merge - completed: 632, errors: 33, current rate: 47.04 done/second merge - completed: 679, errors: 37, current rate: 47.00 done/second merge - completed: 728, errors: 46, current rate: 48.96 done/second merge - completed: 768, errors: 49, current rate: 40.04 done/second merge - completed: 808, errors: 53, current rate: 39.98 done/second merge - completed: 854, errors: 57, current rate: 45.99 done/second merge - completed: 891, errors: 58, current rate: 37.00 done/second merge - completed: 935, errors: 64, current rate: 44.00 done/second merge - completed: 972, errors: 66, current rate: 36.98 done/second merge - completed: 990, errors: 66, current rate: 18.00 done/second merge - completed: 995, errors: 66, current rate: 5.00 done/second merge - completed: 996, errors: 66, current rate: 1.00 done/second merge - completed: 998, errors: 66, current rate: 2.00 done/second merge - completed: 999, errors: 66, current rate: 1.00 done/second merge - completed: 999, errors: 66, current rate: 0.00 done/second merge - completed: 999, errors: 66, current rate: 0.00 done/second completed: 1000, errors: 66, current rate: 5.27 done/second Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 0 75 601 100 671 250 672 350 672 500 696 750 740 1000 765 5000 896 min 54 max 12022 total 934 ``` ==== Weak ownership ON ==== ```sh ❯ go run ./cmd/lakectl abuse merge --amount 1000 --parallelism 50 lakefs://abuse/main Source branch: lakefs://abuse/main merge - completed: 36, errors: 0, current rate: 35.23 done/second merge - completed: 86, errors: 0, current rate: 49.98 done/second merge - completed: 136, errors: 0, current rate: 50.03 done/second merge - completed: 185, errors: 0, current rate: 48.99 done/second merge - completed: 236, errors: 0, current rate: 51.02 done/second merge - completed: 286, errors: 0, current rate: 49.99 done/second merge - completed: 337, errors: 0, current rate: 50.97 done/second merge - completed: 390, errors: 0, current rate: 53.03 done/second merge - completed: 438, errors: 0, current rate: 48.01 done/second merge - completed: 487, errors: 0, current rate: 49.00 done/second merge - completed: 534, errors: 0, current rate: 46.98 done/second merge - completed: 581, errors: 0, current rate: 46.99 done/second merge - completed: 632, errors: 0, current rate: 51.00 done/second merge - completed: 680, errors: 0, current rate: 48.04 done/second merge - completed: 725, errors: 0, current rate: 44.98 done/second merge - completed: 771, errors: 0, current rate: 45.99 done/second merge - completed: 815, errors: 0, current rate: 44.02 done/second merge - completed: 861, errors: 0, current rate: 46.01 done/second merge - completed: 905, errors: 0, current rate: 43.98 done/second merge - completed: 947, errors: 0, current rate: 42.00 done/second merge - completed: 977, errors: 0, current rate: 30.01 done/second merge - completed: 997, errors: 0, current rate: 19.99 done/second completed: 1000, errors: 0, current rate: 4.92 done/second Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 0 75 457 100 464 250 468 350 468 500 642 750 647 1000 729 5000 952 min 54 max 13744 total 1000 ``` * Straighten out interval handling and fix checks-validator - Add some jitter when acquiring ownership on a branch - Refresh _for_ refresh interval, twice _every_ refresh interval - nolint unjustified warnings * [CR] Bug: Ensure single owner succeeds the first time a key is owned Use SetIf with a nil predicate. * Remove log print from test - confusing in a codebase * [CR] Fix comments, error phrasing, and command descriptions * [CR] Clarify request ID handling when missing, rename own -> release * [CR] Remove finished sentinel and break ownership update loop on error * [CR] Run Esti test with branch ownership This does not depend on KV implementation, so just add a matrix to one of the AWS/S3 flavours. * Add log line to indicate ref manager started with weak ownership Otherwise no way to tell Esti _really_ ran it with ownership. * [CR] Only reset if owned when cancelling weak ownership If ownership has already been lost to another thread, do NOT delete ownership when released. - KV does not provide a DeleteIf operation. Instead, use SetIf with an always-expired timestamp. - Along the way, ensure "owner" string is truly unique by stringing a nanoid onto it. Currently owner is the request ID, which should be unique - but adding randomness ensures it will always be unique regardless of the calling system. * Add totals line to lakectl abuse statistics Otherwise it doesn't even say how long it took - which is the most interesting part for `abuse merge`. * lakectl abuse merge: clean up branches before exiting Only report errors. Obviously if not all branches deleted then we left a mess, which is too bad. But the performance test itself succeeded, which is the (more) important thing. * [CR] Correctly count KV ops in comments, and some minor cleanups * Rename basic "ownership" class and move it to pkg/distributed/ 1. It's not a KV util, move it out into distributed. 1. "Weak" is overloaded, so instead call it "mostly correct" ownership. It precisely describes what it is so it must be a good term of art for what we do here. 1. Use the term "branch approximate ownership" at a higher-level in the ref manager. This context doesn't particularly mind the specific properties of ownership, and "approximate" is a good fit there.
- Loading branch information