Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the concurrency of erts_fun_table insertions #8662

Closed

Conversation

lexprfuncall
Copy link
Contributor

While the erts_fun_table is protected by a read-write lock but, when inserting a fun, the runtime always acquires a writer lock allowing it to insert if the lookup fails. This has the undesirable effect of serializing the insertions even in the degenerate case where the fun is already present and the table does not need to be modified.

This change uses a reader lock initially to offer more concurrency in the case where the fun is present, which can be a common case for applications that repeatedly transmit essentially the same fun objects between nodes. If the lookup fails, the code behaves as it did before and falls back to acquiring a writer lock and doing a lookup and insert as needed.

@CLAassistant
Copy link

CLAassistant commented Jul 12, 2024

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

github-actions bot commented Jul 12, 2024

CT Test Results

    3 files    141 suites   50m 18s ⏱️
1 589 tests 1 539 ✅ 49 💤 1 ❌
2 290 runs  2 220 ✅ 69 💤 1 ❌

For more details on these failures, see this check.

Results for commit 903a647.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@lexprfuncall
Copy link
Contributor Author

We have a simple test-case that demonstrates the contention effect.

1> B=term_to_binary(fun()->ok end).
<<131,112,0,0,3,39,0,201,188,156,143,16,126,173,32,90,79,
  205,206,160,193,177,248,0,0,0,43,0,0,...>>
2> F=fun F()->binary_to_term(B),F()end.
#Fun<erl_eval.20.105768164>
3> [spawn(F)||_<-lists:seq(1,20)].
[<0.94.0>,<0.95.0>,<0.96.0>,<0.97.0>,<0.98.0>,<0.99.0>,
 <0.100.0>,<0.101.0>,<0.102.0>,<0.103.0>,<0.104.0>,<0.105.0>,
 <0.106.0>,<0.107.0>,<0.108.0>,<0.109.0>,<0.110.0>,<0.111.0>,
 <0.112.0>,<0.113.0>]
4> lcnt:start(),lcnt:clear(),timer:sleep(1000),lcnt:collect(),lcnt:conflicts(),lcnt:inspect(db_hash_slot,[{print,[id,colls,ratio,duration]}]),lcnt:stop().

Here are the baseline results

    lock  id  #tries  #collisions  collisions [%]  time [us]  duration [%]
    ----- --- ------- ------------ --------------- ---------- -------------
  fun_tab   1   61132        61132        100.0000   19591702     1956.5856
 atom_tab   1  298869           21          0.0070         56        0.0056

And here are the results with this patch applied

     lock  id  #tries  #collisions  collisions [%]  time [us]  durat
    ----- --- ------- ------------ --------------- ---------- ------
 atom_tab   1 5039124         4179          0.0829      61276

We have also verified that the fun_tab contention is no longer present in our production system.

@jhogberg
Copy link
Contributor

Thanks for the PR! Can you rebase this to maint? I think it's safe enough to include in OTP 27.1, and I'm working on a more comprehensive fix for master that removes the lock altogether on the happy path.

@jhogberg jhogberg self-assigned this Jul 12, 2024
@jhogberg jhogberg added team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI labels Jul 12, 2024
@lexprfuncall lexprfuncall changed the base branch from master to maint July 12, 2024 21:18
While the erts_fun_table is protected by a read-write lock, when
ensuring a fun, the runtime always acquires a writer lock allowing it
to insert if the lookup fails.  This has the undesirable effect of
serializing the insertions even in the degenerate case where the fun
is already present and the table does not need to be modified.

This change uses a reader lock initially to offer more concurrency in
the case where the fun is present, which can be a common case for
applications that repeatedly transmit essentially the same fun objects
between nodes.  If the lookup fails, the code behaves as it did before
and falls back to acquiring a writer lock and doing a lookup and
insert as needed.
@lexprfuncall lexprfuncall deleted the fun-tab-concurrency-fix branch July 12, 2024 21:28
@lexprfuncall
Copy link
Contributor Author

Sorry, I accidentally deleted the remote branch. I have put together #8664 with the same content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants