Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf[MQB]: make independent item pools for channels #479

Merged
merged 1 commit into from
Oct 28, 2024

Conversation

678098
Copy link
Collaborator

@678098 678098 commented Oct 26, 2024

Summary

Previously, we used one shared concurrent item pool to populate items going to all mqbnet::Channels. However, on a high throughput the mechanism to ensure concurrency in this pool doesn't keep up and slows down all threads accessing this pool instead. The higher the frequency, the more performance degradation. With our throughput, the negative effect is moderate.

Also, the average frequency doesn't even matter, because this effect can slow down the broker during short spikes of messages.

Also, this PR moves item pool initialization from top-level mqbnet::TransportManager directly to mqbnet::Channel

Profiler

Before:
Screenshot 2024-10-26 at 03 32 11

After:
Screenshot 2024-10-26 at 04 32 48

Isolated benchmark

What this effect could be, with a simple bench program.
By binding the same pool or separate ones (comment/uncomment in the snippet) we can measure the total time to perform the same number of concurrent allocations and deallocations with the given number of threads.

static void threadFunc(size_t iters, bdlma::ConcurrentPool *pool, bslmt::Semaphore *ready) {
    for (size_t i = 0; i < iters; i++) {    
        void *ptr = pool->allocate();

        // 0.5 us per one inner loop exec
        unsigned long long j = (unsigned long long)ptr;
        for (size_t m = 0; m < 1000; m++) {
            if (m * m - j == 0) {
                bsl::cout << " ";
            }
        }
        pool->deallocate(ptr);
    }
    ready->post();
}

static void testPerf() {
    const size_t k_NUM_THREADS = 10;

    bdlmt::ThreadPool threadPool(
            bslmt::ThreadAttributes(),        // default
            k_NUM_THREADS,                    // minThreads
            k_NUM_THREADS,                    // maxThreads
            bsl::numeric_limits<int>::max(),  // maxIdleTime
            s_allocator_p);
 
    bdlma::ConcurrentPool pool(345, bsls::BlockGrowth::BSLS_CONSTANT, s_allocator_p);
    bsl::vector<bdlma::ConcurrentPool*> pools;
    for (size_t i = 0; i < k_NUM_THREADS; i++) {
        pools.push_back(new bdlma::ConcurrentPool(345, bsls::BlockGrowth::BSLS_CONSTANT, s_allocator_p));
    }

    threadPool.start();

    bslmt::Semaphore ready;
    bsls::Types::Int64 begin = bsls::TimeUtil::getTimer();
    for (size_t i = 0; i < k_NUM_THREADS; i++) {
        threadPool.enqueueJob(bdlf::BindUtil::bindS(
                s_allocator_p,
                &threadFunc,
                1000000,
                // &pool,
                pools[i],
                &ready));
    }

    for (size_t i = 0; i < k_NUM_THREADS; i++) {
        ready.wait();
    }
    bsls::Types::Int64 end = bsls::TimeUtil::getTimer();

    bsl::cout << "dt: " << bmqu::PrintUtil::prettyTimeInterval(end - begin) << "\n";

    threadPool.drain();

    for (size_t i = 0; i < k_NUM_THREADS; i++) {
        delete pools[i];
    }
}

The most important factor here is frequency of concurrent pool calls. The more frequently we do this, the more chances to hit concurrency mechanism. To emulate this, there is an inner "work" loop in the code. If this inner loop executes for 100 microseconds or more, the difference between one shared pool and different independent ones is not visible. However, when we go down to 1 microsecond and lower, the difference becomes huge.
With 0.5 microseconds "work", the execution times are:

  • One shared pool: 3.11 s
  • Independent pools: 0.54 s

Counting allocator stats table

Before:

    TransportManager             |       1,078,656|        |           2,196,816|  98,941,262|       9|    98,940,453|       9
      *direct*                   |       1,076,240|        |           2,193,776|  42,391,201|       3|    42,390,433|       3
      Channel-node5              |             416|        |              20,656|  11,310,006|       1|    11,309,999|       1
      Channel-node2              |             416|        |              31,552|  11,310,018|       2|    11,310,011|       2
      Channel-node4              |             416|        |              25,712|  11,310,012|       1|    11,310,005|       1
      Channel-node1              |             416|        |              25,424|  11,310,009|       1|    11,310,002|       1
      Channel-node3              |             416|        |              21,840|  11,310,010|       1|    11,310,003|       1
      Channel-node0              |             336|        |                 336|           6|        |             0|        

The new counting allocator stats table looks like this:

    TransportManager             |       4,488,528|    -976|           5,566,480| 242,776,559|  97,161|   242,773,282|  97,165
      *direct*                   |           1,392|        |               1,392|          12|        |             2|        
      Interface46531             |       3,499,904|    -752|           4,564,272| 111,313,737|  41,211|   111,310,648|  41,213
      cl6_dc3                    |         986,592|    -224|           1,119,248| 131,462,805|  55,950|   131,462,632|  55,952
        *direct*                 |          28,208|        |              28,208|          59|        |             0|        
        node3                    |         273,264|     -64|             296,944|  26,292,559|  11,190|    26,292,531|  11,191
          *direct*               |          99,152|        |              99,152|           7|        |             0|        
          ItemPool               |         173,472|        |             173,472|          13|        |             0|        
          Channel                |             640|     -64|              36,400|  26,292,539|  11,190|    26,292,531|  11,191
        node1                    |         246,416|    -160|             279,440|  26,292,559|  11,190|    26,292,534|  11,191
          *direct*               |          99,152|        |              99,152|           7|        |             0|        
          ItemPool               |         146,784|        |             146,784|          11|        |             0|        
          Channel                |             480|    -160|              33,504|  26,292,541|  11,190|    26,292,534|  11,191
        node2                    |         113,072|        |             148,512|  26,292,541|  11,190|    26,292,525|  11,190
          *direct*               |          99,152|        |              99,152|           7|        |             0|        
          ItemPool               |          13,344|        |              13,344|           1|        |             0|        
          Channel                |             576|        |              36,016|  26,292,533|  11,190|    26,292,525|  11,190
        node5                    |         113,072|        |             151,136|  26,292,536|  11,190|    26,292,520|  11,190
          *direct*               |          99,152|        |              99,152|           7|        |             0|        
          ItemPool               |          13,344|        |              13,344|           1|        |             0|        
          Channel                |             576|        |              38,640|  26,292,528|  11,190|    26,292,520|  11,190
        node4                    |         113,072|        |             149,776|  26,292,538|  11,190|    26,292,522|  11,190
          *direct*               |          99,152|        |              99,152|           7|        |             0|        
          ItemPool               |          13,344|        |              13,344|           1|        |             0|        
          Channel                |             576|        |              37,280|  26,292,530|  11,190|    26,292,522|  11,190
        node0                    |          99,488|        |              99,488|          13|        |             0|        
          *direct*               |          99,152|        |              99,152|           7|        |             0|        
          Channel                |             336|        |                 336|           6|        |             0|        
      ConnectionStates           |             640|        |                 640|           5|        |             0|        

@678098 678098 requested a review from a team as a code owner October 26, 2024 04:39
Copy link

@bmq-oss-ci bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build 334 of commit 5dbe10b has completed with FAILURE

@678098 678098 force-pushed the 241026_decouple_ItemPools branch from e59f194 to d42822e Compare October 28, 2024 14:16
@678098 678098 requested a review from dorjesinpo October 28, 2024 14:53
Copy link

@bmq-oss-ci bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build 337 of commit d42822e has completed with FAILURE

@678098 678098 force-pushed the 241026_decouple_ItemPools branch from d42822e to 1dfbaf2 Compare October 28, 2024 16:01
const bsl::string& name,
bslma::Allocator* allocator)
: d_allocators(allocator)
, d_allocator_p(d_allocators.get(bsl::string("Channel-") + name))
, d_allocator_p(d_allocators.get("Channel"))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dorjesinpo, actually, there are 2 allocators used (Channel and ItemPool), and I will add more in the next PR.

@678098 678098 merged commit 7606a27 into bloomberg:main Oct 28, 2024
35 checks passed
@678098 678098 deleted the 241026_decouple_ItemPools branch October 28, 2024 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants