Perf[MQB]: callback construction in a fixed buffer #481

678098 · 2024-10-27T05:29:15Z

Bind with bsl::function has too much overhead. Even if there is a small buffer optimization when binding, copying or moving a binded function to bsl::function (mqbi::DispatcherEvent::d_callback field) causes allocations. It happens for every confirm going to the cluster.

/blazingmq/src/groups/mqb/mqbi/mqbi_dispatcher.h:537:31: error: static_cast from 'CallbackFunctor *' to '(anonymous namespace)::Dummy *', which are not related by inheritance, is not allowed
  537 |         BSLS_ASSERT_SAFE(0 == static_cast<CALLBACK_TYPE*>(
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
  538 |                                   reinterpret_cast<CallbackFunctor*>(0)));
      |                                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Profiler

Before (in red - everything related to Bind and conversions to d_callback):

After:

This PR was tested on a priority queue, and it saves ~10% of cluster thread processing time. But it should have even greater impact on Fanout queues.

Isolated benchmark

A simple benchmark for both approaches:

class ConfirmFunctor {
  private:
    size_t d_num;

  public:
    // CREATORS
    explicit ConfirmFunctor(size_t num, bslma::Allocator *allocator = 0)
    : d_num(num)
    {
        // NOTHING
    }

    ConfirmFunctor(const ConfirmFunctor&) = default;

    ConfirmFunctor(bslmf::MovableRef<ConfirmFunctor> other) BSLS_NOTHROW_SPEC
    : d_num(other.d_num)
    {
        // NOTHING
    }


    void operator()() {
        if (d_num + 10 == 111) {
            bsl::cout <<  d_num << bsl::endl;
        }
    }
};

struct DataTester {
    bsl::function<void()> d_f1;

    explicit DataTester(bslmf::MovableRef<ConfirmFunctor> f1)
    : d_f1(bslmf::MovableRefUtil::move(f1))
    {

    }

    void test() {
        d_f1();
    }
};

struct DataTester2 {
    ConfirmFunctor d_f1;

    explicit DataTester2(bslmf::MovableRef<ConfirmFunctor> f1)
    : d_f1(bslmf::MovableRefUtil::move(f1))
    {

    }

    void test() {
        d_f1();
    }
};

static void testFunctors(bslma::Allocator *allocator) {
    bsl::cout << bsl::is_nothrow_move_constructible_v<ConfirmFunctor> << bsl::endl;
    bsl::cout << sizeof(ConfirmFunctor) << bsl::endl;
    {
        bsls::Types::Int64 begin = bsls::TimeUtil::getTimer();
        for (size_t i = 0; i < 100000000; i++) {
            DataTester tester(ConfirmFunctor(i, s_allocator_p));
            tester.test();
        }
        bsls::Types::Int64 end = bsls::TimeUtil::getTimer();
        bsl::cout << "dt function: " << bmqu::PrintUtil::prettyTimeInterval(end - begin) << "\n";
    }
    {
        bsls::Types::Int64 begin = bsls::TimeUtil::getTimer();
        for (size_t i = 0; i < 100000000; i++) {
            DataTester2 tester(ConfirmFunctor(i, s_allocator_p));
            tester.test();
        }
        bsls::Types::Int64 end = bsls::TimeUtil::getTimer();
        bsl::cout << "dt in-place: " << bmqu::PrintUtil::prettyTimeInterval(end - begin) << "\n";
    }
}

Outputs:

1
8
101
dt function: 1.43 s
101
dt in-place: 33.96 ms

So functor to bsl::function conversion has 40x overhead in this example.

perf output for this sample code:

Signed-off-by: Evgeny Malygin <[email protected]>

pniedzielski

This change looks good to me. I think main has moved a little since this was created, so once you're able to rebase, I can do a quick second review and approve.

678098 · 2024-12-19T01:10:43Z

@pniedzielski I need to refine this a bit and put performance measures

678098 requested a review from a team as a code owner October 27, 2024 05:29

678098 changed the title ~~[POC]Perf[MQB]: callback allocations in fixed buffer~~ [POC]Perf[MQB]: callback construction in a fixed buffer Oct 27, 2024

678098 mentioned this pull request Oct 27, 2024

Perf[MQB]: do not build temporary functors for every routed message #477

Merged

678098 changed the title ~~[POC]Perf[MQB]: callback construction in a fixed buffer~~ Perf[MQB]: callback construction in a fixed buffer Oct 28, 2024

678098 force-pushed the 241027_callback_opt branch 3 times, most recently from 61e1b87 to db16c77 Compare October 31, 2024 03:46

Perf[MQB]: use in-place callbacks in Dispatcher

7b8f8b6

Signed-off-by: Evgeny Malygin <[email protected]>

678098 force-pushed the 241027_callback_opt branch from db16c77 to 7b8f8b6 Compare December 11, 2024 09:00

pniedzielski reviewed Dec 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf[MQB]: callback construction in a fixed buffer #481

Perf[MQB]: callback construction in a fixed buffer #481

678098 commented Oct 27, 2024 •

edited

Loading

pniedzielski left a comment

678098 commented Dec 19, 2024

Perf[MQB]: callback construction in a fixed buffer #481

Are you sure you want to change the base?

Perf[MQB]: callback construction in a fixed buffer #481

Conversation

678098 commented Oct 27, 2024 • edited Loading

Profiler

Isolated benchmark

pniedzielski left a comment

Choose a reason for hiding this comment

678098 commented Dec 19, 2024

678098 commented Oct 27, 2024 •

edited

Loading