Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf[MQB]: callback construction in a fixed buffer #481

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

678098
Copy link
Collaborator

@678098 678098 commented Oct 27, 2024

Bind with bsl::function has too much overhead. Even if there is a small buffer optimization when binding, copying or moving a binded function to bsl::function (mqbi::DispatcherEvent::d_callback field) causes allocations. It happens for every confirm going to the cluster.

/blazingmq/src/groups/mqb/mqbi/mqbi_dispatcher.h:537:31: error: static_cast from 'CallbackFunctor *' to '(anonymous namespace)::Dummy *', which are not related by inheritance, is not allowed
  537 |         BSLS_ASSERT_SAFE(0 == static_cast<CALLBACK_TYPE*>(
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
  538 |                                   reinterpret_cast<CallbackFunctor*>(0)));
      |                                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Profiler

Before (in red - everything related to Bind and conversions to d_callback):
Screenshot 2024-10-27 at 05 39 05

After:
Screenshot 2024-10-27 at 05 11 24

This PR was tested on a priority queue, and it saves ~10% of cluster thread processing time. But it should have even greater impact on Fanout queues.

Isolated benchmark

A simple benchmark for both approaches:

class ConfirmFunctor {
  private:
    size_t d_num;

  public:
    // CREATORS
    explicit ConfirmFunctor(size_t num, bslma::Allocator *allocator = 0)
    : d_num(num)
    {
        // NOTHING
    }

    ConfirmFunctor(const ConfirmFunctor&) = default;

    ConfirmFunctor(bslmf::MovableRef<ConfirmFunctor> other) BSLS_NOTHROW_SPEC
    : d_num(other.d_num)
    {
        // NOTHING
    }


    void operator()() {
        if (d_num + 10 == 111) {
            bsl::cout <<  d_num << bsl::endl;
        }
    }
};

struct DataTester {
    bsl::function<void()> d_f1;

    explicit DataTester(bslmf::MovableRef<ConfirmFunctor> f1)
    : d_f1(bslmf::MovableRefUtil::move(f1))
    {

    }

    void test() {
        d_f1();
    }
};

struct DataTester2 {
    ConfirmFunctor d_f1;

    explicit DataTester2(bslmf::MovableRef<ConfirmFunctor> f1)
    : d_f1(bslmf::MovableRefUtil::move(f1))
    {

    }

    void test() {
        d_f1();
    }
};

static void testFunctors(bslma::Allocator *allocator) {
    bsl::cout << bsl::is_nothrow_move_constructible_v<ConfirmFunctor> << bsl::endl;
    bsl::cout << sizeof(ConfirmFunctor) << bsl::endl;
    {
        bsls::Types::Int64 begin = bsls::TimeUtil::getTimer();
        for (size_t i = 0; i < 100000000; i++) {
            DataTester tester(ConfirmFunctor(i, s_allocator_p));
            tester.test();
        }
        bsls::Types::Int64 end = bsls::TimeUtil::getTimer();
        bsl::cout << "dt function: " << bmqu::PrintUtil::prettyTimeInterval(end - begin) << "\n";
    }
    {
        bsls::Types::Int64 begin = bsls::TimeUtil::getTimer();
        for (size_t i = 0; i < 100000000; i++) {
            DataTester2 tester(ConfirmFunctor(i, s_allocator_p));
            tester.test();
        }
        bsls::Types::Int64 end = bsls::TimeUtil::getTimer();
        bsl::cout << "dt in-place: " << bmqu::PrintUtil::prettyTimeInterval(end - begin) << "\n";
    }
}

Outputs:

1
8
101
dt function: 1.43 s
101
dt in-place: 33.96 ms

So functor to bsl::function conversion has 40x overhead in this example.

perf output for this sample code:
Screenshot 2024-10-27 at 05 54 59

@678098 678098 requested a review from a team as a code owner October 27, 2024 05:29
@678098 678098 changed the title [POC]Perf[MQB]: callback allocations in fixed buffer [POC]Perf[MQB]: callback construction in a fixed buffer Oct 27, 2024
@678098 678098 changed the title [POC]Perf[MQB]: callback construction in a fixed buffer Perf[MQB]: callback construction in a fixed buffer Oct 28, 2024
@678098 678098 force-pushed the 241027_callback_opt branch 3 times, most recently from 61e1b87 to db16c77 Compare October 31, 2024 03:46
@678098 678098 force-pushed the 241027_callback_opt branch from db16c77 to 7b8f8b6 Compare December 11, 2024 09:00
Copy link
Collaborator

@pniedzielski pniedzielski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good to me. I think main has moved a little since this was created, so once you're able to rebase, I can do a quick second review and approve.

@678098
Copy link
Collaborator Author

678098 commented Dec 19, 2024

@pniedzielski I need to refine this a bit and put performance measures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants