-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broker variant: immutable "data views" with monotonic buffer ressources #318
Comments
Quick recap of where we are. In #368, I had a a bare-bone integration of the new
Aside from switching to A bit of an orthogonal yet related PR is #377. It has been a goal for a while now that the wire protocol and format is "self-hosted". Broker has its own wire protocol now, but still uses CAF to generate the binary representation for objects on the wire. The PR adds a formatting mini-library to Broker that we can re-use for On the Zeek side, zeek/zeek#3354 cleans up various places where Zeek used All of these PRs are refactoring that may change APIs but don't change behavior (or performance). So these are the "low risk" parts of this whole operation. With these puzzle pieces in place (assuming we get them merged soon), I can start making an integration branch that adds Regarding Zeek's The other part where I didn't integrate |
* issue/gh-318: Remove obsolete code CI: Use a python venv in test.sh to avoid pip errors on newer platforms CI: Pin Windows openssl to 3.1.1 CI: Fix building of alpine image with regards to python packages Implement new variant type and builder API
All done. |
I've been thinking about performance recently. That's why I've been revisiting the radix-tree implementation for speeding up filter lookups.
But there's also room for improvement how we represent messages in memory. We've discussed memory-mappable layouts in the past. With a memory-mappable representation, we would basically read a message from the network and then simply create a sort of wrapper that decodes the bytes on demand. The downside is that creating a memory-mappable format is more complicated and requires dedicated builder APIs. While "deserializing" a value is very trivial, accessing fields in a memory-mapped data structure can come with some overhead since data must be decoded on the fly. We also would have to change our network format.
Instead of going down this road, I think there's also another option that doesn't require us to change the network format. With a monotonic buffer resource and a custom allocator, we can flatten nested data structures like
broker::data
in memory, reduce the number of heap allocations and skip any destructors (by "winking out" the entire data structure). This is the same technique that makes RapidJSON fast.To quantify what kind of speedup we could get, I've implemented a small micro benchmark that uses regular
broker::data
and a newshallow_data
implementation (not fully functional, just the types I've needed for the benchmark). I've pickedshallow_data
, because the original idea was that the data would also hold references into the bytes where we've deserialized from to avoid any unnecessary copying overhead. For the benchmark, it made little difference because we only have small strings.I've picked something small to start with, so I've used a variable called
event_1
with this content:(1, 1, (event_1, (42, test)))
. The benchmark currently only looks at how long it takes to deserialize the data:It's a small data structure, so the runtime is fast either way. However, even for this very small data structure, we have a 3x speedup. Real-world messages will be larger and when doing thousands of these per second, the performance gain adds up quickly.
I would leave
broker::data
untouched and use the new "flattened" representation for the message types. Of course there'll be faster ways to do things in the new API. We might leavebroker::data
in for convenience or eventually fade it out. In the transition phase, I think we can make the API either backwards compatible by converting to "regular"broker::data
where needed and otherwise keep the migration overhead minimal. We wouldn't touch the network format nor the JSON representation. We can also make this transparent to the Python bindings, if we don't remove them before that.The text was updated successfully, but these errors were encountered: