-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarking with C++ Bindings #531
Comments
@CyberPoN-3 Awesome, this is a benchmark missing in iceoryx2. Would you be open to contribute the benchmark? The perfect place would be
This will never be possible on that level. We explicitly separated the payload delivery (pubsub or request response) from the wakeup mechanism in iceoryx2. The reason for separation are
You implemented it exactly like it was intended to be used. Yesterday we also added a more complex example describing this setup, take a look at: https://github.com/eclipse-iceoryx/iceoryx2/tree/main/examples/cxx/event_based_communication
The benchmark we did was to measure a ping-pong setup.
For your benchmark you would do the following:
You repeat this cycle 1.000.000 times an measure the runtime T. Then you divide the measured runtime T by 1.000.000 times 2. Times 2 because you had here a two way communication, once from P1 to P2 and then back from P2 to P1. The reason why we did this is, that the underlying system time call
Btw. in the long-term we want to introduce something like a meta port which contains all ports and does the event notifications for you so that you can enjoy a much simpler API. But this port would have the disadvantages I described above. When ultra-low latency is not a requirement this becomes a nice alternative. |
@CyberPoN-3 btw. What does the red dotted line in your graphics mean? |
Hi, sorry for the late reply!
That's for sure, give me the time to get it written as good as I can, then I'll be pleased to share with you! Also feel free to edit the benchmark as you prefer, since I'm testing a use-case I need for my application, infact that 100us red dashed line was a reference I selected based on my previous benchmarks.
I see, that's a great way to conduct the test and I agree with the strategy you adopted! In my case I would be interested also on benchmarking the readiness of the system when there is a delay between a ping-pong and the next one, in order to simulate what happens if, for example, I publish some data at a certain frequency, like 1Hz, 10Hz, 100Hz etc... That's one of the main focuses of the benchmark I'm doing. Infact, from the actual results I got, it looks like that the lower the frequency between publishes the higher the response time to wake up the subscriber process in front of an event. Before talking about numbers (actually ~50us difference from 100Hz and 1Hz), since I'm pretty new to Iceoryx, I would like to ensure myself to have done my best to get the benchmark written the fastest way possible.
That's really interesting, I'm working on a solution to benchmark the way I need, trying to exclude that huge overhead. Thanks for sharing! Any advice would be appreciated! |
What you observe here is a delay in the operating system. The lower the frequency is, the longer the receiving side sleeps, and the OS puts a process into a deeper sleep the longer it is inactive. Meaning, processes that are getting activated with a high frequency are getting more often rescheduled and are maybe in a higher priority scheduling queue and inactive process are rescheduled with a lower frequency and are moved to a low priority scheduling queue. Or maybe when there is a priority queue in place the priority of that process decreases over time - depending on the underlying scheduler of the OS. To have a more responsive system, you could on linux compile your own kernel and configure it with the parameters: This should increase the frequency with which the scheduler checks for process activities, at the slight cost of an increased CPU load. |
Thank you for the advices @elfenpiff, I'll try them out! |
I'm working on a NVidia Jetson Xavier AGX, actually I've finished right now checking both and it seems they are set by default as you suggested. In order to determine the timer interrupt frequency I followed that guide that gives a small test to unveil the setting (since I don't have a /boot/config- file to read on the NVidia Jetson...) and its result is:
Actually my setup sees publisher, subscriber and RouDi running on maximum real time priority level (99) on a round robin scheduling scheme (I used chrt for that and obv checked via htop that the priority was set correctly). Other than that I locked CPU frequency at maximum frequency for the Xavier (~2.2GHz, 8 cores) with Performance scaling governor applied and MAXN nvpmodel. The board was completely left unload during the benchmarks and also a run varying roudi/subscriber/publisher priority in [99RT, 100CFS, 119CFS] trying all possible combinations of them has been done and the pattern showing the increasing response time with longer publish periods still shows on. Is there anything else I can do to dig more on that? |
@CyberPoN-3 I found this answer interesting: https://stackoverflow.com/a/13619750 It stated that round-robin is a suboptimal scheduling algorithm, especially when it comes to processes waiting for IO - exactly what you are doing here. Waiting for an event notification that is sent via a UNIX datagram socket. Could you try other schedulers? Maybe deadline or CTF will provide better results - I am not an experts on schedulers so those are just wild guesses. I think I would explore schedulers and how they work to optimize the reaction time. Maybe you are able to decrease the latency of waking up another process but it could come at a high price, higher CPU load and then the actual computation time increases which would then increase the latency of the overall system. Waking up a process/thread that is in deep sleep always takes a bit longer, since the process and memory has to be reloaded. So the simplest way to avoid a deep sleep is a busy loop - but this will cost massive CPU time. |
@elfenpiff
What about CTF? Actually I never heard about that and google didn't save me that time xD |
My mistake, I mixed something up. I meant the CFS (completely fair scheduler) |
I'm trying to write a benchmark to evaluate performance improvement over the first iceoryx (version 2.0.5) using C++ bindings.
The benchmark consists in sampling repeatedly the time spent between the publish and the corresponding event notification on the consumer, running in two different processes (not two different threads of the same process). The iceoryx1 test uses WaitSet. After having studied the new examples of iceoryx2, I've found that the communication pattern that I would like to test is the 'event' one, so without having to explicitly poll for the new data.
I took cxx/event_multiplexing example as reference code, but I noticed that apparently I cannot send data through that mechanism, so the strategy I implemented is to first publish the data as in cxx/publish_subscribe example and then using the event communication pattern to wake up the consumer process that will poll only one time in order to get the published data (as explained in the cxx/publish_subscribe example).
The operations executed in the time lapse recorded are:
Publisher side:
Subscriber side:
The results I got are comparable without substantial differences w.r.t the first iceoryx, as the following graphs show:
So I would like to as you if there's a smarter way to conduct the benchmark keeping the thing on two different processes and eventually if sending data through events is possible. If that's possible, an example would be very useful.
Thanks in advance,
Matteo
The text was updated successfully, but these errors were encountered: