-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance problem and Error handling #116
Comments
For Part 2 of your error message, this is a bug, and it seems I have miscalculated the required size of the retrieve channel - possibly the usual off by one error ;) . I will fix this! For Part 1. I think the bottleneck is maybe the line |
@hippiehunter Btw, the settings can also be set via:
let service_name = ServiceName::new("My/Funk/ServiceName")?;
let service = zero_copy::Service::new(&service_name)
.publish_subscribe()
.history_size(12)
.subscriber_max_buffer_size(5)
.enable_safe_overflow(true)
.open_or_create::<TransmissionData>()?;
service
.publisher()
.unable_to_deliver_strategy(UnableToDeliverStrategy::DiscardSample); |
…verflow [#116] Fix retrieve buffer overflow
@hippiehunter Part 2 should be fixed now. If you confirm that the removal of every occurrence of |
When i remove the sleep(CYCLE_TIME); call I cant get the publisher connected to the subscriber.
|
@hippiehunter I used your code can confirm the bug. It is a little weird race, both check successfully that the underlying resource is not existing, then both want to create this resource and only one can win this race - the other one is then complaining that the resource already exists and this is the error message you see. The solution is simple, try to open it again and everything shall work. We encountered exactly the same thing on a different place already. I will try to fix this today. |
@hippiehunter I fixed the connection issue, but there is still homework left:
I will look into those issues and fix them as soon as possible. But in the meantime you should be able to continue when you base your work on: #124 I adjusted your code a bit to get a clearer understanding, and when I removed the use core::time::Duration;
use iceoryx2::prelude::*;
use iceoryx2_bb_container::byte_string::FixedSizeByteString;
use iceoryx2_bb_container::vec::FixedSizeVec;
use iceoryx2_bb_posix::signal::SignalHandler;
use std::env;
use std::thread::sleep;
use std::time::Instant;
const CYCLE_TIME: Duration = Duration::from_nanos(1);
#[derive(Debug)]
pub struct BigMessage {
pub process_id: u32,
pub req_id: u32,
pub op_type: i32,
pub file_id: FixedSizeByteString<128>,
pub txn_id: u64,
pub key_data: FixedSizeVec<u8, 64>,
pub record_data: FixedSizeVec<u8, 64000>,
pub original_data: FixedSizeVec<u8, 64000>,
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let service_name = ServiceName::new("My/Funk/ServiceName")?;
let service = zero_copy::Service::new(&service_name)
.publish_subscribe()
.open_or_create::<BigMessage>()?;
let args: Vec<String> = env::args().collect();
if args.len() < 2 {
println!("Usage: {} <publisher|subscriber>", args[0]);
return Ok(());
}
println!("Running as {}", args[1]);
if args[1] == "publisher" {
let publisher = service.publisher().create()?;
let mut counter = 0;
loop {
if SignalHandler::termination_requested() {
break;
}
counter += 1;
let sample = publisher.loan_uninit()?;
let sample = sample.write_payload(BigMessage {
process_id: 1,
req_id: counter,
op_type: 3,
file_id: FixedSizeByteString::from_bytes(b"test").map_err(|e| "error")?,
txn_id: 4,
key_data: FixedSizeVec::new(),
record_data: FixedSizeVec::new(),
original_data: FixedSizeVec::new(),
});
if sample.send()? == 0 {
//it didnt work but also didnt return a real error, sleep and try again
//sleep(CYCLE_TIME);
} else {
// break;
}
}
} else {
let subscriber = service.subscriber().create()?;
let counter = 0;
let start = Instant::now();
{
loop {
match subscriber.receive()? {
None => {
//sleep(CYCLE_TIME);
if SignalHandler::termination_requested() {
break;
}
}
Some(sample) => {
if sample.payload().req_id % 10000 == 0 {
println!(
"received: {}, time: {:?}",
sample.payload().req_id,
start.elapsed()
);
}
}
}
}
}
}
Ok(())
} And the output:
I also run a |
…n, remove retrieve buffer check from publisher - the publisher has not enough information to perform this check
…n, remove retrieve buffer check from publisher - the publisher has not enough information to perform this check
…n, remove retrieve buffer check from publisher - the publisher has not enough information to perform this check
…n, remove retrieve buffer check from publisher - the publisher has not enough information to perform this check
…e-or-open [#116] Fix race in shm create or open
@hippiehunter Everything on main should be fixed now. Could you please confirm. |
I just updated to the current checked in on main and It crashes the consumer with this
|
@hippiehunter Could you please try it also on linux? The error message on Windows that the file already exists is expected and in the next step iceoryx tries to open it when it already exists. But we have to get rid of the error message, otherwise it's a bit confusing. And I think Windows seems to have a totally new problem here - misaligned pointers :/ ... Theoretically, this can only happen when the shared memory is not aligned - but it should be always page size aligned. I have to dig into this. And thanks for your patience and crash reports! |
I tried running it a few times on linux and it seems to work fine |
I tried building the linux side with --release and i sometimes see this error on the publisher. Thanks for looking!
|
@hippiehunter I wrote a lot of tests, some careful reviews and discovered and fixed all bugs you unraveled - hopefully. There are still some issues left I will tackle in the next days:
let service = zero_copy::Service::new(&service_name)
.publish_subscribe()
.open_or_create::<BigMessage>()?;
// do stuff
drop(service);
But your code should now be completely stable. So if you have time to confirm this, I would close the issue and ping you on the other issues when the two remaining problems are also fixed. |
I think my stuff is all stable now, I tried my actual, slightly more complicated use case and it seems reliable now. Thanks for all your efforts in this! |
Windows 11/MSVC Rustc 1.70
and Debian 12 Rustc 1.76
iceoryx 0.2.2 and also iox2-100-pub-sub-without-lifetime-arg
Observed result or behavior:
I've adapted the examples into a single program that i run with either subscriber or publisher as a command line argument. The main difference from the examples is that sending and receiving the messages at a much higher rate.
I've also been fiddling with the config file mostly these four settings
Part 1 of my issue is that I just cant seem to move the messages around very fast, its less than 1k messages per second on my 7900x. Ive tried debug and release builds and haven't noticed any significant difference.
Part 2 of my issue is that I cant seem to make it reliable. If I just give it a big number for publisher_history_size and subscriber_max_buffer_size it seems to be mostly reliable but occasionally I will still see publisher.send(...) return 0 and print something like this to the console. Rather than return 0 I would have thought it would return an error.
Any help, pointers or suggestions you can provide would be much appreciated.
The text was updated successfully, but these errors were encountered: