Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance problem and Error handling #116

Closed
hippiehunter opened this issue Feb 17, 2024 · 13 comments
Closed

Performance problem and Error handling #116

hippiehunter opened this issue Feb 17, 2024 · 13 comments
Assignees
Labels
bug Something isn't working

Comments

@hippiehunter
Copy link

Windows 11/MSVC Rustc 1.70
and Debian 12 Rustc 1.76

iceoryx 0.2.2 and also iox2-100-pub-sub-without-lifetime-arg

Observed result or behavior:
I've adapted the examples into a single program that i run with either subscriber or publisher as a command line argument. The main difference from the examples is that sending and receiving the messages at a much higher rate.

use core::time::Duration;
use std::env;
use std::thread::sleep;
use iceoryx2::prelude::*;
use iceoryx2_bb_container::byte_string::FixedSizeByteString;
use iceoryx2_bb_container::vec::FixedSizeVec;
use iceoryx2_bb_posix::signal::SignalHandler;

const CYCLE_TIME: Duration = Duration::from_nanos(1);

#[derive(Debug)]
pub struct BigMessage {
    pub process_id: u32,
    pub req_id: u32,
    pub op_type: i32,
    pub file_id: FixedSizeByteString<128>,
    pub txn_id: u64,
    pub key_data: FixedSizeVec<u8, 64>,
    pub record_data: FixedSizeVec<u8, 64000>,
    pub original_data: FixedSizeVec<u8, 64000>,
}

fn main() -> Result<(), Box<dyn std::error::Error>> {


    let service_name = ServiceName::new("My/Funk/ServiceName")?;

    let service = zero_copy::Service::new(&service_name)
        .publish_subscribe()
        .open_or_create::<BigMessage>()?;

    let args: Vec<String> = env::args().collect();

    if args.len() < 2 {
        println!("Usage: {} <publisher|subscriber>", args[0]);
        return Ok(());
    }

    println!("Running as {}", args[1]);

    if args[1] == "publisher" {
        let publisher = service.publisher().create()?;
        let mut counter = 0;
        loop {
            if SignalHandler::termination_requested() {
                break;
            }
            counter += 1;
            loop {
                let sample = publisher.loan_uninit()?;
                let sample = sample.write_payload(BigMessage {
                    process_id: 1,
                    req_id: counter,
                    op_type: 3,
                    file_id: FixedSizeByteString::from_bytes(b"test").map_err(|e| { "error" })?,
                    txn_id: 4,
                    key_data: FixedSizeVec::new(),
                    record_data: FixedSizeVec::new(),
                    original_data: FixedSizeVec::new(),
                });
                if publisher.send(sample)? == 0 {
                    //it didnt work but also didnt return a real error, sleep and try again
                    sleep(CYCLE_TIME);
                } else {
                    break;
                }
            }
            //println!("sent");
        }
    } else {
        let subscriber = service.subscriber().create()?;
        let counter = 0;{
        loop {
            match subscriber.receive()? {
                None => {
                    sleep(CYCLE_TIME);
                    if SignalHandler::termination_requested() {
                        break;
                    }
                },
                Some(sample) => {
                    if sample.payload().req_id % 1000 == 0 {
                        println!("received");
                    }
                }
            }
        }

        }
    }
    Ok(())
}

I've also been fiddling with the config file mostly these four settings

publisher_history_size
subscriber_max_buffer_size 
enable_safe_overflow
unable_to_deliver_strategy

Part 1 of my issue is that I just cant seem to move the messages around very fast, its less than 1k messages per second on my 7900x. Ive tried debug and release builds and haven't noticed any significant difference.

Part 2 of my issue is that I cant seem to make it reliable. If I just give it a big number for publisher_history_size and subscriber_max_buffer_size it seems to be mostly reliable but occasionally I will still see publisher.send(...) return 0 and print something like this to the console. Rather than return 0 I would have thought it would return an error.

Publisher { port_id: UniquePublisherId(UniqueSystemId { value: 1738265885562966903244646632352716 }), sample_reference_counter: [2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 2,
              1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
              0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
              0, 0, 0, 0, 0], data_segment: Memory { shared_memory: SharedMemory { name: FileName { value: FixedSizeByteString<255> { len: 60, data: "iox2_21940_1738265885562966903244646632352716.publish
              er_data" } }, size: 19383105, base_address: 0x1f8ac2d0000, has_ownership: true, file_descriptor: FileDescriptor { value: 0, is_owned: true }, memory_lock: None }, name: FileName { value: Fi
              xedSizeByteString<255> { len: 40, data: "21940_1738265885562966903244646632352716" } }, allocator: 0x1f8ac2d0000 }, config: LocalPublisherConfig { max_loaned_samples: 2, unable_to_deliver_s
              trategy: Block }, subscriber_connections: SubscriberConnections { connections: [UnsafeCell { .. }, UnsafeCell { .. }], port_id: UniquePublisherId(UniqueSystemId { value: 1738265885562966903
              244646632352716 }), config: Config { global: Global { root_path_unix: "/tmp/iceoryx2/", root_path_windows: "c:\\Temp\\iceoryx2\\", prefix: "iox2_", service: Service { directory: "services",
               publisher_data_segment_suffix: ".publisher_data", static_config_storage_suffix: ".service", dynamic_config_storage_suffix: ".dynamic", creation_timeout: 500ms, connection_suffix: ".connect
              ion" } }, defaults: Defaults { publish_subscribe: PublishSubscribe { max_subscribers: 2, max_publishers: 5000, subscriber_max_buffer_size: 64, subscriber_max_borrowed_samples: 2, publisher_
              max_loaned_samples: 2, publisher_history_size: 16, enable_safe_overflow: false, unable_to_deliver_strategy: Block }, event: Event { max_listeners: 2, max_notifiers: 16 } } }, static_config:
               StaticConfig { max_subscribers: 2, max_publishers: 5000, history_size: 16, subscriber_max_buffer_size: 64, subscriber_max_borrowed_samples: 2, enable_safe_overflow: false, type_name: "iceo
              ryx_test::BigMessage" } }, subscriber_list_state: UnsafeCell { .. }, history: Some(UnsafeCell { .. }), service: Service { state: ServiceState { static_config: StaticConfig { uuid: "b61bd15e
              8c3ea16146985e960906a8e125156a73", service_name: ServiceName { value: FixedSizeByteString<255> { len: 19, data: "My/Funk/ServiceName" } }, messaging_pattern: PublishSubscribe(StaticConfig {
               max_subscribers: 2, max_publishers: 5000, history_size: 16, subscriber_max_buffer_size: 64, subscriber_max_borrowed_samples: 2, enable_safe_overflow: false, type_name: "iceoryx_test::BigMe
              ssage" }) }, global_config: Config { global: Global { root_path_unix: "/tmp/iceoryx2/", root_path_windows: "c:\\Temp\\iceoryx2\\", prefix: "iox2_", service: Service { directory: "services",
               publisher_data_segment_suffix: ".publisher_data", static_config_storage_suffix: ".service", dynamic_config_storage_suffix: ".dynamic", creation_timeout: 500ms, connection_suffix: ".connect
              ion" } }, defaults: Defaults { publish_subscribe: PublishSubscribe { max_subscribers: 2, max_publishers: 5000, subscriber_max_buffer_size: 64, subscriber_max_borrowed_samples: 2, publisher_
              max_loaned_samples: 2, publisher_history_size: 16, enable_safe_overflow: false, unable_to_deliver_strategy: Block }, event: Event { max_listeners: 2, max_notifiers: 16 } } }, dynamic_storag
              e: Storage { shm: SharedMemory { name: FileName { value: FixedSizeByteString<255> { len: 53, data: "iox2_b61bd15e8c3ea16146985e960906a8e125156a73.dynamic" } }, size: 105238, base_address: 0
              x1f8ac2b0000, has_ownership: false, file_descriptor: FileDescriptor { value: 2, is_owned: true }, memory_lock: None }, name: FileName { value: FixedSizeByteString<255> { len: 40, data: "b61
              bd15e8c3ea16146985e960906a8e125156a73" } }, _phantom_data: PhantomData<iceoryx2::service::dynamic_config::DynamicConfig> }, static_storage: Storage { name: FileName { value: FixedSizeByteSt
              ring<255> { len: 40, data: "b61bd15e8c3ea16146985e960906a8e125156a73" } }, config: Configuration { path: Path { value: FixedSizeByteString<255> { len: 25, data: "c:\Temp\iceoryx2\services"
              } }, suffix: FileName { value: FixedSizeByteString<255> { len: 8, data: ".service" } }, prefix: FileName { value: FixedSizeByteString<255> { len: 5, data: "iox2_" } } }, has_ownership: fals
              e, file: File { path: Some(FilePath { value: FixedSizeByteString<255> { len: 79, data: "c:\Temp\iceoryx2\services\iox2_b61bd15e8c3ea16146985e960906a8e125156a73.service" } }), file_descripto
              r: FileDescriptor { value: 1, is_owned: true } }, len: 343 } } }, degration_callback: None, loan_counter: 1, _dynamic_config_guard: UniqueIndex { value: 0, index_set addr: 0xc3acd1f920 }, _
              phantom_message_type: PhantomData<iceoryx_test::BigMessage> }
              | Unable to send sample via connection Connection { sender: Sender { shared_memory: SharedMemory { name: FileName { value: FixedSizeByteString<255> { len: 85, data: "iox2_173826588556296690
              | 3244646632352716_2349590387523030532116418894603304.connection" } }, size: 1182, base_address: 0x1f8ac290000, has_ownership: false, file_descriptor: FileDescriptor { value: 3, is_owned: t
              | rue }, memory_lock: None }, name: FileName { value: FixedSizeByteString<255> { len: 69, data: "1738265885562966903244646632352716_2349590387523030532116418894603304" } } } } since the ret
              | rieve buffer is full. This can be caused by a corrupted retrieve channel.

Any help, pointers or suggestions you can provide would be much appreciated.

@hippiehunter hippiehunter added the bug Something isn't working label Feb 17, 2024
@elfenpiff
Copy link
Contributor

elfenpiff commented Feb 17, 2024

@hippiehunter

For Part 2 of your error message, this is a bug, and it seems I have miscalculated the required size of the retrieve channel - possibly the usual off by one error ;) . I will fix this!

For Part 1. I think the bottleneck is maybe the line sleep(CYCLE_TIME); with a cycle time of 1 ns. In the worst case, the process states to the OS scheduler, please let me sleep and then is wakened up some time later and this procedure of -scheduling another process and then wake the current process up again and continue - can be very expensive.
But the solution is simple: remove the line sleep(CYCLE_TIME); and let's see if it changes anything.

@elfenpiff
Copy link
Contributor

@hippiehunter Btw, the settings can also be set via:

  • publisher_history_size
  • subscriber_max_buffer_size
  • enable_safe_overflow
  • unable_to_deliver_strategy
 let service_name = ServiceName::new("My/Funk/ServiceName")?;
 let service = zero_copy::Service::new(&service_name)
     .publish_subscribe()
     .history_size(12)
     .subscriber_max_buffer_size(5)
     .enable_safe_overflow(true)
     .open_or_create::<TransmissionData>()?;
 service
     .publisher()
     .unable_to_deliver_strategy(UnableToDeliverStrategy::DiscardSample);

elfenpiff added a commit to elfenpiff/iceoryx2 that referenced this issue Feb 17, 2024
elfenpiff added a commit to elfenpiff/iceoryx2 that referenced this issue Feb 17, 2024
elfenpiff added a commit to elfenpiff/iceoryx2 that referenced this issue Feb 18, 2024
elfenpiff added a commit to elfenpiff/iceoryx2 that referenced this issue Feb 18, 2024
elfenpiff added a commit that referenced this issue Feb 18, 2024
@elfenpiff
Copy link
Contributor

@hippiehunter Part 2 should be fixed now.

If you confirm that the removal of every occurrence of sleep(CYCLE_TIME); fixes your performance problem, I would close the issue.

@hippiehunter
Copy link
Author

When i remove the sleep(CYCLE_TIME); call I cant get the publisher connected to the subscriber.

PublisherConnections { connections: [UnsafeCell { .. }, UnsafeCell { .. }, UnsafeCell { .. },
... 
UnsafeCell continues for a while
...
, subscriber_id: UniqueSubscriberId(UniqueSystemId { value: 1619740554441627454765778268903520 }), config: Config { global: Global {
               root_path_unix: "/tmp/iceoryx2/", root_path_windows: "c:\\Temp\\iceoryx2\\", prefix: "iox2_", service: Service { directory: "services", publisher_data_segment_suffix: ".publisher_data", static_config_storage_suffix: ".service", dynamic_config_storage_suffix: ".dynamic", creat
              ion_timeout: 500ms, connection_suffix: ".connection" } }, defaults: Defaults { publish_subscribe: PublishSubscribe { max_subscribers: 2, max_publishers: 5000, subscriber_max_buffer_size: 64, subscriber_max_borrowed_samples: 2, publisher_max_loaned_samples: 2, publisher_history
              _size: 16, enable_safe_overflow: false, unable_to_deliver_strategy: Block }, event: Event { max_listeners: 2, max_notifiers: 16 } } }, static_config: StaticConfig { max_subscribers: 2, max_publishers: 5000, history_size: 16, subscriber_max_buffer_size: 64, subscriber_max_borro
              wed_samples: 2, enable_safe_overflow: false, type_name: "iceoryx_test::BigMessage" } }, service: Service { state: ServiceState { static_config: StaticConfig { uuid: "b61bd15e8c3ea16146985e960906a8e125156a73", service_name: ServiceName { value: FixedSizeByteString<255> { len: 1
              9, data: "My/Funk/ServiceName" } }, messaging_pattern: PublishSubscribe(StaticConfig { max_subscribers: 2, max_publishers: 5000, history_size: 16, subscriber_max_buffer_size: 64, subscriber_max_borrowed_samples: 2, enable_safe_overflow: false, type_name: "iceoryx_test::BigMess
              age" }) }, global_config: Config { global: Global { root_path_unix: "/tmp/iceoryx2/", root_path_windows: "c:\\Temp\\iceoryx2\\", prefix: "iox2_", service: Service { directory: "services", publisher_data_segment_suffix: ".publisher_data", static_config_storage_suffix: ".service
              ", dynamic_config_storage_suffix: ".dynamic", creation_timeout: 500ms, connection_suffix: ".connection" } }, defaults: Defaults { publish_subscribe: PublishSubscribe { max_subscribers: 2, max_publishers: 5000, subscriber_max_buffer_size: 64, subscriber_max_borrowed_samples: 2,
               publisher_max_loaned_samples: 2, publisher_history_size: 16, enable_safe_overflow: false, unable_to_deliver_strategy: Block }, event: Event { max_listeners: 2, max_notifiers: 16 } } }, dynamic_storage: Storage { shm: SharedMemory { name: FileName { value: FixedSizeByteString<
              255> { len: 53, data: "iox2_b61bd15e8c3ea16146985e960906a8e125156a73.dynamic" } }, size: 105238, base_address: 0x1e925fe0000, has_ownership: false, file_descriptor: FileDescriptor { value: 1, is_owned: true }, memory_lock: None }, name: FileName { value: FixedSizeByteString<25
              5> { len: 40, data: "b61bd15e8c3ea16146985e960906a8e125156a73" } }, _phantom_data: PhantomData<iceoryx2::service::dynamic_config::DynamicConfig> }, static_storage: Storage { name: FileName { value: FixedSizeByteString<255> { len: 40, data: "b61bd15e8c3ea16146985e960906a8e12515
              6a73" } }, config: Configuration { path: Path { value: FixedSizeByteString<255> { len: 25, data: "c:\Temp\iceoryx2\services" } }, suffix: FileName { value: FixedSizeByteString<255> { len: 8, data: ".service" } }, prefix: FileName { value: FixedSizeByteString<255> { len: 5, dat
              a: "iox2_" } } }, has_ownership: false, file: File { path: Some(FilePath { value: FixedSizeByteString<255> { len: 79, data: "c:\Temp\iceoryx2\services\iox2_b61bd15e8c3ea16146985e960906a8e125156a73.service" } }), file_descriptor: FileDescriptor { value: 0, is_owned: true } }, l
              en: 343 } } }, degration_callback: None, publisher_list_state: UnsafeCell { .. }, _phantom_message_type: PhantomData<iceoryx_test::BigMessage> }
              | Unable to establish connection to new publisher UniquePublisherId(UniqueSystemId { value: 3370999858656926372933486367672852 })

@elfenpiff
Copy link
Contributor

@hippiehunter I used your code can confirm the bug. It is a little weird race, both check successfully that the underlying resource is not existing, then both want to create this resource and only one can win this race - the other one is then complaining that the resource already exists and this is the error message you see.

The solution is simple, try to open it again and everything shall work. We encountered exactly the same thing on a different place already. I will try to fix this today.

elfenpiff added a commit to elfenpiff/iceoryx2 that referenced this issue Feb 19, 2024
elfenpiff added a commit to elfenpiff/iceoryx2 that referenced this issue Feb 19, 2024
@elfenpiff
Copy link
Contributor

elfenpiff commented Feb 19, 2024

@hippiehunter I fixed the connection issue, but there is still homework left:

  1. I still encountered one very rare occasions the retrieve buffer is full issue
  2. We need to introduce a flag to track the connection creation

I will look into those issues and fix them as soon as possible. But in the meantime you should be able to continue when you base your work on: #124

I adjusted your code a bit to get a clearer understanding, and when I removed the sleep I got roughly a throughput of around 20k. (Laptop with i7-10875H CPU @ 2.30GHz)

use core::time::Duration;
use iceoryx2::prelude::*;
use iceoryx2_bb_container::byte_string::FixedSizeByteString;
use iceoryx2_bb_container::vec::FixedSizeVec;
use iceoryx2_bb_posix::signal::SignalHandler;
use std::env;
use std::thread::sleep;
use std::time::Instant;

const CYCLE_TIME: Duration = Duration::from_nanos(1);

#[derive(Debug)]
pub struct BigMessage {
    pub process_id: u32,
    pub req_id: u32,
    pub op_type: i32,
    pub file_id: FixedSizeByteString<128>,
    pub txn_id: u64,
    pub key_data: FixedSizeVec<u8, 64>,
    pub record_data: FixedSizeVec<u8, 64000>,
    pub original_data: FixedSizeVec<u8, 64000>,
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let service_name = ServiceName::new("My/Funk/ServiceName")?;

    let service = zero_copy::Service::new(&service_name)
        .publish_subscribe()
        .open_or_create::<BigMessage>()?;

    let args: Vec<String> = env::args().collect();

    if args.len() < 2 {
        println!("Usage: {} <publisher|subscriber>", args[0]);
        return Ok(());
    }

    println!("Running as {}", args[1]);

    if args[1] == "publisher" {
        let publisher = service.publisher().create()?;
        let mut counter = 0;
        loop {
            if SignalHandler::termination_requested() {
                break;
            }
            counter += 1;

            let sample = publisher.loan_uninit()?;
            let sample = sample.write_payload(BigMessage {
                process_id: 1,
                req_id: counter,
                op_type: 3,
                file_id: FixedSizeByteString::from_bytes(b"test").map_err(|e| "error")?,
                txn_id: 4,
                key_data: FixedSizeVec::new(),
                record_data: FixedSizeVec::new(),
                original_data: FixedSizeVec::new(),
            });
            if sample.send()? == 0 {
                //it didnt work but also didnt return a real error, sleep and try again
                //sleep(CYCLE_TIME);
            } else {
                //    break;
            }
        }
    } else {
        let subscriber = service.subscriber().create()?;
        let counter = 0;
        let start = Instant::now();
        {
            loop {
                match subscriber.receive()? {
                    None => {
                        //sleep(CYCLE_TIME);
                        if SignalHandler::termination_requested() {
                            break;
                        }
                    }
                    Some(sample) => {
                        if sample.payload().req_id % 10000 == 0 {
                            println!(
                                "received: {}, time: {:?}",
                                sample.payload().req_id,
                                start.elapsed()
                            );
                        }
                    }
                }
            }
        }
    }
    Ok(())
}

And the output:

received: 10000, time: 1.430437594s
received: 20000, time: 1.896037592s
received: 30000, time: 2.372263314s
received: 40000, time: 2.844562772s
received: 50000, time: 3.357285109s
received: 60000, time: 3.836857632s
received: 70000, time: 4.304958051s
received: 80000, time: 4.778587094s
received: 90000, time: 5.242260115s
received: 100000, time: 5.707258093s
received: 110000, time: 6.17250077s
received: 120000, time: 6.63923506s
received: 130000, time: 7.112825314s
received: 140000, time: 7.582082056s
received: 150000, time: 8.046913373s
received: 160000, time: 8.515481013s
received: 170000, time: 8.985831923s
received: 180000, time: 9.451750306s
received: 190000, time: 9.91745711s
received: 200000, time: 10.385213389s
received: 210000, time: 10.859284136s

I also run a mbw a memory benchmark and my theoretical memory maximum is around ~8500 MiB/s. This throughput here, with BigData having a size of 128 KiB and ~20K samples, is around ~2450 MiB/s, so there is maybe still space left to the top. But iceoryx2 is a young library, and we did not yet start with the extreme performance optimizations. But I also think that ~8500 MiB/s is a more theoretical maximum, we have OS interference when working with multiple processes, wakeups etc. and also need to sync memory between CPU cores - this all takes time and is not included in the measurement from above.

elfenpiff added a commit to elfenpiff/iceoryx2 that referenced this issue Feb 19, 2024
…n, remove retrieve buffer check from publisher - the publisher has not enough information to perform this check
elfenpiff added a commit to elfenpiff/iceoryx2 that referenced this issue Feb 19, 2024
…n, remove retrieve buffer check from publisher - the publisher has not enough information to perform this check
elfenpiff added a commit to elfenpiff/iceoryx2 that referenced this issue Feb 19, 2024
elfenpiff added a commit to elfenpiff/iceoryx2 that referenced this issue Feb 19, 2024
…n, remove retrieve buffer check from publisher - the publisher has not enough information to perform this check
elfenpiff added a commit to elfenpiff/iceoryx2 that referenced this issue Feb 19, 2024
…n, remove retrieve buffer check from publisher - the publisher has not enough information to perform this check
elfenpiff added a commit that referenced this issue Feb 20, 2024
…e-or-open

[#116] Fix race in shm create or open
@elfenpiff
Copy link
Contributor

@hippiehunter Everything on main should be fixed now. Could you please confirm.

@hippiehunter
Copy link
Author

I just updated to the current checked in on main and It crashes the consumer with this

Running as consumer
        8 [D] SharedMemoryBuilder { name: FileName { value: FixedSizeByteString<255> { len: 84, data: "iox2_3203353066776743035165720297650548_928870977317242431530252316446368.connection" } }, size: 166, is_memory_locked: false, has_ownership: true, permission: OWNER_READ | OWNER_WRITE | O
              WNER_EXEC | OWNER_ALL, creation_mode: Some(OpenOrCreate), zero_memory: true, access_mode: ReadWrite, enforce_base_address: None }
              | Unable to open shared memory since the shared memory does not exist.
< Win32 API error > C:\Users\the_e\.cargo\git\checkouts\iceoryx2-1e1d4490bb91753f\80db0c4\iceoryx2-pal\posix\src\windows\mman.rs:225 CreateFileA(shm_file_path(name, SHM_STATE_SUFFIX).as_ptr(),
    GENERIC_WRITE | GENERIC_READ,
    FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
    core::ptr::null::<SECURITY_ATTRIBUTES>(), CREATE_NEW,
    FILE_ATTRIBUTE_NORMAL, 0)
 [ 80 ] The file exists.

        9 [D] SharedMemoryBuilder { name: FileName { value: FixedSizeByteString<255> { len: 84, data: "iox2_3203353066776743035165720297650548_928870977317242431530252316446368.connection" } }, size: 166, is_memory_locked: false, has_ownership: true, permission: OWNER_READ | OWNER_WRITE | O
              WNER_EXEC | OWNER_ALL, creation_mode: Some(OpenOrCreate), zero_memory: true, access_mode: ReadWrite, enforce_base_address: None }
              | Unable to create shared memory since it already exists.
< Win32 API error > C:\Users\the_e\.cargo\git\checkouts\iceoryx2-1e1d4490bb91753f\80db0c4\iceoryx2-pal\posix\src\windows\mman.rs:258 WriteFile(fd_handle, (&shm_size as *const u64) as *const u8, 8,
    &mut bytes_written, core::ptr::null_mut::<OVERLAPPED>())
 [ 5 ] Access is denied.

       10 [T] SharedMemory { name: FileName { value: FixedSizeByteString<255> { len: 84, data: "iox2_3203353066776743035165720297650548_928870977317242431530252316446368.connection" } }, size: 166, base_address: 0x2992c7d0000, has_ownership: true, file_descriptor: FileDescriptor { value: 2,
               is_owned: true }, memory_lock: None }
              | create
       11 [T] SharedMemory { name: FileName { value: FixedSizeByteString<255> { len: 60, data: "iox2_40432_3203353066776743035165720297650548.publisher_data" } }, size: 898689, base_address: 0x2992d050000, has_ownership: false, file_descriptor: FileDescriptor { value: 3, is_owned: true }, m
              emory_lock: None }
              | open
thread 'main' panicked at 'misaligned pointer dereference: address must be a multiple of 0x8 but is 0x2992d0500a2', C:\Users\the_e\.cargo\git\checkouts\iceoryx2-1e1d4490bb91753f\80db0c4\iceoryx2\src\raw_sample.rs:62:18
stack backtrace:
   0: std::panicking::begin_panic_handler
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library\std\src\panicking.rs:578
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library\core\src\panicking.rs:67
   2: core::panicking::panic_misaligned_pointer_dereference
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library\core\src\panicking.rs:174
   3: iceoryx2::raw_sample::RawSample<iceoryx2::service::header::publish_subscribe::Header,iceoryx_test::BigMessage>::as_ref
             at C:\Users\the_e\.cargo\git\checkouts\iceoryx2-1e1d4490bb91753f\80db0c4\iceoryx2\src\raw_sample.rs:62
   4: iceoryx2::raw_sample::RawSample<iceoryx2::service::header::publish_subscribe::Header,iceoryx_test::BigMessage>::as_data_ref
             at C:\Users\the_e\.cargo\git\checkouts\iceoryx2-1e1d4490bb91753f\80db0c4\iceoryx2\src\raw_sample.rs:76
   5: iceoryx2::sample::impl$2::payload<iceoryx_test::BigMessage>
             at C:\Users\the_e\.cargo\git\checkouts\iceoryx2-1e1d4490bb91753f\80db0c4\iceoryx2\src\sample.rs:68
   6: iceoryx_test::main
             at D:\repos\iceoryx_test\src\main.rs:84
   7: core::ops::function::FnOnce::call_once<enum2$<core::result::Result<tuple$<>,alloc::boxed::Box<dyn$<core::error::Error>,alloc::alloc::Global> > > (*)(),tuple$<> >
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca\library\core\src\ops\function.rs:250

@elfenpiff
Copy link
Contributor

elfenpiff commented Feb 20, 2024

@hippiehunter Could you please try it also on linux? The error message on Windows that the file already exists is expected and in the next step iceoryx tries to open it when it already exists. But we have to get rid of the error message, otherwise it's a bit confusing.

And I think Windows seems to have a totally new problem here - misaligned pointers :/ ... Theoretically, this can only happen when the shared memory is not aligned - but it should be always page size aligned. I have to dig into this.

And thanks for your patience and crash reports!

@hippiehunter
Copy link
Author

I tried running it a few times on linux and it seems to work fine

@hippiehunter
Copy link
Author

I tried building the linux side with --release and i sometimes see this error on the publisher. Thanks for looking!

Running as publisher
        9 [T] SharedMemory { name: FileName { value: FixedSizeByteString<255> { len: 60, data: "iox2_13704_1085742739095478483720882696449008.publisher_data" } }, size: 898689, base_address: 0x7fbc5a761000, has_ownership: true, file_descriptor: FileDescriptor { value: 3
              , is_owned: true }, memory_lock: None }
              | create
       10 [D] SharedMemoryBuilder { name: FileName { value: FixedSizeByteString<255> { len: 85, data: "iox2_1085742739095478483720882696449008_1085584282770449955045338807874010.connection" } }, size: 166, is_memory_locked: false, has_ownership: true, permission: OWNER_
              READ | OWNER_WRITE | OWNER_EXEC | OWNER_ALL, creation_mode: Some(OpenOrCreate), zero_memory: true, access_mode: ReadWrite, enforce_base_address: None }
              | Unable to open shared memory since the shared memory does not exist.
       11 [T] SharedMemory { name: FileName { value: FixedSizeByteString<255> { len: 85, data: "iox2_1085742739095478483720882696449008_1085584282770449955045338807874010.connection" } }, size: 166, base_address: 0x7fbc5ab3b000, has_ownership: true, file_descriptor: Fil
              eDescriptor { value: 6, is_owned: true }, memory_lock: None }
              | create
       12 [D] PoolAllocator { buckets: UniqueIndexSet { data_ptr: RelocatablePointer { distance: 112, _phantom: PhantomData<core::cell::UnsafeCell<u32>> }, capacity: 7, borrowed_indices: 4, head: 17179869188, is_memory_initialized: true }, bucket_size: 128360, bucket_al
              ignment: 8, start: 140446948266152, size: 898527, is_memory_initialized: true }
              | Tried to release memory (140446948266146) which does not belong to this allocator.
       13 [D] PoolAllocator { allocator: PoolAllocator { buckets: UniqueIndexSet { data_ptr: RelocatablePointer { distance: 112, _phantom: PhantomData<core::cell::UnsafeCell<u32>> }, capacity: 7, borrowed_indices: 4, head: 17179869188, is_memory_initialized: true }, buc
              ket_size: 128360, bucket_alignment: 8, start: 140446948266152, size: 898527, is_memory_initialized: true }, base_address: 140446948266146, max_supported_alignment_by_memory: 4096 }
              | Failed to release shared memory chunk
       14 [D] Memory { shared_memory: SharedMemory { name: FileName { value: FixedSizeByteString<255> { len: 60, data: "iox2_13704_1085742739095478483720882696449008.publisher_data" } }, size: 898689, base_address: 0x7fbc5a761000, has_ownership: true, file_descriptor: F
              ileDescriptor { value: 3, is_owned: true }, memory_lock: None }, name: FileName { value: FixedSizeByteString<255> { len: 40, data: "13704_1085742739095478483720882696449008" } }, allocator: 0x7fbc5a761000 }
              | Failed to deallocate shared memory chunk due to an internal allocator failure.
       15 [F] Publisher { port_id: UniquePublisherId(UniqueSystemId { value: 1085742739095478483720882696449008 }), sample_reference_counter: [0, 1, 1, 2, 0, 0, 0], data_segment: Memory { shared_memory: SharedMemory { name: FileName { value: FixedSizeByteString<255> { l
              en: 60, data: "iox2_13704_1085742739095478483720882696449008.publisher_data" } }, size: 898689, base_address: 0x7fbc5a761000, has_ownership: true, file_descriptor: FileDescriptor { value: 3, is_owned: true }, memory_lock: None }, name: FileName { value: Fi
              xedSizeByteString<255> { len: 40, data: "13704_1085742739095478483720882696449008" } }, allocator: 0x7fbc5a761000 }, config: LocalPublisherConfig { max_loaned_samples: 1, unable_to_deliver_strategy: Block }, dynamic_storage: Storage { shm: SharedMemory { n
              ame: FileName { value: FixedSizeByteString<255> { len: 53, data: "iox2_b61bd15e8c3ea16146985e960906a8e125156a73.dynamic" } }, size: 105254, base_address: 0x7fbc5a83d000, has_ownership: false, file_descriptor: FileDescriptor { value: 5, is_owned: true }, me
              mory_lock: None }, name: FileName { value: FixedSizeByteString<255> { len: 40, data: "b61bd15e8c3ea16146985e960906a8e125156a73" } }, _phantom_data: PhantomData<iceoryx2::service::dynamic_config::DynamicConfig> }, subscriber_connections: SubscriberConnectio
              ns { connections: [UnsafeCell { .. }, UnsafeCell { .. }], port_id: UniquePublisherId(UniqueSystemId { value: 1085742739095478483720882696449008 }), config: Config { global: Global { root_path_unix: "/tmp/iceoryx2/", root_path_windows: "c:\\Temp\\iceoryx2\\
              ", prefix: "iox2_", service: Service { directory: "services", publisher_data_segment_suffix: ".publisher_data", static_config_storage_suffix: ".service", dynamic_config_storage_suffix: ".dynamic", creation_timeout: 500ms, connection_suffix: ".connection" }
               }, defaults: Defaults { publish_subscribe: PublishSubscribe { max_subscribers: 2, max_publishers: 5000, subscriber_max_buffer_size: 1, subscriber_max_borrowed_samples: 1, publisher_max_loaned_samples: 1, publisher_history_size: 1, enable_safe_overflow: fa
              lse, unable_to_deliver_strategy: Block }, event: Event { max_listeners: 2, max_notifiers: 16 } } }, static_config: StaticConfig { max_subscribers: 2, max_publishers: 5000, history_size: 1, subscriber_max_buffer_size: 1, subscriber_max_borrowed_samples: 1,
              enable_safe_overflow: false, type_name: "iceoryx_test::BigMessage" } }, subscriber_list_state: UnsafeCell { .. }, history: Some(UnsafeCell { .. }), service: Service { state: ServiceState { static_config: StaticConfig { uuid: "b61bd15e8c3ea16146985e960906a8
              e125156a73", service_name: ServiceName { value: FixedSizeByteString<255> { len: 19, data: "My/Funk/ServiceName" } }, messaging_pattern: PublishSubscribe(StaticConfig { max_subscribers: 2, max_publishers: 5000, history_size: 1, subscriber_max_buffer_size: 1
              , subscriber_max_borrowed_samples: 1, enable_safe_overflow: false, type_name: "iceoryx_test::BigMessage" }) }, global_config: Config { global: Global { root_path_unix: "/tmp/iceoryx2/", root_path_windows: "c:\\Temp\\iceoryx2\\", prefix: "iox2_", service: S
              ervice { directory: "services", publisher_data_segment_suffix: ".publisher_data", static_config_storage_suffix: ".service", dynamic_config_storage_suffix: ".dynamic", creation_timeout: 500ms, connection_suffix: ".connection" } }, defaults: Defaults { publi
              sh_subscribe: PublishSubscribe { max_subscribers: 2, max_publishers: 5000, subscriber_max_buffer_size: 1, subscriber_max_borrowed_samples: 1, publisher_max_loaned_samples: 1, publisher_history_size: 1, enable_safe_overflow: false, unable_to_deliver_strateg
              y: Block }, event: Event { max_listeners: 2, max_notifiers: 16 } } }, dynamic_storage: Storage { shm: SharedMemory { name: FileName { value: FixedSizeByteString<255> { len: 53, data: "iox2_b61bd15e8c3ea16146985e960906a8e125156a73.dynamic" } }, size: 105254
              , base_address: 0x7fbc5a83d000, has_ownership: false, file_descriptor: FileDescriptor { value: 5, is_owned: true }, memory_lock: None }, name: FileName { value: FixedSizeByteString<255> { len: 40, data: "b61bd15e8c3ea16146985e960906a8e125156a73" } }, _phan
              tom_data: PhantomData<iceoryx2::service::dynamic_config::DynamicConfig> }, static_storage: Storage { name: FileName { value: FixedSizeByteString<255> { len: 40, data: "b61bd15e8c3ea16146985e960906a8e125156a73" } }, config: Configuration { path: Path { valu
              e: FixedSizeByteString<4096> { len: 22, data: "/tmp/iceoryx2/services" } }, suffix: FileName { value: FixedSizeByteString<255> { len: 8, data: ".service" } }, prefix: FileName { value: FixedSizeByteString<255> { len: 5, data: "iox2_" } } }, has_ownership:
              false, file: File { path: Some(FilePath { value: FixedSizeByteString<4096> { len: 76, data: "/tmp/iceoryx2/services/iox2_b61bd15e8c3ea16146985e960906a8e125156a73.service" } }), file_descriptor: FileDescriptor { value: 4, is_owned: true }, has_ownership: fa
              lse }, len: 341 } } }, degration_callback: None, loan_counter: 1, dynamic_publisher_handle: ContainerHandle { index: 0, container_id: 1 }, _phantom_message_type: PhantomData<iceoryx_test::BigMessage> }
              | This should never happen! Failed to deallocate the reclaimed ptr. Either the data was corrupted or an invalid ptr was returned.
thread 'main' panicked at /home/hh/.cargo/git/checkouts/iceoryx2-1e1d4490bb91753f/80db0c4/iceoryx2/src/port/publisher.rs:387:37:
From: Publisher { port_id: UniquePublisherId(UniqueSystemId { value: 1085742739095478483720882696449008 }), sample_reference_counter: [0, 1, 1, 2, 0, 0, 0], data_segment: Memory { shared_memory: SharedMemory { name: FileName { value: FixedSizeByteString<255> { len: 60, data: "iox2_13704_1085742739095478483720882696449008.publisher_data" } }, size: 898689, base_address: 0x7fbc5a761000, has_ownership: true, file_descriptor: FileDescriptor { value: 3, is_owned: true }, memory_lock: None }, name: FileName { value: FixedSizeByteString<255> { len: 40, data: "13704_1085742739095478483720882696449008" } }, allocator: 0x7fbc5a761000 }, config: LocalPublisherConfig { max_loaned_samples: 1, unable_to_deliver_strategy: Block }, dynamic_storage: Storage { shm: SharedMemory { name: FileName { value: FixedSizeByteString<255> { len: 53, data: "iox2_b61bd15e8c3ea16146985e960906a8e125156a73.dynamic" } }, size: 105254, base_address: 0x7fbc5a83d000, has_ownership: false, file_descriptor: FileDescriptor { value: 5, is_owned: true }, memory_lock: None }, name: FileName { value: FixedSizeByteString<255> { len: 40, data: "b61bd15e8c3ea16146985e960906a8e125156a73" } }, _phantom_data: PhantomData<iceoryx2::service::dynamic_config::DynamicConfig> }, subscriber_connections: SubscriberConnections { connections: [UnsafeCell { .. }, UnsafeCell { .. }], port_id: UniquePublisherId(UniqueSystemId { value: 1085742739095478483720882696449008 }), config: Config { global: Global { root_path_unix: "/tmp/iceoryx2/", root_path_windows: "c:\\Temp\\iceoryx2\\", prefix: "iox2_", service: Service { directory: "services", publisher_data_segment_suffix: ".publisher_data", static_config_storage_suffix: ".service", dynamic_config_storage_suffix: ".dynamic", creation_timeout: 500ms, connection_suffix: ".connection" } }, defaults: Defaults { publish_subscribe: PublishSubscribe { max_subscribers: 2, max_publishers: 5000, subscriber_max_buffer_size: 1, subscriber_max_borrowed_samples: 1, publisher_max_loaned_samples: 1, publisher_history_size: 1, enable_safe_overflow: false, unable_to_deliver_strategy: Block }, event: Event { max_listeners: 2, max_notifiers: 16 } } }, static_config: StaticConfig { max_subscribers: 2, max_publishers: 5000, history_size: 1, subscriber_max_buffer_size: 1, subscriber_max_borrowed_samples: 1, enable_safe_overflow: false, type_name: "iceoryx_test::BigMessage" } }, subscriber_list_state: UnsafeCell { .. }, history: Some(UnsafeCell { .. }), service: Service { state: ServiceState { static_config: StaticConfig { uuid: "b61bd15e8c3ea16146985e960906a8e125156a73", service_name: ServiceName { value: FixedSizeByteString<255> { len: 19, data: "My/Funk/ServiceName" } }, messaging_pattern: PublishSubscribe(StaticConfig { max_subscribers: 2, max_publishers: 5000, history_size: 1, subscriber_max_buffer_size: 1, subscriber_max_borrowed_samples: 1, enable_safe_overflow: false, type_name: "iceoryx_test::BigMessage" }) }, global_config: Config { global: Global { root_path_unix: "/tmp/iceoryx2/", root_path_windows: "c:\\Temp\\iceoryx2\\", prefix: "iox2_", service: Service { directory: "services", publisher_data_segment_suffix: ".publisher_data", static_config_storage_suffix: ".service", dynamic_config_storage_suffix: ".dynamic", creation_timeout: 500ms, connection_suffix: ".connection" } }, defaults: Defaults { publish_subscribe: PublishSubscribe { max_subscribers: 2, max_publishers: 5000, subscriber_max_buffer_size: 1, subscriber_max_borrowed_samples: 1, publisher_max_loaned_samples: 1, publisher_history_size: 1, enable_safe_overflow: false, unable_to_deliver_strategy: Block }, event: Event { max_listeners: 2, max_notifiers: 16 } } }, dynamic_storage: Storage { shm: SharedMemory { name: FileName { value: FixedSizeByteString<255> { len: 53, data: "iox2_b61bd15e8c3ea16146985e960906a8e125156a73.dynamic" } }, size: 105254, base_address: 0x7fbc5a83d000, has_ownership: false, file_descriptor: FileDescriptor { value: 5, is_owned: true }, memory_lock: None }, name: FileName { value: FixedSizeByteString<255> { len: 40, data: "b61bd15e8c3ea16146985e960906a8e125156a73" } }, _phantom_data: PhantomData<iceoryx2::service::dynamic_config::DynamicConfig> }, static_storage: Storage { name: FileName { value: FixedSizeByteString<255> { len: 40, data: "b61bd15e8c3ea16146985e960906a8e125156a73" } }, config: Configuration { path: Path { value: FixedSizeByteString<4096> { len: 22, data: "/tmp/iceoryx2/services" } }, suffix: FileName { value: FixedSizeByteString<255> { len: 8, data: ".service" } }, prefix: FileName { value: FixedSizeByteString<255> { len: 5, data: "iox2_" } } }, has_ownership: false, file: File { path: Some(FilePath { value: FixedSizeByteString<4096> { len: 76, data: "/tmp/iceoryx2/services/iox2_b61bd15e8c3ea16146985e960906a8e125156a73.service" } }), file_descriptor: FileDescriptor { value: 4, is_owned: true }, has_ownership: false }, len: 341 } } }, degration_callback: None, loan_counter: 1, dynamic_publisher_handle: ContainerHandle { index: 0, container_id: 1 }, _phantom_message_type: PhantomData<iceoryx_test::BigMessage> } ::: This should never happen! Failed to deallocate the reclaimed ptr. Either the data was corrupted or an invalid ptr was returned.
stack backtrace:
   0: rust_begin_unwind
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:72:14
   2: iceoryx2::port::publisher::Publisher<Service,MessageType>::retrieve_returned_samples
   3: <iceoryx2::port::publisher::Publisher<Service,MessageType> as iceoryx2::port::publish::internal::PublishMgmt>::send_impl
   4: iceoryx_test::main

@elfenpiff
Copy link
Contributor

elfenpiff commented Mar 6, 2024

@hippiehunter I wrote a lot of tests, some careful reviews and discovered and fixed all bugs you unraveled - hopefully.

There are still some issues left I will tackle in the next days:

  1. Calling this concurrently in busy loops from multiple processes may result in a state where no new services can be created.
let service = zero_copy::Service::new(&service_name)
        .publish_subscribe()
        .open_or_create::<BigMessage>()?;
// do stuff
drop(service);
  1. Creating and destroying Listeners in a busy loop may lead to connection failures on Notifier side. The tests are already written, and the solution is a combination of shared memory bit set and a trigger like semaphore, unix domain socket etc. - in the end the iceoryx1 approach.

  2. When in a busy loop Publishers are created and destroyed, it can cause the Subscriber to release the sample to the wrong publisher. This is already fixed in the PR [#133] write drop tests #144

But your code should now be completely stable. So if you have time to confirm this, I would close the issue and ping you on the other issues when the two remaining problems are also fixed.

@hippiehunter
Copy link
Author

I think my stuff is all stable now, I tried my actual, slightly more complicated use case and it seems reliable now. Thanks for all your efforts in this!

@elfenpiff elfenpiff self-assigned this Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants