-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The publishers retrieve channel is full and the sample cannot be returned #571
Comments
@smileghp As the error message says, this should never happen and is a bug on our side. This occurs when the publisher sends multiple samples and then stops, then the subscriber receives all samples and lets them go out of scope, which then leads to a full queue. I think I am able to reproduce this cleanly on our side when you provide me with some more details:
Can you provide more details about the circumstances that led to the issue? |
@smileghp Oh and two more question.
|
The configuration I used is as follows. I can’t clearly specify the method for reproduction at this moment, but I will try to create a scenario that can be reliably reproduced. [global] [global.node] [global.service] [defaults.publish-subscribe] [defaults.event] |
subscriber created first.
yes,it's a mutli-thread process, but samples accessed, released in a same thread when subscriber handled. |
@smileghp Is the publisher accessed from multiple threads? |
@smileghp The underlying mechanism is a construct with a submission and a completion queue. Your problem is, that the completion queue capacity is exceeded. The capacity is calculated as the sum of the subscriber buffer size Whenever the publisher delivers a sample, it calls Here are some abstract scenarios I have drafted, maybe you have an idea how the capacity of the completion queue can be exceeded - if you cannot create a reproducible minimal example.
Case 1:
Case 2:
|
yes |
Based on your description, I initially think that this error should occur in the publisher’s process, but I observed that the phenomenon seems to happen in the subscriber’s thread. |
And sometimes the subscriber encounters the error “ExceedsMaxBorrowedSamples” In C++, does it return the sample immediately, or could you explain the timing of when the sample is returned? Is there an interface to actively return the sample? the subscriber handle is as follows
|
It is a bug on the publisher side that will affect the subscriber side.
This can also be the root cause since no construct of iceoryx2 is thread-safe. If you want to access the publisher from multiple threads you can create a separate publisher for each thread. I created a pull request #572 that addresses one issue that can lead to the error you encountered. But it happens when the subscriber has a smaller buffer size than the history size, and the history is consumed on the subscriber side while the publisher is delivering it. |
Okay, I will pick your changes and test them in my environment. although multiple threads can access the publisher, there will be no competition because I used a lock to ensure it.
Can you explain this? |
Actually, this is not enough since the |
As soon as the |
@elfenpiff okay but I still don’t understand why the “ExceedsMaxBorrowedSamples” error occurs in 'Subscriber' side. My implementation is the code mentioned above.
|
@smileghp Since the completion queue is filled by the subscriber whenever a sample is released and the publisher has the responsibility to recycle the samples from the completion queue. It follows a specific contract, and when the publisher violates that contract, it causes a failure on the subscriber side since the subscriber just adds samples to the completion queue. It would make sense to handle this differently on the subscriber side so that it is not harshly affected - a better way would be just to cut the connection to the publisher since it is obviously malfunctioning. |
Required information
Operating system:
Linux
uname -a
ver
Linux for arm64
Rust version:
Output of:
rustc --version
Cargo version:
Output of:
cargo --version
iceoryx2 version:
0.4.1
Detailed log output:
[F] Sample<[u8], iceoryx2::service::builder::publish_subscribe::CustomHeaderMarker, iceoryx2::service::ipc::Service> { details: SampleDetails { publisher_connection: Connection { receiver: R
eceiver { storage: Storage { shm: SharedMemory { name: FileName { value: FixedSizeByteString<255> { len: 120, data: "iox2_b9fc73e5c1f646968758453273c6c65cb372831b_11190463427929560482297 388 [F]
58164885_1348123209380142857096792137656.connection" } }, size: 358, base_address: 0xffffa179e000, has_ownership: false, file_descriptor: FileDescriptor { value: 93, is_owned: true }, me
mory_lock: None }, name: FileName { value: FixedSizeByteString<255> { len: 63, data: "1119046342792956048229758164885_1348123209380142857096792137656" } }, _phantom_data: PhantomData }, borrow_counter: UnsafeCell { .. }, name: FileName { value: FixedSizeByteString<255> { len: 63, data: "111904634
2792956048229758164885_1348123209380142857096792137656" } } }, data_segment: Memory { storage: Storage { shm: SharedMemory { name: FileName { value: FixedSizeByteString<255> { len: 98, d
ata: "iox2_0354a209029e7d094a819e2d4030ea331e6caaf0_24469_1119046342792956048229758164885.publisher_data" } }, size: 14750638, base_address: 0xfffef21ec000, has_ownership: false, file_de
scriptor: FileDescriptor { value: 94, is_owned: true }, memory_lock: None }, name: FileName { value: FixedSizeByteString<255> { len: 37, data: "24469_1119046342792956048229758164885" } }
, _phantom_data: PhantomData<iceoryx2_cal::shared_memory::common::details::AllocatorDetails<iceoryx2_cal::shm_allocator::pool_allocator::PoolAllocator>> }, name: FileName { value: FixedS
Sample<[u8], iceoryx2::service::builder::publish_subscribe::CustomHeaderMarker, iceoryx2::service::ipc::Service> { details: SampleDetails { publisher_connection: Connection { receiver: R izeByteString<255> { len: 37, data: "24469_1119046342792956048229758164885" } }, payload_start_address: 281470448877696, _phantom: PhantomData<iceoryx2_cal::shm_allocator::pool_allocator
::PoolAllocator> }, publisher_id: UniquePublisherId(UniqueSystemId { value: 1119046342792956048229758164885, pid: 24469, creation_time: Time { clock_type: Realtime, seconds: 1735479263,
I am using a C++ API, and when receiving a certain topic, this phenomenon inevitably occurs after running for a period of time. I would like to know the cause of this issue and if you have any suggestions or ideas. Thank you!
The text was updated successfully, but these errors were encountered: