Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Transient local messages not cached for multiple publishers #333

Open
ottojo opened this issue Nov 12, 2024 · 3 comments
Open

[Bug] Transient local messages not cached for multiple publishers #333

ottojo opened this issue Nov 12, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@ottojo
Copy link

ottojo commented Nov 12, 2024

Describe the bug

On a local ROS 2 setup, when subscribing to a topic with transient local durability, a node receives the cached message from every transient local publisher on the topic. When subscribing to the same topic over the zenoh bridge (version 1.0.2), only the latest message on the topic is published.

This causes problem for example when a node publishes a static transform on /tf_static, which then makes the static transforms published earlier by the robot state publisher unavailable.

To reproduce

  1. Start zenoh bridge on two hosts
  2. On the first host, start two nodes publishing once with transient local durability, such as ros2 run tf2_ros static_transform_publisher --frame-id map --child-frame-id a and ros2 run tf2_ros static_transform_publisher --frame-id map --child-frame-id b
  3. On the first host, observe two messages being received by ros2 topic echo /tf_static
  4. On the second host, observe only the message from the node which started last being received by ? ros2 topic echo /tf_static`

System info

Ubuntu 22.04 in docker on both hosts, using zenoh-bridge-ros2dds standalone executable version 1.0.2 installed in binary form from the private deb repository

@ottojo ottojo added the bug Something isn't working label Nov 12, 2024
@JEnoch
Copy link
Member

JEnoch commented Nov 26, 2024

I did try to reproduce your issue, but I well get 2 messages when running ros2 topic echo /tf_static on the second host.
Here are my exact commands (adding some debug logging for the bridges):

  • Host 1:
    • RUST_LOG=zenoh_plugin_ros2dds::route_publisher=debug zenoh-bridge-ros2dds
    • ros2 run tf2_ros static_transform_publisher --frame-id map --child-frame-id a
    • ros2 run tf2_ros static_transform_publisher --frame-id map --child-frame-id b
  • Host 2:
    • RUST_LOG=zenoh_plugin_ros2dds::route_subscriber=trace zenoh-bridge-ros2dds -e tcp/<host1_ip>:7447
    • ros2 topic echo /tf_static

For bridge on Host 1, I see such logs:

2024-11-26T15:19:37.781750Z DEBUG tokio-runtime-worker ThreadId(05) zenoh_plugin_ros2dds::route_publisher: Route Publisher (/tf_static -> tf_static): creation with type tf2_msgs/msg/TFMessage
2024-11-26T15:19:37.781912Z DEBUG tokio-runtime-worker ThreadId(05) zenoh_plugin_ros2dds::route_publisher: Route Publisher (/tf_static -> tf_static): caching TRANSIENT_LOCAL publications via a PublicationCache with history=10 (computed from Reader's QoS: history=(KEEP_LAST,1), durability_service.max_instances=-1)
2024-11-26T15:19:37.782256Z DEBUG tokio-runtime-worker ThreadId(05) zenoh_plugin_ros2dds::route_publisher: Route Publisher (/tf_static -> tf_static): congestion_ctrl Block, priority Data, express:false
2024-11-26T15:19:37.782734Z DEBUG tokio-runtime-worker ThreadId(05) zenoh_plugin_ros2dds::route_publisher: Route Publisher (ROS:/tf_static -> Zenoh:tf_static) now serving local nodes {"/static_transform_publisher_pJEDPKxne1WDnuER"}
...
2024-11-26T15:19:43.998629Z DEBUG tokio-runtime-worker ThreadId(09) zenoh_plugin_ros2dds::route_publisher: Route Publisher (ROS:/tf_static -> Zenoh:tf_static) now serving local nodes {"/static_transform_publisher_YEvYvsQEIl2rqKzM", "/static_transform_publisher_pJEDPKxne1WDnuER"}

Meaning the bridge discovered the 1st Publisher on /tf_static with QoS TRANSIENT_LOCAL and KEEP_LAST(1).

By design the bridge creates only 1 route per topic, with an associated PublicationCache for TRANSIENT_LOCAL support. When a remote bridge discovers a Subscriber, it will query historical publications from this cache.
By default the bridge dimensions the cache size to history_length * transient_local_cache_multiplier messages where transient_local_cache_multiplier is configurable and set to 10 by default.

Note: writing this I realized that the transient_local_cache_multiplier config was not documented... #342 fixes this.

The last line is the discovery of the 2nd Publisher for which the same route and PublicationCache is used.

For bridge on Host 2, I see such logs:

2024-11-26T15:13:00.376775Z DEBUG tokio-runtime-worker ThreadId(04) zenoh_plugin_ros2dds::route_subscriber: Route Subscriber (Zenoh:tf_static -> ROS:/tf_static) now serving local nodes {"/_ros2cli_37149"}
2024-11-26T15:13:00.376818Z DEBUG tokio-runtime-worker ThreadId(04) zenoh_plugin_ros2dds::route_subscriber: Route Subscriber (Zenoh:tf_static -> ROS:/tf_static) activate
2024-11-26T15:13:00.376845Z DEBUG tokio-runtime-worker ThreadId(04) zenoh_plugin_ros2dds::route_subscriber: Route Subscriber (Zenoh:tf_static -> ROS:/tf_static): query historical messages from everybody for TRANSIENT_LOCAL Reader on @/*/@ros2_pub_cache/tf_static
2024-11-26T15:13:00.379572Z TRACE                 rx-0 ThreadId(13) zenoh_plugin_ros2dds::route_subscriber: Route Subscriber (Zenoh:tf_static -> ROS:/tf_static): routing message - 92 bytes
2024-11-26T15:13:00.379628Z TRACE                 rx-0 ThreadId(13) zenoh_plugin_ros2dds::route_subscriber: Route Subscriber (Zenoh:tf_static -> ROS:/tf_static): routing message - 92 bytes

Meaning the bridge well gets 2 messages (92 bytes each) on topic /tf_static from the Host 2 bridge's cache and well route those 2 messages to the Subscriber (ros2 topic echo command).

Could you please check if you get the same logs and behaviour ?

Note that if your system on Host 1 has more that 10 Publishers on /tf_static topic, you need to increase this transient_local_cache_multiplier config value, thus all the publications fit in the PublicationCache.

@muellerbernd
Copy link

muellerbernd commented Nov 27, 2024

With two hosts everything works fine. But with 3 Hosts it's not working on my side as mentioned in #219. My usecase:

  • Host 1: robot (spot with a lot of sensors, publishing sensor data and robot_description (tf_static))
    • ros2 launch robot_package robot_launch.launch.py
    • zenoh_bridge_ros2dds
  • Host 2:
    • zenoh_bridge_ros2dds -e tcp/<host1_ip>:7447
    • ros2 topic echo /tf_static (works and shows tf tree)
  • Host 3:
    • zenoh_bridge_ros2dds -e tcp/<host1_ip>:7447
    • rqt. (does not show the tf tree, no tf data received)

All the hosts are connected via wifi.
Same output with this here as host 1:

  • ros2 run tf2_ros static_transform_publisher --frame-id map --child-frame-id a
  • zenoh_bridge_ros2dds

@muellerbernd
Copy link

By using rmw_zenoh this problem seems to be fixed for my usecase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants