-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Subscriber-API does not return a failure, but subscriber is not working #151
Comments
I don't have exactly that board but I will be testing with both a STM32F429ZI and STM32F767ZI, since I am not being able to replicate your issue. In the meantime, I am doing some tests in a Unix environment against a Zenoh router and it seems to be working correctly. Here is a snippet of the code: void data_handler(const z_sample_t *sample, void *ctx) {
(void)(ctx);
char *keystr = z_keyexpr_to_string(sample->keyexpr);
printf(">> [Subscriber] Received ('%s': '%.*s')\n", keystr, (int)sample->payload.len, sample->payload.start);
z_drop(z_move(keystr));
}
int main(int argc, char **argv) {
const char *keyexpr_sub = "demo/example/b";
const char *keyexpr_pub = "demo/example/a";
const char *value_2 = "pub-from-pico";
const char *mode = "client";
char *locator = NULL;
z_owned_config_t config = z_config_default();
zp_config_insert(z_loan(config), Z_CONFIG_MODE_KEY, z_string_make(mode));
if (locator != NULL) {
zp_config_insert(z_loan(config), Z_CONFIG_PEER_KEY, z_string_make(locator));
}
printf("Opening session...\n");
z_owned_session_t s = z_open(z_move(config));
if (!z_check(s)) {
printf("Unable to open session!\n");
return -1;
}
// Start read and lease tasks for zenoh-pico
if (zp_start_read_task(z_loan(s), NULL) < 0 || zp_start_lease_task(z_loan(s), NULL) < 0) {
printf("Unable to start read and lease tasks");
return -1;
}
z_owned_closure_sample_t callback = z_closure(data_handler);
printf("Declaring Subscriber on '%s'...\n", keyexpr_sub);
z_owned_subscriber_t sub = z_declare_subscriber(z_loan(s), z_keyexpr(keyexpr_sub), z_move(callback), NULL);
if (!z_check(sub)) {
printf("Unable to declare subscriber.\n");
return -1;
}
printf("Declaring publisher for '%s'...\n", keyexpr_pub);
z_owned_publisher_t pub = z_declare_publisher(z_loan(s), z_keyexpr(keyexpr_pub), NULL);
if (!z_check(pub)) {
printf("Unable to declare publisher for key expression!\n");
return -1;
}
char *buf = (char *)malloc(256);
for (int idx = 0; 1; ++idx) {
sleep(1);
snprintf(buf, 256, "[%4d] %s", idx, value_2);
printf("Putting Data ('%s': '%s')...\n", keyexpr_pub, buf);
z_publisher_put_options_t options = z_publisher_put_options_default();
options.encoding = z_encoding(Z_ENCODING_PREFIX_TEXT_PLAIN, NULL);
z_publisher_put(z_loan(pub), (const uint8_t *)buf, strlen(buf), &options);
}
printf("Enter 'q' to quit...\n");
char c = '\0';
while (c != 'q') {
fflush(stdin);
scanf("%c", &c);
}
z_undeclare_subscriber(z_move(sub));
// Stop read and lease tasks for zenoh-pico
zp_stop_read_task(z_loan(s));
zp_stop_lease_task(z_loan(s));
z_close(z_move(s));
return 0;
} Note that with Zenoh-Pico you need to explicitly launch the read and lease tasks (multi-thread) or to spin at your own pace (single-thread). I am assuming that you are doing it since you mention that subscribers sometimes they work. If you can share a snippet of your code and/or logs (both from your application using Zenoh-Pico and from the Zenoh-ROS2-Bridge), it would be a great help to try to identify the issue. |
I might have an idea of what is happening on your scenario. The current version of Zenoh protocol needs to be extended with additional capability negotiations during the session establishment to adapt the communication according to each other capabilities. Several improvements in this respect will come with an improved version of the protocol (expected to Q2 2023 according to the public roadmap). This will be especially critical to address the resource constrained capabilities of the microcontrollers. Until then, there are a couple of things you can do as a workaround to your issue.
/**
* Defaulf maximum batch size possible to be received.
*/
#ifndef Z_BATCH_SIZE_RX
#define Z_BATCH_SIZE_RX \
65535 // Warning: changing this value can break the communication
// with zenohd in the current protocol version.
// In the future, it will be possible to negotiate such value.
// Change it at your own risk.
#endif
/**
* Defaulf maximum batch size possible to be sent.
*/
#ifndef Z_BATCH_SIZE_TX
#define Z_BATCH_SIZE_TX 65535
#endif
/**
* Defaulf maximum size for fragmented messages.
*/
#ifndef Z_FRAG_MAX_SIZE
#define Z_FRAG_MAX_SIZE 300000
#endif
Let us know if these workarounds were able to solve your problem. |
Hi Carlos, I don't think it's linked to memory/heap. We were already struggling with mem and reduced the Z_BATCH_SIZE_RX and Z_BATCH_SIZE_TX to 1024 some time ago. We were also facing an issue with a missing free/z_free, but this is already fixed. We have the following situation:
After replacing the TX2 with another system (BB-AI-64) , we run only the Zenoh-agent on BB-AI-64 via zenoh-python:
When I use the BB-AI-64 as a subscriber of TX2-zenoh-DDS-bridge, it works perfect. So in other words:
We will check this week the TX2-side (check of zenoh-DDS-bridge-version, replace combined bridge by zenoh-agent and separate DDS-bridge). |
Thanks for the detailed description of your experiments. It gives us a bit more insights on what might be happening. We had a similar issue while using the Zenoh-DDS-Bridge with a Hussarion robot (https://husarion.com/), where the amount of resources being subscribed/published allied with the fact the Zenoh router was forwarding all the resource declarations was enough to exhaust the memory in our Zenoh-Pico nodes. There are two things we can try to do to debug it:
From here, we can see what actions to take. In the meantime, I am adding some extra memory checks to trigger an |
Hello @cguimaraes, thanks a lot for your remarks, especially the "forwarding all the resource declarations was enough to exhaust" We reduced Z_BATCH_SIZE_RX and Z_BATCH_SIZE_TX to 1024 bytes.. Is it possible that this buffers are exhausted during communication-initialization-phase? Because I give it a try and increased buffers Z_BATCH_SIZE_RX and Z_BATCH_SIZE_TX to 2048 bytes and now it seems to work.. I will test more in detail. In any case: your help was really useful. Thank you very much! |
You insights are also very useful and I will dig on this. And discuss with the team on what can be done in the short term aligned with the router implementation to mitigate this issue. Reducing the This being said, |
Thanks for your remarks. I made several additional tests with a lot of STM32H7-startups and the software works now stable. |
Good to hear that your setup is not stable. Regarding the FreeRTOS port, we have an open request here: #129 |
This will be addressed by when eclipse-zenoh/roadmap#2 . |
@p-avital can you have a look at this issue please? It was a pending a confirmation after the latest protocol merge |
Describe the bug
We have zenoh-pico on STM32H7 with Zephyr.
The Agent runs on a different system (Zenoh-ROS2-Bridge).
When the STM32H7-Board is reset (without resetting the agent!), the Zenoh-communication is established, following the examples (z_open, z_declare_subscriber, z_declare_publisher, ..).
All the Zenoh-API-calls do not return an error, but the following situations can occur:
A parallel started Zenoh-client on a Linux-system (Zenoh-python) works properly, so we assume, the failure is not coming from Zenoh-Agent.
To reproduce
System info
The text was updated successfully, but these errors were encountered: