-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continuous Serialization and Image Data Streaming #676
Comments
Hello @Jake-Carter,
Nice to hear that, I'm going to move this internally so we can be in touch.
Continuous serialization is a kind of advanced feature of the middleware that allows the user to control a multi-stage serialization, it imposes some restrictions to the user type. The main one is that you need to "remove" the buffer part of your type. This implies modifications on your std_msgs/Image type. I'm not sure if this is the straightforward way and probably you are looking for increasing your transport MTU and/or increasing of middleware stream history. How big is your payload?
I've just accepted you in the micro-ROS Slack, do not hesitate to contact me, open new issues or contribute via pull requests. |
Thanks @pablogs9, The ability to increase the transport MTU is very helpful. My current payload is a 160x120 RGB888 image (57600 bytes). My microcontroller only has 128KB of SRAM, so it's fairly constrained. FreeRTOS, the micro-ros library, and my setup code seem to take up about 62KB of SRAM, so I was able to increase the MTU size to 2048 before I started to run out of space. It does help with the speed by the expected factor of 4x over the default though. There are some tricks I can do with the CNN accelerator, so this will let me proceed in the short term. In general, are there any disadvantages to extremely large MTU sizes? I have a couple other questions as well, so I will reach out via Slack. Thanks for your support. |
Well, sending a payload that is almost 45% of your available memory is always kind of hard. In my experience sending single images over micro-ROS/XRCE is possible, but sending video will require some more resources. In any case, even using continuous serialization will force you to use best-effort streams, which implies that losing a single fragment will cause the whole frame to be lost. Before going into continuous serialization, do you have any possibilities of compressing into JPEG and sending it via CompressedImage? |
Thanks @pablogs9, there are a couple of challenges I found this week:
If continuous serialization could offer additional speed improvements I would be interested in exploring it. From what I've seen so far there are two sources of the slow speed: Delays between each MTUYour guidance on increasing the MTU size helped a lot with this, and I've achieved good results reducing this delay as much as my memory allows. I'm not sure where the overhead that's associated with this is coming from, but maybe as I get more familiar with the library I can test it further. It could also be related to my FreeRTOS port and implementation, or just unavoidable small delays from the complexity of the library. Delay gaps inside each MTUI'm seeing almost a 1ms delay inside each MTU, and this was more unexpected. It's happening because the library splits up the frame bytes and data bytes into two separate transport calls, but the time it takes between them is longer than I expected. This was one of the main challenges I had developing the custom transports since my UART FIFOs are very shallow (only 8 bytes). I ended up implementing a queue to extend my FIFO so I wouldn't miss bytes inside each MTU. For example, I captured the logic trace below on the RX side while I was developing my transport functions.
You can see its actively waiting for the frame data first. It gets enough bytes and returns (B). The micro-ROS library takes about 800uS to jump back into the transport read for the rest of the data (A). size_t vMXC_Serial_Read (
struct uxrCustomTransport* transport,
uint8_t* buffer,
size_t length,
int timeout,
uint8_t* error_code)
{
TickType_t elapsed = 0;
const TickType_t xMaxBlockTime = pdMS_TO_TICKS(timeout);
MXC_GPIO_OutSet(indicator.port, indicator.mask); // <-- A (transition low to high, we have entered the transport)
unsigned int num_received = 0;
while(num_received < length && elapsed < xMaxBlockTime) {
if (uxQueueMessagesWaiting(rx_queue) > 0) {
if(xQueueReceive(rx_queue, &buffer[num_received], 1)) {
num_received++;
}
}
elapsed++;
}
MXC_GPIO_OutClr(indicator.port, indicator.mask); // <-- B (transition high to low, we are exiting the transport)
return num_received;
} So, since there is ~1ms delay inside each MTU and I need thousands of MTUs to transmit the large image data, then I was hoping that the continuous serialization would give me the hooks I need to manually transmit my frame data. That way I could simultaneously eliminate the 1ms delay inside each MTU and the delay between MTUs. Sorry for the novel :) - just wanted to provide some more context into the challenges I've seen so far with the transmission speed and extremely large messages. |
How are you ensuring that this is an active wait when there are no messages in the queue: while(num_received < length && elapsed < xMaxBlockTime) {
if (uxQueueMessagesWaiting(rx_queue) > 0) {
if(xQueueReceive(rx_queue, &buffer[num_received], 1)) {
num_received++;
}
}
elapsed++;
} I mean, if uxQueueMessagesWaiting(...) == 0, this will loop for less than I also wonder why you are struggling with the reception of packets and serial read operations if your objective is to send an image. Could you clarify these two points? |
I have my DMA controller constantly unloading the RX FIFO behind the scenes. Every byte, it triggers an ISR to place the received byte in My full transport implementation can be found here.
I have everything working now, but this was something I struggled with a few weeks ago. I wanted to show the read side because the timing issues caused more critical failures to connect with the micro-ROS agent. The gap above can cause incoming bytes to be missed, whereas any delays on the TX side will just slow down communication. Also, I only saved a logic capture for the read side. Today we started a short Thanksgiving break so I will capture a trace during an image transmission as soon as I can next week. |
Hi @pablogs9, I have some updated captures that show the 2 types of delay more clearly. The trace can be opened with Saleae Logic. The zip also includes a adi_micro-ros_tx_image_captures.zip Delays between each Transport UnitHere is an image that highlights the delays between each image data packet (I hope "Transport Unit" is the right term here?). On average it's between 200-300ms per TU. When the red "Indicator" line is high, the code is inside my custom serial write function. Here is a closer look between two TUs. Delays inside each Transport UnitThis image shows the delay between the frame and data portions of the transport unit. It's actually worse than the 1ms delay I captured on the RX side, since it looks like the publisher is waiting on a response from the agent for the frame. The delay originally varied between 1-16ms. After I decreased my USB latency timer with Continuous Serialization?So - basically I would like to know if continuous serialization would let me bypass most of the frame/packeting requirements for the image data. Ideally I'd like to just send one frame, and then manually serialize the data as I receive it. Thanks for your support, |
Hello @Jake-Carter,
Continuous serialization will behave the same, because in this mode you provide the serialization data on-the-fly, but the transport layer and framing layer will be the same.
This detail led me to think that those delays are related to your underlying hardware, did you perform any test without micro-ROS? |
I see, thanks @pablogs9. Could you provide any guidance on the colcon options for building the micro-ros library with stream framing disabled? Is "microxrcedds_client": {
"cmake-args": [
// ...,
"-DUCLIENT_PROFILE_STREAM_FRAMING=OFF",
// ...
]
}, and rmw_uros_set_custom_transport(
// MICROROS_TRANSPORTS_FRAMING_MODE,
MICROROS_TRANSPORTS_PACKET_MODE, // <-- Use "packet" mode instead of framing when setting custom transports
(void *)&transport_config,
vMXC_Serial_Open,
vMXC_Serial_Close,
vMXC_Serial_Write,
vMXC_Serial_Read
); sufficient?
I see the same general ~1ms USB latency even without micro-ROS. We're going through an FTDI USB-serial converter, so I think that's unavoidable. However, the framing protocol itself introduces an additional ~1-2ms for each packet just waiting on the header response, and then the 200-300ms delay between each packet is definitely from the micro-ROS library |
You cannot go on top of a Serial port without framing, because the agent needs to "isolate" each XRCE packet. Nonframing mode is used for transports that ensures the "packetization", UDP is an example. I'm not sure about the implications of this, but maybe it would help increasing the buffer sizes of the framing module, check rb and wb here: I'm not sure if this will have implications on the behavior of the transport.
Is your application code available so I can take a look or try to replicate it in another board to check those delay values? |
Hi @pablogs9, hope you've been well and had a good start to the new year. I've been working on an internal beta release for micro-ROS integration into the MSDK, and have staged things on the dev/micro-ros of our repo. I've written an install.py script that installs ROS + micro-ROS and builds the micro-ROS Agent using the micro_ros_setup scripts. (Documentation here). Maybe it will be useful as a contribution back to the micro-ROS repos in the future. On this ticket - most of my troubles were coming from a lack of knowledge on the concepts. Especially the QoS models. "best effort" streams match much better for my applications. All the delays and jitter seems to come from the Linux side, so eliminating as many message frames as possible works great. For your reference my app code is available here and library support files here. ... However, I saw failures when I tried to publish an image with best effort and traced it to the stream implementation here. I notice In your experience, would it be possible to implement the same message fragmentation as the reliable streams here but without the extra XRCE frame headers/confirmations added? In my case I would be willing to accept any data loss in favor of the reduced transmission latency |
Hello @Jake-Carter, nice to hear about your progress. For sure we are interested in having this integrated in the micro-ROS repos. WRT your question: in XRCE, best-effort streams do not allow fragmentation, so if your payload is an image you need to use reliable streams or configure a big enough buffer so an image fits. |
Issue template
Hello - I'm an engineer with Analog Devices and I've been working on micro-ros support for our microcontrollers, starting with our embedded AI micros. I've completed the port and custom transports following the excellent tutorials, so first off thank you for the great documentation and project. I would eventually like to open a PR with official part support for our MSDK microcontrollers, and I'm building up a cool demo using an OpenManipulator-X running some custom object detection on our MAX78000.
Let me know if there's a better channel/repo to go through for questions. I couldn't get the Slack channel invite to work.
My current challenge is related to the topic of continuous serialization mentioned in the bottom part of this tutorials page.
I'm currently publishing a
sensor_msgs/image
image successfully, but the transmissions are very slow since the message is broken up into many packets. I would like some way to continuously stream the image data instead, but still comply with the expected message framing protocol. So...microcdr
andcontinuous_serialization
modules are somewhat limited. I'm confused about what theucdr_alignment
functions do, and also whether it's possible stream image data row by row from the serialization callback. In the given example does writing into the ucdr buffer push the data out into the transport layer?I also have some more general suggestions/questions related to some challenges I had in developing the custom transports, and would love to contribute back to the project in any way I can.
Thank you,
Jake
The text was updated successfully, but these errors were encountered: