Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash due to TRAILER is unexpectedly appended to the end of PAYLOAD buffer #196

Open
dmikushin opened this issue Nov 1, 2024 · 15 comments · May be fixed by #197
Open

Crash due to TRAILER is unexpectedly appended to the end of PAYLOAD buffer #196

dmikushin opened this issue Nov 1, 2024 · 15 comments · May be fixed by #197

Comments

@dmikushin
Copy link

dmikushin commented Nov 1, 2024

I'm testing Chinese machine vision camera with a stream example. Cameleon expects strictly one LEADER packet, followed by TRAILER packets. An unrecoverable issue is that Cameleon receives TRAILER packet instead of LEADER packet from this camera, which is not designed to happen.

What I know for sure is: the same camera has no issue streaming with Aravis.

I wish to collect more context, if you guide me how. I'm not a Rust programmer, and GDB debugger support for Rust is just awful: it's easier to hack into the memory with bare hands than to understand how to print out anything meaningful from the Rust variables...

@strohel
Copy link
Member

strohel commented Nov 1, 2024

I'm not a Rust programmer, and GDB debugger support for Rust is just awful: it's easier to hack into the memory with bare hands than to understand how to print out anything meaningful from the Rust variables...

Have you tried the rust-gdb wrapper? It should make debugging in GDB much easier. To get it it should be sufficient to install rust toolchain using rustup.

@dmikushin
Copy link
Author

dmikushin commented Nov 1, 2024

Oh yes, I use it, and I'm quite unhappy, because it still does not support containers evaluation. The things is most of the Rust data items are containers, like C++ STL. For example, I may often wish to print a C++ std::vector, but it cannot be printed directly, it's not an array. GDB has special treatment of std::vector to print it, though. That's what we still badly miss from rust-gdb, and what ruins the debugging experience.

@dmikushin
Copy link
Author

dmikushin commented Nov 3, 2024

Thread 3 "stream" hit Breakpoint 2, cameleon_device::u3v::protocol::stream::Trailer::parse_prefix (cursor=0x7ffff71fd188)
    at device/src/u3v/protocol/stream.rs:451
451	            Err(Error::InvalidPacket("invalid prefix magic".into()))

@dmikushin
Copy link
Author

dmikushin commented Nov 3, 2024

(gdb) bt
#0  cameleon_device::u3v::protocol::stream::Trailer::parse_prefix (cursor=0x7ffff71fd188) at device/src/u3v/protocol/stream.rs:451
#1  0x000055555563095d in cameleon_device::u3v::protocol::stream::Trailer::parse<alloc::vec::Vec<u8, alloc::alloc::Global>> (buf=0x7ffff71fd7d8)
    at device/src/u3v/protocol/stream.rs:397
#2  0x00005555556084b2 in cameleon::u3v::stream_handle::StreamingLoop::run (self=...) at cameleon/src/u3v/stream_handle.rs:261
#3  0x0000555555610d6c in cameleon::u3v::stream_handle::{impl#1}::start_streaming_loop::{closure#1} () at cameleon/src/u3v/stream_handle.rs:113
#4  0x000055555565ab76 in std::sys::backtrace::__rust_begin_short_backtrace<cameleon::u3v::stream_handle::{impl#1}::start_streaming_loop::{closure_env#1}, ()>
    (f=...) at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/sys/backtrace.rs:154
#5  0x0000555555634c1d in std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure#0}<cameleon::u3v::stream_handle::{impl#1}::start_streaming_loop::{closure_env#1}, ()> () at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/thread/mod.rs:522
#6  0x000055555566a0c4 in core::panic::unwind_safe::{impl#23}::call_once<(), std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<cameleon::u3v::stream_handle::{impl#1}::start_streaming_loop::{closure_env#1}, ()>> (self=...)
    at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/panic/unwind_safe.rs:272
#7  0x000055555565af54 in std::panicking::try::do_call<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<cameleon::u3v::stream_handle::{impl#1}::start_streaming_loop::{closure_env#1}, ()>>, ()> (data=0x7ffff71ffc30)
    at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:554
#8  0x0000555555634ccb in __rust_try ()
#9  0x0000555555634819 in std::panicking::try<(), core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<cameleon::u3v::stream_handle::{impl#1}::start_streaming_loop::{closure_env#1}, ()>>> (f=...)
    at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:518
#10 std::panic::catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<cameleon::u3v::stream_handle::{impl#1}::start_streaming_loop::{closure_env#1}, ()>>, ()> (f=...)
    at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panic.rs:345
#11 std::thread::{impl#0}::spawn_unchecked_::{closure#1}<cameleon::u3v::stream_handle::{impl#1}::start_streaming_loop::{closure_env#1}, ()> ()
    at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/thread/mod.rs:521
#12 0x0000555555637aef in core::ops::function::FnOnce::call_once<std::thread::{impl#0}::spawn_unchecked_::{closure_env#1}<cameleon::u3v::stream_handle::{impl#1}::start_streaming_loop::{closure_env#1}, ()>, ()> () at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ops/function.rs:250
#13 0x000055555577298b in alloc::boxed::{impl#48}::call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> ()
    at alloc/src/boxed.rs:2231
#14 alloc::boxed::{impl#48}::call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global> ()
    at alloc/src/boxed.rs:2231
#15 std::sys::pal::unix::thread::{impl#2}::new::thread_start () at std/src/sys/pal/unix/thread.rs:105
#16 0x00007ffff7d47144 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#17 0x00007ffff7dc77dc in ?? () from /lib/x86_64-linux-gnu/libc.so.6

@dmikushin
Copy link
Author

This comment is wrong btw, should be "Parse bytes as Trailer":

impl<'a> Trailer<'a> {
    const TRAILER_MAGIC: u32 = 0x5456_3355;

    /// Parse bytes as Leader.

@dmikushin
Copy link
Author

dmikushin commented Nov 3, 2024

But the main problem is that the leader magic is given instead of the expected trailer magic:

(gdb) li
446	    fn parse_prefix(cursor: &mut Cursor<&[u8]>) -> Result<()> {
447	        let magic: u32 = cursor.read_bytes_le()?;
448	        if magic == Self::TRAILER_MAGIC {
449	            Ok(())
450	        } else {
451	            Err(Error::InvalidPacket("invalid prefix magic".into()))
452	        }
453	    }
454	}
455	
(gdb) p/x magic
$1 = 0x4c563355

Btw, here is an example of how rust-gdb handles Rust-aware debugging:

(gdb) p/x Self::TRAILER_MAGIC
No symbol 'Self::TRAILER_MAGIC' in current context
(gdb) p/x Self::LEADER_MAGIC
No symbol 'Self::LEADER_MAGIC' in current context

So we have to go to the source code and see that the p/x magic above indeed prints the leader magic:

    const TRAILER_MAGIC: u32 = 0x5456_3355;
    const LEADER_MAGIC: u32 = 0x4C56_3355;

@dmikushin
Copy link
Author

The panic happens because "trailer must follow the leader" logic is hardcoded in Cameleon streaming:

(gdb) li 252
247	            // We received the data from the bulk transfers, try to parse stuff now.
248	            let leader = match u3v_stream::Leader::parse(&leader_buf)
249	                .map_err(|e| StreamError::InvalidPayload(format!("{}", e).into()))
250	            {
251	                Ok(leader) => leader,
252	                Err(err) => {
253	                    warn!(?err);
254	                    // Reuse `payload_buf`.
255	                    payload_buf_opt = Some(payload_buf);
256	                    self.sender.try_send(Err(err)).ok();
(gdb) 
257	                    continue;
258	                }
259	            };
260	
261	            let trailer = match u3v_stream::Trailer::parse(&trailer_buf)
262	                .map_err(|e| StreamError::InvalidPayload(format!("invalid trailer: {}", e).into()))
263	            {
264	                Ok(trailer) => trailer,
265	                Err(err) => {
266	                    warn!(?err);
(gdb) 
267	                    // Reuse `payload_buf`.
268	                    payload_buf_opt = Some(payload_buf);
269	                    self.sender.try_send(Err(err)).ok();
270	                    continue;
271	                }
272	            };

See: the only expected packet after LEADER is TRAILER.

@dmikushin
Copy link
Author

I have no clue why it is incorrect, so I take Aravis as a reference. And it appears that in Aravis camera streaming loop LEADER is followed by DATA, not by TRAILER:

Looking for the first available camera
vendor name           = Machine Vision
model name            = MV-A5031MU815
device id             = (null)
image width           = 640
image height          = 480
horizontal binning    = 0
vertical binning      = 0
payload               = 307200 bytes
exposure              = 1000 µs
gain                  = 0 dB
ARV_UVSP_PACKET_TYPE_LEADER
ARV_UVSP_PACKET_TYPE_DATA
ARV_UVSP_PACKET_TYPE_TRAILER
ARV_UVSP_PACKET_TYPE_LEADER
ARV_UVSP_PACKET_TYPE_DATA
ARV_UVSP_PACKET_TYPE_TRAILER

@dmikushin
Copy link
Author

Cameleon actually has the same data reading sequence as Aravis:

            if let Err(err) = read_leader(&mut async_pool, &self.params, &mut leader_buf) {
                // Report and send error if the error is fatal.
                if matches!(err, StreamError::Io(..) | StreamError::Disconnected) {
                    error!(?err);
                    self.sender.try_send(Err(err)).ok();
                }
                payload_buf_opt = Some(payload_buf);
                continue;
            };

            if let Err(err) = read_payload(&mut async_pool, &self.params, &mut payload_buf) {
                warn!(?err);
                // Reuse `payload_buf`.
                payload_buf_opt = Some(payload_buf);
                self.sender.try_send(Err(err)).ok();
                continue;
            };

            if let Err(err) = read_trailer(&mut async_pool, &self.params, &mut trailer_buf) {
                warn!(?err);
                // Reuse `payload_buf`.
                payload_buf_opt = Some(payload_buf);
                self.sender.try_send(Err(err)).ok();
                continue;
            };

This exactly corresponds to this sequence in Aravis:

ARV_UVSP_PACKET_TYPE_LEADER
ARV_UVSP_PACKET_TYPE_DATA
ARV_UVSP_PACKET_TYPE_TRAILER

So my guess is that read_payload() does not properly stop reading at the end of the DATA packet, and takes over the TRAILER as well. This way the next packet start, which is supposed to be TRAILER becomes LEADER.

@dmikushin
Copy link
Author

dmikushin commented Nov 3, 2024

Actually, it's even more wrong than my guess:

  • leader_buf starts with (85, 51, 86, 84), which is TRAILER_MAGIC: u32 = 0x5456_3355
  • payload_buf starts with (85, 51, 86, 76), which is LEADER_MAGIC: u32 = 0x4C56_3355
  • trailer_buf starts with (85, 51, 86, 84), which is TRAILER_MAGIC: u32 = 0x5456_3355

It's all messed up, see the GDB below:

Thread 3 "stream" hit Breakpoint 2, cameleon::u3v::stream_handle::StreamingLoop::run (self=...) at cameleon/src/u3v/stream_handle.rs:245
245	            let payload_len = payload_len - last_buf_len.unwrap();
(gdb) p leader_buf
$4 = Vec(size=52) = {85, 51, 86, 84, 0, 0, 32, 0, 145, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 176, 4, 0, 0, 0, 0, 0, 224, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
(gdb) p payload_buf
$5 = Vec(size=614400) = {85, 51, 86, 76, 0, 0, 52, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 136, 102, 114, 231, 19, 3, 0, 0, 1, 0, 8, 1, 128, 2, 0, 0, 224, 1, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0...}
(gdb) p trailer_buf
$6 = Vec(size=52) = {85, 51, 86, 84, 0, 0, 32, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 176, 4, 0, 0, 0, 0, 0, 224, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
(gdb) p 0x4C
$7 = 76
(gdb) p 0x56
$8 = 86
(gdb) p 0x33
$10 = 51
(gdb) p 0x55
$11 = 85

@dmikushin
Copy link
Author

dmikushin commented Nov 3, 2024

It seems that I'm able to fix this issue with the following 2 changes:

  1. I revert the patch, which puts leader, payload and trailer transfers into the same async buffer:
commit a371076bdbdffa25a654d320fe5e1bbf958b2637
Author: Brian Schwind <[email protected]>
Date:   Thu May 11 22:49:45 2023 +0900

    Use async transfers for the leader and trailer buffers

@bschwind , in your a371076 I think you are trying to make the USB bulk transfers as async as possible, for good. That seems to work, but there is one side effect: the bulk transfers may come out shuffled: LEADER buffer is not necessary the first in the sequence of captured buffers, and so on. Perhaps, you were lucky that shuffling did not happen on your system, and the LEADER-DATA-TRAILER order is preserved in the async pool. I propose to revert your patch an re-apply it again with a proper identification of the buffer kind after async polling is finished. What do you think?

  1. In read_payload() I disable payload_final2_size, which exceeds the image size and provokes a failure, because params.payload_final1_size is already the size of the image:
-    if params.payload_final2_size != 0 {
-        async_pool.submit(&mut buf[cursor..cursor + params.payload_final2_size])?;
+    //if params.payload_final2_size != 0 {
+    //    async_pool.submit(&mut buf[cursor..cursor + params.payload_final2_size])?;
+    //}

I have no idea what are those payload_final2_size1 and payload_final2_size2 are. But it seems that if we set the payload size to a larger value than the frames size, the trailer packet is also appended to the payload transfer, and we have no way to determine how many bytes of data belong to the image frame and how many belong to the trailer package. But setting max transfer size to frame size (which is effectively the value of payload_final2_size1) makes the trailer to be received in another separate bulk transfer.

@dmikushin dmikushin changed the title Camera streams two consecutive LEADER packets, Cameleon panics LEADER, PAYLOAD and TRAILER packets are unexpectedly shuffled in the async pool of bulk transfers Nov 3, 2024
dmikushin added a commit to dmikushin/cameleon that referenced this issue Nov 3, 2024
    commit a371076
    Author: Brian Schwind <[email protected]>
    Date:   Thu May 11 22:49:45 2023 +0900

        Use async transfers for the leader and trailer buffers

Performing leader, payload and trailer bulk transfers in one common async pool loop
may change the order of buffers. So instead of leader-payload-trailer we may get
trailer-leader-payload, which is currently not handled by the buffer parsing code.
This patch could be re-applied later, acompanied by a proper order-independent handling.
dmikushin added a commit to dmikushin/cameleon that referenced this issue Nov 3, 2024
    commit a371076
    Author: Brian Schwind <[email protected]>
    Date:   Thu May 11 22:49:45 2023 +0900

        Use async transfers for the leader and trailer buffers

Performing leader, payload and trailer bulk transfers in one common async pool loop
may change the order of buffers. So instead of leader-payload-trailer we may get
trailer-leader-payload, which is currently not handled by the buffer parsing code.
This patch could be re-applied later, acompanied by a proper order-independent handling.
@dmikushin
Copy link
Author

Correction: the first point above about async bulk transfers has no negative effect. Only the second point about the payload size is enough to fix this issue!

@dmikushin dmikushin changed the title LEADER, PAYLOAD and TRAILER packets are unexpectedly shuffled in the async pool of bulk transfers Crash due to TRAILER is unexpectedly appended to the end of PAYLOAD buffer Nov 3, 2024
@dmikushin
Copy link
Author

I've drafted #197 to discuss a possible fix with you.

@bschwind
Copy link
Contributor

bschwind commented Nov 5, 2024

I'll give this a look - it's definitely important to have the bulk transfers pipelined as much as possible to attain the throughput you need for higher frame rates. I'll have to double check the current code and its possibilities for shuffling or unnecessary transfers. It's also possible one of our cameras isn't following the standard, but yeah give me some time to read up again on these things.

@dmikushin
Copy link
Author

Thanks, @bschwind ! You can safely ignore the whole investigation here and proceed to #197, which is the only issue to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants