Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MQTT queue use-after-free #3971

Closed
wants to merge 3 commits into from

Conversation

mskvortsov
Copy link
Contributor

There is an issue with the MQTT queue implementation. Enqueued ServiceEnvelope items contain fields pointing to objects that may be deallocated in another thread before the fields are accessed. This leads to the publishing of garbage.

One approach would be just encapsulating the fields (MeshPacket and both strings) into a new QueueItem structure by value. However, since ServiceEnvelope seems to be used only in the context of MQTT.cpp, changing field annotation at the protobuf level also seems reasonable.

Requires meshtastic/protobufs#503.

@mskvortsov
Copy link
Contributor Author

An example of the failure in the wild

==73485== Invalid read of size 4
==73485==    at 0x238AB4: MQTT::meshPacketToJson[abi:cxx11](_meshtastic_MeshPacket*) (MQTT.cpp:889)
==73485==    by 0x23561F: MQTT::publishQueuedMessages() (MQTT.cpp:464)
==73485==    by 0x23535F: MQTT::runOnce() (MQTT.cpp:420)
==73485==    by 0x1B4A57: concurrency::OSThread::run() (OSThread.cpp:85)
==73485==    by 0x274ECF: ThreadController::runOrDelay() (ThreadController.cpp:59)
==73485==    by 0x1E6547: loop (main.cpp:1014)
==73485==    by 0x293E47: main (main.cpp:209)
==73485==  Address 0x7560a78 is 280 bytes inside a block of size 312 free'd
==73485==    at 0x4887B40: free (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==73485==    by 0x20E79F: MemoryDynamic<_meshtastic_MeshPacket>::release(_meshtastic_MeshPacket*) (MemoryPool.h:64)
==73485==    by 0x209C03: RadioLibInterface::completeSending() (RadioLibInterface.cpp:334)
==73485==    by 0x209B7B: RadioLibInterface::handleTransmitInterrupt() (RadioLibInterface.cpp:319)
==73485==    by 0x20981B: RadioLibInterface::onNotify(unsigned int) (RadioLibInterface.cpp:228)
==73485==    by 0x1B4633: concurrency::NotifiedWorkerThread::checkNotification() (NotifiedWorkerThread.cpp:82)
==73485==    by 0x1B465B: concurrency::NotifiedWorkerThread::runOnce() (NotifiedWorkerThread.cpp:89)
==73485==    by 0x1B4A57: concurrency::OSThread::run() (OSThread.cpp:85)
==73485==    by 0x274ECF: ThreadController::runOrDelay() (ThreadController.cpp:59)
==73485==    by 0x1E6547: loop (main.cpp:1014)
==73485==    by 0x293E47: main (main.cpp:209)
==73485==  Block was alloc'd at
==73485==    at 0x48850C8: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==73485==    by 0x20E7C3: MemoryDynamic<_meshtastic_MeshPacket>::alloc(unsigned int) (MemoryPool.h:71)
==73485==    by 0x20A2BF: Allocator<_meshtastic_MeshPacket>::allocZeroed(unsigned int) (MemoryPool.h:28)
==73485==    by 0x20A257: Allocator<_meshtastic_MeshPacket>::allocZeroed() (MemoryPool.h:18)
==73485==    by 0x20D3FF: Router::allocForSending() (Router.cpp:116)
==73485==    by 0x21BC6F: SinglePortModule::allocDataPacket() (SinglePortModule.h:34)
==73485==    by 0x229FFB: ProtobufModule<_meshtastic_Telemetry>::allocDataProtobuf(_meshtastic_Telemetry const&) (ProtobufModule.h:45)
==73485==    by 0x229D6B: DeviceTelemetryModule::sendTelemetry(unsigned int, bool) (DeviceTelemetry.cpp:93)
==73485==    by 0x229A6F: DeviceTelemetryModule::runOnce() (DeviceTelemetry.cpp:24)
==73485==    by 0x1B4A57: concurrency::OSThread::run() (OSThread.cpp:85)
==73485==    by 0x274ECF: ThreadController::runOrDelay() (ThreadController.cpp:59)
==73485==    by 0x1E6547: loop (main.cpp:1014)

@thebentern
Copy link
Contributor

Couldn't we check if the pointers are valid instead of converting these field to value via the protobuf annotations?

@thebentern thebentern requested a review from caveman99 May 26, 2024 12:15
@mskvortsov
Copy link
Contributor Author

mskvortsov commented May 26, 2024

Reference counting for the MeshPacket objects in the base allocation pool could be an option. That’s a bit more invasive change though.

@mskvortsov
Copy link
Contributor Author

I've experimented with adding a hidden refCount field to meshtastic_MeshPacket (hidden in the sense of existing only at C++ code level) and actually freeing up the packet from packetPool only if the counter becomes 1. Unfortunately, this approach done right requires some additional support from Nanopb: at the moment there is no good way to extend message A to A* with an auxiliary field added so that other message B encapsulating A will also use this extended A*.

So I would stick with the solution that was originally proposed.

@github-actions github-actions bot added the Stale label Nov 20, 2024
@github-actions github-actions bot closed this Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants