You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue appears as a Program received signal SIGBUS, Bus error. in gdb or a return code of -7 from the gtest's that use memory tools, but only in our Linux-aarch64 CI jobs, e.g.:
-- run_test.py: extra environment variables:
- RMW_IMPLEMENTATION=rmw_fastrtps_cpp
- LD_PRELOAD=/home/rosbuild/ci_scripts/ws/install/osrf_testing_tools_cpp/lib/libmemory_tools_interpose.so
-- run_test.py: extra environment variables to append:
- LD_LIBRARY_PATH+=/home/rosbuild/ci_scripts/ws/build/rcl
-- run_test.py: invoking following command in '/home/rosbuild/ci_scripts/ws/src/ros2/rcl/rcl':
- /home/rosbuild/ci_scripts/ws/build/rcl/test/test_time__rmw_fastrtps_cpp --gtest_output=xml:/home/rosbuild/ci_scripts/ws/build/rcl/test_results/rcl/test_time__rmw_fastrtps_cpp.gtest.xml
Running main() from gtest_main.cc
[==========] Running 8 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 2 tests from TestTimeFixture__rmw_fastrtps_cpp
[ RUN ] TestTimeFixture__rmw_fastrtps_cpp.test_rcl_ros_time_set_override
[ OK ] TestTimeFixture__rmw_fastrtps_cpp.test_rcl_ros_time_set_override (0 ms)
[ RUN ] TestTimeFixture__rmw_fastrtps_cpp.test_rcl_init_for_clock_and_point
[ OK ] TestTimeFixture__rmw_fastrtps_cpp.test_rcl_init_for_clock_and_point (1 ms)
[----------] 2 tests from TestTimeFixture__rmw_fastrtps_cpp (1 ms total)
[----------] 6 tests from rcl_time__rmw_fastrtps_cpp
[ RUN ] rcl_time__rmw_fastrtps_cpp.clock_validation
[ OK ] rcl_time__rmw_fastrtps_cpp.clock_validation (0 ms)
[ RUN ] rcl_time__rmw_fastrtps_cpp.default_clock_instanciation
[ OK ] rcl_time__rmw_fastrtps_cpp.default_clock_instanciation (0 ms)
[ RUN ] rcl_time__rmw_fastrtps_cpp.specific_clock_instantiation
[ OK ] rcl_time__rmw_fastrtps_cpp.specific_clock_instantiation (0 ms)
[ RUN ] rcl_time__rmw_fastrtps_cpp.rcl_time_difference
[ OK ] rcl_time__rmw_fastrtps_cpp.rcl_time_difference (0 ms)
[ RUN ] rcl_time__rmw_fastrtps_cpp.rcl_time_difference_signed
[ OK ] rcl_time__rmw_fastrtps_cpp.rcl_time_difference_signed (0 ms)
[ RUN ] rcl_time__rmw_fastrtps_cpp.rcl_time_update_callbacks
[ OK ] rcl_time__rmw_fastrtps_cpp.rcl_time_update_callbacks (0 ms)
[----------] 6 tests from rcl_time__rmw_fastrtps_cpp (0 ms total)
[----------] Global test environment tear-down
[==========] 8 tests from 2 test cases ran. (1 ms total)
[ PASSED ] 8 tests.
-- run_test.py: return code -7
-- run_test.py: inject classname prefix into gtest result file '/home/rosbuild/ci_scripts/ws/build/rcl/test_results/rcl/test_time__rmw_fastrtps_cpp.gtest.xml'
-- run_test.py: verify result file '/home/rosbuild/ci_scripts/ws/build/rcl/test_results/rcl/test_time__rmw_fastrtps_cpp.gtest.xml'
I've given up trying to fix this for now. I can confirm that the actual checks are still working in gtest, it's just an error that occurs during shutdown and I have no reason to believe it means that there's a memory problem on ARM, so I think checking on Linux-x86_64 and macOS-x86_64 is good enough for now.
This is what I've been able to debug so far. I added this diff and then took the output from gdb:
diff --git a/osrf_testing_tools_cpp/src/memory_tools/impl/linux.cpp b/osrf_testing_tools_cpp/src/memory_tools/impl/linux.cpp
index 9056ff6..ef6a552 100644
--- a/osrf_testing_tools_cpp/src/memory_tools/impl/linux.cpp+++ b/osrf_testing_tools_cpp/src/memory_tools/impl/linux.cpp@@ -48,9 +48,26 @@ using osrf_testing_tools_cpp::memory_tools::impl::StaticAllocator;
using StaticAllocatorT = StaticAllocator<8388608>;
// used to fullfil calloc call from dlerror.c during initialization of original functions
// constructor is called on first use with a placement-new and the static storage
-static uint8_t g_static_allocator_storage[sizeof(StaticAllocatorT)];+static uint8_t g_static_allocator_storage[sizeof(StaticAllocatorT)] = {0};
static StaticAllocatorT * g_static_allocator = nullptr;
+static bool g_static_allocator_destroyed = false;+class StaticAllocatorInitializer+{+public:+ StaticAllocatorInitializer()+ {+ SAFE_FWRITE(stderr, "in StaticAllocatorInitializer()\n");+ g_static_allocator_destroyed = false;+ }+ ~StaticAllocatorInitializer()+ {+ SAFE_FWRITE(stderr, "in ~StaticAllocatorInitializer()\n");+ g_static_allocator_destroyed = true;+ }+};+static StaticAllocatorInitializer g_static_allocator_initializer;+
// storage for original malloc/realloc/calloc/free
using MallocSignature = void * (*)(size_t);
static MallocSignature g_original_malloc = nullptr;
@@ -73,12 +90,24 @@ static void __linux_memory_tools_init(void)
get_static_initialization_complete() = true;
}
+// on shared library unload, go back to uninitialized behavior, passing through to static allocator.+static void __linux_memory_tools_fini(void) __attribute__((destructor));+static void __linux_memory_tools_fini(void)+{+ SAFE_FWRITE(stderr, "in __linux_memory_tools_fini()\n");+ get_static_initialization_complete() = false;+}+
extern "C"
{
void *
malloc(size_t size) noexcept
{
+ if (g_static_allocator_destroyed) {+ SAFE_FWRITE(stderr, "in malloc(): g_static_allocator_destroyed is true\n");+ return g_original_malloc(size);+ }
if (!get_static_initialization_complete()) {
if (nullptr == g_static_allocator) {
// placement-new the static allocator
@@ -93,6 +122,10 @@ malloc(size_t size) noexcept
void *
realloc(void * pointer, size_t size) noexcept
{
+ if (g_static_allocator_destroyed) {+ SAFE_FWRITE(stderr, "in realloc(): g_static_allocator_destroyed is true\n");+ return g_original_realloc(pointer, size);+ }
if (!get_static_initialization_complete()) {
if (nullptr == g_static_allocator) {
// placement-new the static allocator
@@ -107,6 +140,10 @@ realloc(void * pointer, size_t size) noexcept
void *
calloc(size_t count, size_t size) noexcept
{
+ if (g_static_allocator_destroyed) {+ SAFE_FWRITE(stderr, "in calloc(): g_static_allocator_destroyed is true\n");+ return g_original_calloc(count, size);+ }
if (!get_static_initialization_complete()) {
if (nullptr == g_static_allocator) {
// placement-new the static allocator
@@ -121,6 +158,10 @@ calloc(size_t count, size_t size) noexcept
void
free(void * pointer) noexcept
{
+ if (g_static_allocator_destroyed) {+ SAFE_FWRITE(stderr, "in free(): g_static_allocator_destroyed is true\n");+ return free(pointer);+ }
if (nullptr == pointer || g_static_allocator->deallocate(pointer)) {
// free of nullptr or,
// memory was originally allocated by static allocator, no need to pass to "real" free
A useful thing to note here is that the interpose library (the one being loaded with LD_PRELOAD has it's shared library fini called before some of the other libraries like the rmw fastrtps library (this information comes from LD_DEBUG=libs being set).
My next things to check would be what the unloading order of the shared libraries are on Linux x86_64 and macOS.
Other ideas/notes:
The message from gdb is concerning: Backtrace stopped: previous frame identical to this frame (corrupt stack?)
It doesn't help me because I don't have a header that gets included in each TU, but it's interesting to know that's how std::cout is probably implemented.
This issue appears as a
Program received signal SIGBUS, Bus error.
ingdb
or a return code of-7
from the gtest's that use memory tools, but only in our Linux-aarch64 CI jobs, e.g.:-- from https://ci.ros2.org/job/ci_linux-aarch64/1444/testReport/junit/(root)/projectroot/test_time__rmw_fastrtps_cpp/
I've given up trying to fix this for now. I can confirm that the actual checks are still working in gtest, it's just an error that occurs during shutdown and I have no reason to believe it means that there's a memory problem on ARM, so I think checking on Linux-x86_64 and macOS-x86_64 is good enough for now.
This is what I've been able to debug so far. I added this diff and then took the output from
gdb
:A useful thing to note here is that the interpose library (the one being loaded with
LD_PRELOAD
has it's shared libraryfini
called before some of the other libraries like the rmw fastrtps library (this information comes fromLD_DEBUG=libs
being set).My next things to check would be what the unloading order of the shared libraries are on Linux x86_64 and macOS.
Other ideas/notes:
gdb
is concerning:Backtrace stopped: previous frame identical to this frame (corrupt stack?)
libc6-2.23
: https://github.com/lattera/glibc/blob/a2f34833b1042d5d8eeb263b4cf4caaea138c4ad/stdlib/cxa_finalize.c#L48.bss
versus.data
compared tox86_64
?The text was updated successfully, but these errors were encountered: