Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for --default-name and --default-output in create command #3

Open
wants to merge 26 commits into
base: template_v0
Choose a base branch
from

Conversation

jgalar
Copy link

@jgalar jgalar commented Jun 15, 2016

No description provided.

jgalar and others added 26 commits June 14, 2016 18:37
There is no need for 4 bytes to express a boolean state.

Signed-off-by: Jérémie Galarneau <[email protected]>
validate_command: checks for invalid options configuration
create_session_basic: responsible for session creation and configuration
generate_output: responsible of outputting the info to the user

Signed-off-by: Jonathan Rajotte <[email protected]>
PSRCode pushed a commit that referenced this pull request May 10, 2017
Found by Coverity:

CID 1243017 (#1 of 4): Buffer not null terminated
(BUFFER_SIZE_WARNING)14. buffer_size_warning: Calling strncpy with a
maximum size argument of 264 bytes on destination array msg.channel_name
of size 264 bytes might leave the destination string unterminated.

ID 1243017 (#2 of 4): Buffer not null terminated
(BUFFER_SIZE_WARNING)14. buffer_size_warning: Calling strncpy with a
maximum size argument of 264 bytes on destination array
msg_2_2.channel_name of size 264 bytes might leave the destination
string unterminated.

CID 1243017 (#3 of 4): Buffer not null terminated
(BUFFER_SIZE_WARNING)15. buffer_size_warning: Calling strncpy with a
maximum size argument of 4096 bytes on destination array msg.pathname of
size 4096 bytes might leave the destination string unterminated.

CID 1243017 (#4 of 4): Buffer not null terminated
(BUFFER_SIZE_WARNING)15. buffer_size_warning: Calling strncpy with a
maximum size argument of 4096 bytes on destination array
msg_2_2.pathname of size 4096 bytes might leave the destination string
unterminated.

Signed-off-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Jérémie Galarneau <[email protected]>
PSRCode pushed a commit that referenced this pull request Oct 16, 2019
Background
==========
I have a file named "/a" on my file system (don't ask why).

Issue
=====
While running the `test_utils_expand_path` test case on my machine, I
get a Segfault. Here is the gdb backtrace:

  #0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:62
  #1  0x0000555555559eb7 in expand_double_slashes_dot_and_dotdot (path=0x0) at utils.c:223
  #2  0x000055555555a2d8 in _utils_expand_path (path=0x55555556b250 "/a/b/c/d/e", keep_symlink=true) at utils.c:384
  #3  0x000055555555a408 in utils_expand_path (path=0x55555556b250 "/a/b/c/d/e") at utils.c:423
  #4  0x000055555555859e in test_utils_expand_path () at test_utils_expand_path.c:291
  lttng#5  0x00005555555589b0 in main (argc=1, argv=0x7fffffffe5e8) at test_utils_expand_path.c:352

I get this backtrace because the function `utils_partial_realpath()`
returns NULL when it tries to expand the "/a/b/c/d/e" path and realize
that it could not exist since "/a" is a file and not a directory.

Anyways, the returned NULL pointer is ignored and directly used in the
`expand_double_slashes_dot_and_dotdot()` function right after.

This configuration ("/a" being a file) is expected to fail but not to segfault.
It could be reproduce in a real scenario when creating directory
structures.

Solution
========
Return an error if `utils_partial_realpath()` returns NULL.

Signed-off-by: Francis Deslauriers <[email protected]>
Signed-off-by: Jérémie Galarneau <[email protected]>
PSRCode pushed a commit that referenced this pull request Jan 13, 2022
Observed issue
==============

While running tests/regression/tools/clear/test_ust test in a loop we
eventually witness the following error:

The symptom on the Babeltrace side is a Connection reset by peer. This
is caused by a relayd abort after an assertion failure due to a
reference count being lower than 0.

Test case:

  # Test ust streaming live clear with viewer
  # Parameters: tracing_active=0, clear_twice=0, buffer_type=pid

Tools commit: https://review.lttng.org/c/lttng-tools/+/6754/1

Relay daemon assertion failure stack trace:

  (gdb) bt
  #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
  #1  0x00007ffa2e850859 in __GI_abort () at abort.c:79
  #2  0x00007ffa2e850729 in __assert_fail_base (fmt=0x7ffa2e9e6588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x56221bdb96f4 "res >= 0",
      file=0x56221bdb97a0 "/root/virtenv/usr/include/urcu/ref.h", line=66, function=<optimized out>) at assert.c:92
  #3  0x00007ffa2e861f36 in __GI___assert_fail (assertion=assertion@entry=0x56221bdb96f4 "res >= 0", file=file@entry=0x56221bdb97a0 "/root/virtenv/usr/include/urcu/ref.h", line=line@entry=66,
      function=function@entry=0x56221bdbd638 <__PRETTY_FUNCTION__.7554> "urcu_ref_put") at assert.c:101
  #4  0x000056221bd6a1cc in urcu_ref_put (release=<optimized out>, ref=0x7ffa24008cb0) at /root/virtenv/usr/include/urcu/ref.h:66
  lttng#5  viewer_stream_put (vstream=vstream@entry=0x7ffa24008cb0) at viewer-stream.c:279
  lttng#6  0x000056221bd5e4c5 in viewer_get_metadata (conn=conn@entry=0x7ffa0c000fc0) at live.c:2211
  lttng#7  0x000056221bd63778 in process_control (conn=0x7ffa0c000fc0, recv_hdr=0x7ffa297c5af0) at live.c:2376
  lttng#8  thread_worker (data=<optimized out>) at live.c:2541
  lttng#9  0x00007ffa2ea28609 in start_thread (arg=<optimized out>) at pthread_create.c:477
  lttng#10 0x00007ffa2e94d293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Babeltrace crash:

11-17 05:13:05.197 47007 47007 D PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER [email protected]:1494 [live] Requesting next index for stream: cmd=GET_NEXT_INDEX, viewer-stream-id=3352
11-17 05:13:05.215 47007 47007 E PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER [email protected]:245 [live] Error receiving from Relay: Connection reset by peer.
11-17 05:13:05.215 47007 47007 E PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER [email protected]:1522 [live] Error receiving get next index reply
11-17 05:13:05.215 47007 47007 D PLUGIN/SRC.CTF.LTTNG-LIVE [email protected]:1069 [live] Returning from advancing live stream iterator: status=LTTNG_LIVE_ITERATOR_STATUS_ERROR, stream-name="stream-3352", viewer-stream-id=3352
11-17 05:13:05.215 47007 47007 E PLUGIN/SRC.CTF.LTTNG-LIVE [email protected]:1802 [live] Error preparing the next batch of messages: live-iter-status=LTTNG_LIVE_ITERATOR_STATUS_ERROR
11-17 05:13:05.216 47007 47007 W LIB/MSG-ITER [email protected]:866 Component input port message iterator's "next" method failed: iter-addr=0x563b7dd07700, iter-upstream-comp-name="live", iter-upstream-comp-log-level=TRACE, iter-upstream-comp-class-type=SOURCE, iter-upstream-comp-class-name="lttng-live", iter-upstream-comp-class-partial-descr="Connect to an LTTng relay daemon", iter-upstream-port-type=OUT
PUT, iter-upstream-port-name="out", status=ERROR
11-17 05:13:05.216 47007 47007 E PLUGIN/FLT.UTILS.MUXER [email protected]:446 [mux] Upstream iterator's next method returned an error: status=ERROR
11-17 05:13:05.216 47007 47007 E PLUGIN/FLT.UTILS.MUXER [email protected]:989 [mux] Cannot validate muxer's upstream message iterator wrapper: muxer-msg-iter-addr=0x563b7dd04740, muxer-upstream-msg-iter-wrap-addr=0x563b7dd0db30
11-17 05:13:05.216 47007 47007 E PLUGIN/FLT.UTILS.MUXER [email protected]:1417 [mux] Cannot get next message: comp-addr=0x563b7dd1a760, muxer-comp-addr=0x563b7dd1b930, muxer-msg-iter-addr=0x563b7dd04740, msg-iter-addr=0x563b7dd07590, status=ERROR
11-17 05:13:05.216 47007 47007 W LIB/MSG-ITER [email protected]:866 Component input port message iterator's "next" method failed: iter-addr=0x563b7dd07590, iter-upstream-comp-name="mux", iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=FILTER, iter-upstream-comp-class-name="muxer", iter-upstream-comp-class-partial-descr="Sort messages from multiple inpu", iter-upstream-port-type=OUTPUT,
 iter-upstream-port-name="out", status=ERROR
11-17 05:13:05.216 47007 47007 W LIB/GRAPH [email protected]:462 Component's "consume" method failed: status=ERROR, comp-addr=0x563b7dd06d20, comp-name="pretty", comp-log-level=WARNING, comp-class-type=SINK, comp-class-name="pretty", comp-class-partial-descr="Pretty-print messages (`text` fo", comp-class-is-frozen=1, comp-class-so-handle-addr=0x563b7dd051b0, comp-class-so-handle-path="/root/virtenv/usr/lib/babeltr
ace2/plugins/babeltrace-plugin-text.so", comp-input-port-count=1, comp-output-port-count=0
11-17 05:13:05.217 47007 47007 E CLI [email protected]:2537 Graph failed to complete successfully

Cause
=====

Both relay and live threads can put the ownership reference on the
metadata viewer stream concurrently without synchronization, thus
leading to a reference count going lower than 0.

The viewer stream ownership design initially planned for being owned by
the live thread, thus allowing the live thread to put the ownership
reference as soon as the associated relay stream is observed as closed,
and the viewer stream is considered as hung up.

However, in the specific case of the metadata viewer stream, the
responsibility of closing the metadata viewer stream is shared between
the relay and live threads, because the viewers expect to observe a
LTTNG_VIEWER_NO_NEW_METADATA message before the metadata stream
hangs up (see comment in viewer_get_metadata()). Therefore, if
viewer_get_metadata() is done before the metadata stream is closed, the
viewer will receive the LTTNG_VIEWER_NO_NEW_METADATA message, and set
the no_new_metadata_notified state to true. It's then the relay thread's
relay_close_stream() which will invoke the ownership put. However,
the live thread may concurrently try to put the viewer stream ownership
as well from a subsequent viewer_get_metadata(), thus leading to a
reference count < 0.

Solution
========

Fix this by putting the ownership reference from the live viewer thread
rather than the relay thread. This can be done by tracking the state of
no_new_metadata_notified within the live viewer thread.

Known drawbacks
===============

This will postpone reclaim of the metadata viewer stream from the
relay stream close to the following viewer_get_metadata (after a
LTTNG_VIEWER_NO_NEW_METADATA message has been replied), which I don't
think is an issue.

Signed-off-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: I9783d3a8402dddaa364a7a69c0764eaf30c16fde
PSRCode pushed a commit that referenced this pull request Feb 15, 2022
When doing `lttng stop`, I get:

    Direct leak of 656 byte(s) in 1 object(s) allocated from:
        #0 0x7f970719e459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
        #1 0x7f9706eba29a in zmalloc /home/simark/src/lttng-tools/src/common/macros.h:45
        #2 0x7f9706ebbb9d in recv_sessiond_optional_data /home/simark/src/lttng-tools/src/lib/lttng-ctl/lttng-ctl.c:494
        #3 0x7f9706ebbf9a in lttng_ctl_ask_sessiond_fds_varlen /home/simark/src/lttng-tools/src/lib/lttng-ctl/lttng-ctl.c:596
        #4 0x7f9706eba714 in lttng_ctl_ask_sessiond_varlen_no_cmd_header /home/simark/src/lttng-tools/src/lib/lttng-ctl/lttng-ctl-helper.h:58
        lttng#5 0x7f9706eba747 in lttng_ctl_ask_sessiond /home/simark/src/lttng-tools/src/lib/lttng-ctl/lttng-ctl-helper.h:78
        lttng#6 0x7f9706ec4604 in lttng_list_channels /home/simark/src/lttng-tools/src/lib/lttng-ctl/lttng-ctl.c:2262
        lttng#7 0x55837235c4e7 in get_session_stats_str /home/simark/src/lttng-tools/src/bin/lttng/utils.c:499
        lttng#8 0x55837235bf73 in print_session_stats /home/simark/src/lttng-tools/src/bin/lttng/utils.c:445
        lttng#9 0x55837231cc12 in stop_tracing /home/simark/src/lttng-tools/src/bin/lttng/commands/stop.c:138
        lttng#10 0x55837231d062 in cmd_stop /home/simark/src/lttng-tools/src/bin/lttng/commands/stop.c:229
        lttng#11 0x55837235e63e in handle_command /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:237
        lttng#12 0x55837235f0f2 in parse_args /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:426
        lttng#13 0x55837235f3f4 in main /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:475
        lttng#14 0x7f9706adcb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)

    Direct leak of 308 byte(s) in 1 object(s) allocated from:
        #0 0x7f970719e459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
        #1 0x7f9706eba29a in zmalloc /home/simark/src/lttng-tools/src/common/macros.h:45
        #2 0x7f9706ebbb9d in recv_sessiond_optional_data /home/simark/src/lttng-tools/src/lib/lttng-ctl/lttng-ctl.c:494
        #3 0x7f9706ebbf9a in lttng_ctl_ask_sessiond_fds_varlen /home/simark/src/lttng-tools/src/lib/lttng-ctl/lttng-ctl.c:596
        #4 0x7f9706eba714 in lttng_ctl_ask_sessiond_varlen_no_cmd_header /home/simark/src/lttng-tools/src/lib/lttng-ctl/lttng-ctl-helper.h:58
        lttng#5 0x7f9706eba747 in lttng_ctl_ask_sessiond /home/simark/src/lttng-tools/src/lib/lttng-ctl/lttng-ctl-helper.h:78
        lttng#6 0x7f9706ec421c in lttng_list_domains /home/simark/src/lttng-tools/src/lib/lttng-ctl/lttng-ctl.c:2220
        lttng#7 0x55837235c3d3 in get_session_stats_str /home/simark/src/lttng-tools/src/bin/lttng/utils.c:484
        lttng#8 0x55837235bf73 in print_session_stats /home/simark/src/lttng-tools/src/bin/lttng/utils.c:445
        lttng#9 0x55837231cc12 in stop_tracing /home/simark/src/lttng-tools/src/bin/lttng/commands/stop.c:138
        lttng#10 0x55837231d062 in cmd_stop /home/simark/src/lttng-tools/src/bin/lttng/commands/stop.c:229
        lttng#11 0x55837235e63e in handle_command /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:237
        lttng#12 0x55837235f0f2 in parse_args /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:426
        lttng#13 0x55837235f3f4 in main /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:475
        lttng#14 0x7f9706adcb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)

This is due to the get_session_stats_str function not free'ing the
results of lttng_list_channels and lttng_list_domains.  Fix that.

Change-Id: I4c200d3df41bf09bdce8eadb000abbff7fe5a751
Signed-off-by: Simon Marchi <[email protected]>
Signed-off-by: Jérémie Galarneau <[email protected]>
PSRCode pushed a commit that referenced this pull request Feb 15, 2022
When doing `lttng destroy`, I get:

    Direct leak of 4385 byte(s) in 1 object(s) allocated from:
        #0 0x7f74ae025459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
        #1 0x7f74add4129a in zmalloc /home/simark/src/lttng-tools/src/common/macros.h:45
        #2 0x7f74add42b9d in recv_sessiond_optional_data /home/simark/src/lttng-tools/src/lib/lttng-ctl/lttng-ctl.c:494
        #3 0x7f74add42f9a in lttng_ctl_ask_sessiond_fds_varlen /home/simark/src/lttng-tools/src/lib/lttng-ctl/lttng-ctl.c:596
        #4 0x7f74add41714 in lttng_ctl_ask_sessiond_varlen_no_cmd_header /home/simark/src/lttng-tools/src/lib/lttng-ctl/lttng-ctl-helper.h:58
        lttng#5 0x7f74add41747 in lttng_ctl_ask_sessiond /home/simark/src/lttng-tools/src/lib/lttng-ctl/lttng-ctl-helper.h:78
        lttng#6 0x7f74add4a922 in lttng_list_sessions /home/simark/src/lttng-tools/src/lib/lttng-ctl/lttng-ctl.c:2105
        lttng#7 0x56472bcbdf80 in cmd_destroy /home/simark/src/lttng-tools/src/bin/lttng/commands/destroy.c:330
        lttng#8 0x56472bd00764 in handle_command /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:237
        lttng#9 0x56472bd01218 in parse_args /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:426
        lttng#10 0x56472bd0151a in main /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:475
        lttng#11 0x7f74ad963b24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)

This is due to cmd_destroy not free'ing the result of
lttng_list_sessions. Fix that.

Change-Id: Iff2e75e6ec1cdcd0bdfdbbc3d5099422e592905b
Signed-off-by: Simon Marchi <[email protected]>
Signed-off-by: Jérémie Galarneau <[email protected]>
PSRCode added a commit that referenced this pull request Feb 15, 2022
Observed issue
==============

The following scenario leads to an abort of lttng-sessiond.

 lttng-sessiond (with kernel tracing available)
 lttng create system-trace --snapshot -U /tmp/snapshot
 lttng enable-channel -k system-trace --subbuf-size=4k --num-subbuf=256
 lttng enable-event -c system-trace -k 'sched_wak*' -s system-trace
 lttng start system-trace

 lttng enable-event -u -a

Fails as expected with:
 Error: Events: The command tried to enable an event in a new domain for
 a session that has already been started once. (channel channel0,
 session system-trace)

Launch any ust app such as easy_ust from the lttng-ust repository.

The following backtrace is generated:

 (gdb) bt
 #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
 #1  0x00007ffff7af0859 in __GI_abort () at abort.c:79
 #2  0x00007ffff7af0729 in __assert_fail_base (fmt=0x7ffff7c86588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55555564b765 "usess->active", file=0x555555649a60 "ust-app.c", line
 #3  0x00007ffff7b01f36 in __GI___assert_fail (assertion=0x55555564b765 "usess->active", file=0x555555649a60 "ust-app.c", line=5123, function=0x55555564ecf0 <__PRETTY_FUNCTION__.14199> "ust_
 #4  0x00005555555d1f5e in ust_app_global_update (usess=0x7fffe001fb90, app=0x7fffac000b80) at ust-app.c:5123
 lttng#5  0x00005555555b60d4 in update_ust_app (app_sock=82) at dispatch.c:71
 lttng#6  0x00005555555b7025 in thread_dispatch_ust_registration (data=0x5555556a07f0) at dispatch.c:409
 lttng#7  0x00005555555ad5ab in launch_thread (data=0x5555556a0810) at thread.c:65
 lttng#8  0x00007ffff7ce6609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#9  0x00007ffff7bed293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

This also happens for the track command. You can replace the `lttng
enable-event -u -a` with `lttng track --userspace --vuid=0` then launch
an app and the same backtrace gets generated.

Cause
=====

During `process_client_msg` the `create_ust_session` function is called
and a ust session is assigned to the "system_trace" session with a
state of `active` set to 0 (false). This is not a problem.

The problem seems to lie with a single call site for
`ust_app_global_update` in `update_ust_app`. The status of the ust
session is not checked before calling the `ust_app_global_update`. It is
important to note that all `ust_app_global_update_all` callsites guard
the call with a check against the status of the session.

Solution
========

Guard the call to `ust_app_global_update` with a check of the ust
session active state.

Known drawbacks
=========

None.

Signed-off-by: Jonathan Rajotte <[email protected]>
Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: I14d25d99d0609689247cdfa86130bd0219613581
PSRCode added a commit that referenced this pull request Feb 15, 2022
Observed issue
==============

The following backtrace have been reported [1].

 #0  __GI_raise (sig=sig@entry=6) at /usr/src/debug/glibc/2.31+gitAUTOINC+f84949f1c4-r0/git/sysdeps/unix/sysv/linux/raise.c:50
 #1  0x0000003123025528 in __GI_abort () at /usr/src/debug/glibc/2.31+gitAUTOINC+f84949f1c4-r0/git/stdlib/abort.c:79
 #2  0x0000000000419884 in lttng_trace_archive_location_serialize (location=0x7f1c9c001160, buffer=0x7f1cb961c320) at /usr/src/debug/lttng-tools/2.13.0-r0/lttng-tools-2.13.0/src/common/location.c:230
 #3  0x00000000004c8f06 in lttng_evaluation_session_rotation_serialize (evaluation=0x7f1cb000a7f0, payload=0x7f1cb961c320) at /usr/src/debug/lttng-tools/2.13.0-r0/lttng-tools-2.13.0/src/common/conditions/session-rotation.c:539
 #4  0x00000000004a80fa in lttng_evaluation_serialize (evaluation=0x7f1cb000a7f0, payload=0x7f1cb961c320) at /usr/src/debug/lttng-tools/2.13.0-r0/lttng-tools-2.13.0/src/common/evaluation.c:42
 lttng#5  0x00000000004bc24f in lttng_notification_serialize (notification=0x7f1cb961c310, payload=0x7f1cb961c320) at /usr/src/debug/lttng-tools/2.13.0-r0/lttng-tools-2.13.0/src/common/notification.c:63
 lttng#6  0x0000000000458b7d in notification_client_list_send_evaluation (client_list=0x7f1cb0008f90, trigger=0x7f1ca40113d0, evaluation=<optimized out>, source_object_creds=0x7f1cb000a874, client_report=0x475840 <client_handle_transmission_status>, user_data=0x7f1cb0006010) at /usr/src/debug/lttng-tools/2.13.0-r0/lttng-tools-2.13.0/src/bin/lttng-sessiond/notification-thread-events.c:4379
 lttng#7  0x0000000000476586 in action_executor_generic_handler (item=0x7f1cb0009600, work_item=0x7f1cb000a820, executor=0x7f1cb0006010) at /usr/src/debug/lttng-tools/2.13.0-r0/lttng-tools-2.13.0/src/bin/lttng-sessiond/action-executor.c:696
 lttng#8  action_work_item_execute (work_item=0x7f1cb000a820, executor=0x7f1cb0006010) at /usr/src/debug/lttng-tools/2.13.0-r0/lttng-tools-2.13.0/src/bin/lttng-sessiond/action-executor.c:715
 lttng#9  action_executor_thread (_data=0x7f1cb0006010) at /usr/src/debug/lttng-tools/2.13.0-r0/lttng-tools-2.13.0/src/bin/lttng-sessiond/action-executor.c:797
 lttng#10 0x0000000000462327 in launch_thread (data=0x7f1cb00060b0) at /usr/src/debug/lttng-tools/2.13.0-r0/lttng-tools-2.13.0/src/bin/lttng-sessiond/thread.c:66
 lttng#11 0x0000003123408ea4 in start_thread (arg=<optimized out>) at /usr/src/debug/glibc/2.31+gitAUTOINC+f84949f1c4-r0/git/nptl/pthread_create.c:477
 lttng#12 0x00000031230f8dcf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

This can be easily reproduced with the following session and trigger
configuration:

 lttng create test
 lttng enable-event -u -a
 lttng start
 # Register two similar triggers via a dummy C program since rotation
 # completed condition is not exposed on the CLI for now. Yielding the
 # following triggers:
 lttng list-triggers
 - name: trigger0
   owner uid: 1000
   condition: session rotation completed
     session name: test
     errors: none
  action:notify
   errors: none
 - name: trigger1
   owner uid: 1000
   condition: session rotation completed
     session name: test
     errors: none
  action:notify
   errors: none

  lttng rotate <- abort happens here.

Cause
=====

The problem lies in how the location (`lttng_trace_archive_location`)
object is assigned to the `lttng_evaluation` objects. A single location
object can end up being shared between multiple `lttng_evaluation` objects
since we iterate over all triggers and create an `lttng_evaluation` object
with the location each time as needed.

See `src/bin/lttng-sessiond/notification-thread-events.c:1956`.

The location object is then freed when the first notification is
completely serialized. The second serialization end up having a
reference to a freed `lttng_trace_archive_location` object.

Solution
========

Implement ref counting for the lttng_trace_archive_location object.

Note
=======

This also fixes a leak that was present in `cmd_destroy_session_reply`.

The location is created by `session_get_trace_archive_location` and is
never `destroyed`/`put`.

Known drawbacks
=========

None.

References
==========

[1] https://bugs.lttng.org/issues/1325

Fixes: #1325

Signed-off-by: Jonathan Rajotte <[email protected]>
Change-Id: I99dc595ee5b0288c727b193ed061f5273752bd24
Signed-off-by: Jérémie Galarneau <[email protected]>
PSRCode added a commit that referenced this pull request Feb 15, 2022
Observed issue
==============

A segmentation fault is observed for multiple UST timeout scenarios.

Backtrace:

 #0  __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:384
 #1  0x0000557fe0395df9 in copy_filter_bytecode (orig_f=0x7f9c5802b790) at ust-app.c:1196
 #2  0x0000557fe0397702 in shadow_copy_event (ua_event=0x7f9c58025ff0, uevent=0x7f9c58033560) at ust-app.c:1824
 #3  0x0000557fe039ac46 in create_ust_app_event (ua_sess=0x7f9c5802ec20, ua_chan=0x7f9c58025cc0, uevent=0x7f9c58033560, app=0x7f9c5c001da0) at ust-app.c:3192
 #4  0x0000557fe03a054d in ust_app_channel_synchronize_event (ua_chan=0x7f9c58025cc0, uevent=0x7f9c58033560, ua_sess=0x7f9c5802ec20, app=0x7f9c5c001da0) at ust-app.c:5096
 lttng#5  0x0000557fe03a0772 in ust_app_synchronize (usess=0x7f9c580074a0, app=0x7f9c5c001da0) at ust-app.c:5173
 lttng#6  0x0000557fe03a0a70 in ust_app_global_update (usess=0x7f9c580074a0, app=0x7f9c5c001da0) at ust-app.c:5255
 lttng#7  0x0000557fe03a00e0 in ust_app_start_trace_all (usess=0x7f9c580074a0) at ust-app.c:4987
 lttng#8  0x0000557fe0355c6a in cmd_start_trace (session=0x7f9c5800a190) at cmd.c:2668
 lttng#9  0x0000557fe0382e70 in process_client_msg (cmd_ctx=0x7f9c58003d70, sock=0x7f9c74bf44e0, sock_error=0x7f9c74bf44e4) at client.c:1527
 lttng#10 0x0000557fe03848a2 in thread_manage_clients (data=0x557fe06d9440) at client.c:2200
 lttng#11 0x0000557fe037d1cb in launch_thread (data=0x557fe06d94b0) at thread.c:75
 lttng#12 0x00007f9c796af609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#13 0x00007f9c795b6293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The scenario:

  # Start an instrumented app
  ./app
  gdb lttng-sessiond
  # put a breakpoint on ustctl_set_filter
  lttng create my_session
  lttng enable-event -u tp:tp_test
  lttng start
  lttng enable-event -u __dummy --filter 'my_field == "user34"'
  # The tracepoint should hit. Do not continue.
  kill -s SIGSTOP $(pgrep app)
  # Continue lttng-sessiond.
  # enable-event will return an error. This a bug in itself, still let's
  # continue with the current bug.
  lttng stop
  # Start a new app that will register.
  ./app &
  sleep 1
  lttng start
  # lttng-sessiond should segfault.

Cause
=====

During the "lttng enable-event" command, the timeout error bubbles up
all the way to event_ust_enable_tracepoint and is different from
LTTNG_UST_ERR_EXIST. `trace_ust_destroy_event` is called and frees the
`uevent` object. Note that contrary to the comment `uevent` is added to
the channel event hash table at this point.

On the next `lttng start` command, the event node is still present in
the hash table and is iterated on. lttng-sessiond segfault on the first
data access of the previously freed memory.

The problem was introduced by commit
88e3c2f [1]. Which essentially move the
callsite of `add_unique_ust_event` before `ust_app_*_event_glb` calls.

Solution
========

Go to `end` label to prevent freeing of the uevent object.

Note that app synchronization should not force an error at the channel
level, since a single app can fail but the whole channel should not.

The `error` label is now obsolete.

Known drawbacks
=========

None.

References
==========

[1] lttng@88e3c2f

Signed-off-by: Jonathan Rajotte <[email protected]>
Change-Id: Ifaf3f4c71bb2da869c7b441aaa4b367f8f7cbdd6
Signed-off-by: Jérémie Galarneau <[email protected]>
PSRCode added a commit that referenced this pull request Feb 15, 2022
Observed issue
==============

The following scenario lead to the UST thread to be "stuck" on recvmsg
on the notify socket.
The problem manifest itself when an application is unresponsive during
the ustctl_start_session call. Note that the default timeout for ust
communication is 5 seconds.

  # Start an instrumented app
  ./app
  gdb lttng-sessiond
  # put a breakpoint on ustctl_start_session
  lttng create my_session
  lttng enable-event -u -a
  lttng start
  # The tracepoint should hit. Do not continue.
  kill -s SIGSTOP $(pgrep app)
  # Continue lttng-sessiond.
  sleep 5 # This make sure lttng-sessiond unregister the app from its point of view
  kill -s SIGCONT $(pgrep app)
  gdb -p $(pgrep app)
  thread apply all bt

App stack trace:

  Thread 3 (Thread 0x7fe2c6f58700 (LWP 48172)):
  #0  __libc_recvmsg (flags=0, msg=0x7fe2c6f56ac0, fd=4) at ../sysdeps/unix/sysv/linux/recvmsg.c:28
  #1  __libc_recvmsg (fd=fd@entry=4, msg=msg@entry=0x7fe2c6f56ac0, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recvmsg.c:25
  #2  0x00007fe2c7a010ba in ustcomm_recv_unix_sock (sock=sock@entry=4, buf=buf@entry=0x7fe2c6f56ea0, len=len@entry=48) at lttng-ust-comm.c:308
  #3  0x00007fe2c7a037c3 in ustcomm_register_channel (sock=4, session=session@entry=0x7fe2c0000ba0, session_objd=<optimized out>, channel_objd=<optimized out>, nr_ctx_fields=nr_ctx_fields@entry=0, ctx_fields=<optimized out>, chan_id=0x7fe2
  c6f5716c, header_type=0x7fe2c0012b18) at lttng-ust-comm.c:1544
  #4  0x00007fe2c7a10787 in lttng_session_enable (session=0x7fe2c0000ba0) at lttng-events.c:444
  lttng#5  0x00007fe2c7a0b785 in lttng_session_cmd (objd=1, cmd=128, arg=140611977311672, uargs=0x7fe2c6f57800, owner=0x7fe2c7a5da00 <local_apps>) at lttng-ust-abi.c:576
  lttng#6  0x00007fe2c7a07d6d in handle_message (lum=0x7fe2c6f57590, sock=3, sock_info=0x7fe2c7a5da00 <local_apps>) at lttng-ust-comm.c:1003
  lttng#7  ust_listener_thread (arg=0x7fe2c7a5da00 <local_apps>) at lttng-ust-comm.c:1712
  lttng#8  0x00007fe2c7993609 in start_thread (arg=<optimized out>) at pthread_create.c:477
  lttng#9  0x00007fe2c78ba293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

  ...

Cause
=====

When the app continues after the timeout from lttng-sessiond side, the
actual start_session message is received on the application side then
UST, app side, send commands on the notify socket. On lttng-sessiond
side, the command is received but no reply is sent.

This is due to the fact that the lookup against the
ust_app_ht_by_notify_sock hash table (find_app_by_notify_sock)
return nothing since the app is unregistered at this point and the hash
table node was removed on unregistration.

Solution
========

When the app lookup fails, return an error that will trigger the cleanup
of the notify socket.

Known drawbacks
=========
None

Note
=========
Subsequent error path in reply_ust_register_channel,
add_event_ust_registry, and add_enum_ust_registry might lead to the same
type of problem since no reply is sent to the app. Still, for those
cases the complete application/notify socket should not be destroyed
since the error path relate to either a session or a sub object of a
session.

Signed-off-by: Jonathan Rajotte <[email protected]>
Change-Id: Iea0dc027ca1ee772e84c7e545114f1be69fd1f63
Signed-off-by: Jérémie Galarneau <[email protected]>
PSRCode pushed a commit that referenced this pull request Feb 15, 2022
Observed issue
==============

As reported in #1323 (https://bugs.lttng.org/issues/1323), crashes of
the relay daemon are observed when running the user space clear tests.

The crash occurs with the following stack trace:
  #0  0x000055fbb861d6ae in urcu_ref_get_unless_zero (ref=0x28) at /usr/local/include/urcu/ref.h:85
  #1  lttng_trace_chunk_get (chunk=0x0) at trace-chunk.c:1836
  #2  0x000055fbb86051e2 in make_viewer_streams (relay_session=relay_session@entry=0x7f6ea002d540, viewer_session=<optimized out>, seek_t=seek_t@entry=LTTNG_VIEWER_SEEK_BEGINNING, nb_total=nb_total@entry=0x7f6ea9607b00, nb_unsent=nb_unsent@entry=0x7f6ea9607aec, nb_created=nb_created@entry=0x7f6ea9607ae8, closed=<optimized out>) at live.c:405
  #3  0x000055fbb86061d9 in viewer_get_new_streams (conn=0x7f6e94000fc0) at live.c:1155
  #4  process_control (conn=0x7f6e94000fc0, recv_hdr=0x7f6ea9607af0) at live.c:2353
  lttng#5  thread_worker (data=<optimized out>) at live.c:2515
  lttng#6  0x00007f6eae86a609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
  lttng#7  0x00007f6eae78f293 in clone () from /lib/x86_64-linux-gnu/libc.so.6

The race window during which this occurs seems very small as it can take
hours to reproduce this crash. However, a minimal reproducer could be
identified, as stated in the bug report.

Essentially, the same crash can be reproduced by attaching a live viewer
to a session that has seen events being produced, been stopped and been
cleared.

Cause
=====

The crash occurs as an attempt is made to take a reference to a viewer
session’s trace chunk as viewer streams are created. The crux of the
problem is that the code doesn’t expect a viewer session’s trace chunk
to be NULL.

The viewer session’s current trace chunk is initially set, when a viewer
attaches to the viewer session, to a copy the corresponding
relay_session’s current trace chunk.

A live session always attempts to "catch-up" to the newest available
trace chunk. This means that when a viewer reaches the end of a trace
chunk, the viewer session may not transition to the "next" one: it jumps
to the most recent trace chunk available (the one being produced by the
relay_session). Hence, if the producer performs multiple rotations
before a viewer completes the consumption of a trace chunk, it will skip
over those "intermediary" trace chunks.

A viewer session updates its current trace chunk when:
  1) new viewer streams are created,
  2) a new index is requested,
  3) metadata is requested.

Hence, as a general principle, the viewer session will reference the
most recent trace chunk available _even if its streams do not point to
it_. It indicates which trace chunk viewer streams should transition to
when the end of their current trace chunk is reached.

The live code properly handles transitions to a null chunk. This can be
verified by attaching a viewer to a live session, stopping the session,
clearing it (thus entering a null trace chunk), and resuming tracing.

The only issue is that the case where the first trace chunk of a viewer
session is "null" (no active trace chunk) is mishandled in two places:
  1) in make_viewer_streams(), where the crash is observed,
  2) in viewer_get_metadata().

Solution
========

In make_viewer_streams(), it is assumed that a viewer session will have
a non-null trace chunk whenever a rotation is not ongoing. This is
reflected by the fact that a reference is always acquired on the viewer
session’s trace chunk.

That code is one of the three places that can cause a viewer session’s
trace chunk to be updated. We still want to update the viewer session to
the most recently seen trace chunk (null, in this case). However, there
is no reference to acquire and the trace chunk to use for the creation
of the viewer stream is NULL. This is properly handled by
viewer_stream_create().

The second site to change is viewer_get_metadata() which doesn’t handle
a viewer metadata stream not having an active trace chunk at all.
Thankfully, the protocol allows us to express this condition by
returning the LTTNG_VIEWER_NO_NEW_METADATA status code when a viewer
metadata stream doesn’t have an open file and doesn’t have a current
trace chunk.

Surprisingly, this bug didn’t trigger in the case where a transition to
a null chunk occurred _after_ attaching to a viewer session.

This is because viewers will typically ask for metadata as a result of an
LTTNG_VIEWER_FLAG_NEW_METADATA reply to the GET_NEXT_INDEX command. When
a session is stopped and all data was consumed, this command returns
that no new data is available, causing the viewers to wait and ask again
later.

However, when attaching, babeltrace2 (at least, and probably babeltrace 1.x)
always asks for an initial segment of metadata before asking for an
index.

Known drawbacks
===============

None.

Fixes: #1323

Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: I516fca60755e6897f6b7170c12d706ef57ad61a5
PSRCode pushed a commit that referenced this pull request Feb 15, 2022
Observed issue
==============

When performing:

  #!/bin/bash

  lttng create py_syscalls --live

  lttng enable-event -u -a
  lttng enable-event -k -a

  lttng start

  babeltrace2 -i lttng-live net://localhost/host/raton/py_syscalls

The relay daemon hits this assertion:

  Thread 8 (Thread 0x7fffeeffd700 (LWP 167040) "lttng-relayd"):
  #0  0x00007ffff7b1618b in raise () from /lib/x86_64-linux-gnu/libc.so.6
  #1  0x00007ffff7af5859 in abort () from /lib/x86_64-linux-gnu/libc.so.6
  #2  0x00007ffff7af5729 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
  #3  0x00007ffff7b06f36 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
  #4  0x00005555555889bb in viewer_session_attach (vsession=0x7fffdc001400, session=session@entry=0x7fffe8001180) at viewer-session.c:80
  lttng#5  0x000055555557bcff in viewer_attach_session (conn=0x7fffd0001140) at live.c:1275
  lttng#6  process_control (conn=0x7fffd0001140, recv_hdr=0x7fffeeffcaf0) at live.c:2341
  lttng#7  thread_worker (data=<optimized out>) at live.c:2515
  lttng#8  0x00007ffff7ccd609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
  lttng#9  0x00007ffff7bf2293 in clone () from /lib/x86_64-linux-gnu/libc.so.6

Cause
=====

This assert appears to be entirely wrong.

It checks that the "viewer session" has a NULL current trace chunk when
attaching a session to a viewer session, but in the case where a viewer
session has multiple sessions (e.g. with kernel and ust tracing
combined), we are attaching each session individually to the viewer
session, and we set the current trace chunk of the viewer session when
we attach the first session to it.

So it is expected to be non-NULL when attaching the second session.

Solution
========

Remove the assertion.

Known limitations
=================

None.

Fixes: #1335
Signed-off-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: I4d8f5d5347b4588144ddf449976cae5a94b81b3a
PSRCode added a commit that referenced this pull request Feb 15, 2022
Observed issue
==============

lttng-consumerd segfaults for the following scenario:

  $ lttng create test --snapshot
  $ lttng enable-event -u -an
  $ lttng start
  # Run an app just to have some events
  $ lttng regenerate metadata
  $ lttng snapshot record

The following backtrace is obtained:

 (gdb) bt
 #0  __GI___pthread_mutex_lock (mutex=0x130) at ../nptl/pthread_mutex_lock.c:67
 #1  0x000055b383cfaed3 in lttng_ustconsumer_recv_metadata (sock=29, key=4, offset=0, len=12391, version=1, channel=0x7fe90000a760, timer=0, wait=1) at ust-consumer.c:1347
 #2  0x000055b383d00197 in lttng_ustconsumer_request_metadata (ctx=0x55b3855a1e50, channel=0x7fe90000a760, timer=0, wait=1) at ust-consumer.c:3336
 #3  0x000055b383cf9e76 in snapshot_metadata (metadata_channel=0x7fe90000a760, key=4, path=0x7fe911a09944 "uid/1000/64-bit", relayd_id=18446744073709551615, ctx=0x55b3855a1e50) at ust-consum
 #4  0x000055b383cfbe73 in lttng_ustconsumer_recv_cmd (ctx=0x55b3855a1e50, sock=28, consumer_sockpoll=0x7fe911a0dbb0) at ust-consumer.c:1853
 lttng#5  0x000055b383ccf9b7 in lttng_consumer_recv_cmd (ctx=0x55b3855a1e50, sock=28, consumer_sockpoll=0x7fe911a0dbb0) at consumer.c:2097
 lttng#6  0x000055b383cd3bfd in consumer_thread_sessiond_poll (data=0x55b3855a1e50) at consumer.c:3284
 lttng#7  0x00007fe914c22609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#8  0x00007fe914b47293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
 (gdb) up
 #1  0x000055b383cfaed3 in lttng_ustconsumer_recv_metadata (sock=29, key=4, offset=0, len=12391, version=1, channel=0x7fe90000a760, timer=0, wait=1) at ust-consumer.c:1347
 1347                    pthread_mutex_lock(&channel->metadata_stream->lock);
 (gdb) print channel->metadata_stream
 $1 = (struct lttng_consumer_stream *) 0x0

Note that the following scenario also leads to a similar backtrace:

  $ lttng create test --snapshot
  $ lttng enable-event -u -an
  $ lttng start
  # Run an app just to have some events with a short duration
  $ lttng regenerate metadata
  # Run a second app just to have some events and a metadata flush while
  # the metadata cache status is set as `invalidated`.
  ^ lttng-consumerd segfault on app termination.

The backtrace:

 (gdb) bt
 #0  __GI___pthread_mutex_lock (mutex=0x130) at ../nptl/pthread_mutex_lock.c:67
 #1  0x00005612a5a13ed3 in lttng_ustconsumer_recv_metadata (sock=28, key=2, offset=0, len=12391, version=1, channel=0x7f3978005400, timer=0, wait=1) at ust-consumer.c:1347
 #2  0x00005612a5a14d0a in lttng_ustconsumer_recv_cmd (ctx=0x5612a6feee50, sock=28, consumer_sockpoll=0x7f3989494bb0) at ust-consumer.c:1818
 #3  0x00005612a59e89b7 in lttng_consumer_recv_cmd (ctx=0x5612a6feee50, sock=28, consumer_sockpoll=0x7f3989494bb0) at consumer.c:2097
 #4  0x00005612a59ecbfd in consumer_thread_sessiond_poll (data=0x5612a6feee50) at consumer.c:3284
 lttng#5  0x00007f398c6a9609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#6  0x00007f398c5ce293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
 (gdb) up
 #1  0x00005612a5a13ed3 in lttng_ustconsumer_recv_metadata (sock=28, key=2, offset=0, len=12391, version=1, channel=0x7f3978005400, timer=0, wait=1) at ust-consumer.c:1347
 1347                    pthread_mutex_lock(&channel->metadata_stream->lock);
 (gdb) print channel->metadata_stream
 $1 = (struct lttng_consumer_stream *) 0x0
 (gdb)

Cause
=====

For session configured in snapshot mode the metadata channel
metadata_stream field is NULL except for a "short" window during the
actual snapshot record action (snapshot_metadata).

The `regenerate metadata` effectively flag the metadata cache as invalid
leading to handling the cache invalidation state
(`CONSUMER_METADATA_CACHE_WRITE_STATUS_INVALIDATED`) in
`lttng_ustconsumer_recv_metadata`. This was introduced by
b1316da [1].

At that point the function expects the `channel->metadata_stream` to be
non null. This is simply not true for snapshot session metadata channels.

Note that we cannot simply swap `lttng_ustconsumer_request_metadata` and
`create_ust_streams` in `snapshot_metadata` to ensure that the
`channel->metadata_stream` is non null since
`lttng_ustconsumer_recv_metadata` ends up being called on metadata flush when
an app quit. This sequence of events corresponds to the second scenario
put forward in the `Observed Issue` section.

Solution
========

Null check `channel->metadata_stream` and perform only the operation
when it is non null. This partly mirror what is done in `consumer_metadata_wakeup_pipe`.

I am not sure if the check on `channel->monitor` is required but it
seems irrelevant to the notion of resetting the stream consumed position
when the stream exists.

With this taken care off, we find ourself with another
backtrace:

 #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50                                                                                                             [93/122]
 #1  0x00007f75cf9b3859 in __GI_abort () at abort.c:79
 #2  0x00007f75cf9b3729 in __assert_fail_base (fmt=0x7f75cfb49588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55ab004e9c68 "pthread_mutex_trylock(&stream->lock)", file=0x55ab004
 #3  0x00007f75cf9c4f36 in __GI___assert_fail (assertion=0x55ab004e9c68 "pthread_mutex_trylock(&stream->lock)", file=0x55ab004e8d7a "ust-consumer.c", line=1285, function=0x55ab004eb8a0 <__PR
 #4  0x000055ab004b1b9c in metadata_stream_reset_cache_consumed_position (stream=0x7f75b400a850) at ust-consumer.c:1285
 lttng#5  0x000055ab004b4fef in commit_one_metadata_packet (stream=0x7f75b400a850) at ust-consumer.c:2551
 lttng#6  0x000055ab004b5f46 in get_next_subbuffer_metadata (stream=0x7f75b400a850, subbuffer=0x7f75cc972630) at ust-consumer.c:2927
 lttng#7  0x000055ab0048b6a9 in lttng_consumer_read_subbuffer (stream=0x7f75b400a850, ctx=0x55ab01d4ee50, locked_by_caller=true) at consumer.c:3522
 lttng#8  0x000055ab004b0f66 in snapshot_metadata (metadata_channel=0x7f75b4005400, key=2, path=0x7f75cc972944 "uid/1000/64-bit", relayd_id=18446744073709551615, ctx=0x55ab01d4ee50) at ust-consum
 lttng#9  0x000055ab004b2e86 in lttng_ustconsumer_recv_cmd (ctx=0x55ab01d4ee50, sock=28, consumer_sockpoll=0x7f75cc976bb0) at ust-consumer.c:1861
 lttng#10 0x000055ab004869b7 in lttng_consumer_recv_cmd (ctx=0x55ab01d4ee50, sock=28, consumer_sockpoll=0x7f75cc976bb0) at consumer.c:2097
 lttng#11 0x000055ab0048abfd in consumer_thread_sessiond_poll (data=0x55ab01d4ee50) at consumer.c:3284
 lttng#12 0x00007f75cfb8b609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#13 0x00007f75cfab0293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Which is also caused in part to the invalidation of the cache.
`metadata_stream_reset_cache_consumed_position` expect that the stream
at that point be locked. Which is not the case despite what the last argument
to `lttng_consumer_read_subbuffer` indicates. To alleviate that we hold
the lock during the call to `lttng_consumer_read_subbuffer`.

Known drawbacks
=========

None.

References
=========

[1] https://git.lttng.org/?p=lttng-tools.git;a=commit;h=b1316da1ffbd276fc8271e7a9438e683ad352781

Signed-off-by: Jonathan Rajotte <[email protected]>
Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: Ib09fdc989b1baae26d0f496afeabc3e7cb27228c
PSRCode pushed a commit that referenced this pull request Feb 15, 2022
Observed issue
==============

While running tests/regression/tools/clear/test_ust test in a loop we
eventually witness the following error:

The symptom on the Babeltrace side is a Connection reset by peer. This
is caused by a relayd abort after an assertion failure due to a
reference count being lower than 0.

Test case:

  # Test ust streaming live clear with viewer
  # Parameters: tracing_active=0, clear_twice=0, buffer_type=pid

Tools commit: https://review.lttng.org/c/lttng-tools/+/6754/1

Relay daemon assertion failure stack trace:

  (gdb) bt
  #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
  #1  0x00007ffa2e850859 in __GI_abort () at abort.c:79
  #2  0x00007ffa2e850729 in __assert_fail_base (fmt=0x7ffa2e9e6588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x56221bdb96f4 "res >= 0",
      file=0x56221bdb97a0 "/root/virtenv/usr/include/urcu/ref.h", line=66, function=<optimized out>) at assert.c:92
  #3  0x00007ffa2e861f36 in __GI___assert_fail (assertion=assertion@entry=0x56221bdb96f4 "res >= 0", file=file@entry=0x56221bdb97a0 "/root/virtenv/usr/include/urcu/ref.h", line=line@entry=66,
      function=function@entry=0x56221bdbd638 <__PRETTY_FUNCTION__.7554> "urcu_ref_put") at assert.c:101
  #4  0x000056221bd6a1cc in urcu_ref_put (release=<optimized out>, ref=0x7ffa24008cb0) at /root/virtenv/usr/include/urcu/ref.h:66
  lttng#5  viewer_stream_put (vstream=vstream@entry=0x7ffa24008cb0) at viewer-stream.c:279
  lttng#6  0x000056221bd5e4c5 in viewer_get_metadata (conn=conn@entry=0x7ffa0c000fc0) at live.c:2211
  lttng#7  0x000056221bd63778 in process_control (conn=0x7ffa0c000fc0, recv_hdr=0x7ffa297c5af0) at live.c:2376
  lttng#8  thread_worker (data=<optimized out>) at live.c:2541
  lttng#9  0x00007ffa2ea28609 in start_thread (arg=<optimized out>) at pthread_create.c:477
  lttng#10 0x00007ffa2e94d293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Babeltrace crash:

11-17 05:13:05.197 47007 47007 D PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER [email protected]:1494 [live] Requesting next index for stream: cmd=GET_NEXT_INDEX, viewer-stream-id=3352
11-17 05:13:05.215 47007 47007 E PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER [email protected]:245 [live] Error receiving from Relay: Connection reset by peer.
11-17 05:13:05.215 47007 47007 E PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER [email protected]:1522 [live] Error receiving get next index reply
11-17 05:13:05.215 47007 47007 D PLUGIN/SRC.CTF.LTTNG-LIVE [email protected]:1069 [live] Returning from advancing live stream iterator: status=LTTNG_LIVE_ITERATOR_STATUS_ERROR, stream-name="stream-3352", viewer-stream-id=3352
11-17 05:13:05.215 47007 47007 E PLUGIN/SRC.CTF.LTTNG-LIVE [email protected]:1802 [live] Error preparing the next batch of messages: live-iter-status=LTTNG_LIVE_ITERATOR_STATUS_ERROR
11-17 05:13:05.216 47007 47007 W LIB/MSG-ITER [email protected]:866 Component input port message iterator's "next" method failed: iter-addr=0x563b7dd07700, iter-upstream-comp-name="live", iter-upstream-comp-log-level=TRACE, iter-upstream-comp-class-type=SOURCE, iter-upstream-comp-class-name="lttng-live", iter-upstream-comp-class-partial-descr="Connect to an LTTng relay daemon", iter-upstream-port-type=OUT
PUT, iter-upstream-port-name="out", status=ERROR
11-17 05:13:05.216 47007 47007 E PLUGIN/FLT.UTILS.MUXER [email protected]:446 [mux] Upstream iterator's next method returned an error: status=ERROR
11-17 05:13:05.216 47007 47007 E PLUGIN/FLT.UTILS.MUXER [email protected]:989 [mux] Cannot validate muxer's upstream message iterator wrapper: muxer-msg-iter-addr=0x563b7dd04740, muxer-upstream-msg-iter-wrap-addr=0x563b7dd0db30
11-17 05:13:05.216 47007 47007 E PLUGIN/FLT.UTILS.MUXER [email protected]:1417 [mux] Cannot get next message: comp-addr=0x563b7dd1a760, muxer-comp-addr=0x563b7dd1b930, muxer-msg-iter-addr=0x563b7dd04740, msg-iter-addr=0x563b7dd07590, status=ERROR
11-17 05:13:05.216 47007 47007 W LIB/MSG-ITER [email protected]:866 Component input port message iterator's "next" method failed: iter-addr=0x563b7dd07590, iter-upstream-comp-name="mux", iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=FILTER, iter-upstream-comp-class-name="muxer", iter-upstream-comp-class-partial-descr="Sort messages from multiple inpu", iter-upstream-port-type=OUTPUT,
 iter-upstream-port-name="out", status=ERROR
11-17 05:13:05.216 47007 47007 W LIB/GRAPH [email protected]:462 Component's "consume" method failed: status=ERROR, comp-addr=0x563b7dd06d20, comp-name="pretty", comp-log-level=WARNING, comp-class-type=SINK, comp-class-name="pretty", comp-class-partial-descr="Pretty-print messages (`text` fo", comp-class-is-frozen=1, comp-class-so-handle-addr=0x563b7dd051b0, comp-class-so-handle-path="/root/virtenv/usr/lib/babeltr
ace2/plugins/babeltrace-plugin-text.so", comp-input-port-count=1, comp-output-port-count=0
11-17 05:13:05.217 47007 47007 E CLI [email protected]:2537 Graph failed to complete successfully

Cause
=====

Both relay and live threads can put the ownership reference on the
metadata viewer stream concurrently without synchronization, thus
leading to a reference count going lower than 0.

The viewer stream ownership design initially planned for being owned by
the live thread, thus allowing the live thread to put the ownership
reference as soon as the associated relay stream is observed as closed,
and the viewer stream is considered as hung up.

However, in the specific case of the metadata viewer stream, the
responsibility of closing the metadata viewer stream is shared between
the relay and live threads, because the viewers expect to observe a
LTTNG_VIEWER_NO_NEW_METADATA message before the metadata stream
hangs up (see comment in viewer_get_metadata()). Therefore, if
viewer_get_metadata() is done before the metadata stream is closed, the
viewer will receive the LTTNG_VIEWER_NO_NEW_METADATA message, and set
the no_new_metadata_notified state to true. It's then the relay thread's
relay_close_stream() which will invoke the ownership put. However,
the live thread may concurrently try to put the viewer stream ownership
as well from a subsequent viewer_get_metadata(), thus leading to a
reference count < 0.

Solution
========

Fix this by putting the ownership reference from the live viewer thread
rather than the relay thread. This can be done by tracking the state of
no_new_metadata_notified within the live viewer thread.

Known drawbacks
===============

This will postpone reclaim of the metadata viewer stream from the
relay stream close to the following viewer_get_metadata (after a
LTTNG_VIEWER_NO_NEW_METADATA message has been replied), which I don't
think is an issue.

Signed-off-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: I9783d3a8402dddaa364a7a69c0764eaf30c16fde
PSRCode pushed a commit that referenced this pull request Feb 24, 2022
Observed issue
==============

As reported in #1323 (https://bugs.lttng.org/issues/1323), crashes of
the relay daemon are observed when running the user space clear tests.

The crash occurs with the following stack trace:
  #0  0x000055fbb861d6ae in urcu_ref_get_unless_zero (ref=0x28) at /usr/local/include/urcu/ref.h:85
  #1  lttng_trace_chunk_get (chunk=0x0) at trace-chunk.c:1836
  #2  0x000055fbb86051e2 in make_viewer_streams (relay_session=relay_session@entry=0x7f6ea002d540, viewer_session=<optimized out>, seek_t=seek_t@entry=LTTNG_VIEWER_SEEK_BEGINNING, nb_total=nb_total@entry=0x7f6ea9607b00, nb_unsent=nb_unsent@entry=0x7f6ea9607aec, nb_created=nb_created@entry=0x7f6ea9607ae8, closed=<optimized out>) at live.c:405
  #3  0x000055fbb86061d9 in viewer_get_new_streams (conn=0x7f6e94000fc0) at live.c:1155
  #4  process_control (conn=0x7f6e94000fc0, recv_hdr=0x7f6ea9607af0) at live.c:2353
  lttng#5  thread_worker (data=<optimized out>) at live.c:2515
  lttng#6  0x00007f6eae86a609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
  lttng#7  0x00007f6eae78f293 in clone () from /lib/x86_64-linux-gnu/libc.so.6

The race window during which this occurs seems very small as it can take
hours to reproduce this crash. However, a minimal reproducer could be
identified, as stated in the bug report.

Essentially, the same crash can be reproduced by attaching a live viewer
to a session that has seen events being produced, been stopped and been
cleared.

Cause
=====

The crash occurs as an attempt is made to take a reference to a viewer
session’s trace chunk as viewer streams are created. The crux of the
problem is that the code doesn’t expect a viewer session’s trace chunk
to be NULL.

The viewer session’s current trace chunk is initially set, when a viewer
attaches to the viewer session, to a copy the corresponding
relay_session’s current trace chunk.

A live session always attempts to "catch-up" to the newest available
trace chunk. This means that when a viewer reaches the end of a trace
chunk, the viewer session may not transition to the "next" one: it jumps
to the most recent trace chunk available (the one being produced by the
relay_session). Hence, if the producer performs multiple rotations
before a viewer completes the consumption of a trace chunk, it will skip
over those "intermediary" trace chunks.

A viewer session updates its current trace chunk when:
  1) new viewer streams are created,
  2) a new index is requested,
  3) metadata is requested.

Hence, as a general principle, the viewer session will reference the
most recent trace chunk available _even if its streams do not point to
it_. It indicates which trace chunk viewer streams should transition to
when the end of their current trace chunk is reached.

The live code properly handles transitions to a null chunk. This can be
verified by attaching a viewer to a live session, stopping the session,
clearing it (thus entering a null trace chunk), and resuming tracing.

The only issue is that the case where the first trace chunk of a viewer
session is "null" (no active trace chunk) is mishandled in two places:
  1) in make_viewer_streams(), where the crash is observed,
  2) in viewer_get_metadata().

Solution
========

In make_viewer_streams(), it is assumed that a viewer session will have
a non-null trace chunk whenever a rotation is not ongoing. This is
reflected by the fact that a reference is always acquired on the viewer
session’s trace chunk.

That code is one of the three places that can cause a viewer session’s
trace chunk to be updated. We still want to update the viewer session to
the most recently seen trace chunk (null, in this case). However, there
is no reference to acquire and the trace chunk to use for the creation
of the viewer stream is NULL. This is properly handled by
viewer_stream_create().

The second site to change is viewer_get_metadata() which doesn’t handle
a viewer metadata stream not having an active trace chunk at all.
Thankfully, the protocol allows us to express this condition by
returning the LTTNG_VIEWER_NO_NEW_METADATA status code when a viewer
metadata stream doesn’t have an open file and doesn’t have a current
trace chunk.

Surprisingly, this bug didn’t trigger in the case where a transition to
a null chunk occurred _after_ attaching to a viewer session.

This is because viewers will typically ask for metadata as a result of an
LTTNG_VIEWER_FLAG_NEW_METADATA reply to the GET_NEXT_INDEX command. When
a session is stopped and all data was consumed, this command returns
that no new data is available, causing the viewers to wait and ask again
later.

However, when attaching, babeltrace2 (at least, and probably babeltrace 1.x)
always asks for an initial segment of metadata before asking for an
index.

Known drawbacks
===============

None.

Fixes: #1323

Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: I516fca60755e6897f6b7170c12d706ef57ad61a5
PSRCode pushed a commit that referenced this pull request Feb 24, 2022
Observed issue
==============

When performing:

  #!/bin/bash

  lttng create py_syscalls --live

  lttng enable-event -u -a
  lttng enable-event -k -a

  lttng start

  babeltrace2 -i lttng-live net://localhost/host/raton/py_syscalls

The relay daemon hits this assertion:

  Thread 8 (Thread 0x7fffeeffd700 (LWP 167040) "lttng-relayd"):
  #0  0x00007ffff7b1618b in raise () from /lib/x86_64-linux-gnu/libc.so.6
  #1  0x00007ffff7af5859 in abort () from /lib/x86_64-linux-gnu/libc.so.6
  #2  0x00007ffff7af5729 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
  #3  0x00007ffff7b06f36 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
  #4  0x00005555555889bb in viewer_session_attach (vsession=0x7fffdc001400, session=session@entry=0x7fffe8001180) at viewer-session.c:80
  lttng#5  0x000055555557bcff in viewer_attach_session (conn=0x7fffd0001140) at live.c:1275
  lttng#6  process_control (conn=0x7fffd0001140, recv_hdr=0x7fffeeffcaf0) at live.c:2341
  lttng#7  thread_worker (data=<optimized out>) at live.c:2515
  lttng#8  0x00007ffff7ccd609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
  lttng#9  0x00007ffff7bf2293 in clone () from /lib/x86_64-linux-gnu/libc.so.6

Cause
=====

This assert appears to be entirely wrong.

It checks that the "viewer session" has a NULL current trace chunk when
attaching a session to a viewer session, but in the case where a viewer
session has multiple sessions (e.g. with kernel and ust tracing
combined), we are attaching each session individually to the viewer
session, and we set the current trace chunk of the viewer session when
we attach the first session to it.

So it is expected to be non-NULL when attaching the second session.

Solution
========

Remove the assertion.

Known limitations
=================

None.

Fixes: #1335
Signed-off-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: I4d8f5d5347b4588144ddf449976cae5a94b81b3a
PSRCode added a commit that referenced this pull request Feb 24, 2022
…y_name

Observed issue
==============

The lttng-sessiond asserts with the following backtrace on lttng create:

 #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
 #1  0x00007ffff7ab5859 in __GI_abort () at abort.c:79
 #2  0x00007ffff7ab5729 in __assert_fail_base (fmt=0x7ffff7c4b588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5555556ab0a6 "node_ptr == &node->node", file=0x5555556ab085 "hashtable.c", line=298, function=<optimized out>) at a
 #3  0x00007ffff7ac6f36 in __GI___assert_fail (assertion=assertion@entry=0x5555556ab0a6 "node_ptr == &node->node", file=file@entry=0x5555556ab085 "hashtable.c", line=line@entry=298, function=function@entry=0x5555556ab380 <__PRETTY_FUNCTIO
 #4  0x000055555560be44 in lttng_ht_add_unique_str (ht=<optimized out>, node=0x7fffe0026c58) at hashtable.c:298
 lttng#5  0x000055555558fb6a in add_session_ht (ls=0x7fffe0024970) at session.c:372
 lttng#6  session_create (name=<optimized out>, uid=1000, gid=1000, out_session=out_session@entry=0x7fffedfddbd8) at session.c:1308
 lttng#7  0x000055555559b219 in cmd_create_session_from_descriptor (creds=<optimized out>, creds=<optimized out>, home_path=<optimized out>, descriptor=<optimized out>) at cmd.c:3040
 lttng#8  cmd_create_session (cmd_ctx=cmd_ctx@entry=0x7fffedfe5fa0, sock=<optimized out>, return_descriptor=return_descriptor@entry=0x7fffedfddd68) at cmd.c:3176
 lttng#9  0x00005555555cc341 in process_client_msg (sock_error=0x7fffedfddd10, sock=0x7fffedfddd0c, cmd_ctx=0x7fffedfe5fa0) at client.c:2177
 lttng#10 thread_manage_clients (data=<optimized out>) at client.c:2742
 lttng#11 0x00005555555c5fff in launch_thread (data=0x55555571b780) at thread.c:66
 lttng#12 0x00007ffff7c8b609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#13 0x00007ffff7bb2293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The issue can be reproduced with modifications to the rotation thread
code and the following scenario:

 $ lttng create test
 $ lttng enable-event -u -a
 $ lttng start
 run any app just so that we have a complete valid session. (might not be necessary)
 $ lttng destroy --no-wait
 $ lttng create test
 ^ Should assert here.

The diff to be applied:

 diff --git a/src/bin/lttng-sessiond/rotation-thread.cpp b/src/bin/lttng-sessiond/rotation-thread.cpp
 index ac149c845..c11f068ed 100644
 --- a/src/bin/lttng-sessiond/rotation-thread.cpp
 +++ b/src/bin/lttng-sessiond/rotation-thread.cpp
 @@ -565,6 +565,8 @@ int handle_job_queue(struct rotation_thread_handle *handle,
  {
         int ret = 0;

 +       sleep(5);
 +
         for (;;) {
                 struct ltt_session *session;
                 struct rotation_thread_job *job;

Note that the initial report for this issue was on a system under load
for which the `lttng destroy` completion check failed and a `lttng
create` was performed. As of today the exact reason why the completion
check failed is not known. Still we can "fix" the race leading to the
lttng-sessiond assertion considering a user might use the `--no-wait`
variant of `lttng destroy` and could easily end up in this
situation.

Cause
=====

Note: all `lttng create` commands have the same session name passed as
argument.

On `lttng destroy` the ltt_session object is flagged as destroyed
(ltt_session::destroyed). The removal of the object from the hash
table (ltt_sessions_ht_by_name) will be performed during the
`session_release` which is driven by the session refcount.

A reference on the `ltt_session` object is held for the
rotation initiated by the `lttng destroy` command. The rotation is
enqueued by the rotation thread.

At this point the system is busy and the rotation thread does not run.
We simulate this with a `sleep(5)` during the `handle_job_queue`.

The `lttng destroy --no-wait` returns. If the `--no-wait` option is not
passed the `lttng destroy` command will work as expected and wait for
completion. We can SIGINT the `lttng destroy` command and perform a
`lttng create` yielding the same backtrace.

On `lttng create`, `session_create` validates that the name does not
conflict with an existing session using `session_find_by_name`. It is
important to note that `session_find_by_name` discriminates also on the
`session->destroyed` flag (introduced by [1]).

The `ltt_sessions_ht_by_name` hash table was introduced by [2] to remove
the need to lock the session list to sample a session id during the
queueing of actions to be executed related to a trigger. The assumption
was made that, during the creation phase, the session would
always be unique in that hash table based on its name. This is simply
not true since multiple sessions with the same name can coexist as long
as only a single one is marked as "not destroyed". This is an important
concept due to the refcounting of the object and the feature relying on
the lifetime of the object (i.e rotation). This is mostly valid when
talking about the global session list.

Solution
========

Move the hash table removal earlier during the release of the session
object.

Move the removal from `del_session_ht`, which is done during the
`session_release` function, to the `lttng_session_destroy` function.

It is safe to do so since currently the only user of that hash table
(the action executor) does not care much about destroyed session at that
point.

This ensures that we maintain the uniqueness property of the key (name)
for that hash table on insertion.

The alternative was to expose an hash table that could contain
duplicates and force the handling of a set on all lookups.

Known drawbacks
=========

None.

References
==========
[1] https://git.lttng.org/?p=lttng-tools.git;a=commit;h=e32d7f274604b77bcd83c24994e88df3761ed658
[2] https://git.lttng.org/?p=lttng-tools.git;a=commit;h=e1bbf98908a6399f39a9a8bc95bd8b59cecaa816

Signed-off-by: Jonathan Rajotte <[email protected]>
Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: I2f1d0d6c04ee7210166e9847a850afbe6eaa7609
PSRCode added a commit that referenced this pull request Feb 24, 2022
Observed issue
==============

lttng-consumerd segfaults for the following scenario:

  $ lttng create test --snapshot
  $ lttng enable-event -u -an
  $ lttng start
  # Run an app just to have some events
  $ lttng regenerate metadata
  $ lttng snapshot record

The following backtrace is obtained:

 (gdb) bt
 #0  __GI___pthread_mutex_lock (mutex=0x130) at ../nptl/pthread_mutex_lock.c:67
 #1  0x000055b383cfaed3 in lttng_ustconsumer_recv_metadata (sock=29, key=4, offset=0, len=12391, version=1, channel=0x7fe90000a760, timer=0, wait=1) at ust-consumer.c:1347
 #2  0x000055b383d00197 in lttng_ustconsumer_request_metadata (ctx=0x55b3855a1e50, channel=0x7fe90000a760, timer=0, wait=1) at ust-consumer.c:3336
 #3  0x000055b383cf9e76 in snapshot_metadata (metadata_channel=0x7fe90000a760, key=4, path=0x7fe911a09944 "uid/1000/64-bit", relayd_id=18446744073709551615, ctx=0x55b3855a1e50) at ust-consum
 #4  0x000055b383cfbe73 in lttng_ustconsumer_recv_cmd (ctx=0x55b3855a1e50, sock=28, consumer_sockpoll=0x7fe911a0dbb0) at ust-consumer.c:1853
 lttng#5  0x000055b383ccf9b7 in lttng_consumer_recv_cmd (ctx=0x55b3855a1e50, sock=28, consumer_sockpoll=0x7fe911a0dbb0) at consumer.c:2097
 lttng#6  0x000055b383cd3bfd in consumer_thread_sessiond_poll (data=0x55b3855a1e50) at consumer.c:3284
 lttng#7  0x00007fe914c22609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#8  0x00007fe914b47293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
 (gdb) up
 #1  0x000055b383cfaed3 in lttng_ustconsumer_recv_metadata (sock=29, key=4, offset=0, len=12391, version=1, channel=0x7fe90000a760, timer=0, wait=1) at ust-consumer.c:1347
 1347                    pthread_mutex_lock(&channel->metadata_stream->lock);
 (gdb) print channel->metadata_stream
 $1 = (struct lttng_consumer_stream *) 0x0

Note that the following scenario also leads to a similar backtrace:

  $ lttng create test --snapshot
  $ lttng enable-event -u -an
  $ lttng start
  # Run an app just to have some events with a short duration
  $ lttng regenerate metadata
  # Run a second app just to have some events and a metadata flush while
  # the metadata cache status is set as `invalidated`.
  ^ lttng-consumerd segfault on app termination.

The backtrace:

 (gdb) bt
 #0  __GI___pthread_mutex_lock (mutex=0x130) at ../nptl/pthread_mutex_lock.c:67
 #1  0x00005612a5a13ed3 in lttng_ustconsumer_recv_metadata (sock=28, key=2, offset=0, len=12391, version=1, channel=0x7f3978005400, timer=0, wait=1) at ust-consumer.c:1347
 #2  0x00005612a5a14d0a in lttng_ustconsumer_recv_cmd (ctx=0x5612a6feee50, sock=28, consumer_sockpoll=0x7f3989494bb0) at ust-consumer.c:1818
 #3  0x00005612a59e89b7 in lttng_consumer_recv_cmd (ctx=0x5612a6feee50, sock=28, consumer_sockpoll=0x7f3989494bb0) at consumer.c:2097
 #4  0x00005612a59ecbfd in consumer_thread_sessiond_poll (data=0x5612a6feee50) at consumer.c:3284
 lttng#5  0x00007f398c6a9609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#6  0x00007f398c5ce293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
 (gdb) up
 #1  0x00005612a5a13ed3 in lttng_ustconsumer_recv_metadata (sock=28, key=2, offset=0, len=12391, version=1, channel=0x7f3978005400, timer=0, wait=1) at ust-consumer.c:1347
 1347                    pthread_mutex_lock(&channel->metadata_stream->lock);
 (gdb) print channel->metadata_stream
 $1 = (struct lttng_consumer_stream *) 0x0
 (gdb)

Cause
=====

For session configured in snapshot mode the metadata channel
metadata_stream field is NULL except for a "short" window during the
actual snapshot record action (snapshot_metadata).

The `regenerate metadata` effectively flag the metadata cache as invalid
leading to handling the cache invalidation state
(`CONSUMER_METADATA_CACHE_WRITE_STATUS_INVALIDATED`) in
`lttng_ustconsumer_recv_metadata`. This was introduced by
b1316da [1].

At that point the function expects the `channel->metadata_stream` to be
non null. This is simply not true for snapshot session metadata channels.

Note that we cannot simply swap `lttng_ustconsumer_request_metadata` and
`create_ust_streams` in `snapshot_metadata` to ensure that the
`channel->metadata_stream` is non null since
`lttng_ustconsumer_recv_metadata` ends up being called on metadata flush when
an app quit. This sequence of events corresponds to the second scenario
put forward in the `Observed Issue` section.

Solution
========

Null check `channel->metadata_stream` and perform only the operation
when it is non null. This partly mirror what is done in `consumer_metadata_wakeup_pipe`.

I am not sure if the check on `channel->monitor` is required but it
seems irrelevant to the notion of resetting the stream consumed position
when the stream exists.

With this taken care off, we find ourself with another
backtrace:

 #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50                                                                                                             [93/122]
 #1  0x00007f75cf9b3859 in __GI_abort () at abort.c:79
 #2  0x00007f75cf9b3729 in __assert_fail_base (fmt=0x7f75cfb49588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55ab004e9c68 "pthread_mutex_trylock(&stream->lock)", file=0x55ab004
 #3  0x00007f75cf9c4f36 in __GI___assert_fail (assertion=0x55ab004e9c68 "pthread_mutex_trylock(&stream->lock)", file=0x55ab004e8d7a "ust-consumer.c", line=1285, function=0x55ab004eb8a0 <__PR
 #4  0x000055ab004b1b9c in metadata_stream_reset_cache_consumed_position (stream=0x7f75b400a850) at ust-consumer.c:1285
 lttng#5  0x000055ab004b4fef in commit_one_metadata_packet (stream=0x7f75b400a850) at ust-consumer.c:2551
 lttng#6  0x000055ab004b5f46 in get_next_subbuffer_metadata (stream=0x7f75b400a850, subbuffer=0x7f75cc972630) at ust-consumer.c:2927
 lttng#7  0x000055ab0048b6a9 in lttng_consumer_read_subbuffer (stream=0x7f75b400a850, ctx=0x55ab01d4ee50, locked_by_caller=true) at consumer.c:3522
 lttng#8  0x000055ab004b0f66 in snapshot_metadata (metadata_channel=0x7f75b4005400, key=2, path=0x7f75cc972944 "uid/1000/64-bit", relayd_id=18446744073709551615, ctx=0x55ab01d4ee50) at ust-consum
 lttng#9  0x000055ab004b2e86 in lttng_ustconsumer_recv_cmd (ctx=0x55ab01d4ee50, sock=28, consumer_sockpoll=0x7f75cc976bb0) at ust-consumer.c:1861
 lttng#10 0x000055ab004869b7 in lttng_consumer_recv_cmd (ctx=0x55ab01d4ee50, sock=28, consumer_sockpoll=0x7f75cc976bb0) at consumer.c:2097
 lttng#11 0x000055ab0048abfd in consumer_thread_sessiond_poll (data=0x55ab01d4ee50) at consumer.c:3284
 lttng#12 0x00007f75cfb8b609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#13 0x00007f75cfab0293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Which is also caused in part to the invalidation of the cache.
`metadata_stream_reset_cache_consumed_position` expect that the stream
at that point be locked. Which is not the case despite what the last argument
to `lttng_consumer_read_subbuffer` indicates. To alleviate that we hold
the lock during the call to `lttng_consumer_read_subbuffer`.

Known drawbacks
=========

None.

References
=========

[1] https://git.lttng.org/?p=lttng-tools.git;a=commit;h=b1316da1ffbd276fc8271e7a9438e683ad352781

Signed-off-by: Jonathan Rajotte <[email protected]>
Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: Ib09fdc989b1baae26d0f496afeabc3e7cb27228c
PSRCode pushed a commit that referenced this pull request Feb 24, 2022
Observed issue
==============

While running tests/regression/tools/clear/test_ust test in a loop we
eventually witness the following error:

The symptom on the Babeltrace side is a Connection reset by peer. This
is caused by a relayd abort after an assertion failure due to a
reference count being lower than 0.

Test case:

  # Test ust streaming live clear with viewer
  # Parameters: tracing_active=0, clear_twice=0, buffer_type=pid

Tools commit: https://review.lttng.org/c/lttng-tools/+/6754/1

Relay daemon assertion failure stack trace:

  (gdb) bt
  #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
  #1  0x00007ffa2e850859 in __GI_abort () at abort.c:79
  #2  0x00007ffa2e850729 in __assert_fail_base (fmt=0x7ffa2e9e6588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x56221bdb96f4 "res >= 0",
      file=0x56221bdb97a0 "/root/virtenv/usr/include/urcu/ref.h", line=66, function=<optimized out>) at assert.c:92
  #3  0x00007ffa2e861f36 in __GI___assert_fail (assertion=assertion@entry=0x56221bdb96f4 "res >= 0", file=file@entry=0x56221bdb97a0 "/root/virtenv/usr/include/urcu/ref.h", line=line@entry=66,
      function=function@entry=0x56221bdbd638 <__PRETTY_FUNCTION__.7554> "urcu_ref_put") at assert.c:101
  #4  0x000056221bd6a1cc in urcu_ref_put (release=<optimized out>, ref=0x7ffa24008cb0) at /root/virtenv/usr/include/urcu/ref.h:66
  lttng#5  viewer_stream_put (vstream=vstream@entry=0x7ffa24008cb0) at viewer-stream.c:279
  lttng#6  0x000056221bd5e4c5 in viewer_get_metadata (conn=conn@entry=0x7ffa0c000fc0) at live.c:2211
  lttng#7  0x000056221bd63778 in process_control (conn=0x7ffa0c000fc0, recv_hdr=0x7ffa297c5af0) at live.c:2376
  lttng#8  thread_worker (data=<optimized out>) at live.c:2541
  lttng#9  0x00007ffa2ea28609 in start_thread (arg=<optimized out>) at pthread_create.c:477
  lttng#10 0x00007ffa2e94d293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Babeltrace crash:

11-17 05:13:05.197 47007 47007 D PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER [email protected]:1494 [live] Requesting next index for stream: cmd=GET_NEXT_INDEX, viewer-stream-id=3352
11-17 05:13:05.215 47007 47007 E PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER [email protected]:245 [live] Error receiving from Relay: Connection reset by peer.
11-17 05:13:05.215 47007 47007 E PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER [email protected]:1522 [live] Error receiving get next index reply
11-17 05:13:05.215 47007 47007 D PLUGIN/SRC.CTF.LTTNG-LIVE [email protected]:1069 [live] Returning from advancing live stream iterator: status=LTTNG_LIVE_ITERATOR_STATUS_ERROR, stream-name="stream-3352", viewer-stream-id=3352
11-17 05:13:05.215 47007 47007 E PLUGIN/SRC.CTF.LTTNG-LIVE [email protected]:1802 [live] Error preparing the next batch of messages: live-iter-status=LTTNG_LIVE_ITERATOR_STATUS_ERROR
11-17 05:13:05.216 47007 47007 W LIB/MSG-ITER [email protected]:866 Component input port message iterator's "next" method failed: iter-addr=0x563b7dd07700, iter-upstream-comp-name="live", iter-upstream-comp-log-level=TRACE, iter-upstream-comp-class-type=SOURCE, iter-upstream-comp-class-name="lttng-live", iter-upstream-comp-class-partial-descr="Connect to an LTTng relay daemon", iter-upstream-port-type=OUT
PUT, iter-upstream-port-name="out", status=ERROR
11-17 05:13:05.216 47007 47007 E PLUGIN/FLT.UTILS.MUXER [email protected]:446 [mux] Upstream iterator's next method returned an error: status=ERROR
11-17 05:13:05.216 47007 47007 E PLUGIN/FLT.UTILS.MUXER [email protected]:989 [mux] Cannot validate muxer's upstream message iterator wrapper: muxer-msg-iter-addr=0x563b7dd04740, muxer-upstream-msg-iter-wrap-addr=0x563b7dd0db30
11-17 05:13:05.216 47007 47007 E PLUGIN/FLT.UTILS.MUXER [email protected]:1417 [mux] Cannot get next message: comp-addr=0x563b7dd1a760, muxer-comp-addr=0x563b7dd1b930, muxer-msg-iter-addr=0x563b7dd04740, msg-iter-addr=0x563b7dd07590, status=ERROR
11-17 05:13:05.216 47007 47007 W LIB/MSG-ITER [email protected]:866 Component input port message iterator's "next" method failed: iter-addr=0x563b7dd07590, iter-upstream-comp-name="mux", iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=FILTER, iter-upstream-comp-class-name="muxer", iter-upstream-comp-class-partial-descr="Sort messages from multiple inpu", iter-upstream-port-type=OUTPUT,
 iter-upstream-port-name="out", status=ERROR
11-17 05:13:05.216 47007 47007 W LIB/GRAPH [email protected]:462 Component's "consume" method failed: status=ERROR, comp-addr=0x563b7dd06d20, comp-name="pretty", comp-log-level=WARNING, comp-class-type=SINK, comp-class-name="pretty", comp-class-partial-descr="Pretty-print messages (`text` fo", comp-class-is-frozen=1, comp-class-so-handle-addr=0x563b7dd051b0, comp-class-so-handle-path="/root/virtenv/usr/lib/babeltr
ace2/plugins/babeltrace-plugin-text.so", comp-input-port-count=1, comp-output-port-count=0
11-17 05:13:05.217 47007 47007 E CLI [email protected]:2537 Graph failed to complete successfully

Cause
=====

Both relay and live threads can put the ownership reference on the
metadata viewer stream concurrently without synchronization, thus
leading to a reference count going lower than 0.

The viewer stream ownership design initially planned for being owned by
the live thread, thus allowing the live thread to put the ownership
reference as soon as the associated relay stream is observed as closed,
and the viewer stream is considered as hung up.

However, in the specific case of the metadata viewer stream, the
responsibility of closing the metadata viewer stream is shared between
the relay and live threads, because the viewers expect to observe a
LTTNG_VIEWER_NO_NEW_METADATA message before the metadata stream
hangs up (see comment in viewer_get_metadata()). Therefore, if
viewer_get_metadata() is done before the metadata stream is closed, the
viewer will receive the LTTNG_VIEWER_NO_NEW_METADATA message, and set
the no_new_metadata_notified state to true. It's then the relay thread's
relay_close_stream() which will invoke the ownership put. However,
the live thread may concurrently try to put the viewer stream ownership
as well from a subsequent viewer_get_metadata(), thus leading to a
reference count < 0.

Solution
========

Fix this by putting the ownership reference from the live viewer thread
rather than the relay thread. This can be done by tracking the state of
no_new_metadata_notified within the live viewer thread.

Known drawbacks
===============

This will postpone reclaim of the metadata viewer stream from the
relay stream close to the following viewer_get_metadata (after a
LTTNG_VIEWER_NO_NEW_METADATA message has been replied), which I don't
think is an issue.

Signed-off-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: I9783d3a8402dddaa364a7a69c0764eaf30c16fde
PSRCode pushed a commit that referenced this pull request Jun 10, 2022
Observed issue
==============

When consumer_stream_destroy() is called from, for example, the error
path in setup_metadata(), consumer_stream_free() can end up being called
twice on the same stream.  Since the stream->metadata_bucket is not set
to NULL after being destroyed, it leads to a use-after-free:

 ERROR: AddressSanitizer: heap-use-after-free on address 0x604000000318
 READ of size 8 at 0x604000000318 thread T7
     #0 in metadata_bucket_destroy
     #1 in consumer_stream_free
     #2 in consumer_stream_destroy
     #3 in setup_metadata
     #4 in lttng_ustconsumer_recv_cmd
     lttng#5 in lttng_consumer_recv_cmd
     lttng#6 in consumer_thread_sessiond_poll
     lttng#7 in start_thread nptl/pthread_create.c:481
     lttng#8 in clone (/lib/x86_64-linux-gnu/libc.so.6+0xfcbde)

 0x604000000318 is located 8 bytes inside of 48-byte region [0x604000000310,0x604000000340)
 freed by thread T7 here:
     #0 in __interceptor_free
     #1 in metadata_bucket_destroy
     #2 in consumer_stream_free
     #3 in consumer_stream_destroy
     #4 in clean_channel_stream_list
     lttng#5 in consumer_del_channel
     lttng#6 in consumer_stream_destroy
     lttng#7 in setup_metadata
     lttng#8 in lttng_ustconsumer_recv_cmd
     lttng#9 in lttng_consumer_recv_cmd
     lttng#10 in consumer_thread_sessiond_poll
     lttng#11 in start_thread nptl/pthread_create.c:481

 previously allocated by thread T7 here:
     #0 in __interceptor_calloc
     #1 in zmalloc
     #2 in metadata_bucket_create
     #3 in consumer_stream_enable_metadata_bucketization
     #4 in lttng_ustconsumer_set_stream_ops
     lttng#5 in lttng_ustconsumer_on_recv_stream
     lttng#6 in lttng_consumer_on_recv_stream
     lttng#7 in create_ust_streams
     lttng#8 in ask_channel
     lttng#9 in lttng_ustconsumer_recv_cmd
     lttng#10 in lttng_consumer_recv_cmd
     lttng#11 in consumer_thread_sessiond_poll
     lttng#12 in start_thread nptl/pthread_create.c:481

 Thread T7 created by T0 here:
     #0 in __interceptor_pthread_create
     #1 in main
     #2 in __libc_start_main ../csu/libc-start.c:332

 SUMMARY: AddressSanitizer: heap-use-after-free in metadata_bucket_destroy

This can be easily reproduced by forcing a failure during the setup
of the metadata reproducible using the following change:

  diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c
  index fa1c71299..97ed59632 100644

  --- a/src/common/ust-consumer/ust-consumer.c
  +++ b/src/common/ust-consumer/ust-consumer.c
  @@ -908,8 +908,7 @@ static int setup_metadata(struct lttng_consumer_local_data *ctx, uint64_t key)

           /* Send metadata stream to relayd if needed. */
           if (metadata->metadata_stream->net_seq_idx != (uint64_t) -1ULL) {
  -                ret = consumer_send_relayd_stream(metadata->metadata_stream,
  -                                metadata->pathname);
  +                ret = -1;
                   if (ret < 0) {
                           ret = LTTCOMM_CONSUMERD_ERROR_METADATA;
                           goto error;

Cause
=====

Channels have a list of streams that are being "setup" and are not
yet monitored for consumption. During this setup phase, the streams are
owned by the channel. On destruction of the channel, any stream in that
list will thus be cleaned-up.

When destroying a consumer stream, a reference to its channel is 'put'.
This can result in the destruction of the channel.

In the situation described above, the release of the channel's reference
is done before the stream is removed from the channel's stream list.
This causes the channel's clean-up to invoke (again) the current
stream's clean-up, resulting in the double-free of the metadata bucket.

This problem is present in a number of error paths.

Solution
========

Some error paths already manually removed the consumer stream from it's
channel's stream list before invoking consumer_stream_destroy(). The
various error paths that have to deal with this possible situation are
changed to simply invoke consumer_stream_destroy().

consumer_stream_destroy() is modified to always remove the stream from
its channel's list before performing the rest of the clean-up. This
ensures that those double clean-ups can't occur.

Drawbacks
=========

None.

Reported-by: Vincent Whitchurch <[email protected]>
Tested-by: Vincent Whitchurch <[email protected]>
Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: Ibeca9b675b86fc46be3f57826f7158de4da43df8
PSRCode added a commit that referenced this pull request Jun 10, 2022
Observed issue
==============

Scenario:

gdb lttng-sessiond
  set non-stop
  break rotation-thread.cpp:584
  ^ simulates a slow rotation thread or not scheduled thread.

lttng create test1
lttng enable-event -u -a
lttng start test1
lttng create test2
lttng enable-event -u -a
lttng start test2
lttng destroy test1
   This will hang on rotation pending checks on the CLI side.

In another shell:

lttng destroy test2
   This will hang on rotation pending checks on the CLI side.

Back to gdb
   thread 7
   continue

Results in:

 #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
 #1  0x00007ffff786c859 in __GI_abort () at abort.c:79
 #2  0x00007ffff786c729 in __assert_fail_base (fmt=0x7ffff7a02588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5555556bb148 "count == lttng_ht_get_count(ltt_sessions_ht_by_name)", file=0x5555556bae9f "session.cpp", line=395, function=<optimized out>) at assert.c:92
 #3  0x00007ffff787e006 in __GI___assert_fail (assertion=0x5555556bb148 "count == lttng_ht_get_count(ltt_sessions_ht_by_name)", file=0x5555556bae9f "session.cpp", line=395, function=0x5555556bb129 "int ltt_sessions_ht_empty()") at assert.c:101
 #4  0x0000555555586d59 in ltt_sessions_ht_empty () at session.cpp:395
 lttng#5  0x0000555555586e53 in del_session_ht (ls=0x7fffdc000c30) at session.cpp:418
 lttng#6  0x0000555555588a95 in session_release (ref=0x7fffdc001e50) at session.cpp:999
 lttng#7  0x000055555558620f in urcu_ref_put (ref=0x7fffdc001e50, release=0x5555555886eb <session_release(urcu_ref*)>) at /home/joraj/lttng/master/install/include/urcu/ref.h:68
 lttng#8  0x0000555555588c8f in session_put (session=0x7fffdc000c30) at session.cpp:1048
 lttng#9  0x00005555555bf995 in handle_job_queue (handle=0x55555575d260, state=0x7fffeeffc240, queue=0x555555758960) at rotation-thread.cpp:612
 lttng#10 0x00005555555c05da in thread_rotation (data=0x55555575d260) at rotation-thread.cpp:847
 lttng#11 0x00005555555c3b1c in launch_thread (data=0x55555575d2f0) at thread.cpp:66
 lttng#12 0x00007ffff7a46609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#13 0x00007ffff7969163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Other scenarios can lead to a similar backtrace when using the
`--no-wait` lttng destroy option.

Cause
=====

Since ed41e57 [1], hash table removal
for the session object for the `ltt_sessions_ht_by_name` and
`ltt_sessions_ht_by_name` are "decoupled". Removal from
`ltt_sessions_ht_by_name` is done early in `session_destroy()` while
removal from `ltt_sessions_ht_by_id` is done during `session_release` when
the last reference of a session object is released.

This can leads to `imbalances` between the size of the two hash tables
when multiple sessions are at play.

Solution
========

Rework `ltt_sessions_ht_empty()` to exit early when
`ltt_sessions_ht_by_id` is not empty. Perform a sanity check on
`ltt_sessions_ht_by_name` only when `ltt_sessions_ht_by_id` is empty.

Note
========

Ideally both hash tables' lifetime would be managed separately but it
seems easier in term of initialization to bundle them together for now
considering the limited scope of the `ltt_sessions_ht_by_name` hash
table.

Signed-off-by: Jonathan Rajotte <[email protected]>
Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: I66c459f80298f929add703ac977cccd1da6dd556
PSRCode added a commit that referenced this pull request Jun 10, 2022
Observed issue
==============

Using the notification client from
doc/examples/trigger-condition-event-matches/notification-client.cpp, an
assert is hit when the notification subsystem is under load.

 #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
 #1  0x00007f69eab58859 in __GI_abort () at abort.c:79
 #2  0x00007f69eab58729 in __assert_fail_base (fmt=0x7f69eacee588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7f69eae1d5dd "len > 0", file=0x7f69eae1d5cb "unix.cpp", line=179, function=<optimized out>) at assert.c:92
 #3  0x00007f69eab6a006 in __GI___assert_fail (assertion=0x7f69eae1d5dd "len > 0", file=0x7f69eae1d5cb "unix.cpp", line=179, function=0x7f69eae1d598 "ssize_t lttcomm_recv_unix_sock(int, void*, size_t)") at assert.c:101
 #4  0x00007f69eadd5fe6 in lttcomm_recv_unix_sock (sock=3, buf=0x55da9ecd5f89, len=0) at unix.cpp:179
 lttng#5  0x00007f69ead7df3f in receive_message (channel=0x55da9ecd6ee0) at channel.cpp:64
 lttng#6  0x00007f69ead7e478 in lttng_notification_channel_get_next_notification (channel=0x55da9ecd6ee0, _notification=0x7ffdefed2570) at channel.cpp:279
 lttng#7  0x000055da9e0e742f in main (argc=2, argv=0x7ffdefed2698) at notification-client.cpp:506

 (gdb) frame
 lttng#5  0x00007f69ead7df3f in receive_message (channel=0x55da9ecd6ee0) at channel.cpp:64
 64              ret = lttcomm_recv_unix_sock(channel->socket,

 (gdb) print msg
 $2 = {type = 5 '\005', size = 0, fds = 0, payload = 0x7ffdefed24a8 ""}

The msg type 5 is
`LTTNG_NOTIFICATION_CHANNEL_MESSAGE_TYPE_NOTIFICATION_DROPPED`

Cause
=====

The msg portion of a
`LTTNG_NOTIFICATION_CHANNEL_MESSAGE_TYPE_NOTIFICATION_DROPPED` is indeed
zero. There is no extra payload.

Solution
========

When the msg size is zero, skip the 'payload' reception phase.

Known drawbacks
=========

None.

Signed-off-by: Jonathan Rajotte <[email protected]>
Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: Ibabb922d0e410c9902414a5eabbe04738861d772
PSRCode added a commit that referenced this pull request Jun 10, 2022
Observed issue
==============

During high throughput event notification generation scenarios the
following deadlock happens:

 Thread 14 (Thread 0x7f74b4ff9700 (LWP 76062)):
 #0  __lll_lock_wait (futex=futex@entry=0x56408765dde8, private=0) at lowlevellock.c:52
 #1  0x00007f74c941a0a3 in __GI___pthread_mutex_lock (mutex=0x56408765dde8) at ../nptl/pthread_mutex_lock.c:80
 #2  0x000056408704b207 in run_command_wait (handle=0x56408765ddd0, cmd=0x7f74b4ff7f70) at notification-thread-commands.cpp:31
 #3  0x000056408704bcef in notification_thread_command_remove_tracer_event_source (handle=0x56408765ddd0, tracer_event_source_fd=54) at notification-thread-commands.cpp:319
 #4  0x000056408708a0c1 in delete_ust_app (app=0x7f749c000bf0) at ust-app.cpp:1059
 lttng#5  0x000056408708a511 in delete_ust_app_rcu (head=0x7f749c000ca0) at ust-app.cpp:1122
 lttng#6  0x00007f74c988b4a7 in call_rcu_thread (arg=0x7f74b8004a80) at ../src/urcu-call-rcu-impl.h:369
 lttng#7  0x00007f74c9417609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#8  0x00007f74c933a163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 Thread 13 (Thread 0x7f74b57fa700 (LWP 76047)):
 #0  0x00007f74c933a49e in epoll_wait (epfd=48, events=0x7f74a4000b60, maxevents=2, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
 #1  0x00005640870eafa6 in compat_epoll_wait (events=0x7f74b57f9240, timeout=-1, interruptible=false) at compat/poll.cpp:280
 #2  0x00005640870abb65 in thread_agent_management (data=0x56408765f0b0) at agent-thread.cpp:424
 #3  0x0000564087062b1a in launch_thread (data=0x56408765f150) at thread.cpp:66
 #4  0x00007f74c9417609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#5  0x00007f74c933a163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 Thread 12 (Thread 0x7f74b5ffb700 (LWP 76046)):                                                                                                                                       [630/709]
 #0  0x00007f74c933a49e in epoll_wait (epfd=47, events=0x7f74a0000b60, maxevents=2, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
 #1  0x00005640870eafa6 in compat_epoll_wait (events=0x7f74b5ffa170, timeout=-1, interruptible=false) at compat/poll.cpp:280
 #2  0x00005640870a4095 in thread_application_notification (data=0x56408765ee40) at notify-apps.cpp:78
 #3  0x0000564087062b1a in launch_thread (data=0x56408765eed0) at thread.cpp:66
 #4  0x00007f74c9417609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#5  0x00007f74c933a163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 Thread 11 (Thread 0x7f74b67fc700 (LWP 76045)):
 #0  0x00007f74c933a49e in epoll_wait (epfd=44, events=0x7f74ac000b60, maxevents=2, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
 #1  0x00005640870eafa6 in compat_epoll_wait (events=0x7f74b67fb170, timeout=-1, interruptible=false) at compat/poll.cpp:280
 #2  0x00005640870723db in thread_application_management (data=0x56408765ebd0) at manage-apps.cpp:93
 #3  0x0000564087062b1a in launch_thread (data=0x56408765ec60) at thread.cpp:66
 #4  0x00007f74c9417609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#5  0x00007f74c933a163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 Thread 10 (Thread 0x7f74b6ffd700 (LWP 76044)):
 #0  0x00007f74c933a49e in epoll_wait (epfd=39, events=0x7f74a8000b60, maxevents=2, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
 #1  0x00005640870eafa6 in compat_epoll_wait (events=0x7f74b6ffc130, timeout=-1, interruptible=false) at compat/poll.cpp:280
 #2  0x0000564087070a27 in thread_application_registration (data=0x56408765e940) at register.cpp:214
 #3  0x0000564087062b1a in launch_thread (data=0x56408765e9f0) at thread.cpp:66
 #4  0x00007f74c9417609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#5  0x00007f74c933a163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 Thread 9 (Thread 0x7f74b77fe700 (LWP 76043)):                                                                                                                                        [654/709]
 #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
 #1  0x00005640870c8e25 in futex (uaddr=0x5640871e2800 <ust_cmd_queue>, op=0, val=-1, timeout=0x0, uaddr2=0x0, val3=0) at /home/joraj/lttng/master/install/include/urcu/futex.h:72
 #2  0x00005640870c8e6d in futex_async (uaddr=0x5640871e2800 <ust_cmd_queue>, op=0, val=-1, timeout=0x0, uaddr2=0x0, val3=0) at /home/joraj/lttng/master/install/include/urcu/futex.h:104
 #3  0x00005640870c939a in futex_nto1_wait (futex=0x5640871e2800 <ust_cmd_queue>) at futex.cpp:77
 #4  0x000056408706f2af in thread_dispatch_ust_registration (data=0x56408765e740) at dispatch.cpp:453
 lttng#5  0x0000564087062b1a in launch_thread (data=0x56408765e760) at thread.cpp:66
 lttng#6  0x00007f74c9417609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#7  0x00007f74c933a163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 Thread 8 (Thread 0x7f74b7fff700 (LWP 76042)):
 #0  0x00007f74c933a49e in epoll_wait (epfd=33, events=0x7f74b0000b60, maxevents=2, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
 #1  0x00005640870eafa6 in compat_epoll_wait (events=0x7f74b7ffad40, timeout=-1, interruptible=false) at compat/poll.cpp:280
 #2  0x000056408706c424 in thread_manage_clients (data=0x56408765e4f0) at client.cpp:2528
 #3  0x0000564087062b1a in launch_thread (data=0x56408765e560) at thread.cpp:66
 #4  0x00007f74c9417609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#5  0x00007f74c933a163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 Thread 7 (Thread 0x7f74c4b8f700 (LWP 76041)):                                                                                                                                        [672/709]
 #0  0x00007f74c933a49e in epoll_wait (epfd=31, events=0x7f74bc000b60, maxevents=3, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
 #1  0x00005640870eafa6 in compat_epoll_wait (events=0x7f74c4b8e240, timeout=-1, interruptible=false) at compat/poll.cpp:280
 #2  0x000056408705f2b6 in thread_rotation (data=0x56408765e280) at rotation-thread.cpp:804
 #3  0x0000564087062b1a in launch_thread (data=0x56408765e310) at thread.cpp:66
 #4  0x00007f74c9417609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#5  0x00007f74c933a163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 Thread 6 (Thread 0x7f74c5390700 (LWP 76040)):
 #0  0x00007f74c925f1d2 in __GI___sigtimedwait (set=0x7f74c538f090, info=0x7f74c538f110, timeout=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:29
 #1  0x000056408706138a in thread_timer (data=0x7ffc1fcbe3f0) at timer.cpp:359
 #2  0x0000564087062b1a in launch_thread (data=0x56408765e0a0) at thread.cpp:66
 #3  0x00007f74c9417609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 #4  0x00007f74c933a163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 Thread 5 (Thread 0x7f74c5b91700 (LWP 76039)):
 #0  __libc_write (nbytes=8, buf=0x7f74c5b8fc88, fd=24) at ../sysdeps/unix/sysv/linux/write.c:26
 #1  __libc_write (fd=24, buf=0x7f74c5b8fc88, nbytes=8) at ../sysdeps/unix/sysv/linux/write.c:24
 #2  0x00005640870eeb4f in lttng_write (fd=24, buf=0x7f74c5b8fc88, count=8) at readwrite.cpp:77
 #3  0x000056408704b535 in run_command_no_wait (handle=0x56408765ddd0, in_cmd=0x7f74c5b8fdf0) at notification-thread-commands.cpp:92
 #4  0x000056408704bf49 in notification_thread_client_communication_update (handle=0x56408765ddd0, id=2, transmission_status=CLIENT_TRANSMISSION_STATUS_QUEUED) at notification-thread-command
 lttng#5  0x000056408707bc62 in client_handle_transmission_status (client=0x7f74b80050d0, status=CLIENT_TRANSMISSION_STATUS_QUEUED, user_data=0x7f74b8004410) at action-executor.cpp:258
 lttng#6  0x0000564087057525 in notification_client_list_send_evaluation (client_list=0x7f74b8004df0, trigger=0x7f74b0001030, evaluation=0x7f74b815d1d0, source_object_creds=0x0, client_report=0x5
 lttng#7  0x000056408707bce9 in action_executor_notify_handler (executor=0x7f74b8004410, work_item=0x7f74b815d430, item=0x7f74b80e48e0) at action-executor.cpp:269
 lttng#8  0x000056408707dd6d in action_executor_generic_handler (executor=0x7f74b8004410, work_item=0x7f74b815d430, item=0x7f74b80e48e0) at action-executor.cpp:670
 lttng#9  0x000056408707df01 in action_work_item_execute (executor=0x7f74b8004410, work_item=0x7f74b815d430) at action-executor.cpp:689

 lttng#10 0x000056408707e525 in action_executor_thread (_data=0x7f74b8004410) at action-executor.cpp:771                                                                                   [698/709]
 lttng#11 0x0000564087062b1a in launch_thread (data=0x7f74b80044b0) at thread.cpp:66
 lttng#12 0x00007f74c9417609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#13 0x00007f74c933a163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 Thread 4 (Thread 0x7f74c6392700 (LWP 76038)):
 #0  __lll_lock_wait (futex=futex@entry=0x56408765dde8, private=0) at lowlevellock.c:52
 #1  0x00007f74c941a0a3 in __GI___pthread_mutex_lock (mutex=0x56408765dde8) at ../nptl/pthread_mutex_lock.c:80
 #2  0x0000564087053c89 in handle_notification_thread_command (handle=0x56408765ddd0, state=0x7f74c63911b0) at notification-thread-events.cpp:3142
 #3  0x000056408704ac81 in thread_notification (data=0x56408765ddd0) at notification-thread.cpp:715
 #4  0x0000564087062b1a in launch_thread (data=0x56408765dec0) at thread.cpp:66
 lttng#5  0x00007f74c9417609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 #run_command_no_wait6  0x00007f74c933a163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 Thread 3 (Thread 0x7f74c6b93700 (LWP 76037)):
 #0  0x00007f74c933a49e in epoll_wait (epfd=21, events=0x7f74c0000b60, maxevents=2, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
 #1  0x00005640870eafa6 in compat_epoll_wait (events=0x7f74c6b92170, timeout=-1, interruptible=false) at compat/poll.cpp:280
 #2  0x000056408706400a in thread_manage_health (data=0x56408765db50) at health.cpp:140
 #3  0x0000564087062b1a in launch_thread (data=0x56408765dbf0) at thread.cpp:66
 #4  0x00007f74c9417609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#5  0x00007f74c933a163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 Thread 2 (Thread 0x7f74c7394700 (LWP 76036)):
 #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
 #1  0x00007f74c987d238 in futex (uaddr=0x564087659b10, op=0, val=-1, timeout=0x0, uaddr2=0x0, val3=0) at ../include/urcu/futex.h:72
 #2  futex_async (uaddr=0x564087659b10, op=0, val=-1, timeout=0x0, uaddr2=0x0, val3=0) at ../include/urcu/futex.h:104
 #3  futex_wait (futex=0x564087659b10) at workqueue.c:136
 #4  0x00007f74c987ced2 in workqueue_thread (arg=0x564087659ad0) at workqueue.c:237
 lttng#5  0x00007f74c9417609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#6  0x00007f74c933a163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 Thread 1 (Thread 0x7f74c73cd300 (LWP 76034)):
 #0  0x00007f74c933a49e in epoll_wait (epfd=50, events=0x564087666880, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
 #1  0x00005640870eafa6 in compat_epoll_wait (events=0x7ffc1fcbe280, timeout=-1, interruptible=false) at compat/poll.cpp:280
 #2  0x0000564087062244 in sessiond_wait_for_quit_pipe (timeout_ms=-1) at thread-utils.cpp:83
 #3  0x00005640870127dc in main (argc=1, argv=0x7ffc1fcbe668) at main.cpp:1921

Cause
=====

The event_pipe used to notify the notification poll loop is full and the
lttng_write call blocks with the locks for both the client and the
cmd_queue held.

Solution
========

Go back to using eventfd but without the use of EFD_SEMAPHORE (linux
2.6.30) to continue supporting kernel between 2.6.27 and 2.6.29.

The EFD_SEMAPHORE is emulated with a read, decrement, write as explained
by the initial committer of EFD_SEMAPHORE [1].

Known drawbacks
=========

This does not solve the actual block+lock problem but simply push it
back further. The lttng_write on the eventfd can block when reaching
UINT64_MAX. This would represent, at 1 command queued per ns (which is
ridiculous), ~584 years of queueing without a dequeue operation.

Reference
=======
[1] https://lwn.net/Articles/318151/

Signed-off-by: Jonathan Rajotte <[email protected]>
Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: Ie749c4169708f57463fe3cfab2366f1015bae4e0
PSRCode pushed a commit that referenced this pull request Jun 10, 2022
Issue observed
==============

Address sanitizer reports the following invalid accesses while running
the test_mi test.

❯ ASAN_OPTIONS=detect_odr_violation=0 lttng-sessiond
=================================================================
==289173==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60400000e280 at pc 0x55cbbe35e2e0 bp 0x7f01672f1550 sp 0x7f01672f1540
WRITE of size 4 at 0x60400000e280 thread T13
    #0 0x55cbbe35e2df in mark_thread_as_ready /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/manage-consumer.cpp:32
    #1 0x55cbbe360160 in thread_consumer_management /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/manage-consumer.cpp:267
    #2 0x55cbbe336ac4 in launch_thread /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/thread.cpp:66
    #3 0x7f01729c15c1 in start_thread (/usr/lib/libc.so.6+0x8d5c1)
    #4 0x7f0172a46583 in __clone (/usr/lib/libc.so.6+0x112583)

0x60400000e280 is located 8 bytes to the right of 40-byte region [0x60400000e250,0x60400000e278)
allocated by thread T7 here:
    #0 0x7f01733b1fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x55cbbe33adf3 in zmalloc_internal ../../../src/common/macros.hpp:60
    #2 0x55cbbe33ae03 in thread_notifiers* zmalloc<thread_notifiers>() ../../../src/common/macros.hpp:89
    #3 0x55cbbe3617f9 in launch_consumer_management_thread(consumer_data*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/manage-consumer.cpp:440
    #4 0x55cbbe33cf49 in spawn_consumer_thread /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:188
    lttng#5 0x55cbbe33f7cf in start_consumerd /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:394
    lttng#6 0x55cbbe345713 in process_client_msg /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:1277
    lttng#7 0x55cbbe34d74b in thread_manage_clients /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:2622
    lttng#8 0x55cbbe336ac4 in launch_thread /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/thread.cpp:66
    lttng#9 0x7f01729c15c1 in start_thread (/usr/lib/libc.so.6+0x8d5c1)

Thread T13 created by T7 here:
    #0 0x7f0173353eb7 in __interceptor_pthread_create /usr/src/debug/gcc/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x55cbbe336f9e in lttng_thread_create(char const*, void* (*)(void*), bool (*)(void*), void (*)(void*), void*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/thread.cpp:106
    #2 0x55cbbe3618cc in launch_consumer_management_thread(consumer_data*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/manage-consumer.cpp:453
    #3 0x55cbbe33cf49 in spawn_consumer_thread /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:188
    #4 0x55cbbe33f7cf in start_consumerd /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:394
    lttng#5 0x55cbbe345713 in process_client_msg /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:1277
    lttng#6 0x55cbbe34d74b in thread_manage_clients /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:2622
    lttng#7 0x55cbbe336ac4 in launch_thread /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/thread.cpp:66
    lttng#8 0x7f01729c15c1 in start_thread (/usr/lib/libc.so.6+0x8d5c1)

Thread T7 created by T0 here:
    #0 0x7f0173353eb7 in __interceptor_pthread_create /usr/src/debug/gcc/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x55cbbe336f9e in lttng_thread_create(char const*, void* (*)(void*), bool (*)(void*), void (*)(void*), void*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/thread.cpp:106
    #2 0x55cbbe34eebf in launch_client_thread() /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:2756
    #3 0x55cbbe27f31a in main /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/main.cpp:1838
    #4 0x7f017296130f in __libc_start_call_main (/usr/lib/libc.so.6+0x2d30f)

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/manage-consumer.cpp:32 in mark_thread_as_ready
Shadow bytes around the buggy address:
  0x0c087fff9c00: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c087fff9c10: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c087fff9c20: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c087fff9c30: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c087fff9c40: fa fa fd fd fd fd fd fa fa fa 00 00 00 00 00 fa
=>0x0c087fff9c50:[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff9c60: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff9c70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff9c80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff9c90: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff9ca0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==289173==ABORTING

Cause
=====

The start functions of the various worker threads of the session daemon
are implemented in separate translation units (TU). To make use of the
lttng_thread API, they all define different control structures to
control their shutdown.

Those structures are all named 'thread_notifiers' and are all allocated
using zmalloc<>. The various instances of zmalloc<thread_notifiers> all
end up having the same mangled name (e.g.
_Z7zmallocI16thread_notifiersEPT_v).

At link time, only one instance of zmalloc<thread_notifiers> is kept.
Since those structures all have different layout/sizes, this is
problematic. However, it is an acceptable behaviour according to the ODR
[1].

I first considered making the various memory allocation functions in
macros.hpp 'static' which results in each TU holding the appropriate
specialization of the various functions. While this works, it doesn't
make us ODR-compliant. To make a long story short, a program defining
multiple types sharing the same name, in the same namespace, is
ill-formed.

Another concern is that marking all templated free-functions as static
will eventually result in code bloat.

Solution
========

All structures defined in TUs (but not in a header) are placed in
unnamed namespaces (also called anonymous namespaces) [2].

This results in separate copies of the templated functions being
generated when specialized using a structure in an anonymous
namespace (e.g. _Z7zmallocIN12_GLOBAL__N_116thread_notifiersEEPT_v).

We could have renamed the various `thread_notifiers` structures to give
them different names. However, I found those are not the only structures
sharing a name in different TUs. For instance, the same problem applies
to `struct lttng_index` (index in a stream, index in a map).

I propose we systematically namespace structures defined in TUs in the
future.

This will also save us trouble if those POD structures eventually become
non-POD: we would experience the same "clashes" if those structures had
constructors, for example.

References
==========

[1] https://en.cppreference.com/w/cpp/language/definition
[2] https://en.cppreference.com/w/cpp/language/namespace

Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: I867e5a287ad8cf3ada617335bc1a80b800bf0833
PSRCode pushed a commit that referenced this pull request Jun 10, 2022
LeakSanitizer reports the following leak:

==974957==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7fdb86fcd1b2 in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:164
    #1 0x7fdb86d7c296 in lttng_dynamic_buffer_set_capacity(lttng_dynamic_buffer*, unsigned long) /home/jgalar/EfficiOS/src/lttng-tools/src/common/dynamic-buffer.cpp:159
    #2 0x7fdb86d7c060 in lttng_dynamic_buffer_set_size(lttng_dynamic_buffer*, unsigned long) /home/jgalar/EfficiOS/src/lttng-tools/src/common/dynamic-buffer.cpp:112
    #3 0x7fdb86d2589a in recv_payload_sessiond /home/jgalar/EfficiOS/src/lttng-tools/src/lib/lttng-ctl/lttng-ctl.cpp:230
    #4 0x7fdb86d26fa5 in lttng_ctl_ask_sessiond_payload(lttng_payload_view*, lttng_payload*) /home/jgalar/EfficiOS/src/lttng-tools/src/lib/lttng-ctl/lttng-ctl.cpp:662
    lttng#5 0x7fdb86d2cd8d in lttng_list_tracepoint_fields /home/jgalar/EfficiOS/src/lttng-tools/src/lib/lttng-ctl/lttng-ctl.cpp:1767
    lttng#6 0x56481623cb4c in list_ust_event_fields commands/list.cpp:850
    lttng#7 0x5648162448d9 in cmd_list(int, char const**) commands/list.cpp:2394
    lttng#8 0x56481628fb3e in handle_command /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng/lttng.cpp:238
    lttng#9 0x564816290601 in parse_args /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng/lttng.cpp:427
    lttng#10 0x564816290908 in main /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng/lttng.cpp:476
    lttng#11 0x7fdb8661730f in __libc_start_call_main (/usr/lib/libc.so.6+0x2d30f)

SUMMARY: AddressSanitizer: 32 byte(s) leaked in 1 allocation(s).

The session daemon's reply is indeed never released in
lttng_list_tracepoint_fields.

Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: Idd244b52a69f3b74e5c131c1c36c6ee6d76f4285
PSRCode pushed a commit that referenced this pull request Jun 10, 2022
==1175545==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 8696 byte(s) in 1 object(s) allocated from:
    #0 0x7efed0f39fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x55707ddc6004 in zmalloc_internal ../../../src/common/macros.hpp:60
    #2 0x55707ddceb17 in ltt_ust_session* zmalloc<ltt_ust_session>() ../../../src/common/macros.hpp:89
    #3 0x55707ddc81e7 in trace_ust_create_session(unsigned long) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/trace-ust.cpp:274
    #4 0x55707ddc2bea in test_create_one_ust_session /home/jgalar/EfficiOS/src/lttng-tools/tests/unit/test_ust_data.cpp:63
    lttng#5 0x55707ddc4941 in main /home/jgalar/EfficiOS/src/lttng-tools/tests/unit/test_ust_data.cpp:283
    lttng#6 0x7efed04f930f in __libc_start_call_main (/usr/lib/libc.so.6+0x2d30f)

Indirect leak of 24672 byte(s) in 1 object(s) allocated from:
    #0 0x7efed0f39fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x55707dee4ec1 in zmalloc_internal ../../../src/common/macros.hpp:60
    #2 0x55707def774e in consumer_output* zmalloc<consumer_output>() ../../../src/common/macros.hpp:89
    #3 0x55707dee90df in consumer_create_output(consumer_dst_type) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/consumer.cpp:523
    #4 0x55707ddc8821 in trace_ust_create_session(unsigned long) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/trace-ust.cpp:321
    lttng#5 0x55707ddc2bea in test_create_one_ust_session /home/jgalar/EfficiOS/src/lttng-tools/tests/unit/test_ust_data.cpp:63
    lttng#6 0x55707ddc4941 in main /home/jgalar/EfficiOS/src/lttng-tools/tests/unit/test_ust_data.cpp:283
    lttng#7 0x7efed04f930f in __libc_start_call_main (/usr/lib/libc.so.6+0x2d30f)

Indirect leak of 1024 byte(s) in 1 object(s) allocated from:
    #0 0x7efed0f39fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x7efed0bf985f in alloc_split_items_count /home/jgalar/EfficiOS/src/userspace-rcu/src/rculfhash.c:688
    #2 0x7efed0bf985f in _cds_lfht_new /home/jgalar/EfficiOS/src/userspace-rcu/src/rculfhash.c:1642

Indirect leak of 656 byte(s) in 1 object(s) allocated from:
    #0 0x7efed0f39fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x7efed0bfac68 in __default_alloc_cds_lfht ../src/rculfhash-internal.h:172
    #2 0x7efed0bfac68 in alloc_cds_lfht /home/jgalar/EfficiOS/src/userspace-rcu/src/rculfhash-mm-order.c:81

Indirect leak of 48 byte(s) in 2 object(s) allocated from:
    #0 0x7efed0f39fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x7efed0bfabd4 in cds_lfht_alloc_bucket_table /home/jgalar/EfficiOS/src/userspace-rcu/src/rculfhash-mm-order.c:35
    #2 0x7efed0bfabd4 in cds_lfht_alloc_bucket_table /home/jgalar/EfficiOS/src/userspace-rcu/src/rculfhash-mm-order.c:28

Indirect leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x7efed0f39fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x55707de3a9af in zmalloc_internal ../../src/common/macros.hpp:60
    #2 0x55707de3a9bf in lttng_ht* zmalloc<lttng_ht>() ../../src/common/macros.hpp:89
    #3 0x55707de38461 in lttng_ht_new(unsigned long, lttng_ht_type) hashtable/hashtable.cpp:113
    #4 0x55707dee9340 in consumer_create_output(consumer_dst_type) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/consumer.cpp:535
    lttng#5 0x55707ddc8821 in trace_ust_create_session(unsigned long) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/trace-ust.cpp:321
    lttng#6 0x55707ddc2bea in test_create_one_ust_session /home/jgalar/EfficiOS/src/lttng-tools/tests/unit/test_ust_data.cpp:63
    lttng#7 0x55707ddc4941 in main /home/jgalar/EfficiOS/src/lttng-tools/tests/unit/test_ust_data.cpp:283
    lttng#8 0x7efed04f930f in __libc_start_call_main (/usr/lib/libc.so.6+0x2d30f)

Indirect leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x7efed0f39fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x7efed0bfac15 in cds_lfht_alloc_bucket_table /home/jgalar/EfficiOS/src/userspace-rcu/src/rculfhash-mm-order.c:31

Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: Ib2ad82a197f2a4ccb86ae5799c1d93ff059888e3
PSRCode pushed a commit that referenced this pull request Jun 10, 2022
==1190137==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0x7f40a9d4c1b2 in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:164
    #1 0x55ab716e1def in lttng_dynamic_buffer_set_capacity(lttng_dynamic_buffer*, unsigned long) /home/jgalar/EfficiOS/src/lttng-tools/src/common/dynamic-buffer.cpp:159
    #2 0x55ab716e1956 in lttng_dynamic_buffer_append(lttng_dynamic_buffer*, void const*, unsigned long) /home/jgalar/EfficiOS/src/lttng-tools/src/common/dynamic-buffer.cpp:52
    #3 0x55ab716ca64e in lttng_log_level_rule_serialize(lttng_log_level_rule const*, lttng_payload*) /home/jgalar/EfficiOS/src/lttng-tools/src/common/log-level-rule.cpp:177
    #4 0x55ab716c760f in test_log_level_rule_serialize_deserialize /home/jgalar/EfficiOS/src/lttng-tools/tests/unit/test_log_level_rule.cpp:60
    lttng#5 0x55ab716c8457 in test_log_level_rule_at_least_as_severe_as /home/jgalar/EfficiOS/src/lttng-tools/tests/unit/test_log_level_rule.cpp:177
    lttng#6 0x55ab716c84d3 in main /home/jgalar/EfficiOS/src/lttng-tools/tests/unit/test_log_level_rule.cpp:185
    lttng#7 0x7f40a938830f in __libc_start_call_main (/usr/lib/libc.so.6+0x2d30f)

Direct leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0x7f40a9d4c1b2 in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:164
    #1 0x55ab716e1def in lttng_dynamic_buffer_set_capacity(lttng_dynamic_buffer*, unsigned long) /home/jgalar/EfficiOS/src/lttng-tools/src/common/dynamic-buffer.cpp:159
    #2 0x55ab716e1956 in lttng_dynamic_buffer_append(lttng_dynamic_buffer*, void const*, unsigned long) /home/jgalar/EfficiOS/src/lttng-tools/src/common/dynamic-buffer.cpp:52
    #3 0x55ab716ca64e in lttng_log_level_rule_serialize(lttng_log_level_rule const*, lttng_payload*) /home/jgalar/EfficiOS/src/lttng-tools/src/common/log-level-rule.cpp:177
    #4 0x55ab716c760f in test_log_level_rule_serialize_deserialize /home/jgalar/EfficiOS/src/lttng-tools/tests/unit/test_log_level_rule.cpp:60
    lttng#5 0x55ab716c8135 in test_log_level_rule_exactly /home/jgalar/EfficiOS/src/lttng-tools/tests/unit/test_log_level_rule.cpp:154
    lttng#6 0x55ab716c84ce in main /home/jgalar/EfficiOS/src/lttng-tools/tests/unit/test_log_level_rule.cpp:184
    lttng#7 0x7f40a938830f in __libc_start_call_main (/usr/lib/libc.so.6+0x2d30f)

Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: I2d1eafabbd5c101c188bad8a2137615b29c0ef68
PSRCode pushed a commit that referenced this pull request Jun 10, 2022
==1198508==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 56 byte(s) in 1 object(s) allocated from:
    #0 0x7f8b62634fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x557871869adb in zmalloc_internal ../../src/common/macros.hpp:60
    #2 0x55787186c8a0 in zmalloc<(anonymous namespace)::lttng_rate_policy_once_after_n> ../../src/common/macros.hpp:89
    #3 0x55787186c173 in lttng_rate_policy_once_after_n_create actions/rate-policy.cpp:707
    #4 0x55787186a368 in lttng_rate_policy_once_after_n_create_from_payload actions/rate-policy.cpp:183
    lttng#5 0x55787186ad02 in lttng_rate_policy_create_from_payload(lttng_payload_view*, lttng_rate_policy**) actions/rate-policy.cpp:287
    lttng#6 0x557871865b5b in test_rate_policy_once_after_n /home/jgalar/EfficiOS/src/lttng-tools/tests/unit/test_rate_policy.cpp:231
    lttng#7 0x557871865dc9 in main /home/jgalar/EfficiOS/src/lttng-tools/tests/unit/test_rate_policy.cpp:250
    lttng#8 0x7f8b61c7130f in __libc_start_call_main (/usr/lib/libc.so.6+0x2d30f)

Direct leak of 56 byte(s) in 1 object(s) allocated from:
    #0 0x7f8b62634fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x557871869adb in zmalloc_internal ../../src/common/macros.hpp:60
    #2 0x55787186c890 in zmalloc<(anonymous namespace)::lttng_rate_policy_every_n> ../../src/common/macros.hpp:89
    #3 0x55787186b6cd in lttng_rate_policy_every_n_create actions/rate-policy.cpp:492
    #4 0x55787186a699 in lttng_rate_policy_every_n_create_from_payload actions/rate-policy.cpp:220
    lttng#5 0x55787186ad02 in lttng_rate_policy_create_from_payload(lttng_payload_view*, lttng_rate_policy**) actions/rate-policy.cpp:287
    lttng#6 0x557871864cae in test_rate_policy_every_n /home/jgalar/EfficiOS/src/lttng-tools/tests/unit/test_rate_policy.cpp:122
    lttng#7 0x557871865dc4 in main /home/jgalar/EfficiOS/src/lttng-tools/tests/unit/test_rate_policy.cpp:249
    lttng#8 0x7f8b61c7130f in __libc_start_call_main (/usr/lib/libc.so.6+0x2d30f)

SUMMARY: AddressSanitizer: 112 byte(s) leaked in 2 allocation(s).

Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: I3a9b4d99e93f355ddb8623a289f8397907486ab0
PSRCode pushed a commit that referenced this pull request Jun 10, 2022
==1429021==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0x7fe305f031b2 in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:164
    #1 0x559f1b022238 in lttng_dynamic_buffer_set_capacity(lttng_dynamic_buffer*, unsigned long) /home/jgalar/EfficiOS/src/lttng-tools/src/common/dynamic-buffer.cpp:159
    #2 0x559f1b021d9f in lttng_dynamic_buffer_append(lttng_dynamic_buffer*, void const*, unsigned long) /home/jgalar/EfficiOS/src/lttng-tools/src/common/dynamic-buffer.cpp:52
    #3 0x559f1b02144a in lttng_dynamic_array_add_element(lttng_dynamic_array*, void const*) /home/jgalar/EfficiOS/src/lttng-tools/src/common/dynamic-array.cpp:58
    #4 0x559f1b07d07b in lttng_action_path_copy(lttng_action_path const*, lttng_action_path*) actions/path.cpp:116
    lttng#5 0x559f1b02383f in lttng_error_query_action_create /home/jgalar/EfficiOS/src/lttng-tools/src/common/error-query.cpp:232
    lttng#6 0x559f1b02760e in lttng_error_query_create_from_payload(lttng_payload_view*, lttng_error_query**) /home/jgalar/EfficiOS/src/lttng-tools/src/common/error-query.cpp:911
    lttng#7 0x559f1af5c361 in receive_lttng_error_query /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:740
    lttng#8 0x559f1af64eba in process_client_msg /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:2336
    lttng#9 0x559f1af67378 in thread_manage_clients /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:2624
    lttng#10 0x559f1af50642 in launch_thread /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/thread.cpp:68
    lttng#11 0x7fe3055225c1 in start_thread (/usr/lib/libc.so.6+0x8d5c1)

Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: I7a6f7d2a9746124581eebf30877466f16db67a6b
PSRCode pushed a commit that referenced this pull request Jun 10, 2022
==1480456==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 112 byte(s) in 1 object(s) allocated from:
    #0 0x7fdb9260cfb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x7fdb9242348d in zmalloc_internal ../../src/common/macros.hpp:60
    #2 0x7fdb924295a9 in lttng_trigger* zmalloc<lttng_trigger>() ../../src/common/macros.hpp:89
    #3 0x7fdb92423dbe in lttng_trigger_create /home/jgalar/EfficiOS/src/lttng-tools/src/common/trigger.cpp:58
    #4 0x56304832331f in register_trigger /home/jgalar/EfficiOS/src/lttng-tools/tests/regression/tools/trigger/utils/register-some-triggers.cpp:24
    lttng#5 0x5630483233f1 in register_trigger_action_list_notify /home/jgalar/EfficiOS/src/lttng-tools/tests/regression/tools/trigger/utils/register-some-triggers.cpp:46
    lttng#6 0x5630483239a0 in test_session_rotation_conditions /home/jgalar/EfficiOS/src/lttng-tools/tests/regression/tools/trigger/utils/register-some-triggers.cpp:246
    lttng#7 0x563048323d4d in main /home/jgalar/EfficiOS/src/lttng-tools/tests/regression/tools/trigger/utils/register-some-triggers.cpp:309
    lttng#8 0x7fdb91c6630f in __libc_start_call_main (/usr/lib/libc.so.6+0x2d30f)

Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: Ie163989a70f65f9c2c4e93c36cc9fc6ba6bdeeb5
PSRCode pushed a commit that referenced this pull request Jun 10, 2022
==1501334==ERROR: LeakSanitizer: detected memory leaks

Indirect leak of 16386 byte(s) in 1 object(s) allocated from:
    #0 0x7f95efc3cdd9 in __interceptor_malloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x55acb0681ed3 in lttng_filter_yyalloc(unsigned long, void*) filter/filter-lexer.cpp:2511
    #2 0x55acb067f2f2 in lttng_filter_yy_create_buffer(_IO_FILE*, int, void*) filter/filter-lexer.cpp:1895
    #3 0x55acb067ea44 in yyrestart(_IO_FILE*, void*) filter/filter-lexer.cpp:1824
    #4 0x55acb0649a43 in filter_parser_ctx_alloc(_IO_FILE*) filter/filter-parser.ypp:271
    lttng#5 0x55acb0649e7f in filter_parser_ctx_create_from_filter_expression(char const*, filter_parser_ctx**) filter/filter-parser.ypp:332
    lttng#6 0x55acb058ee89 in parse_event_rule commands/add_trigger.cpp:783
    lttng#7 0x55acb05920c0 in handle_condition_event commands/add_trigger.cpp:1361
    lttng#8 0x55acb0592739 in parse_condition commands/add_trigger.cpp:1457
    lttng#9 0x55acb0596b56 in cmd_add_trigger(int, char const**) commands/add_trigger.cpp:2304
    lttng#10 0x55acb05a5b80 in handle_command /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng/lttng.cpp:238
    lttng#11 0x55acb05a6643 in parse_args /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng/lttng.cpp:427
    lttng#12 0x55acb05a694a in main /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng/lttng.cpp:476
    lttng#13 0x7f95ef28730f in __libc_start_call_main (/usr/lib/libc.so.6+0x2d30f)

Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: I6fa21e7d066e0cf48afc3f91ceefbfd19c6b86fd
PSRCode pushed a commit that referenced this pull request Jun 10, 2022
==1769573==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x7fef37a29fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x7fef37792f2f in zmalloc_internal ../../../src/common/macros.hpp:60
    #2 0x7fef3779573a in lttng_rotation_schedules* zmalloc<lttng_rotation_schedules>() ../../../src/common/macros.hpp:89
    #3 0x7fef377947cc in lttng_rotation_schedules_create /home/jgalar/EfficiOS/src/lttng-tools/src/lib/lttng-ctl/rotate.cpp:353
    #4 0x7fef37794aa0 in get_schedules /home/jgalar/EfficiOS/src/lttng-tools/src/lib/lttng-ctl/rotate.cpp:392
    lttng#5 0x7fef377956dc in lttng_session_list_rotation_schedules /home/jgalar/EfficiOS/src/lttng-tools/src/lib/lttng-ctl/rotate.cpp:665
    lttng#6 0x5646131713f2 in test_add_list_remove_schedule /home/jgalar/EfficiOS/src/lttng-tools/tests/regression/tools/rotation/schedule_api.c:252
    lttng#7 0x56461317157b in test_add_list_remove_size_schedule /home/jgalar/EfficiOS/src/lttng-tools/tests/regression/tools/rotation/schedule_api.c:270
    lttng#8 0x564613171680 in main /home/jgalar/EfficiOS/src/lttng-tools/tests/regression/tools/rotation/schedule_api.c:307
    lttng#9 0x7fef373ae30f in __libc_start_call_main (/usr/lib/libc.so.6+0x2d30f)

Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: I9b7eb537d158791db76f9a7676ffeb5d4a1f2203
PSRCode pushed a commit that referenced this pull request Jun 10, 2022
==1801304==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 224 byte(s) in 2 object(s) allocated from:
    #0 0x7fe0f4e73fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x559fbeb64175 in zmalloc_internal ../../src/common/macros.hpp:60
    #2 0x559fbeb6a291 in lttng_trigger* zmalloc<lttng_trigger>() ../../src/common/macros.hpp:89
    #3 0x559fbeb64aa6 in lttng_trigger_create /home/jgalar/EfficiOS/src/lttng-tools/src/common/trigger.cpp:58
    #4 0x559fbe9dc417 in subscribe_session_consumed_size_rotation(ltt_session*, unsigned long, notification_thread_handle*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/rotate.cpp:87
    lttng#5 0x559fbe995d6f in cmd_rotation_set_schedule(ltt_session*, bool, lttng_rotation_schedule_type, unsigned long, notification_thread_handle*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/cmd.cpp:5993
    lttng#6 0x559fbe9fe559 in process_client_msg /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:2246
    lttng#7 0x559fbea01378 in thread_manage_clients /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:2624
    lttng#8 0x559fbe9ea642 in launch_thread /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/thread.cpp:68
    lttng#9 0x7fe0f44935c1 in start_thread (/usr/lib/libc.so.6+0x8d5c1)

Indirect leak of 208 byte(s) in 2 object(s) allocated from:
    #0 0x7fe0f4e73fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x559fbeb16e21 in zmalloc_internal ../../src/common/macros.hpp:60
    #2 0x559fbeb16e31 in lttng_action_notify* zmalloc<lttng_action_notify>() ../../src/common/macros.hpp:89
    #3 0x559fbeb168a0 in lttng_action_notify_create actions/notify.cpp:135
    #4 0x559fbe9dc34b in subscribe_session_consumed_size_rotation(ltt_session*, unsigned long, notification_thread_handle*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/rotate.cpp:80
    lttng#5 0x559fbe995d6f in cmd_rotation_set_schedule(ltt_session*, bool, lttng_rotation_schedule_type, unsigned long, notification_thread_handle*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/cmd.cpp:5993
    lttng#6 0x559fbe9fe559 in process_client_msg /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:2246
    lttng#7 0x559fbea01378 in thread_manage_clients /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:2624
    lttng#8 0x559fbe9ea642 in launch_thread /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/thread.cpp:68
    lttng#9 0x7fe0f44935c1 in start_thread (/usr/lib/libc.so.6+0x8d5c1)

Indirect leak of 160 byte(s) in 2 object(s) allocated from:
    #0 0x7fe0f4e73fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x559fbeb3d7a1 in zmalloc_internal ../../src/common/macros.hpp:60
    #2 0x559fbeb3fa35 in lttng_condition_session_consumed_size* zmalloc<lttng_condition_session_consumed_size>() ../../src/common/macros.hpp:89
    #3 0x559fbeb3e6fd in lttng_condition_session_consumed_size_create conditions/session-consumed-size.cpp:206
    #4 0x559fbe9dc0f1 in subscribe_session_consumed_size_rotation(ltt_session*, unsigned long, notification_thread_handle*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/rotate.cpp:54
    lttng#5 0x559fbe995d6f in cmd_rotation_set_schedule(ltt_session*, bool, lttng_rotation_schedule_type, unsigned long, notification_thread_handle*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/cmd.cpp:5993
    lttng#6 0x559fbe9fe559 in process_client_msg /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:2246
    lttng#7 0x559fbea01378 in thread_manage_clients /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:2624
    lttng#8 0x559fbe9ea642 in launch_thread /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/thread.cpp:68
    lttng#9 0x7fe0f44935c1 in start_thread (/usr/lib/libc.so.6+0x8d5c1)

Indirect leak of 112 byte(s) in 2 object(s) allocated from:
    #0 0x7fe0f4e73fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x559fbeb242ad in zmalloc_internal ../../src/common/macros.hpp:60
    #2 0x559fbeb27062 in zmalloc<(anonymous namespace)::lttng_rate_policy_every_n> ../../src/common/macros.hpp:89
    #3 0x559fbeb25e9f in lttng_rate_policy_every_n_create actions/rate-policy.cpp:492
    #4 0x559fbeb168b9 in lttng_action_notify_create actions/notify.cpp:141
    lttng#5 0x559fbe9dc34b in subscribe_session_consumed_size_rotation(ltt_session*, unsigned long, notification_thread_handle*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/rotate.cpp:80
    lttng#6 0x559fbe995d6f in cmd_rotation_set_schedule(ltt_session*, bool, lttng_rotation_schedule_type, unsigned long, notification_thread_handle*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/cmd.cpp:5993
    lttng#7 0x559fbe9fe559 in process_client_msg /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:2246
    lttng#8 0x559fbea01378 in thread_manage_clients /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:2624
    lttng#9 0x559fbe9ea642 in launch_thread /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/thread.cpp:68
    lttng#10 0x7fe0f44935c1 in start_thread (/usr/lib/libc.so.6+0x8d5c1)

Indirect leak of 34 byte(s) in 2 object(s) allocated from:
    #0 0x7fe0f4e19319 in __interceptor_strdup /usr/src/debug/gcc/libsanitizer/asan/asan_interceptors.cpp:454
    #1 0x559fbeb3f603 in lttng_condition_session_consumed_size_set_session_name conditions/session-consumed-size.cpp:442
    #2 0x559fbe9dc2c4 in subscribe_session_consumed_size_rotation(ltt_session*, unsigned long, notification_thread_handle*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/rotate.cpp:71
    #3 0x559fbe995d6f in cmd_rotation_set_schedule(ltt_session*, bool, lttng_rotation_schedule_type, unsigned long, notification_thread_handle*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/cmd.cpp:5993
    #4 0x559fbe9fe559 in process_client_msg /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:2246
    lttng#5 0x559fbea01378 in thread_manage_clients /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:2624
    lttng#6 0x559fbe9ea642 in launch_thread /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/thread.cpp:68
    lttng#7 0x7fe0f44935c1 in start_thread (/usr/lib/libc.so.6+0x8d5c1)

The rotation trigger of a session (used for size-based rotations) is
never cleaned-up. It is now cleaned up every time its condition is
hit and whenever the session is destroyed.

Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: I5a89341535f87b7851b548ded9838c18bd1ccb95
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants