From 396a4421101290589e37e5bc606186efe3757073 Mon Sep 17 00:00:00 2001 From: Jonathan Rajotte Date: Mon, 12 Jul 2021 16:44:38 -0400 Subject: [PATCH] Fix: ust: segfault on lttng start on filter bytecode copy MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Observed issue ============== A segmentation fault is observed for multiple UST timeout scenarios. Backtrace: #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:384 #1 0x0000557fe0395df9 in copy_filter_bytecode (orig_f=0x7f9c5802b790) at ust-app.c:1196 #2 0x0000557fe0397702 in shadow_copy_event (ua_event=0x7f9c58025ff0, uevent=0x7f9c58033560) at ust-app.c:1824 #3 0x0000557fe039ac46 in create_ust_app_event (ua_sess=0x7f9c5802ec20, ua_chan=0x7f9c58025cc0, uevent=0x7f9c58033560, app=0x7f9c5c001da0) at ust-app.c:3192 #4 0x0000557fe03a054d in ust_app_channel_synchronize_event (ua_chan=0x7f9c58025cc0, uevent=0x7f9c58033560, ua_sess=0x7f9c5802ec20, app=0x7f9c5c001da0) at ust-app.c:5096 #5 0x0000557fe03a0772 in ust_app_synchronize (usess=0x7f9c580074a0, app=0x7f9c5c001da0) at ust-app.c:5173 #6 0x0000557fe03a0a70 in ust_app_global_update (usess=0x7f9c580074a0, app=0x7f9c5c001da0) at ust-app.c:5255 #7 0x0000557fe03a00e0 in ust_app_start_trace_all (usess=0x7f9c580074a0) at ust-app.c:4987 #8 0x0000557fe0355c6a in cmd_start_trace (session=0x7f9c5800a190) at cmd.c:2668 #9 0x0000557fe0382e70 in process_client_msg (cmd_ctx=0x7f9c58003d70, sock=0x7f9c74bf44e0, sock_error=0x7f9c74bf44e4) at client.c:1527 #10 0x0000557fe03848a2 in thread_manage_clients (data=0x557fe06d9440) at client.c:2200 #11 0x0000557fe037d1cb in launch_thread (data=0x557fe06d94b0) at thread.c:75 #12 0x00007f9c796af609 in start_thread (arg=) at pthread_create.c:477 #13 0x00007f9c795b6293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 The scenario: # Start an instrumented app ./app gdb lttng-sessiond # put a breakpoint on ustctl_set_filter lttng create my_session lttng enable-event -u tp:tp_test lttng start lttng enable-event -u __dummy --filter 'my_field == "user34"' # The tracepoint should hit. Do not continue. kill -s SIGSTOP $(pgrep app) # Continue lttng-sessiond. # enable-event will return an error. This a bug in itself, still let's # continue with the current bug. lttng stop # Start a new app that will register. ./app & sleep 1 lttng start # lttng-sessiond should segfault. Cause ===== During the "lttng enable-event" command, the timeout error bubbles up all the way to event_ust_enable_tracepoint and is different from LTTNG_UST_ERR_EXIST. `trace_ust_destroy_event` is called and frees the `uevent` object. Note that contrary to the comment `uevent` is added to the channel event hash table at this point. On the next `lttng start` command, the event node is still present in the hash table and is iterated on. lttng-sessiond segfault on the first data access of the previously freed memory. The problem was introduced by commit 88e3c2f5610b9ac89b0923d448fee34140fc46fb [1]. Which essentially move the callsite of `add_unique_ust_event` before `ust_app_*_event_glb` calls. Solution ======== Go to `end` label to prevent freeing of the uevent object. Note that app synchronization should not force an error at the channel level, since a single app can fail but the whole channel should not. The `error` label is now obsolete. Known drawbacks ========= None. References ========== [1] https://github.com/lttng/lttng-tools/commit/88e3c2f5610b9ac89b0923d448fee34140fc46fb Signed-off-by: Jonathan Rajotte Change-Id: Ifaf3f4c71bb2da869c7b441aaa4b367f8f7cbdd6 Signed-off-by: Jérémie Galarneau --- src/bin/lttng-sessiond/event.c | 23 ++--------------------- 1 file changed, 2 insertions(+), 21 deletions(-) diff --git a/src/bin/lttng-sessiond/event.c b/src/bin/lttng-sessiond/event.c index 189236beb1..ffeaa72f4a 100644 --- a/src/bin/lttng-sessiond/event.c +++ b/src/bin/lttng-sessiond/event.c @@ -173,7 +173,7 @@ int event_ust_enable_tracepoint(struct ltt_ust_session *usess, filter = NULL; exclusion = NULL; if (ret != LTTNG_OK) { - goto error; + goto end; } /* Valid to set it after the goto error since uevent is still NULL */ @@ -208,11 +208,10 @@ int event_ust_enable_tracepoint(struct ltt_ust_session *usess, if (ret < 0) { if (ret == -LTTNG_UST_ERR_EXIST) { ret = LTTNG_ERR_UST_EVENT_EXIST; - goto end; } else { ret = LTTNG_ERR_UST_ENABLE_FAIL; - goto error; } + goto end; } DBG("Event UST %s %s in channel %s", uevent->attr.name, @@ -226,24 +225,6 @@ int event_ust_enable_tracepoint(struct ltt_ust_session *usess, free(filter); free(exclusion); return ret; - -error: - /* - * Only destroy event on creation time (not enabling time) because if the - * event is found in the channel (to_create == 0), it means that at some - * point the enable_event worked and it's thus valid to keep it alive. - * Destroying it also implies that we also destroy it's shadow copy to sync - * everyone up. - */ - if (to_create) { - /* In this code path, the uevent was not added to the hash table */ - trace_ust_destroy_event(uevent); - } - rcu_read_unlock(); - free(filter_expression); - free(filter); - free(exclusion); - return ret; } /*