Skip to content

Commit

Permalink
Fix: ust: segfault on lttng start on filter bytecode copy
Browse files Browse the repository at this point in the history
Observed issue
==============

A segmentation fault is observed for multiple UST timeout scenarios.

Backtrace:

 #0  __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:384
 #1  0x0000557fe0395df9 in copy_filter_bytecode (orig_f=0x7f9c5802b790) at ust-app.c:1196
 #2  0x0000557fe0397702 in shadow_copy_event (ua_event=0x7f9c58025ff0, uevent=0x7f9c58033560) at ust-app.c:1824
 #3  0x0000557fe039ac46 in create_ust_app_event (ua_sess=0x7f9c5802ec20, ua_chan=0x7f9c58025cc0, uevent=0x7f9c58033560, app=0x7f9c5c001da0) at ust-app.c:3192
 #4  0x0000557fe03a054d in ust_app_channel_synchronize_event (ua_chan=0x7f9c58025cc0, uevent=0x7f9c58033560, ua_sess=0x7f9c5802ec20, app=0x7f9c5c001da0) at ust-app.c:5096
 lttng#5  0x0000557fe03a0772 in ust_app_synchronize (usess=0x7f9c580074a0, app=0x7f9c5c001da0) at ust-app.c:5173
 lttng#6  0x0000557fe03a0a70 in ust_app_global_update (usess=0x7f9c580074a0, app=0x7f9c5c001da0) at ust-app.c:5255
 lttng#7  0x0000557fe03a00e0 in ust_app_start_trace_all (usess=0x7f9c580074a0) at ust-app.c:4987
 lttng#8  0x0000557fe0355c6a in cmd_start_trace (session=0x7f9c5800a190) at cmd.c:2668
 lttng#9  0x0000557fe0382e70 in process_client_msg (cmd_ctx=0x7f9c58003d70, sock=0x7f9c74bf44e0, sock_error=0x7f9c74bf44e4) at client.c:1527
 lttng#10 0x0000557fe03848a2 in thread_manage_clients (data=0x557fe06d9440) at client.c:2200
 lttng#11 0x0000557fe037d1cb in launch_thread (data=0x557fe06d94b0) at thread.c:75
 lttng#12 0x00007f9c796af609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 lttng#13 0x00007f9c795b6293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The scenario:

  # Start an instrumented app
  ./app
  gdb lttng-sessiond
  # put a breakpoint on ustctl_set_filter
  lttng create my_session
  lttng enable-event -u tp:tp_test
  lttng start
  lttng enable-event -u __dummy --filter 'my_field == "user34"'
  # The tracepoint should hit. Do not continue.
  kill -s SIGSTOP $(pgrep app)
  # Continue lttng-sessiond.
  # enable-event will return an error. This a bug in itself, still let's
  # continue with the current bug.
  lttng stop
  # Start a new app that will register.
  ./app &
  sleep 1
  lttng start
  # lttng-sessiond should segfault.

Cause
=====

During the "lttng enable-event" command, the timeout error bubbles up
all the way to event_ust_enable_tracepoint and is different from
LTTNG_UST_ERR_EXIST. `trace_ust_destroy_event` is called and frees the
`uevent` object. Note that contrary to the comment `uevent` is added to
the channel event hash table at this point.

On the next `lttng start` command, the event node is still present in
the hash table and is iterated on. lttng-sessiond segfault on the first
data access of the previously freed memory.

The problem was introduced by commit
88e3c2f [1]. Which essentially move the
callsite of `add_unique_ust_event` before `ust_app_*_event_glb` calls.

Solution
========

Go to `end` label to prevent freeing of the uevent object.

Note that app synchronization should not force an error at the channel
level, since a single app can fail but the whole channel should not.

The `error` label is now obsolete.

Known drawbacks
=========

None.

References
==========

[1] lttng@88e3c2f

Signed-off-by: Jonathan Rajotte <[email protected]>
Change-Id: Ifaf3f4c71bb2da869c7b441aaa4b367f8f7cbdd6
Signed-off-by: Jérémie Galarneau <[email protected]>
  • Loading branch information
PSRCode authored and jgalar committed Oct 18, 2021
1 parent f3dcd8f commit 396a442
Showing 1 changed file with 2 additions and 21 deletions.
23 changes: 2 additions & 21 deletions src/bin/lttng-sessiond/event.c
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ int event_ust_enable_tracepoint(struct ltt_ust_session *usess,
filter = NULL;
exclusion = NULL;
if (ret != LTTNG_OK) {
goto error;
goto end;
}

/* Valid to set it after the goto error since uevent is still NULL */
Expand Down Expand Up @@ -208,11 +208,10 @@ int event_ust_enable_tracepoint(struct ltt_ust_session *usess,
if (ret < 0) {
if (ret == -LTTNG_UST_ERR_EXIST) {
ret = LTTNG_ERR_UST_EVENT_EXIST;
goto end;
} else {
ret = LTTNG_ERR_UST_ENABLE_FAIL;
goto error;
}
goto end;
}

DBG("Event UST %s %s in channel %s", uevent->attr.name,
Expand All @@ -226,24 +225,6 @@ int event_ust_enable_tracepoint(struct ltt_ust_session *usess,
free(filter);
free(exclusion);
return ret;

error:
/*
* Only destroy event on creation time (not enabling time) because if the
* event is found in the channel (to_create == 0), it means that at some
* point the enable_event worked and it's thus valid to keep it alive.
* Destroying it also implies that we also destroy it's shadow copy to sync
* everyone up.
*/
if (to_create) {
/* In this code path, the uevent was not added to the hash table */
trace_ust_destroy_event(uevent);
}
rcu_read_unlock();
free(filter_expression);
free(filter);
free(exclusion);
return ret;
}

/*
Expand Down

0 comments on commit 396a442

Please sign in to comment.