-
Notifications
You must be signed in to change notification settings - Fork 9
Separate trace collection from trace decoding. #99
Conversation
I've tried to comment on things that I think are either new-ish or are easily changed. Nothing major. |
Just one comment left! |
Please squash. |
splat |
bors r+ |
99: Separate trace collection from trace decoding. r=ltratt a=vext01 Co-authored-by: Edd Barrett <[email protected]>
99: Separate trace collection from trace decoding. r=ltratt a=vext01 Co-authored-by: Edd Barrett <[email protected]>
Build failed: |
Just had a quick look at this. We are maxing out the trace buffer, which is currently set (by default) to 1024 pages (a page is 4096 bytes by default on a typical linux system, so that's about 4MiB, by my calculations). The test in question does in indeed test a "long" trace: it calls Shall we try bumping the default trace buffer size? |
Yes, 4MiB is very small. Just make it 64MiB (at least for testing purposes) and be done with it IMHO. |
Something like that? (untested) |
Actually, are we sure pages are a fixed size? I don't think that's true anymore, at least on Linux. We probably have to read the page size value from somewhere to calculate this. |
Page size is configurable. That's what was implied by "typical Linux system" in the commit message. We can use the (surprisingly well-named) |
Yes we should. There's no point not doing so as we'll simply forget and confuse our future selves! |
a0ba640 takes into account variable page sizes. 978aa7f fixes the intermittent test failures we spoke about in person (the commit message addresses some of the questions we had). In doing the latter I've found a nasty bug, which I will raise an issue to explain. We should deal with this in a separate PR. |
Also pushed af4f168 as a result of in-person conversation. |
src/collect/mod.rs
Outdated
@@ -25,6 +25,10 @@ const PERF_DFLT_INITIAL_TRACE_BUFSIZE: size_t = 1024 * 1024; // 1MiB | |||
/// The interface offered by all trace collectors. | |||
pub trait TraceCollector: Send + Sync { | |||
/// Return a `ThreadTraceCollector` for tracing the current thread. | |||
/// | |||
/// Note that an instance of `ThreadTraceCollector` is not thread local, and an attempt to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this really difficult to parse. Can we simplify it to "You must not start two TraceCollector
s in the same thread. Doing so is is undefined behaviour". Also I think this function (or the trait?) should be unsafe
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You must not start two TraceCollectors in the same thread
But that's not strictly true. You can start two different ThreadTraceCollectors
on the same thread as long as their "tracing sessions" don't overlap (in terms of time).
unsafe
Yep, forgot about that one!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking yesterday: why does a user facing ThreadTraceCollector
even exist? If we had TraceCollector::start_thread_collector()
, then this issue (I think) disappears? There can we a hidden thread-local ThreadTraceCollector
and the problem described in #101 I think disappears too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the internal API, so I'm happy with whatever change you suggest (but I do want the comments to be understandable by those of us whose English is not as good as yours :)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually OK with making a slightly stronger restriction than what the (current) API allows, so how about something really simple, like:
You may only call
thread_tracer()
once per thread. Calling it more than once per-thread will result in undefined behavior.
This will force the user to only ever have one hanging around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you may only obtain a thread collector once (but the collector is reusable).
overly restricts the user, but is very easy to explain. In other words, there are cases where it is safe to obtain > 1 thread collector, but if we tell the user they must obtain only one, it rules all unsafe cases out.
The test suite cannot abide by these rules though.
A thread may obtain multiple ThreadTraceCollectors but must only collect a trace with one at a time.
precisely describes when it is safe to obtain >1 thread collector.
I hope that makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[The more I see the confusion, the more I want to fix the API]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can fix the API without difficulty, that's probably more productive than trying to work out how to encode a complex condition in prose!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it will take that long to fix, but I would like to do it as a fresh PR if that's OK.
Are you OK with putting a placeholder comment here for now if I promise that my next PR fixes the API?
We could use:
A thread may obtain multiple ThreadTraceCollectors but must only collect a trace with one at a time.
Even if it's a little weak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Actually I already have a boring docstring branch to go in after this one -- if that one could go in next, I could save time resolving conflicts later)
@@ -66,7 +66,7 @@ | |||
#define SYSFS_PT_TYPE "/sys/bus/event_source/devices/intel_pt/type" | |||
#define MAX_PT_TYPE_STR 8 | |||
|
|||
#define MAX_OPEN_PERF_TRIES 20000 | |||
#define MAX_OPEN_PERF_TRIES 50000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest that adding a pthread_yield
or sched_yield
(or even just "sleep 100ms" or similar!) wherever this loop is will be more robust: it will make it much more likely that the other thread acquiring the lock will succeed before we try again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do, although I thought yielding was kind of frowned upon? (don't ask me why, I have a vague recollection of reading something on a mailing list)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(btw, there's already a sleep:
hwtracer/src/collect/perf/collect.c
Line 70 in 978aa7f
#define OPEN_PERF_WAIT_NSECS 1000 * 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yielding doesn't always do what people expect (e.g. in spinlocks).
1000 * 30
nanoseconds is 1/33333th of a second -- under such circumstances sleep
might not even do anything! I'm fine with sleeping, but let's make it least 1/100th of a second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me. I don't remember where these numbers even came from TBH.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went digging. Here's the discussion on the "first version" of # of iterations and how long to sleep:
#43 (comment)
We have bumped the numbers on a couple of occasions since though.
Not sure if that changes anything? Shall we still use 1/100th of a sec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'd use at least 1/100th of a second, perhaps as much as 1/10th. In the grand scheme of things, small delays like that don't really slow down the test suite to any noticeable degree.
I've pushed what I propose we do for this PR. For clarity, next PRs:
|
I have a branch with a preliminary API redesign now. Let's move this one forward? |
Please squash. |
splat. |
bors r+ |
99: Separate trace collection from trace decoding. r=ltratt a=vext01 Co-authored-by: Edd Barrett <[email protected]>
Build failed: |
|
OK, so strum has pulled in this unicode license and the ISC license. with the unicode one says:
Are we a "copy of the software"? |
IMHO this license is incredibly unclear (I've encountered it in another project) but the only plausible thing we can do is to allow it in |
Or eliminate strum from our code. Which would be a shame... |
OK with those fixes? |
Please squash. |
We are about to add our own trace decoder (as an alternative to libipt) to hwtracer. In order to do this we need the ability to have separate notions of trace collection and trace decoding. In other words, instead of having a `PerfPT backend`, we now have a `PerfTraceCollector` and a `LibIPTTraceDecoder`. (There was also something called a `Dummy` backend, which I saw no utility to, and have thus removed) This is a large change, that will be hard to review I'm afraid. In doing this work I've realised how crusty and unidiomatic some of the hwtracer code is. Regardless I've tried hard to resist the urge to uncrustify unrelated parts of code. I will fix some of this stuff in follow up PRs.
This makes the trace buffer 64MiB, adjusts the sleep interval between attempts to open perf and ups the number of attempts before giving up.
ykjit#101 We will need to fix this API in the near future.
splat. |
bors r+ |
Build succeeded: |
We are about to add our own trace decoder (as an alternative to libipt) to hwtracer. In order to do this we need the ability to have separate notions of trace collection and trace decoding.
In other words, instead of having a
PerfPT
backend, we now have aPerfTraceCollector
and aLibIPTTraceDecoder
.(There was also something called a
Dummy
backend, which I saw no utility to, and have thus removed)This is a large change, that will be hard to review I'm afraid.
In doing this work I've realised how crusty and unidiomatic some of the hwtracer code is. Regardless I've tried hard to resist the urge to uncrustify unrelated parts of code. I will fix some of this stuff in follow up PRs.
yk
companion: ykjit/yk#580