Skip to content
This repository has been archived by the owner on Nov 9, 2022. It is now read-only.

Separate trace collection from trace decoding. #99

Merged
merged 5 commits into from
Oct 20, 2022
Merged

Conversation

vext01
Copy link
Collaborator

@vext01 vext01 commented Oct 14, 2022

We are about to add our own trace decoder (as an alternative to libipt) to hwtracer. In order to do this we need the ability to have separate notions of trace collection and trace decoding.

In other words, instead of having a PerfPT backend, we now have a PerfTraceCollector and a LibIPTTraceDecoder.

(There was also something called a Dummy backend, which I saw no utility to, and have thus removed)

This is a large change, that will be hard to review I'm afraid.

In doing this work I've realised how crusty and unidiomatic some of the hwtracer code is. Regardless I've tried hard to resist the urge to uncrustify unrelated parts of code. I will fix some of this stuff in follow up PRs.

yk companion: ykjit/yk#580

src/collect/mod.rs Outdated Show resolved Hide resolved
src/collect/mod.rs Outdated Show resolved Hide resolved
src/collect/perf/mod.rs Outdated Show resolved Hide resolved
src/decode/libipt/mod.rs Outdated Show resolved Hide resolved
src/decode/mod.rs Outdated Show resolved Hide resolved
@ltratt
Copy link
Contributor

ltratt commented Oct 14, 2022

I've tried to comment on things that I think are either new-ish or are easily changed. Nothing major.

@ltratt
Copy link
Contributor

ltratt commented Oct 14, 2022

Just one comment left!

@ltratt
Copy link
Contributor

ltratt commented Oct 14, 2022

Please squash.

@vext01
Copy link
Collaborator Author

vext01 commented Oct 14, 2022

splat

@ltratt
Copy link
Contributor

ltratt commented Oct 14, 2022

bors r+

bors bot added a commit that referenced this pull request Oct 14, 2022
99: Separate trace collection from trace decoding. r=ltratt a=vext01



Co-authored-by: Edd Barrett <[email protected]>
bors bot added a commit that referenced this pull request Oct 14, 2022
99: Separate trace collection from trace decoding. r=ltratt a=vext01



Co-authored-by: Edd Barrett <[email protected]>
@bors
Copy link
Contributor

bors bot commented Oct 14, 2022

Build failed:

@vext01
Copy link
Collaborator Author

vext01 commented Oct 17, 2022

Just had a quick look at this.

We are maxing out the trace buffer, which is currently set (by default) to 1024 pages (a page is 4096 bytes by default on a typical linux system, so that's about 4MiB, by my calculations).

The test in question does in indeed test a "long" trace: it calls SystemTime::now().elapsed().unwrap().subsec_nanos() in a loop 3000 times.

Shall we try bumping the default trace buffer size?

@ltratt
Copy link
Contributor

ltratt commented Oct 17, 2022

Yes, 4MiB is very small. Just make it 64MiB (at least for testing purposes) and be done with it IMHO.

@vext01
Copy link
Collaborator Author

vext01 commented Oct 17, 2022

Something like that? (untested)

@ltratt
Copy link
Contributor

ltratt commented Oct 17, 2022

Actually, are we sure pages are a fixed size? I don't think that's true anymore, at least on Linux. We probably have to read the page size value from somewhere to calculate this.

@vext01
Copy link
Collaborator Author

vext01 commented Oct 17, 2022

Page size is configurable. That's what was implied by "typical Linux system" in the commit message.

We can use the (surprisingly well-named) getpagesize() if you like.

@ltratt
Copy link
Contributor

ltratt commented Oct 17, 2022

We can use the (surprisingly well-named) getpagesize() if you like.

Yes we should. There's no point not doing so as we'll simply forget and confuse our future selves!

@vext01
Copy link
Collaborator Author

vext01 commented Oct 18, 2022

a0ba640 takes into account variable page sizes.

978aa7f fixes the intermittent test failures we spoke about in person (the commit message addresses some of the questions we had).

In doing the latter I've found a nasty bug, which I will raise an issue to explain. We should deal with this in a separate PR.

@vext01
Copy link
Collaborator Author

vext01 commented Oct 18, 2022

Also pushed af4f168 as a result of in-person conversation.

@@ -25,6 +25,10 @@ const PERF_DFLT_INITIAL_TRACE_BUFSIZE: size_t = 1024 * 1024; // 1MiB
/// The interface offered by all trace collectors.
pub trait TraceCollector: Send + Sync {
/// Return a `ThreadTraceCollector` for tracing the current thread.
///
/// Note that an instance of `ThreadTraceCollector` is not thread local, and an attempt to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this really difficult to parse. Can we simplify it to "You must not start two TraceCollectors in the same thread. Doing so is is undefined behaviour". Also I think this function (or the trait?) should be unsafe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must not start two TraceCollectors in the same thread

But that's not strictly true. You can start two different ThreadTraceCollectors on the same thread as long as their "tracing sessions" don't overlap (in terms of time).

unsafe

Yep, forgot about that one!

Copy link
Collaborator Author

@vext01 vext01 Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking yesterday: why does a user facing ThreadTraceCollector even exist? If we had TraceCollector::start_thread_collector(), then this issue (I think) disappears? There can we a hidden thread-local ThreadTraceCollector and the problem described in #101 I think disappears too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the internal API, so I'm happy with whatever change you suggest (but I do want the comments to be understandable by those of us whose English is not as good as yours :)).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually OK with making a slightly stronger restriction than what the (current) API allows, so how about something really simple, like:

You may only call thread_tracer() once per thread. Calling it more than once per-thread will result in undefined behavior.

This will force the user to only ever have one hanging around.

Copy link
Collaborator Author

@vext01 vext01 Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may only obtain a thread collector once (but the collector is reusable).

overly restricts the user, but is very easy to explain. In other words, there are cases where it is safe to obtain > 1 thread collector, but if we tell the user they must obtain only one, it rules all unsafe cases out.

The test suite cannot abide by these rules though.

A thread may obtain multiple ThreadTraceCollectors but must only collect a trace with one at a time.

precisely describes when it is safe to obtain >1 thread collector.

I hope that makes sense?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[The more I see the confusion, the more I want to fix the API]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can fix the API without difficulty, that's probably more productive than trying to work out how to encode a complex condition in prose!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will take that long to fix, but I would like to do it as a fresh PR if that's OK.

Are you OK with putting a placeholder comment here for now if I promise that my next PR fixes the API?

We could use:

A thread may obtain multiple ThreadTraceCollectors but must only collect a trace with one at a time.

Even if it's a little weak.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Actually I already have a boring docstring branch to go in after this one -- if that one could go in next, I could save time resolving conflicts later)

@@ -66,7 +66,7 @@
#define SYSFS_PT_TYPE "/sys/bus/event_source/devices/intel_pt/type"
#define MAX_PT_TYPE_STR 8

#define MAX_OPEN_PERF_TRIES 20000
#define MAX_OPEN_PERF_TRIES 50000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that adding a pthread_yield or sched_yield (or even just "sleep 100ms" or similar!) wherever this loop is will be more robust: it will make it much more likely that the other thread acquiring the lock will succeed before we try again.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do, although I thought yielding was kind of frowned upon? (don't ask me why, I have a vague recollection of reading something on a mailing list)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(btw, there's already a sleep:

#define OPEN_PERF_WAIT_NSECS 1000 * 30
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yielding doesn't always do what people expect (e.g. in spinlocks).

1000 * 30 nanoseconds is 1/33333th of a second -- under such circumstances sleep might not even do anything! I'm fine with sleeping, but let's make it least 1/100th of a second.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me. I don't remember where these numbers even came from TBH.

Copy link
Collaborator Author

@vext01 vext01 Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went digging. Here's the discussion on the "first version" of # of iterations and how long to sleep:
#43 (comment)

We have bumped the numbers on a couple of occasions since though.

Not sure if that changes anything? Shall we still use 1/100th of a sec?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd use at least 1/100th of a second, perhaps as much as 1/10th. In the grand scheme of things, small delays like that don't really slow down the test suite to any noticeable degree.

@vext01
Copy link
Collaborator Author

vext01 commented Oct 19, 2022

I've pushed what I propose we do for this PR.

For clarity, next PRs:

    1. easy docstring/tidying PR (work already done)
    1. fix API

@vext01
Copy link
Collaborator Author

vext01 commented Oct 20, 2022

I have a branch with a preliminary API redesign now. Let's move this one forward?

@ltratt
Copy link
Contributor

ltratt commented Oct 20, 2022

Please squash.

@vext01
Copy link
Collaborator Author

vext01 commented Oct 20, 2022

splat.

@ltratt
Copy link
Contributor

ltratt commented Oct 20, 2022

bors r+

bors bot added a commit that referenced this pull request Oct 20, 2022
99: Separate trace collection from trace decoding. r=ltratt a=vext01



Co-authored-by: Edd Barrett <[email protected]>
@bors
Copy link
Contributor

bors bot commented Oct 20, 2022

Build failed:

@vext01
Copy link
Collaborator Author

vext01 commented Oct 20, 2022

  ┌─ unicode-ident 1.0.5 (registry+https://github.com/rust-lang/crates.io-index):4:13
  │
4 │ license = "(MIT OR Apache-2.0) AND Unicode-DFS-2016"
  │            -^^^----^^^^^^^^^^------^^^^^^^^^^^^^^^^
  │            ││      │               │
  │            ││      │               rejected: not explicitly allowed
  │            ││      accepted: license is explicitly allowed
  │            │accepted: license is explicitly allowed
  │            license expression retrieved via Cargo.toml `license`
  │

@vext01
Copy link
Collaborator Author

vext01 commented Oct 20, 2022

OK, so strum has pulled in this unicode license and the ISC license.

with the unicode one says:

 either

    (a) this copyright and permission notice appear with all copies of the Data Files or Software, or
    (b) this copyright and permission notice appear in associated Documentation.

Are we a "copy of the software"?

@ltratt
Copy link
Contributor

ltratt commented Oct 20, 2022

IMHO this license is incredibly unclear (I've encountered it in another project) but the only plausible thing we can do is to allow it in cargo deny and hope that we've done the right thing.

@vext01
Copy link
Collaborator Author

vext01 commented Oct 20, 2022

Or eliminate strum from our code. Which would be a shame...

@vext01
Copy link
Collaborator Author

vext01 commented Oct 20, 2022

OK with those fixes?

@ltratt
Copy link
Contributor

ltratt commented Oct 20, 2022

Please squash.

We are about to add our own trace decoder (as an alternative to libipt)
to hwtracer. In order to do this we need the ability to have separate
notions of trace collection and trace decoding.

In other words, instead of having a `PerfPT backend`, we now have a
`PerfTraceCollector` and a `LibIPTTraceDecoder`.

(There was also something called a `Dummy` backend, which I saw no
utility to, and have thus removed)

This is a large change, that will be hard to review I'm afraid.

In doing this work I've realised how crusty and unidiomatic some of the
hwtracer code is. Regardless I've tried hard to resist the urge to
uncrustify unrelated parts of code. I will fix some of this stuff in
follow up PRs.
This makes the trace buffer 64MiB, adjusts the sleep interval between
attempts to open perf and ups the number of attempts before giving up.
ykjit#101

We will need to fix this API in the near future.
@vext01
Copy link
Collaborator Author

vext01 commented Oct 20, 2022

splat.

@ltratt
Copy link
Contributor

ltratt commented Oct 20, 2022

bors r+

@bors
Copy link
Contributor

bors bot commented Oct 20, 2022

Build succeeded:

@bors bors bot merged commit 2be2f00 into ykjit:master Oct 20, 2022
@vext01 vext01 deleted the refactor branch October 21, 2022 11:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants