-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigation: Collect "mean instruction count" metric #689
Comments
If the aim is the count the number of unique instructions executed per process, the
That should allow subsequent iterations to run at close to full speed as well. |
If you need the total instructions as well, then a stats build is probably the fastest. def event_handler2(self, code, instruction_offset):
self.count_executed += 1 |
I see my confusion now is over how Given that, any comments on the results? |
Just to check that I'm reading this right: It is indeed puzzling that the numbers for mean instruction count are so low for so many benchmarks. Before investigating further, it might we worth checking that the numbers stay the same using the faster approach I outlined, as a sanity check. |
Each was executed an average of 80k times, so some probably only once, and some much more than that. Collecting a histogram of execution counts could be interesting (but would require more data collected, of course). Yeah, I'll try again with the faster approach and see how the numbers compare. |
Some additional findings (from discussion with @markshannon): Some of these benchmarks start a web server in a separate process, and then just make HTTP requests against it from the main process. ( Likewise, |
Using @markshannon's suggested performance improvements, the results are basically the same. Results, sorted by mean instruction count
Results, sorted by instruction count
|
@mdboom Could you post the raw data here? I am just curious because of my general interest in this sort of analysis. |
As suggested by @markshannon, it could be useful to know for each benchmark the mean number of times each specific instruction is executed to characterize each benchmark better. (Here "specific instruction" means a unique location within a code object, not an bytecode instruction type.)
The metric is:
This, roughly speaking, gives an idea of how "loopy" code is.
Methodology
This uses
sys.monitoring
for measurement, using a plugin for pyperf that turns on the measurement only around the actual benchmarking code. (The concept of a pyperf plugin exists only in a pull request at the moment).The data was collected by running with pyperf's
--debug-single-value
, which runs the benchmark code exactly once (the inner loop only once, and the outer loop that spawns individual processes exactly once). This is partly to reveal which benchmarks have no loops at all, and also because the instrumentation is very slow, so as a practical matter running as little as possible helps get results in a reasonable time (it takes about 3 hours on my laptop, atm).Results
Results, sorted by mean instruction count
Results, sorted by absolute instructions counts
A few conclusions to draw from this:
There are a few benchmarks were the mean instruction count is 1, or very low, where there's not much an optimizer could do (short of multiple loops). Some, at least
pickle*
,unpickle*
andsqlite_synth
aren't really CPython interpreter benchmarks at all and just drop to C code pretty quickly (and the profiling results confirm this).flaskblogging
,gunicorn
anddjangocms
are at least ostenibly macrobenchmarks, so we should investigate why the mean instruction count is so low there. We should consider excluding this class whole class of benchmarks from the global number, at least for purposes of optimizing the interpreter.Benchmarks with a high mean instruction count and total instruction count feel like really robust examples of real-world code. These include
pylint
,mypy2
,docutils
,dask
,sympy*
,pycparser
.The text was updated successfully, but these errors were encountered: