In stdlib, official doc. No mentioning of threads, runs by default only on the main thread.
This module is included in the embedded Python and it's fast and it's not hard to make it enabled in all threads.
There is pyprof2calltree which converts its output to KCachegrind / QCachegrind (brew install qcachegrind
) which is very powerful for visualization.
There is also RunSnakeRun to visualize the cProfile data.
In stdlib, official doc.
According to the docs:
does not yet work well with threads
thus not an option.
Homepage, esp. build for multithreaded apps, see why yappi doc.
Earlier versions didn't save callstack (according to here).
By Dropbox, blog post. Every 10ms it saves the stack trace.