Replies: 1 comment 11 replies
-
What issues have you had? I assume this is w.r.t. HW counters as the > Is anyone working on support?Short answer: No. In part because when the APIs came out, it created a design problem. The other part is that, while I've developed a solution to this design problem, now I work at AMD these days so I can't really dedicate time to writing support for NVIDIA stuff. Here's the background on the design problem: while I personally shudder when I look at the implementation of the
The problem was that because the new profiling APIs didn't make intermediate data available (i.e. you could only get the data when you finalized the entire API), a > If not, I'd be happy to implement a component for it.That would be great. Here's what I would recommend:
Here's the gist of what you need in TIMEMORY_DECLARE_COMPONENT(cupti_perfworks)
TIMEMORY_DEFINE_CONCRETE_TRAIT(fini_priority, component::cupti_perfworks, priority_constant<-4>)
namespace tim
{
namespace component
{
struct cupti_perfworks : base<cupti_perfworks, void>
{
static std::string label() { return "cupti_perfworks"; }
static void global_init(); // place setup here (automatically called on first use of component)
static void global_finalize(); // place teardown here
void start();
void stop();
void set_prefix(const char* _v) { m_prefix = _v; }
private:
const char* m_prefix = nullptr;
};
}
} Here's the gist of what you need in TIMEMORY_DECLARE_COMPONENT(cupti_perfworks_data)
namespace tim
{
namespace component
{
struct cupti_perfworks_data : base<cupti_perfworks, std::vector<double>>
{
// keep this the same
static std::string label() { return "cupti_perfworks"; }
void store(value_type _v) { set_accum(std::move(_v)); }
};
}
}
TIMEMORY_INITIALIZE_STORAGE(cupti_perfworks_data)
namespace
{
//
// ... CUPTI-specific sample code ...
//
}
//
//
//
void
tim::component::cupti_perfworks::start() { ... }
void
tim::component::cupti_perfworks::stop() { ... }
void
tim::component::cupti_perfworks::global_init() { ... }
void
tim::component::cupti_perfworks::global_finalize()
{
static bool _once = false;
if(_once) return;
_once = true;
using bundle_t = lightweight_tuple<cupti_perfworks_data>;
// modify
for(auto itr : ...)
{
bundle_t _v{ ... extract label from itr... };
_v.push(); // create call-stack entry
_v.start(); // needed for stop call
_v.store( ... data from itr... );
_v.stop(); // increments lap counter
_v.pop(); // update call-stack entry
}
} And the gist of the main: #include "cupti_perfworks.hpp"
#include <timemory/timemory.hpp>
namespace comp = tim::component;
using bundle_t = tim::component_tuple<comp::cupti_perfworks>;
int main(int argc, char** argv)
{
tim::timemory_init(argc, argv);
bundle_t _v{ "main" };
_v.start();
// ... etc.
_v.stop();
tim::timemory_finalize();
} |
Beta Was this translation helpful? Give feedback.
-
Hi All,
It is great to have a project that provides a toolkit for profiling. I'd like to integrate Timemory into TVM instead of having to reimplement all these profiling interfaces. However, it seems like Timemory does not current support CUDA 3000 series GPUs. I assume this is because of the new profiling APIs (CUPTI perfworks). Is anyone working on support? If not, I'd be happy to implement a component for it.
Beta Was this translation helpful? Give feedback.
All reactions