This is an AppDynamics Machine Agent monitor (extension) to gather GPU metrics used in AI (or other) workloads. Metrics are gathered every minute and published to the AppDynamics Metric Browser. The following metrics will be captured:
- Fan Speed
- GPU Temperature
- Power Draw
- Graphics Clock (Mhz)
- Max Graphics Clock (Mhz)
- Mem Clock
- Max Mem Clock (Mhz)
- Sm Clock
- Max Sm Clock (Mhz)
- Video Clock
- Max Video Clock (Mhz)
- Mem Free
- Mem Reserved
- Mem Total
- Mem Used
- Per-Process Memory Usage
- nvidia-smi (provided by NVIDIA Drivers)
- python3
git clone this repo
mv nvidia-collector /path/to/machine/agent/monitors
sudo systemctl restart appdynamics-machine-agent