Skip to content

Latest commit

 

History

History
12 lines (8 loc) · 665 Bytes

README.md

File metadata and controls

12 lines (8 loc) · 665 Bytes

This is a fast deep learning kernel in C built from scratch for hardware performance benchmarking or simple machine learning tasks requiring efficiently configurable and deployable deep neural networks.

The default program in main.c reports timing for model initialization, forward passing, and backward passing of a network with over 1 billion parameters. To compile and execute this program, simply use the terminal commands shown below. Use the same commands if modifications are made in main.c for other use.

cd build
make && ./exec

alt text

Dell XPS 15 9520 (Intel i7-12700H)