Releases: mirkoruether/ann-cpp
Performance Level 4
CUDA support
Convolutional input layer
2.0 Conv and pooling layers are working
Performance Level 3
Performance measurement:
Hardware:
Laptop (ASUS FX753VE): Intel i7-7700HQ / NVIDIA GTX 1050Ti (Mobile)
Desktop: Intel i5 6600k / Palit Jetstream NVIDIA GTX 970
Configuration:
Neural Net: (784 | 30 | 10) fully connected
Precision: double
Training data: MNIST 60k
MiniBatchSize: 8
MiniBatchesPerEpoch: 7.500
Epochs: 5
Performance Level 3:
Parallelized
- Feed forward and error calculation by subdividing the mini batch and processing partial mini batches parallel
- Gradient calculation and parameter adjustment by doing it parallel for the biases and weights on each layer
Redesigned linear algebra without memory allocation during training
x64 RelWithDebInfo (/MD /Zi /O2 /Ob1 /DNDEBUG)
Results:
Laptop: N/A
Desktop: 15.2s
Performance Level 2
Performance measurement:
Hardware:
Laptop (ASUS FX753VE): Intel i7-7700HQ / NVIDIA GTX 1050Ti (Mobile)
Desktop: Intel i5 6600k / Palit Jetstream NVIDIA GTX 970
Configuration:
Neural Net: (784 | 30 | 10) fully connected
Precision: double
Training data: MNIST 60k
MiniBatchSize: 8
MiniBatchesPerEpoch: 7.500
Epochs: 5
Performance Level 2:
Redesigned linear algebra without memory allocation during training
x64 RelWithDebInfo (/MD /Zi /O2 /Ob1 /DNDEBUG)
Results:
Laptop: N/A
Desktop: 35.8s
Version 0.1 - Performance Level 0/1
Performance measurement:
Hardware:
Laptop (ASUS FX753VE): Intel i7-7700HQ / NVIDIA GTX 1050Ti (Mobile)
Desktop: Intel i5 6600k / Palit Jetstream NVIDIA GTX 970
Configuration:
Neural Net: (784 | 30 | 10) fully connected
Precision: double
Training data: MNIST 60k
MiniBatchSize: 8
MiniBatchesPerEpoch: 7.500
Epochs: 5
Performance Level 0:
x64 RelWithDebInfo (/MD /Zi /O2 /Ob1 /DNDEBUG)
Results:
Laptop: 136.1s
Desktop: 123.3s
Performance Level 1:
Parallel minibatch error calculation
x64 RelWithDebInfo (/MD /Zi /O2 /Ob1 /DNDEBUG)
Results:
Laptop: 117.5s
Desktop: 101.7s