Benchmark: MLP 4x25 hidden layers, 88% in 100 epochs #186

jklw10 · 2024-10-30T17:36:37Z

https://github.com/jklw10/dnns-from-scratch-in-zig/tree/benchmark-submission

Simply create the folder data/mnist-fashion/ extract the datasets into it,
Run with zig build run -Doptimize=ReleaseFast
Zig version 0.14.0-dev.1860+2e2927735
(Might run with 0.13.0)

The same setup is able to get to 97.2% on mnist number set. (different configuration, likely overfit) (98% with 4x100 neuron hidden layers)

What's wacky about it:
Weights are forcibly normalized (and adjusted):
(grads[i]-avg(grads)) / (max(grads)-min(grads)) * (2-(2/inputSize))
Fradients are forcibly normalized (and adjusted) norm(weight) * (1+(2/inputSize))
Fradients are biased to move the weight towards that weight's EMA:
grads[i] / abs(ema[i]) + abs(grads[i] -EMA[i]-Weight[i])
Forward pass uses sign(weight) * sqrt(weight*ema) in place of weight

Some of this is slightly off, please read
https://github.com/jklw10/dnns-from-scratch-in-zig/blob/benchmark-submission/src/layerGrok.zig#L259
to see the full context. Hopefully it's human readable enough.

This score probably isn't the maximum I can gain, just the fastest to test in an afternoon. Should I update here or just make a new issue in case I gain a higher score? (4x100 hidden neurons achieved 89%)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark: MLP 4x25 hidden layers, 88% in 100 epochs #186

Benchmark: MLP 4x25 hidden layers, 88% in 100 epochs #186

jklw10 commented Oct 30, 2024 •

edited

Loading

Benchmark: MLP 4x25 hidden layers, 88% in 100 epochs #186

Benchmark: MLP 4x25 hidden layers, 88% in 100 epochs #186

Comments

jklw10 commented Oct 30, 2024 • edited Loading

jklw10 commented Oct 30, 2024 •

edited

Loading