You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Simply create the folder data/mnist-fashion/ extract the datasets into it,
Run with zig build run -Doptimize=ReleaseFast
Zig version 0.14.0-dev.1860+2e2927735
(Might run with 0.13.0)
The same setup is able to get to 97.2% on mnist number set. (different configuration, likely overfit) (98% with 4x100 neuron hidden layers)
What's wacky about it:
Weights are forcibly normalized (and adjusted): (grads[i]-avg(grads)) / (max(grads)-min(grads)) * (2-(2/inputSize))
Fradients are forcibly normalized (and adjusted) norm(weight) * (1+(2/inputSize))
Fradients are biased to move the weight towards that weight's EMA:
grads[i] / abs(ema[i]) + abs(grads[i] -EMA[i]-Weight[i])
Forward pass uses sign(weight) * sqrt(weight*ema) in place of weight
This score probably isn't the maximum I can gain, just the fastest to test in an afternoon. Should I update here or just make a new issue in case I gain a higher score? (4x100 hidden neurons achieved 89%)
The text was updated successfully, but these errors were encountered:
https://github.com/jklw10/dnns-from-scratch-in-zig/tree/benchmark-submission
Simply create the folder
data/mnist-fashion/
extract the datasets into it,Run with
zig build run -Doptimize=ReleaseFast
Zig version
0.14.0-dev.1860+2e2927735
(Might run with 0.13.0)
The same setup is able to get to 97.2% on mnist number set. (different configuration, likely overfit) (98% with 4x100 neuron hidden layers)
What's wacky about it:
Weights are forcibly normalized (and adjusted):
(grads[i]-avg(grads)) / (max(grads)-min(grads)) * (2-(2/inputSize))
Fradients are forcibly normalized (and adjusted)
norm(weight) * (1+(2/inputSize))
Fradients are biased to move the weight towards that weight's EMA:
grads[i] / abs(ema[i]) + abs(grads[i] -EMA[i]-Weight[i])
Forward pass uses
sign(weight) * sqrt(weight*ema)
in place of weightSome of this is slightly off, please read
https://github.com/jklw10/dnns-from-scratch-in-zig/blob/benchmark-submission/src/layerGrok.zig#L259
to see the full context. Hopefully it's human readable enough.
This score probably isn't the maximum I can gain, just the fastest to test in an afternoon. Should I update here or just make a new issue in case I gain a higher score? (4x100 hidden neurons achieved 89%)
The text was updated successfully, but these errors were encountered: