This is my attempt to implement neural network training and inference with the BitLinear layer from the BitNet paper from scratch in C for learning purposes. The long term goal is to work towards an implementation of a smaller version of the LLaMA architecture. This repo also implements inference for a BPE tokenizer trained with the tiktoken library.
To keep things concise, the source files for layers, data structures and other utilities are implemented as single header libraries.
The train program initializes a new model and trains it on the dataset specified. For example,
gcc mnist_train.c -o train_mnist -lm
./train_mnist
├── experiments/ # miscellaneous programs used to test ideas
├── layers/ # source files for layers of the LLM
├── utils/ # utility functions (data structures, matrix functions, dataloaders, etc.)
├── tests/ # unit tests for various libraries and functions
├── tokenizer.h # single header library for inference on BPE tokenizer
└── mnist_bitmlp.c # train and test bit multi layer perceptron on MNIST dataset
Function names for layers contain suffix corresponding to their forward and backward pass.
_fwd
– forward pass_bkwd
– backpropagation
Gradient variables are prefixed with d
eg. gradient of output of a layer is dy
. Additionally, quantised variables contain a q
suffix eg. quantised activations will be xq
.
- BitLinear implementation
- RMSNorm layer
- BitLinear layer
- Bit matrix multiplications
- GELU activation
- Weight and activation quantisation/dequantisation functions
- BitLinear MLP Block
- Cross entropy loss implementation
- Training weight initialisation and allocation
- AdamW optimiser implementation
- Training loop on MNIST dataset for BitMLP
- Train a multilayer perceptron classifier for the MNIST dataset
- Parallelize code using OpenMP
- Tokenizer implementation
- Loading tokenizer from file
- Base64 decoding
- Hashtable implementation
- PriorityQueue implementation
- Encode text to input ids using tokenizer
- Decode input ids to text using tokenizer
- Verify correctness of tokenizer implementation on sample corpus
- BitNet transformer implementation
- Token embedding layer
- Grouped query attention block
- Forward and backward pass for BitNet architecture
- Dataloader implementation
- Saving and loading model weights
- Training loop implementation