gpt

This is my first gpt, I adapted it from Andrej Karpathy's excellent video Let's build GPT: from scratch, in code, spelled out. I refactored to separate training from generating and also added configuration.

This is tested and works in both native Windows and WSL2 Ubuntu.

I have Windows, and an AMD Radeon 6800 XT, and I wanted to use it to train a gpt, but Cuda doesn't work with Radeon, and ROCm only works in Linux.

The solution is DirectML. So this code targets DirectML but it would be a small change to switch it to cuda, ROCm, or any other pytorch backend.

Setup

Install miniconda (miniconda specifically is required for DirectML support), then run the following commands:

conda env create
conda activate gpt

First you generate model.pt by running the following command: (this takes 1.5 hours on my 6800xt - it might be faster or slower on your hardware)

python train.py

Then you can generate text by running the following command:

python generate.py