this repo is depracated; see new version NanoGPT-Lab

templateGPT

about

This is the model I edit whenever I want to test a new transformer architecture idea I have. It's designed to be:

flexible in that many large changes are tweakable from the config file rather than messing with the code
easy to read/edit the code since files are cleanly organized & well commented
well suited for training models in the 1-10m parameter range on a CPU or the 100m-1b parameter range on a GPU without editing any code, just the config file
easy to visualize/demonstrate what's happening in the progression of tensor shapes for learning & debugging purposes (thanks to our custom LoggingModule.py and view_modules.ipynb)
almost as efficient as Andrej Karpathy's nanoGPT despite everything we've added
up to date with the most recent SotA architecture, namely Llama 3.1 (Karpathy's nanoGPT is based on the very old GPT2 and his nanoLlama31 library is built for fine-tuning the full 8b size rather than pre-training 1m-1b sized models)

Notice that even though some of these models are very small (1 to 10m parameters), they are actually reasonable rough proxies for how well a scaled up version may do on real data thanks to our use of the TinyStories dataset. According to the original paper, somewhere in the 1 to 3m parameter range a GPT2-inspired architecture is capable of understanding that the token 'apple' is something that the main character of the tiny story 'Tim' would like to 'eat'; meaning it can actually pick up on the relationships in this text which are an isomorphic subset of the ones that a larger language model would see when training on the entire internet. This basic idea is the backbone behind microsoft's Phi family of models, originally described in the paper Textbooks Are All You Need, and how they can perform so well despite being so small. I hope this repo can be of help to anyone who wants to get into designing & building novel architectures but doesn't have the compute to test a larger model on every single idea they have. I'm literally training the 1-5m parameter models on the CPU of a 2019 iMac with 8gb of ram.

Then when it's time to scale up (100m-1b parameters) and use a GPU, all you have to do is go into the config file and switch to the fineweb or fineweb-edu. Realistically single old GPUs are cheap enough nowadays (less than $1 per hour) that you could train the 1-10m parameter models on them for cheap and that's what I usually do, but it's still nice to think that someone who's resource constrained can mess around without having to learn how to use & pay for a GPU cloud solution at all, or that someone with a halfway decent CPU/GPU/MPS might find it easier to test locally before switching to a cloud GPU node.

This repo is part of a larger project of mine called micro-GPT-sandbox that's basically a hub for all the novel model experiments I do, with the goal of facilitating easy comparison between the different models. Basically for each of those experiments I just use this very repo as a template to start editing, and then once I'm happy with the project (or if I've just abandoned it but it's moderately functional) I add it to the sandbox. If you end up using this repo as a template, feel free to contribute your project to the sandbox as well!

getting started

clone the repository
cd to the folder
setup a virtual environment unless you're an agent of chaos. Use python 3.12.4; pytorch doesn't like 3.13
pip install -r requirements.txt
edit values in config.py to suit your liking. This might involve a lot of trial and error if you don't know what you're doing, either due to errors from incompatible parameter configurations or from going over your available vRAM amount. Checkout the config files for each already trained model to get an idea of what reasonable values look like
Run python train.py to train your own version of templateGPT
If you ever want to just test out a model you've already made then run the following command. The name of each model is the name of the folder it resides in inside models/. The model you run need not match up with the hyperparameters currently in config.py, that file is just for setting up training.

python inference.py <insert_model_name_here> "prompt"

If you've trained multiple models, you can compare them in model_comparison.ipynb as long as you remember to use the third cell to specify which models you want to compare. It'll look at loss curves over the course of training and teacher-forcing topk accuracy rate
This step could really go anywhere, but if you're trying to learn how transformers work then along with reading the code in modules/ you can use view_modules.ipynb to visualize how the tensor shapes change. Each cell shows you in detail how a different module or scenario works in terms of how the tensor shapes change as they move through
If/when you become confident to mess with the actual code yourself and test out a novel architecture idea you've got, head on over into modules/ and get to work. While you're doing this, make sure to use LoggingModule instead of nn.module and put @log_io before every class function you make so that you can use view_modules.ipynb for easy visualization/debugging.
If/when you've got a novel transformer architecture edit up and working, send it over to your own template/fork of micro-GPT-sandbox for easy comparisons against the original templateGPT (micro-GPT-sandbox is currently in an even less finished state than this repo)

file structure

tokenizers/: a folder where you store your tokenizers
- bpe_tinyStories/: a byte-pair encoding tokenizer trained on the first 10k sequences from the TinyStoriesV2 dataset, which is a fan-made upgrade over the original TinyStories
  - build.ipynb: the notebook where i trained the tokenizer models
  - tokenizer.py: an overly-simplistic and annoyingly inefficient tokenizer with bos & eos tokens, post-sequence padding, and a display function to help you visualize how a given string is broken down into tokens
  - models/
    - {509, 1021, 2045}.model: different tokenizer sizes, each a subset of the next.
- bpe_fineweb/: a yet-to-be trained byte-pair encoding tokenizer of fineweb
  - ...
- bpe_fineweb-edu/: a byte-pair encoding tokenizer trained on the first 2k sequences from the "sample-350BT" subset of fineweb-edu. We train the model on the "sample-10BT" subset which means the tokenizer was ~~mostly~~ trained on data the model won't see during training
  - ...
  - models/
    - {509, 1021, 2045, 4093, 8189, 16381, 32765}.model: different tokenizer sizes, each a subset of the next.
- byte/: choose this to use bytes instead of tokens - ...
modules/: where all of the code for the actual model goes
- attention.py: multi-query attention with pre-computed rotary positional encodings that knows to automatically use Flash Attention if you have access to a cuda GPU.
- layer.py: defines each residual connection layer of our GPT
- logging.py: defines the LoggingModule class, a wrapper that you should use instead of pytorch's nn.module in order to facilitate easy demonstration of how tensor shapes change throughout a given module
- mlp.py: a multi-layer perceptron with an optional gate and either ReLU, GeLU, or SiLU nonlinearities, all configurable in config.py. Adding more nonlinearities is also absurdly easy
- model.py: the primary class for our GPT
- norm.py: a norm module with an optional affine layer that allows you to switch between RMSNorm, LayerNorm and CosineNorm easily using a setting over in config.py. Adding new normalization methods is also absurdly easy
trained/
- Llama3_1m_atto/: a 1m parameter model trained for 2k iterations with a batch size of 64 for a total of 128k sequences (tinyStoriesV2 dataset is ~2.76 million sequences, so less than 5% of the available dataset) and designed to resemble the architecture of Llama 3/nanoLlama31. However it uses BPE rather than llama's tokenization scheme, and i think i also used more dropout during training
  - model_config.json: hyperparameters of the model
  - model.pth: weights of the model
  - train_config.json: hyperparameters of the training loop used
  - log_data.csv: a record of loss and a couple other key metrics over the course of training
- GPT2_1m_atto/: a 1m parameter model trained for 2k iterations with a batch size of 64 for a total of 128k sequences (tinyStoriesV2 dataset is ~2.76 million sequences, so less than 5% of the available dataset) and designed to resemble the architecture of GPT2/nanoGPT
  - ...
tests/: a collection of pytest tests. Currently only test_modules.py is actually working; the rest are just first drafts written by claude which have not yet been looked at.
config.py: all of the easily editable model and training settings
inference.py: run with multiple prompts and edit your settings like so:

python inference.py "insert_model_name_here" "prompt 1" "prompt 2" "prompt..." --temp=0.7 --min_p=0.05 --top_k=None --top_p=None --max_len=100 --show_tokens

model_comparison.ipynb: open this notebook to compare different models against each other. includes loss curve plots and topk teacher-forcing accuracy rate
model_comparison.py: functions for comparing models; used in model_comparison.ipynb
view_modules.ipynb: creates easy printouts that allow you to follow the progression of tensor shapes for demonstration & debugging purposes of all the loggingmodules in modules/. If you're building new modules for a novel architecture idea you have then this notebook will be of extreme value to you in debugging & visualization. Also includes visualizations of the learning rate scheduler and how a given piece of text is tokenized with your chozen tokenizer
tools.py: A variety of functions & classes that don't fit elsewhere and/or are used by more than one of the jupyter notebooks. I should prolly find a better way to organize these
train.py: first edit config.py then run this file to train a model like so:

python train.py --device=cuda

definite TODOs

add useful stuff from karpathy's nanoGPT
- make it distributed data parallelizable on cuda
- setup downloaded datasets to optionally download as token indices rather than as strings (makes loading them during training faster)
- add the benchmark test
go back and make sure model checkpointing is working. at one point it was but i've changed so much since then and haven't bothered using it so i'd bet it's broken
make dropout at different places optional (see display of gpt2 vs llama w/ dropout)

potential future TODOs

how to contribute

Other than the above TODO lists, appreciated contributions include:

bug fixes
adding more detailed comment explanations of what the code is doing
general readability edits
efficiency edits
editing the code in modules/ to take better advantage of the LoggingModule. This means splitting up each class into more and tinier functions
training more models (especially if they're bigger than what's already here!)

Because I'm not super knowledgeable on how collaborating on git projects works and I tend to edit directly on the main branch, please reach out and communicate with me about any edits you plan to make so that I can avoid editing the same files. Click here to join my discord server

check me out

guides on how to build miniature versions of popular models from scratch, with a hand-holding walkthrough of every single tensor operation: minGemma, minGrok, and minLlama3. Future versions of those kinds of guides I make will use this repo as a template
my YouTube channel
my other links

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

this repo is depracated; see new version NanoGPT-Lab

templateGPT

about

getting started

file structure

definite TODOs

potential future TODOs

how to contribute

check me out

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
modules		modules
tests		tests
tokenizers		tokenizers
trained		trained
.gitignore		.gitignore
License.txt		License.txt
README.md		README.md
__init__.py		__init__.py
config.py		config.py
inference.py		inference.py
model_comparison.ipynb		model_comparison.ipynb
model_comparison.py		model_comparison.py
requirements.txt		requirements.txt
tools.py		tools.py
train.py		train.py
view_modules.ipynb		view_modules.ipynb

License

evintunador/templateGPT

Folders and files

Latest commit

History

Repository files navigation

this repo is depracated; see new version NanoGPT-Lab

templateGPT

about

getting started

file structure

definite TODOs

potential future TODOs

how to contribute

check me out

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages