Skip to content

A collection of educational toy implementations and examples of key components from modern Transformer architectures.

License

Notifications You must be signed in to change notification settings

chrisjob1021/transformer-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformer Examples

This repository contains a collection of toy implementations and examples of key components from modern Transformer architectures. Each example is designed to be educational, well-documented, and easy to understand.

Components

Component Description Paper
Multi-Head Latent Attention (MLA) A novel attention mechanism from DeepSeek V2 that uses latent queries to reduce KV cache and Rotary Position Embeddings DeepSeek V2 Technical Report
Multi-Head Attention The original attention mechanism from the Transformer paper Attention Is All You Need
Relative Multi-Head Attention Attention with relative position representations Self-Attention with Relative Position Representations
Absolute Positional Encoding Sinusoidal positional encoding from the original Transformer Attention Is All You Need
Rotary Position Embedding Enhanced positional encoding using rotation RoFormer

Setup

  1. Create and activate a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install the package in development mode:
pip install -e .
  1. Install additional dependencies:
pip install -r requirements.txt

Getting Started

Each component has its own directory with:

  • Implementation code
  • Jupyter notebook with examples (and visualizations)

To run a notebook:

  1. Make sure Jupyter is installed:
pip install jupyter
  1. Start Jupyter:
jupyter notebook
  1. In your browser, navigate to the component you want to explore (e.g., attention/mla_attention.ipynb)
  2. Click on the notebook to open it
  3. You can run cells individually by pressing Shift+Enter or run all cells from the Cell menu

For example, to explore Multi-Head Latent Attention from DeepSeek:

cd attention
jupyter notebook mla_attention.ipynb

Contributing

Contributions are welcome! If you'd like to add a new component or improve an existing one, please feel free to submit a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A collection of educational toy implementations and examples of key components from modern Transformer architectures.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published