deepseekv3-minimal

Creating the DeepSeek V3 model from scratch

Purpose

Learning the architecture of DeepSeek V3 can be challenging. To understand it, the surrounding mechanisms of training, floating point (8 bit) optimizations etc are not required. Therefore, to make things simple, this repo exists.

Multi Token Prediction
Mixture of experts with controllable number of active experts
Transformers (obviously)
Key-Value-Query compression
Basic training loops
Greedy, sampled, and MTP based text generation
Minimal code make everything work
Not everything will be here according to their paper, other sources or open source implementation.
This is purely a personal effort

Slow training!

The architecture is not optimized and it very slow to train. Reduce the dataset or make the model less complex for faster training. :)

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
config		config
data		data
models		models
training		training
visualization		visualization
.gitignore		.gitignore
README.md		README.md
deepseek_arch.png		deepseek_arch.png
main.py		main.py
requirements.txt		requirements.txt
run_text_generation.py		run_text_generation.py
seeding.py		seeding.py
text_generation.py		text_generation.py
trainable_params.py		trainable_params.py
training_metrics_yt.png		training_metrics_yt.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

deepseekv3-minimal

Purpose

Slow training!

Architecture of DeepSeek V3

Training on Youtube comments dataset

Datasets

About

Releases

Packages

Languages

wajihullahbaig/deepseekv3-minimal

Folders and files

Latest commit

History

Repository files navigation

deepseekv3-minimal

Purpose

Slow training!

Architecture of DeepSeek V3

Training on Youtube comments dataset

Datasets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages