Skip to content

Commit 9ebbe82

Browse files
awaelchlirasbt
andauthored
Add Llama 3.1 405B config (#1622)
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
1 parent 6c4eca0 commit 9ebbe82

File tree

5 files changed

+28
-2
lines changed

5 files changed

+28
-2
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ Every model is written from scratch to maximize performance and remove layers of
9696

9797
| Model | Model size | Author | Reference |
9898
|----|----|----|----|
99-
| Llama 3 & 3.1 | 8B, 70B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama3) |
99+
| Llama 3 & 3.1 | 8B, 70B, 405B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama3) |
100100
| Llama 2 | 7B, 13B, 70B | Meta AI | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288) |
101101
| Code Llama | 7B, 13B, 34B, 70B | Meta AI | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950) |
102102
| Mixtral MoE | 8x7B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/mixtral-of-experts/) |

litgpt/config.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -877,6 +877,7 @@ def norm_class(self) -> Type:
877877
intermediate_size=14336,
878878
rope_base=500000,
879879
),
880+
# https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/blob/main/config.json
880881
dict(
881882
name="Llama-3.1-8B{}",
882883
hf_config=dict(org="meta-llama", name="Meta-Llama-3.1-8B{}"),
@@ -913,6 +914,7 @@ def norm_class(self) -> Type:
913914
intermediate_size=28672,
914915
rope_base=500000,
915916
),
917+
# https://huggingface.co/meta-llama/Meta-Llama-3.1-70B/blob/main/config.json
916918
dict(
917919
name="Llama-3.1-70B{}",
918920
hf_config=dict(org="meta-llama", name="Meta-Llama-3.1-70B{}"),
@@ -931,6 +933,25 @@ def norm_class(self) -> Type:
931933
intermediate_size=28672,
932934
rope_base=500000,
933935
),
936+
# https://huggingface.co/meta-llama/Meta-Llama-3.1-405B/blob/main/config.json
937+
dict(
938+
name="Llama-3.1-405B{}",
939+
hf_config=dict(org="meta-llama", name="Meta-Llama-3.1-405B{}"),
940+
block_size=131072,
941+
vocab_size=128000,
942+
padded_vocab_size=128256,
943+
n_layer=126,
944+
n_head=128,
945+
n_embd=16384,
946+
n_query_groups=16,
947+
rotary_percentage=1.0,
948+
parallel_residual=False,
949+
bias=False,
950+
norm_class_name="RMSNorm",
951+
mlp_class_name="LLaMAMLP",
952+
intermediate_size=53248,
953+
rope_base=500000,
954+
),
934955
]
935956
for c in llama_3:
936957
for kind in ("", "-Instruct"):

tests/test_model.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,7 @@ def test_against_original_open_llama_3b(device, dtype):
213213
{"name": "Llama-2-70b-chat-hf", "n_query_groups": 1},
214214
{"name": "Llama-3-8B"},
215215
{"name": "Llama-3-8B-Instruct"},
216+
{"name": "Llama-3.1-405B", "n_query_groups": 4},
216217
{"name": "Llama-3.1-8B"},
217218
{"name": "Llama-3.1-8B-Instruct"},
218219
],

tests/test_prompts.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ def test_prompt_style_from_config():
5353
"Llama-2-70b-chat-hf",
5454
"Llama-3-8B-Instruct",
5555
"Llama-3-70B-Instruct",
56+
"Llama-3.1-405B-Instruct",
5657
"Gemma-2b-it",
5758
"Gemma-7b-it",
5859
"FreeWilly2",

tutorials/download_model_weights.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,8 @@ LitGPT supports a variety of LLM architectures with publicly available weights.
1717
| Gemma | 2B, 7B | Google | [Google Team, Google Deepmind](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf) |
1818
| Gemma 2 | 9B, 27B | Google | [Google Team, Google Deepmind](https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf) |
1919
| Llama 2 | 7B, 13B, 70B | Meta AI | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288) |
20-
| Llama 3 & 3.1 | 8B, 70B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama3) |
20+
| Llama 3 | 8B, 70B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama3) |
21+
| Llama 3.1 | 8B, 70B, 405B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama3) |
2122
| LongChat | 7B, 13B | LMSYS | [LongChat Team 2023](https://lmsys.org/blog/2023-06-29-longchat/) |
2223
| Mathstral | 7B | Mistral AI | [Mistral AI 2024](https://mistral.ai/news/mathstral/) |
2324
| MicroLlama | 300M | Ken Wang | [MicroLlama repo](https://github.com/keeeeenw/MicroLlama)
@@ -117,6 +118,8 @@ meta-llama/Meta-Llama-3-70B
117118
meta-llama/Meta-Llama-3-70B-Instruct
118119
meta-llama/Meta-Llama-3-8B
119120
meta-llama/Meta-Llama-3-8B-Instruct
121+
meta-llama/Meta-Llama-3.1-405B
122+
meta-llama/Meta-Llama-3.1-405B-Instruct
120123
meta-llama/Meta-Llama-3.1-70B
121124
meta-llama/Meta-Llama-3.1-70B-Instruct
122125
meta-llama/Meta-Llama-3.1-8B

0 commit comments

Comments
 (0)