Skip to content

Commit c6c2b35

Browse files
authored
Add Mistral Large 123B (#1673)
1 parent ef9647c commit c6c2b35

File tree

3 files changed

+26
-3
lines changed

3 files changed

+26
-3
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ Every model is written from scratch to maximize performance and remove layers of
9999
| Llama 3 & 3.1 | 8B, 70B, 405B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama3) |
100100
| Code Llama | 7B, 13B, 34B, 70B | Meta AI | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950) |
101101
| Mixtral MoE | 8x7B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/mixtral-of-experts/) |
102-
| Mistral | 7B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/announcing-mistral-7b/) |
102+
| Mistral | 7B, 123B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/announcing-mistral-7b/) |
103103
| CodeGemma | 7B | Google | [Google Team, Google Deepmind](https://ai.google.dev/gemma/docs/codegemma) |
104104
| Gemma 2 | 2B, 9B, 27B | Google | [Google Team, Google Deepmind](https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf) |
105105
| Phi 3 | 3.8B | Microsoft | [Abdin et al. 2024](https://arxiv.org/abs/2404.14219) |
@@ -129,7 +129,7 @@ Every model is written from scratch to maximize performance and remove layers of
129129
| Mathstral | 7B | Mistral AI | [Mistral AI 2024](https://mistral.ai/news/mathstral/) |
130130
| MicroLlama | 300M | Ken Wang | [MicroLlama repo](https://github.com/keeeeenw/MicroLlama) |
131131
| Mixtral MoE | 8x7B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/mixtral-of-experts/) |
132-
| Mistral | 7B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/announcing-mistral-7b/) |
132+
| Mistral | 7B, 123B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/announcing-mistral-7b/) |
133133
| Nous-Hermes | 7B, 13B, 70B | NousResearch | [Org page](https://huggingface.co/NousResearch) |
134134
| OpenLLaMA | 3B, 7B, 13B | OpenLM Research | [Geng & Liu 2023](https://github.com/openlm-research/open_llama) |
135135
| Phi 1.5 & 2 | 1.3B, 2.7B | Microsoft Research | [Li et al. 2023](https://arxiv.org/abs/2309.05463) |

litgpt/config.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1756,6 +1756,26 @@ def norm_class(self) -> Type:
17561756
intermediate_size=14336,
17571757
)
17581758
)
1759+
configs.append(
1760+
# https://huggingface.co/mistralai/Mistral-Large-Instruct-2407/blob/main/config.json
1761+
dict(
1762+
name="Mistral-Large-Instruct-2407",
1763+
hf_config=dict(org="mistralai", name="Mistral-Large-Instruct-2407"),
1764+
padded_vocab_size=32768,
1765+
block_size=32768,
1766+
n_layer=88,
1767+
n_head=96,
1768+
n_embd=12288,
1769+
n_query_groups=8,
1770+
rotary_percentage=1.0,
1771+
parallel_residual=False,
1772+
bias=False,
1773+
norm_class_name="RMSNorm",
1774+
norm_eps=1e-05,
1775+
mlp_class_name="LLaMAMLP",
1776+
intermediate_size=28672,
1777+
)
1778+
)
17591779

17601780

17611781
############

tutorials/download_model_weights.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ LitGPT supports a variety of LLM architectures with publicly available weights.
2323
| Mathstral | 7B | Mistral AI | [Mistral AI 2024](https://mistral.ai/news/mathstral/) |
2424
| MicroLlama | 300M | Ken Wang | [MicroLlama repo](https://github.com/keeeeenw/MicroLlama)
2525
| Mixtral MoE | 8x7B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/mixtral-of-experts/) |
26-
| Mistral | 7B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/announcing-mistral-7b/) |
26+
| Mistral | 7B, 123B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/announcing-mistral-7b/) |
2727
| Nous-Hermes | 7B, 13B, 70B | NousResearch | [Org page](https://huggingface.co/NousResearch) |
2828
| OpenLLaMA | 3B, 7B, 13B | OpenLM Research | [Geng & Liu 2023](https://github.com/openlm-research/open_llama) |
2929
| Phi 1.5 & 2 | 1.3B, 2.7B | Microsoft Research | [Li et al. 2023](https://arxiv.org/abs/2309.05463) |
@@ -136,7 +136,10 @@ microsoft/Phi-3-mini-4k-instruct
136136
mistralai/mathstral-7B-v0.1
137137
mistralai/Mistral-7B-Instruct-v0.1
138138
mistralai/Mistral-7B-Instruct-v0.2
139+
mistralai/Mistral-7B-Instruct-v0.3
139140
mistralai/Mistral-7B-v0.1
141+
mistralai/Mistral-7B-v0.3
142+
mistralai/Mistral-Large-Instruct-2407
140143
mistralai/Mixtral-8x7B-Instruct-v0.1
141144
mistralai/Mixtral-8x7B-v0.1
142145
NousResearch/Nous-Hermes-13b

0 commit comments

Comments
 (0)