@@ -99,7 +99,7 @@ Every model is written from scratch to maximize performance and remove layers of
99
99
| Llama 3 & 3.1 | 8B, 70B, 405B | Meta AI | [ Meta AI 2024] ( https://github.com/meta-llama/llama3 ) |
100
100
| Code Llama | 7B, 13B, 34B, 70B | Meta AI | [ Rozière et al. 2023] ( https://arxiv.org/abs/2308.12950 ) |
101
101
| Mixtral MoE | 8x7B | Mistral AI | [ Mistral AI 2023] ( https://mistral.ai/news/mixtral-of-experts/ ) |
102
- | Mistral | 7B | Mistral AI | [ Mistral AI 2023] ( https://mistral.ai/news/announcing-mistral-7b/ ) |
102
+ | Mistral | 7B, 123B | Mistral AI | [ Mistral AI 2023] ( https://mistral.ai/news/announcing-mistral-7b/ ) |
103
103
| CodeGemma | 7B | Google | [ Google Team, Google Deepmind] ( https://ai.google.dev/gemma/docs/codegemma ) |
104
104
| Gemma 2 | 2B, 9B, 27B | Google | [ Google Team, Google Deepmind] ( https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf ) |
105
105
| Phi 3 | 3.8B | Microsoft | [ Abdin et al. 2024] ( https://arxiv.org/abs/2404.14219 ) |
@@ -129,7 +129,7 @@ Every model is written from scratch to maximize performance and remove layers of
129
129
| Mathstral | 7B | Mistral AI | [ Mistral AI 2024] ( https://mistral.ai/news/mathstral/ ) |
130
130
| MicroLlama | 300M | Ken Wang | [ MicroLlama repo] ( https://github.com/keeeeenw/MicroLlama ) |
131
131
| Mixtral MoE | 8x7B | Mistral AI | [ Mistral AI 2023] ( https://mistral.ai/news/mixtral-of-experts/ ) |
132
- | Mistral | 7B | Mistral AI | [ Mistral AI 2023] ( https://mistral.ai/news/announcing-mistral-7b/ ) |
132
+ | Mistral | 7B, 123B | Mistral AI | [ Mistral AI 2023] ( https://mistral.ai/news/announcing-mistral-7b/ ) |
133
133
| Nous-Hermes | 7B, 13B, 70B | NousResearch | [ Org page] ( https://huggingface.co/NousResearch ) |
134
134
| OpenLLaMA | 3B, 7B, 13B | OpenLM Research | [ Geng & Liu 2023] ( https://github.com/openlm-research/open_llama ) |
135
135
| Phi 1.5 & 2 | 1.3B, 2.7B | Microsoft Research | [ Li et al. 2023] ( https://arxiv.org/abs/2309.05463 ) |
0 commit comments