r1
Here are 41 public repositories matching this topic...
Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
-
Updated
Feb 19, 2025 - Python
Doge Family of Small Language Model
-
Updated
Feb 27, 2025 - Python
SOTA RL fine-tuning solution for advanced math reasoning of LLM
-
Updated
Feb 23, 2025 - Python
Model Context Protocol server for DeepSeek's advanced language models
-
Updated
Feb 13, 2025 - JavaScript
使用langchain进行任务规划,构建子任务的会话场景资源,通过MCTS任务执行器,来让每个子任务通过在上下文中资源,通过自身反思探索来获取自身对问题的最优答案;这种方式依赖模型的对齐偏好,我们在每种偏好上设计了一个工程框架,来完成自我对不同答案的奖励进行采样策略
-
Updated
Feb 24, 2025 - Jupyter Notebook
Auto-generate fallback and meter display from existing group info in d&b audiotechnik's R1 and ArrayCalc software.
-
Updated
Mar 7, 2024 - Python
Recreating the minimal training methods of DeepSeek-R1 for small langauge models.
-
Updated
Feb 10, 2025 - Python
A multi-stage pipeline that enhances Qwen2.5 language models with DeepSeek Reasoner's chain-of-thought capabilities. Implements the DeepSeek-R1 methodology through cold-start SFT, reasoning-oriented RL, rejection sampling, and optional model distillation.
-
Updated
Jan 24, 2025 - Python
Explore the Multimodal “Aha Moment” on 2B Model
-
Updated
Feb 27, 2025 - Python
A comprehensive collection of process reward models.
-
Updated
Feb 24, 2025
Improve this page
Add a description, image, and links to the r1 topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the r1 topic, visit your repo's landing page and select "manage topics."