Explore the Multimodal “Aha Moment” on 2B Model
reinforcement-learning reasoning r1 post-training multimodal deepseek deepseek-r1 grpo deepseek-r1-zero r1-zero multimodal-journey multimodal-r1
-
Updated
Feb 28, 2025 - Python