TurningPoint AI
AIGC Research Group
About us
We are a compact and hardcore research team focused on harnessing the power of Multimodal Reasoning. Current focus includes:
-
Multimodal
-
O1-style reasoning
-
Text-to-image generation
PROJECT HIGHLIGHT
VisualThinker-R1-Zero
Deepseek R1 has demonstrated how Reinforcement Learning (RL) with well-designed rule-based rewards can enable a large language model to build unique reasoning capabilities autonomously. Since then, many researchers have attempted to extend this success to multimodal reasoning. However, recent efforts primarily struggle to reproduce the increasing response length and thinking pattern exhibited by DeepSeek R1.
VisionThinker-R1-Zero is a replication of DeepSeek-R1-Zero training on small multimodal models. We are the first to successfully observe the emergent “aha moment” and increased response length on multimodal tasks. Through applying GRPO on an unaligned 2B base model, we can observe the model develops self-verification autonomously and exhibits an emergent ability to "take another look" at the image and correct its mistakes.