TurningPoint AI
AIGC Research Group

About us

We are a compact and hardcore research team focused on harnessing the power of Multimodal Reasoning. Current focus includes:

Multimodal
O1-style reasoning
Text-to-image generation

PROJECT HIGHLIGHT

VisualThinker-R1-Zero

Deepseek R1 has demonstrated how Reinforcement Learning (RL) with well-designed rule-based rewards can enable a large language model to build unique reasoning capabilities autonomously. Since then, many researchers have attempted to extend this success to multimodal reasoning. However, recent efforts primarily struggle to reproduce the increasing response length and thinking pattern exhibited by DeepSeek R1.

VisionThinker-R1-Zero is a replication of DeepSeek-R1-Zero training on small multimodal models. We are the first to successfully observe the emergent “aha moment” and increased response length on multimodal tasks. Through applying GRPO on an unaligned 2B base model, we can observe the model develops self-verification autonomously and exhibits an emergent ability to "take another look" at the image and correct its mistakes.

Project Page

More Projects

MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?

DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers

Understanding the Impact of Negative Prompts: When and How Do They Take Effect?

Large Language Models are Interpretable Learners

One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

On Discrete Prompt Optimization for Diffusion Models

The Crystal Ball Hypothesis in Diffusion Models: Anticipating Object Positions from Initial Noise

MuLan: Multimodal-LLM Agent for Progressive Multi-object Diffusion

Join Us

At TurningPoint AI, there are no ranks or limits - only opportunities. We prioritize transparency, fairness, and integrity. Active contribution and team spirit are fundamental to our team culture. For research assistants and collaboration opportunities:

Apply

TurningPoint AI AIGC Research Group