“A new study reveals that chain-of-thought reasoning in models like DeepSeek-R1 actually amplifies position bias in multiple-choice questions, contrary to assumptions that extended reasoning reduces heuristic biases. The finding suggests that longer reasoning trajectories correlate with stronger positional preferences, challenging the premise that more thinking automatically improves reasoning quality.”
Key Takeaways
- Position bias in multiple-choice QA scales with the length of reasoning trajectories in reasoning-capable models.
- Chain-of-thought reasoning doesn't reduce shallow heuristics as commonly assumed; it can amplify certain biases.
- Study tests thirteen reasoning-mode configurations including DeepSeek-R1 distilled models, revealing systematic bias patterns.
Reasoning models exhibit more position bias when given longer thinking processes.
trending_upWhy It Matters
This research has significant implications for AI deployment in high-stakes applications like standardized testing and educational assessment. If reasoning models systematically favor certain answer positions more as they think longer, it raises concerns about fairness and reliability. Understanding these biases is crucial for developers building trustworthy AI systems and for organizations using these models to make important decisions.



