“Researchers propose a new framework distinguishing between capability elicitation (increasing probability of existing behaviors) and capability creation (enabling fundamentally new model capabilities) during post-training. This distinction challenges conventional wisdom that supervised fine-tuning merely imitates while reinforcement learning discovers, offering a more nuanced view of how training procedures actually improve language models.”
Key Takeaways
- Current post-training debate oversimplifies SFT as imitation and RL as discovery—a distinction that misses crucial nuances.
- The key question is whether training increases probability of existing behaviors or fundamentally changes model capabilities.
- Free-energy perspective provides framework for distinguishing elicitation from creation in post-training procedures.
Post-training doesn't just imitate—it fundamentally changes what AI models can actually do.
trending_upWhy It Matters
Understanding whether post-training elicits or creates capabilities is fundamental to improving AI development practices and setting realistic expectations for what different training methods achieve. This distinction directly impacts how researchers design training procedures, allocate computational resources, and evaluate model improvements. The framework could reshape post-training research methodology and help practitioners make better decisions about which techniques to employ.



