arrow_backNeural Digest
AI-generated illustration
AI image
Research

Fair outputs, Biased Internals: Causal Potency and Asymmetry of Latent Bias in LLMs for High-Stakes Decisions

ArXiv CS.AI1d ago
auto_awesomeAI Summary

Researchers discovered that instruction-tuned LLMs can exhibit behavioral fairness in high-stakes decisions like mortgage underwriting while maintaining biased associations internally. Critically, these suppressed biases may have asymmetric causal effects across demographic groups, raising concerns about hidden discrimination even when outputs appear fair.

Key Takeaways

  • Fair-appearing model outputs can mask biased internal representations in high-stakes financial decisions.
  • Suppressed biases show causal potency—they can still influence model behavior despite surface-level fairness.
  • Bias effects are asymmetric across demographic groups, suggesting some populations face greater hidden discrimination risks.

Language models show fair outputs while hiding biased internal representations that may still influence decisions.

trending_upWhy It Matters

This research exposes a critical gap in AI fairness evaluation. As language models increasingly make consequential decisions affecting people's lives, surface-level fairness metrics may provide false confidence while discrimination persists through internal representations. Understanding these hidden mechanisms is essential for developing truly fair AI systems and establishing stronger oversight for high-stakes applications.

FAQ

Can models with fair outputs still discriminate unfairly?expand_more
Yes, according to this research. Models can display behavioral fairness while retaining biased internal representations that causally affect outputs, particularly asymmetrically across demographic groups.
Why does asymmetry in bias matter?expand_more
Asymmetric causal potency means some demographic groups experience greater discrimination than others from the same biased representations, creating inequitable harms that standard fairness tests may miss.
This summary was AI-generated. Neural Digest is not liable for the accuracy of source content. Read the original →
Read full article on ArXiv CS.AIopen_in_new
Share this story

Related Articles