arrow_backNeural Digest
AI-generated illustrationAI image
Research

Blind Refusal: Language Models Refuse to Help Users Evade Unjust, Absurd, and Illegitimate Rules

ArXiv CS.AI5h ago
auto_awesomeAI Summary

Researchers document a critical flaw in safety-trained language models: they indiscriminately refuse requests to circumvent rules, regardless of whether those rules are legitimate, unjust, or absurd. This 'blind refusal' represents a failure in moral reasoning that could limit AI usefulness in scenarios requiring ethical judgment about rule validity.

AI models refuse all rule-breaking requests, even when rules are unjust or absurd.

This summary was AI-generated. Neural Digest is not liable for the accuracy of source content. Read the original →
Read full article on ArXiv CS.AIopen_in_new
Share this story