“DisaBench is a new evaluation framework co-created with people with disabilities to assess how language models perform across disability-related scenarios. The research introduces a taxonomy of twelve disability harm categories, an evaluation methodology using benign and adversarial prompts across seven life domains, and a dataset of 175 prompts. This addresses a critical gap where general-purpose safety benchmarks fail to adequately evaluate disability-specific risks.”
Key Takeaways
- DisaBench contains twelve disability harm categories developed with disabled people and red teaming experts.
- Framework evaluates LMs across seven life domains using paired benign and adversarial prompts.
- Dataset includes 175 prompts with human-annotated labels across 525 prompt-response pairs.
New benchmark identifies disability harms that standard AI safety tests completely miss.
trending_upWhy It Matters
Disability communities are frequently overlooked in AI safety research, leading to models that cause real harms to disabled users. DisaBench fills this critical gap by centering disabled voices in AI evaluation, establishing a participatory methodology that could become standard practice. This work ensures future language models are tested for accessibility and disability-specific risks, not just general safety concerns.
FAQ
Why is DisaBench necessary when general safety benchmarks already exist?
General-purpose safety benchmarks don't adequately evaluate disability-specific harms, leaving vulnerable populations at risk from models that pass standard safety tests but still cause disability-related injuries.
How were the twelve disability harm categories determined?
The taxonomy was co-created through participatory research with people with disabilities and red teaming experts, ensuring disabled communities directly shaped what harms are evaluated.



