DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models

auto_awesomeAI Summary

“DisaBench is a new evaluation framework co-created with people with disabilities to assess how language models perform across disability-related scenarios. The research introduces a taxonomy of twelve disability harm categories, an evaluation methodology using benign and adversarial prompts across seven life domains, and a dataset of 175 prompts. This addresses a critical gap where general-purpose safety benchmarks fail to adequately evaluate disability-specific risks.”

Key Takeaways

DisaBench contains twelve disability harm categories developed with disabled people and red teaming experts.
Framework evaluates LMs across seven life domains using paired benign and adversarial prompts.
Dataset includes 175 prompts with human-annotated labels across 525 prompt-response pairs.

New benchmark identifies disability harms that standard AI safety tests completely miss.

trending_upWhy It Matters

Disability communities are frequently overlooked in AI safety research, leading to models that cause real harms to disabled users. DisaBench fills this critical gap by centering disabled voices in AI evaluation, establishing a participatory methodology that could become standard practice. This work ensures future language models are tested for accessibility and disability-specific risks, not just general safety concerns.

FAQ

Why is DisaBench necessary when general safety benchmarks already exist?

General-purpose safety benchmarks don't adequately evaluate disability-specific harms, leaving vulnerable populations at risk from models that pass standard safety tests but still cause disability-related injuries.

How were the twelve disability harm categories determined?

The taxonomy was co-created through participatory research with people with disabilities and red teaming experts, ensuring disabled communities directly shaped what harms are evaluated.

This summary was AI-generated. Neural Digest is not liable for the accuracy of source content. Read the original →

Read full article on ArXiv CS.AIopen_in_new

Share this story

DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models

Key Takeaways

trending_upWhy It Matters

FAQ

Related Articles

Interoception: Your Brain's Hidden Sense Explained

ToolSense: Auditing How LLMs Understand Tools

Arbor: Tree Search Powers Autonomous Agent Reasoning