arrow_backNeural Digest
AI-generated illustration
AI image
Research

Exploring Data Augmentation and Resampling Strategies for Transformer-Based Models to Address Class Imbalance in AI Scoring of Scientific Explanations in NGSS Classroom

ArXiv CS.AI5d ago
auto_awesomeAI Summary

Researchers are exploring data augmentation and resampling techniques to improve transformer-based models at scoring student scientific explanations, particularly for identifying advanced reasoning. This work addresses a critical challenge in educational AI: handling imbalanced datasets where certain rubric categories are underrepresented, which is essential for fair and accurate automated assessment systems.

Key Takeaways

  • Study uses 1,466 student responses to physical science assessment aligned with NGSS standards
  • Addresses class imbalance problem where advanced reasoning categories are underrepresented in training data
  • Investigates augmentation and resampling strategies for transformer-based text classification models

New study tackles class imbalance in AI systems scoring student scientific explanations

trending_upWhy It Matters

Automated educational scoring has significant potential for providing scalable, immediate feedback to students, but class imbalance can lead to biased models that fail to recognize advanced reasoning. This research directly tackles a practical barrier to deploying AI scoring systems in real classrooms, ensuring they accurately assess all student performance levels rather than favoring common response categories.

FAQ

What is class imbalance and why is it problematic for AI scoring?expand_more
Class imbalance occurs when some categories (like advanced reasoning) have fewer training examples than others. This causes AI models to perform poorly on underrepresented categories, leading to unfair or inaccurate assessments.
How do data augmentation and resampling help with this problem?expand_more
These techniques artificially increase the training data for underrepresented categories through methods like creating synthetic samples or oversampling, helping models learn to recognize all response types equally well.
This summary was AI-generated. Neural Digest is not liable for the accuracy of source content. Read the original →
Read full article on ArXiv CS.AIopen_in_new
Share this story

Related Articles