Abstract
Grading and providing personalized feedback on short-answer questions is time consuming. Professional incentives often push instructors to rely on multiple-choice assessments instead, reducing opportunities for students to develop critical thinking skills. Using large-language-model (LLM) assistance, we augment the productivity of instructors grading short-answer questions in large classes. Through a randomized controlled trial across four undergraduate courses and almost 300 students in 2023/2024, we assess the effectiveness of AI-assisted grading and feedback in comparison to human grading. Our results demonstrate that AI-assisted grading can mimic what an instructor would do in a small class.