Project
Automated Scoring System

Client
English Literacy Platform
RoleInstructional Designer

Overview In preparation to scale our product, I researched how to train machine learning models to accurately score writing assessments. My work resulted in a comprehensive product requirements document (PRD) for a system that could efficiently process student writing samples, automatically identify and label exempt responses, and generate scores across our predefined categories.

Year2024


























PROBLEM

Scoring a single assessment manually takes a human scorer approximately 10-15 minutes. For a large-scale assessment involving 1,000 student writing samples, this would require around 250 hours of dedicated scoring time. This process creates significant bottlenecks in delivering timely feedback, limits the number of students that can be assessed, and represents an inefficient use of the content team's time. The manual scoring method also introduces potential inconsistencies in evaluation and prevents rapid, scalable reporting of student writing performance.

Key objectives for the Automated Scoring System were as follows:

  • Develop ML models capable of precise multi-category scoring
  • Create a flexible system for processing and scoring writing samples
  • Design an output mechanism that could generate clear, actionable assessment data
  • Provide a scalable solution that could potentially handle tens of thousands of student assessments





RESEARCH


To prepare to write this PRD, I proactively expanded my machine learning knowledge. I learned about model development in texts such as Chollet's "Machine Learning with Python." Simultaneously, I pursued online coursework in programming to deepen my technical understanding.

My goal was to bridge the gap between technical complexity and practical application, ensuring the PRD would be comprehensible and actionable for both the content and engineering teams. This approach allowed me to craft a document that was technically rigorous yet accessible, translating complex ML concepts into clear, strategic requirements that could guide potential system implementation.





RESULTS
The PRD outlined a robust system capable of:

  • Accepting inputs as CSV files containing student writing samples
  • Leveraging advanced algorithms and AI/plagiarism detection to identify and tag exempt responses with exemption codes
  • Evaluating responses across five categories using NLP-based models
  • Outputting two CSV files: 1) Exempt responses with their exemption codes. 2) Scoreable responses with raw scores across five categories.

Though this system wasn’t implemented, planning it gave me invaluable insights into AI integration and NLP applications in education. It also offered a look into the complexities of creating scalable, future-proof solutions with AI-driven systems. This experience fueled my enthusiasm for exploring NLP and AI technologies further in my career.