ProjectLexical Diversity Client English Literacy PlatformRoleInstructional Designer OverviewDuring this project, I conducted research into computational metrics for evaluating lexical diversity and sentence complexity. New scoring categories, along with the introduction of subscores, enhanced our scoring accuracy and allowed us to provide students with more meaningful insights into their English proficiency. Year2023
PROBLEM
In our assessments, students write paragraphs of ±30-60 words in response to a prompt. Initially, we scored writing samples on a 0-20 scale in five categories:
Holistic Quality
Genre Elements
Correct Word Sequence
Readability
Complex Words
While categories 1-3 were accurately scored by human raters, categories 4-5 used computational methods that introduced significant inaccuracies. Readability calculations using the Flesch-Kincaid formula and complex word percentages produced unreliable results for short text samples.
Our research aimed to identify and integrate alternative metrics to assess the readability and sophistication of a writing sample. We also researched options for scoring scales that would allow students to see more growth on their assessment reports.
RESEARCH Our objective for this research was to find metrics to assess a student's
Lexical Profile: Vocabulary usage, variety, and accuracy
Sentence Variety and Accuracy: The ability to construct clear, varied sentence structures
Additionally, metrics must be:
Easy and quick to determine using rubrics or computational methods
Less susceptible to innacuracies due to short text length
LEXICAL PROFILE (LP)After extensive research into lexical diversity measures, I selected the Measure of Textual Lexical Diversity (MTLD) for its resistance to text length variations as one half of the LP score. For lexical sophistication, I averaged two distinct metrics, the Academic Word List (AWL) and the English Vocabulary Profile (EVP). Lexical Profile = Lexical Diversity +
Lexical Sophistication
Percentage of words from Averil Coxhead's academic word families
Retrieved using Text Inspector
Rounded to nearest tenth
Typical range: 0-10%
Translated to 0-9 scale using a piecewise function
EVP Characteristics
Percentage of unique words at B1 CEFR level
Uses UK vocabulary list
Retrieved using Text Inspector
Rounded to nearest tenth
Typical range: 0-20%
Translated to 0-9 scale using a piecewise function
SENTENCE VARIETY & ACCURACY (SVA)Colleagues simultaneously developed a 0-9 rubric that evaluated:
Clarity of sentence structures
Variety of clause arrangements
Effectiveness of sentence structures in conveying meaning
Sentence complexity and intentional communication
IMPACT ON DESIGN We implemented several key changes to make our scoring scale and reports more student-friendly:
Scoring Scale:
Version 1: Categories scored 0-4, totaling 20 possible points
Version 2: Categories scored 0-18, totaling 90 possible points
Contextualization:
Version 1: Scores directly compared to CEFR
Version 2: Replaced direct CEFR correlation with standardized cateogires specific to the platform's curriculum
Subscores:
Version 1: Feedback based on total score
Version 2: Three subscores displayed numerically, allowing performance-based customization
RESULTSBy implementing advanced linguistic analysis techniques like the Measure of Textual Lexical Diversity (MTLD) and developing a comprehensive rubric, we significantly improved the precision of our English proficiency assessments. The new scoring system allowed us to provide students with more detailed feedback about their writing skills and linguistic development.