Texas Shifts to AI Grading For STAAR Tests Amid Educator Concerns

Source: Unsplash / Annie Spratt

Students in Texas are facing a new reality as their written responses on STAAR tests will now be graded by computers, according to The Texas Tribune. Kids taking the State of Texas Assessment of Academic Readiness this week will have their open-ended answers assessed by an "automated scoring engine" instead of human graders.

This move by the Texas Education Agency (TEA) comes after a redesign of the STAAR test last year, which emphasized more constructed response items over multiple choice. Jose Rios, director of student assessment at the TEA, spoke on the necessity of such a change due to the time-intensive nature of grading these types of answers. The natural language processing technology behind this, similar to AI chatbots such as GPT-4, is expected to save the state approximately $15-20 million a year.

While the TEA has streamlined its process, needing fewer than 2,000 human scorers this year compared to last year's 6,000, some educators remain skeptical about the efficacy of this technological shift. "There ought to be some consensus about, hey, this is a good thing, or not a good thing, a fair thing or not a fair thing," Kevin Brown, the executive director for the Texas Association of School Administrators, told CBS Austin. In response to the trepidation, Chris Rozunick, division director for assessment development at the TEA, assured that the automation would not stray into autonomous decision-making.

Despite the assurances, incidents from a pilot in December 2023 have raised eyebrows, as Lori Rapp, superintendent at Lewisville ISD, reported a "drastic increase" in zeroes for constructed responses and remained unable to pinpoint the exact cause. The concerns are not just about errors but about the potential inability of computers to recognize the nuances of student creativity and unique expression in writing, a fear echoed by many in the community. Still, the TEA insists that a quarter of the responses will be rescored by humans and those with "low confidence" scores or unrecognizable content will be redirected to human scorers.

For those worried about inaccuracies in grading, the TEA has set a $50 fee for rescoring requests, refunded if the new score trumps the initial one. This automated system's stakes are high as STAAR results significantly impact school grading and can lead to serious state interventions in underperforming schools. As the state treads into new waters with machine learning and artificial intelligence, the coming months may be telling of the risks and rewards of such a leap into technologically-driven education assessment.