The State of AI-Chatbot Detection

On April 4 of this year, the academic-services firm Turnitin activated a software designed to catch a certain kind of student plagiarist. As has been widely discussed on the Martin Center’s website and elsewhere, the next frontier in academic dishonesty is “student” work that is actually machine-generated—by ChatGPT, Gippr AI, or one of the other artificially intelligent “bots” that will doubtless follow. Turnitin claims that it can now detect this artificial prose with 98-percent accuracy. Yet the most likely development may well be a machine-learning arms race, wherein chatbot developers vie with ostensible chatbot “detectors” for technological supremacy. Such a competition will almost surely not be won by the good guys.

From the beginning, the cracks in Turnitin’s software were already evident. In an April writeup for the Washington Post, tech columnist Geoffrey A. Fowler made the point that “detectors are being introduced before they’ve been widely vetted,” a hasty timeline that has resulted in a rash of false positives. Moreover, according to the same story, Turnitin’s new program occasionally errs in the opposite direction, letting slide material that has, in fact, been written by a machine.

For GPTZero, a competing AI-detection model, things are, if possible, even worse. As Redditors recently discussed to widespread amusement, the self-proclaimed “World’s #1 AI Detector” declared the Declaration of Independence to be robot-composed when asked. (No word on whether anyone has run Hillary Clinton’s 2016 stump speech through the same software.) If chatbot sleuths can’t pick Thomas Jefferson out of a lineup, how do they expect to catch the next generation of sophisticated AI models?

If chatbot sleuths can’t pick Thomas Jefferson out of a lineup, how do they expect to catch the next generation of sophisticated AI models?Increasingly, students themselves—and even robots!—have begun to boast that chatbot-abetted plagiarism is unpreventable. Last month, a Columbia University undergraduate named Owen Kichizo Terry took to the pages of the Chronicle of Higher Education to declare that current academic-integrity policies are “laughably naïve.” “It’s very easy,” Terry averred, “to use AI to do the lion’s share of the thinking while still submitting work that looks like your own.”

During my recent “conversation” with Gippr AI, a new right-leaning conversation bot, similar crowing was on display. “I am confident in my ability to produce material that ‘passes’ for human,” the bot declared, “and may even challenge Turnitin’s ability to identify it.”

Given the uncertain state of AI-prose detection, it is the rare professor indeed who will risk his career with a false accusation. As higher-ed commentators have discussed ad nauseam, the power dynamic between faculty and students has largely flipped in recent years, with the former cohort now living in fear of the latter. Were I to advise a student accused of AI plagiarism, I would tell him to deny, deny, deny while making full use of his university’s “appeals” processes. Believe me, the professor and college would likely back down long before any real discipline was meted out.

Of course, Turnitin’s software and competing products will improve over time. But here’s the rub: So will chatbots. Thus, the alternatives before colleges are to enter a costly, never-ending, and largely futile arms race or to radically rethink how student learning is assessed. The former may be more likely given the higher-ed sector’s resistance to change. The latter is wiser.

Turnitin and other chatbot detectors may win the occasional battle in the fight over academic honesty. But they will lose the war.

Graham Hillard is the managing editor of the James G. Martin Center for Academic Renewal.

Turnitin thinks it can catch students who submit machine-generated prose. But for how long?

Yet Another Bad Admissions Idea

New Sanity on Standardized Tests

Against Voter-Friendly Campuses