Two years since the release of ChatGPT, teachers and institutions are still struggling with assessment in the age of artificial intelligence (AI). Some have banned AI tools outright. Others have turned to AI tools only to abandon them months later or have called for teachers to embrace AI to transform assessment. The result is a hodgepodge of responses, leaving many kindergarten to Grade 12 and post-secondary teachers to make decisions about AI use that may not be aligned with the teacher next door, institutional policies, or current research on what AI can and cannot do.One response has been to use AI detection software, which rely on algorithms to try to identify how a specific text was generated. AI detection tools are better than humans at spotting AI-generated work. But they’re a sufficiently imperfect solution, and they do nothing to address the core validity problem of designing assessments where we can be confident in what students know and can do. Teachers using AI detectorsA recent American survey, based on nationally representative surveys of K-12 public school teachers published by the Center for Democracy and Technology, reported that 68 per cent of teachers use AI detectors.This practice has also founds its way into some Canadian K-12 schools and universities.AI detectors vary in their methods. Two common approaches are to check for qualities described as “burstiness,” referring to alternating and short and long sentences (the way humans tend to write) and complexity (or “perplexity”). If an assignment does not have the typical markers of human-generated text, the software may flag it as AI-generated, prompting the teacher to begin an investigation for academic misconduct.To its credit, AI detection software is more reliable than human detection. Repeated studies across contexts show humans — including teachers and other experts — are incapable of reliably distinguishing AI-generated text, despite teachers’ confidence that they can spot a fake. Teachers should not be confident they can spot AI-generated text. Icons for apps DeepSeek and ChatGPT on a smartphone screen in Beijing, Jan. 28, 2025. (AP Photo/Andy Wong) Accuracy of detectors variesWhile some AI detection tools are unreliable or biased against English language learners, others seem to be more successful. However, what success rates should really signal for educators is questionable.Turnitin boasts that their AI detector has a 99 per cent success rate, vis-à-vis their near one per cent rate of false positives (that is, the number of human-generated submissions their tool incorrectly flags as AI-generated). This accuracy has been challenged by a recent study that found Turnitin only detected AI-generated text about 61 per cent of the time.The same study suggested how different factors could shape accuracy results. For example, GPTZero’s accuracy may be as low as 26 per cent, especially if students edit the output an AI tool generates. Yet a different study of the same detector suggested a wide range of results (for example, between 23 and 82 per cent accuracy or 74 and 100 per cent accuracy).Considering numbers in contextThe value of a percentage depends on its context. In most courses, being correct 99 per cent of the time is exceptional. It’s above the most common threshold for statistical significance in academic research, which is often set at 95 per cent. But a 99 per cent success rate would be atrocious in air travel. There, a 99 per cent success rate would mean around 500 accidents every day in the United States alone. That level of failure would be unacceptable.To suggest what this could look like: at an institution like mine, the University of Winnipeg, about 10,000 students submit multiple assignments — we could ballpark five, for argument’s sake — for around five courses every year. That would be about 250,000 assignments every year. There, even a 99 per cent success rate means roughly 2,500 failures. That’s 2,500 false positives where students did not use ChatGPT or other tools, but the AI detection software flags them for possible use of AI, potentially initiating hours of investigative work for teachers and administrators alongside stress for students who may be falsely accused of cheating. Time wasted investigating false positivesWhile AI detection software merely flag possible problems, we’ve already seen that humans are unreliable detectors. We cannot tell which of these 2,500 assignments are false positives, meaning cheaters will still slip through the cracks and precious teacher time will be wasted investigating innocent students who did nothing wrong.This is not a new problem. Cheating has been a major concern long before ChatGPT. Ubiquitous AI has merely shed a spotlight on a long-standing validity problem. When students can plagiarize, hire contract cheaters, rely on ChatGPT or have their friend or sister write the paper, relying on take-home assessments written outside class time without any teacher oversight is indefensible. I cannot presume that such forms of assessment represent the student’s learning, because I cannot reliably discern if the student actually wrote them. Need to change assessmentThe solution to taller cheating ladders is not taller walls. The solution is to change how we are assessing — something classroom assessment researchers have been advocating for long before the onset of AI.Just as we don’t spend thousands of dollars on “did-their-sister-write-this” detectors, schools should not rest easy simply because AI detection companies have a product to sell. If educators want to make valid inferences about what students know and can do, assessment practices are needed that emphasize ongoing formative assessment (like drafts, works-in-progress and repeated observations of student learning). These need to be rooted in authentic contexts relevant to students’ lives and their learning that centre comprehensive academic integrity as a shared responsibility of students, teachers and system leaders — not just a mantra of “don’t cheat and if we catch you we will punish you.”Let’s spend less on flawed detection tools and more on supporting teachers to develop their assessment capacity across the board.Michael Holden does not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.