How a literary prize highlighted the growing challenge of AI use in writing worldwide

Wait 5 sec.

Criticism is not uncommon in the aftermath of a literary prize announcement. But the uproar after the British literary magazine Granta announced the regional winners of its annual short story prize last month was literary criticism of a different kind: allegations of the use of artificial intelligence (AI).Within days of the magazine naming this year’s winners of the Commonwealth Short Story Prize, allegations surfaced that some of the winning entries showed signs of AI-generated text. The turn of events has not just highlighted the increasing usage of AI in creative and other forms of writing but also put the role of tools that claim to detect AI-generated text under scrutiny. We explain.The controversySince 2012, Granta has been publishing the winners of the Prize — awarded in partnership with the Commonwealth Foundation — for five geographies: Africa, Asia, Canada and Europe, the Caribbean, and the Pacific. The overall winner is to be announced on June 30.Days after the magazine announced the winners, many social media users began calling out Trinidadian writer Jamir Nazir’s short story “The Serpent in the Grove” (winner for the Caribbean), with one citing the AI detector tool Pangram to call it “100% AI generated” and a “Turing Test of sorts”.The Turing Test, proposed by British mathematician Alan Turing in the 1950s, is a test of a machine’s ability to exhibit intelligent behaviour that a human evaluator cannot distinguish from that of a human. To date, it is considered a benchmark for AI.Also read | Did AI write this Commonwealth Prize-winning story?Similar allegations were directed at two other winners, Indian writer Sharon Aruparayil (Asia) and Malta’s John Edward DeMicoli (Canada and Europe), again using Pangram. The remaining two — Lisa-Anne Julien (South Africa, Africa) and Holly Ann Miller (New Zealand, Pacific) — were, however, assessed to be “fully human-written”.In a written response, Aruparayil earlier told The Indian Express that “no AI tools were used at any stage in the writing, editing, or development process” of her story.Machine learningStory continues below this adTo understand how these tools claiming to detect AI-generated text work, one must first know the science of machine learning (ML). In simple terms, ML refers to the use of data and statistics to build an AI system: this is done by feeding large datasets into a computer so that it can think and reason like a human — or even at superhuman levels.“So you would take lots of examples of AI-written content and human-written content and feed it to a big model to do the classification for you. The model, through data, learns signals like, ‘Oh, AI models tend to use em dashes’, or use the word ‘imperative’ or ‘delve’. These are statistical patterns that large ML models can learn when they’re fed lots of examples of both human writing and AI writing,” Danish Pruthi, assistant professor at the Indian Institute of Science Bengaluru, told The Indian Express.When allegations of AI started flying in after the prize announcement, a lot of them pointed to the “tells”: signs that a piece of text is AI-generated. (The term comes from the card game poker, where it refers to an involuntary change in a player’s body language that gives clues about their next move.)Pruthi said that besides em dashes or certain words, other tells include text that is organised in bullet points — often with a heading of what the bullet is about. Also, even though AI-generated text tends to conclude things neatly, Pruthi said that “human conclusions sometimes introduce new content, but model conclusions rarely do”.Story continues below this adHe also cited the instance of “negative parallelism”: a rhetorical writing style marked by the formulaic “Not X, but Y” structure. “For instance, ‘These headphones are not just hearing devices, but sound-cancelling devices.’ Models are very commonly doing that now,” he said.Also read | Commonwealth Prize AI controversy shows: The writer who took the shortcut, failed the storyAs to where these tells come from, Pruthi said research was ongoing but there were no clear answers yet.“One common hypothesis is that after you pre-train a model, you post-train it to make it safe and useful and able to follow instructions. That’s typically done by contracting annotators and data vendors who create examples to answer questions of different types,” he said.“A lot of those datasets, which are private and constructed by large frontier labs, have these cues. People who are writing those answers write in this way, and therefore models replicate that behaviour,” he added.Story continues below this adPruthi said using ML to tell apart whether something is written by a human or AI is just one approach: one that even Pangram — the AI detector at the heart of the matter — also uses. He added that people now want to move beyond this “binary framing” of what is AI versus human. “They (as in the entire research community) are figuring out what the extent of collaboration is. Is it lightly assisted by an AI, moderately assisted by an AI, [or] heavily assisted by an AI?”In a statement following the allegations, Granta’s publisher Sigrid Rausing had said that the magazine had used the AI chatbot Claude to evaluate Nazir’s story as to “whether it was AI-generated”. “The response was long, concluding that it was ‘almost certainly not produced unaided by a human’,” the statement said.According to Pruthi, asking Claude, ChatGPT, or Gemini whether something is AI-written is a “very bad idea”. “The model is not specifically trained for this. So it might take an educated guess, but that isn’t going to be very accurate… This happens to be a task where you care about accuracy a great deal because of it being high stakes,” he said. Recently, Nobel Prize-winning Polish writer Olga Tokarczuk’s comments about using AI for research while writing her last novel had invited criticism. Photo: Wikimedia CommonsThe other key difference, Pruthi said, is that a lot of detectors are tuned in such a way to ensure “false positives are low”. A false positive refers to an instance when a detector flags something human-written as AI-generated, as opposed to a false negative, wherein an AI-generated text might pass off as human.Story continues below this ad“So they will specifically set the model, or set the thresholds, in such a way that the chance of incorrectly flagging a human-written text as AI is very, very low,” he added.AI detectors are also distinct from tools detecting plagiarism: Pruthi said that these were “two entirely different tasks”. Since plagiarism largely concerns copying intellectual work without attribution, “plagiarism detectors tend to sophisticatedly score how well a given idea/work matches with existing work”.“On the contrary, AI detectors just try to estimate whether a piece of text could be AI-generated given lots of examples they’ve seen,” he said.How reliable are these tools?Pruthi said that Pangram — which claims a false positive rate of 1 in every 10,000 cases (0.01%) — is quite reliable, which was backed by some independent studies.Story continues below this adBut he sounded a word of caution, saying that an ML model would “obviously not be 100% accurate all the time”. Using the analogy of spam classification in emails, he said: “We still develop ML models to detect what is spam and what is not. There too, the content, the words used, the way it is phrased — all that is helpful to figure out whether it is spam or not. We still get a few examples wrong, where an important email might land in spam or vice versa.”This is because these tools have limitations. According to Pruthi, ML models are more likely to be wrong when there are fewer words, since there aren’t enough signs to tell confidently whether something’s AI-written or human-authored.Another limitation is what Pruthi called “low-entropy text”, which refers to text that is generally precise and accurate in nature — and thus hard to classify.“Let’s say if I ask you, ‘Give me all the states of India in alphabetical order.’ The states you will produce are one clear, definitive answer, whereas what a model will produce will also be the same answer… You will not be able to tell whether it’s coming from a model or a human,” he said.Story continues below this adSimilarly, code — written instructions that tell a computer to execute a certain task — can also “be tricky to detect” at times. “There’s only a certain way it can be written,” Pruthi added.NewsletterFollow our daily newsletter so you never miss anything important. On Wednesday, we answer readers' questions.SubscribePruthi mentioned another limitation — one that he and his colleagues presented in a recent paper at an ML conference — in a scenario when one slightly polishes a piece of text using a language model. “Even though the base ideas and content were written by you, instead of saying that it is slightly edited or mixed text, models might flag it as fully AI-generated,” he said.This can dissuade writers from even refining their writing using AI, out of fear that their work could be mistakenly flagged as AI-generated. “The mob believes the machine, and the machine gets to control the narrative of what deserves to be ‘human-written’,” Arupayiril had told The Indian Express.Pruthi said that while cases like Arupayiril’s were unfortunate, AI text detection “saves writers’ lives and careers in a different way”.Story continues below this ad“There’s a lot of AI slop on the internet right now. A lot of Kindle books that are published are fully AI-written,” he said. “So if a good detector is able to weed out most of that and at least label that this is AI-generated, then maybe a careful reader can choose not to consume that content. In some way, good AI detectors are helping channel attention to legitimately human-written content.”Impact on writers and publishersRecently, Nobel Prize-winning Polish writer Olga Tokarczuk’s comments about using AI for research while writing her last novel had invited criticism. Although Tokarczuk later clarified she did not use AI in the writing process itself, Pruthi believed transparency was key. “If writers are using and benefitting from AI, they can appropriately disclose it,” he said.Jane Friedman, an American publishing professional with over two decades of industry experience, concurred. “Everyone needs to be on the same page about where these tools are touching the process and, to the best of everyone’s ability, track how they’re being used,” she told The Indian Express.According to her, there was a lack of trust around the use of AI itself. “One of the problems here is that everyone is kind of doing their own thing behind the curtain. Part of that has to do with the taboo around it and everyone being uncertain about the technology and the different attitudes towards it,” she said.In terms of using AI responsibly, Friedman cited a recent report by The New York Times that mentioned how a non-fiction book published in the US included quotes that were fabricated by AI. “This is a classic example of people either trusting the AI too much or not yet having the skill to use it in a way that avoids these sorts of mistakes,” she said.She, however, felt that writers will eventually get smarter. So, the onus was on everyone working in publishing or related industries such as media or academia to be on the same page, despite their “very good reasons to be anti-AI”.“I think just saying, ‘I don’t want to deal with it and you can’t make me’, seems somewhat childish or naive. At some point, you have to realise that this technology is here,” she said.