As generative AI becomes ever more convincing at mimicking human text, many universities and academic institutions have come to rely on AI detection tools to police academic integrity. However, recent research has clearly demonstrated that these tools are not only ineffective, they are also amplifying systematic injustices in academia. Jenni AI presents a smarter workspace for drafting, citing, and proofreading: helping students and researchers make the best possible use of AI tools while ensuring their academic integrity is preserved. More
In 2022, the use of generative AI language tools such as ChatGPT began to explode in popularity, making it possible to produce coherent, human-like text with short and simple prompts. For universities, this created a clear and immediate problem: that students would come to rely on these tools for their writing assignments without imparting any effort themselves.
To combat this concern, AI detection tools started to emerge, designed to flag writing clearly generated by AI. These tools became the default mechanism for protecting academic integrity – with little concrete proof that they actually work.
When AI generates text, it simply chooses which word is most likely to come next, given all the words which have come before. As a result, AI-generated text clusters around high-probability word choices, which is ultimately why it sounds so smooth. In contrast, since every human mind is unique, our writing is generally filled with unusual phrases and leaps in logic, with sentence structures that vary to reflect our distinctive thought patterns.
This tendency can be measured using a quantity called ‘perplexity’: a measure of the predictability of word choices, based on how surprised a language model is when comparing the next word in a piece of text with its own predictions.
Detection tools aim to exploit these differences by using perplexity to assess whether text is mathematically probable. At first, AI detection tools based on these measures worked fairly well, correctly flagging AI-written text most of the time. But this reliability wouldn’t last long.
As language models evolved, they quickly improved their ability to capture the more improbable aspects of human writing. The better AI has become at mimicking human language, the more its outputs overlap with genuine human text – meaning that today, AI-generated writing is often able to slip through the cracks in the latest detection tools.
On the surface, it might seem that this is simply a technical problem, solvable through improvements to detection tools that allow them to distinguish increasingly subtle differences between human- and AI-written text. But unfortunately, the results of real peer-reviewed research paint a different picture.
In 2023, a team of researchers at the University of Maryland approached the problem mathematically in an arXiv preprint. If an AI detection tool is as good as it could possibly be, they asked – is there a fundamental limit to its accuracy? The answer they arrived at was discouraging.
Since the overlap between human- and AI-written text is now so large, tools will always face a trade-off: the more AI writing they correctly identify, the more genuine human writing they will flag as AI-written by mistake. Conversely, the more the rules are relaxed to prevent these errors, the more AI text they will fail to identify.
For universities and wider academic institutions, this presents an uncomfortable truth: no matter how much detection tools are improved to keep pace with the latest AI language models, they can be mathematically proven to get worse over time.
To date, the most comprehensive real-world test of these tools was a study published in the International Journal for Educational Integrity, in which a team tested 14 different detection tools currently in use, tasking them with distinguishing between human- and AI-written text. Every one of them scored below 80% accuracy, with just five scoring above 70%. In practice, this means that even with the best tools currently available, students risk being wrongfully accused of cheating roughly one in five times.
The problem becomes even worse when considering the particular groups of human writers most likely to have their work incorrectly flagged as AI-generated. In another 2023 paper published in Patterns, researchers at Stanford University used detection tools to assess real student essays. While the tools worked well for native English speakers, they misclassified around 60% of all essays by students who speak English as a second language.
According to the team, this likely occurs because students are more careful when writing in a less familiar language, making them more likely to use simpler sentence structures and grammatically correct phrasing to ensure clarity. In turn, this makes their writing more predictable – so that they unintentionally mimic the patterns that appear in AI-generated text.
Similar rates of misclassification have also been observed for writing by neurodivergent students, including those with autism and ADHD. Altogether, this problem only exacerbates an already concerning systematic injustice. The message of these studies is abundantly clear: to preserve academic integrity in the age of generative AI, we will need to look beyond AI detection tools.
At a time when academic writing is almost trivially easy to produce using AI language models, how else can genuine academic integrity be upheld? The team at Jenni AI suggest that universities can help to combat the problem by focusing more on in-class writing assignments, ensuring work is free from plagiarism, and setting process-based assignments where students are required to submit regular drafts and research logs.
For students themselves, the focus should be on providing correct citations and credit, rather than making random tweaks to their writing out of fear of AI detection. Ultimately, academic integrity is based on honestly acknowledging sources and building on others’ ideas – not on outsmarting unreliable algorithms.
To put these concepts into practice, Jenni AI has developed a transparent, controlled workspace for drafting, citing, and proofreading, which aims to support human writers while keeping AI involvement visible, limited, and verifiable. The assistant has several key functions. It assists writers at the sentence level, autocompleting just one sentence at a time, which can be manually accepted so that the user stays actively involved in writing and decision-making. Its suggestions are deeply rooted in verified sources, using an integrated research library to avoid hallucinated or incorrect citations. The Jenni AI system also allows users to insert an AI Declaration clause that is in line with standards from frameworks such as the OECD AI Principles and UNESCO’s Ethics of AI.
The platform also forces the user to oversee what and how much AI can contribute, ensuring the core ideas of any argument originate from genuine human thought. It supports multilingual writing, ensuring that non-native English speakers and neurodivergent writers can communicate clearly.
Through this more responsible use of AI tools, the Jenni AI team are confident that policies are shifting away from AI detection tools to instead encouraging transparency in terms of how and what AI was used for during the writing process. With AI increasingly recognized as an assistant in the writing process, the standard for research quality will rely even more on clear thinking, sound arguments, and accurately supported evidence.