12 May 2026·6 min read

How AI Detectors Like Turnitin and GPTZero Actually Work

If you have used AI to help with an essay, you have probably worried about detectors like Turnitin, GPTZero, or Originality.ai. They feel like a black box: you paste text, and a number comes back telling you how "AI" it looks. Understanding what that number actually measures takes a lot of the fear out of it — and helps you write in a way that reads like you.

Detectors predict, they do not "know"

The first thing to understand is that no detector can see whether you used AI. There is no hidden watermark in most AI text and no record the detector can look up. Instead, a detector is a statistical model trained on large amounts of human writing and large amounts of AI writing. When you give it your text, it estimates how likely it is that text like yours came from an AI model. The output is a probability, not a fact.

This is why detector results are described as "likelihood" or "confidence" scores. A 90% score does not mean 90% of your essay is AI. It means the model is fairly confident the writing pattern resembles AI-generated text it has seen before. That confidence can be wrong in both directions.

Perplexity: how predictable your words are

The core signal most detectors lean on is called perplexity. In simple terms, perplexity measures how surprising each word is given the words around it. Large language models are trained to produce the most probable next word, so their writing tends to be smooth, predictable, and low-perplexity. Human writing wanders more — we pick unexpected words, change direction mid-thought, and vary our phrasing. That unpredictability reads as higher perplexity.

When your text is consistently smooth and predictable, a detector sees low perplexity and leans toward "AI." When your text has natural variety and the occasional surprising turn of phrase, perplexity rises and the detector leans toward "human."

Burstiness: how much your sentences vary

The second common signal is burstiness — the variation in sentence length and structure across a passage. Humans write in bursts: a long, winding sentence followed by a short one. Three words. Then back to something more complex. AI models, left to their own devices, tend to produce sentences of similar length and rhythm, paragraph after paragraph. Low burstiness is a strong tell.

This is why text that has been lightly paraphrased — swapping a few synonyms but keeping the same sentence shapes — often still gets flagged. The words changed, but the rhythm did not.

Why detectors get it wrong

Because detectors work on statistical patterns, they make mistakes. Two kinds matter to students:

False positives: genuine human writing flagged as AI. This happens most to non-native English writers, people who write in a plain, formal style, and anyone whose prose is naturally even and structured. It is a documented, serious problem.
False negatives: AI text that slips through as human. This happens when the writing has enough variety and unpredictability to look natural.

Different detectors also disagree with each other constantly. The same paragraph can score 15% on one tool and 80% on another, because they were trained differently. There is no single "official" score.

What this means for you

The practical takeaway is not to chase a magic number. It is to make your writing genuinely read like a person: vary your sentence lengths, let your own voice and opinions show, and avoid the flat, evenly-paced phrasing that AI defaults to. That is exactly the kind of restructuring a tool like PenSmith does — and because detection is probabilistic, the only honest way to know where you stand is to check your own draft against a detector before you submit it, rather than trusting any guarantee.