What Perplexity and Burstiness Mean in AI Detection

Learn what perplexity and burstiness actually measure in AI detectors, why they matter, and how to write text that doesn't trip the alarm.

When an AI detector says your text "reads like ChatGPT wrote it," it's not running a magic spell. It's measuring two specific statistical properties of your prose: perplexity and burstiness. If you understand what those two things actually are, you can write (or revise) copy that doesn't trigger false positives. You'll also understand exactly why some rewrites work while others don't.

Here's the short version: perplexity measures how predictable each word choice is. Burstiness measures whether your sentence lengths vary the way a human's naturally would. Detectors flag text that scores low on both.

What perplexity measures

In linguistics and machine learning, perplexity is a measurement of how surprised a language model is by a sequence of words. A lower perplexity score means the text was very easy to predict: each word followed naturally from the one before it. A higher perplexity score means the text surprised the model, which signals more original word choices.

When GPT-4 generates text, it picks the statistically safest word at each step. That keeps output coherent, but it also makes it easy to predict in hindsight. A detector trained on AI-generated text learns to recognize this pattern: the whole paragraph feels like it could only go one way.

Here's a concrete example. Compare these two sentences:

Version A: "It is important to note that artificial intelligence has transformed the way businesses operate in today's digital landscape."

Version B: "Most marketing teams now use AI for copy, and a lot of them can't remember what the workflow looked like before."

A language model would assign a much lower perplexity score to Version A. Every word is a safe, expected choice. Version B uses a dash, a contraction, and an informal aside. The word choices are slightly less predictable, so the perplexity score goes up.

If you want a deeper look at how detectors actually use this under the hood, this breakdown of how AI content detectors actually work explains the model architecture involved.

What burstiness measures

Burstiness describes how much sentence length varies within a passage. Humans don't write in steady, even pulses. We write one long, meandering sentence that tries to get everything in, then we stop. Short one. Then another long one with a subordinate clause or two.

Language models tend to produce sentences that cluster around a comfortable middle length. They're rarely very short. They're rarely very long. The output is rhythmically flat, even when the content is interesting.

Burstiness is typically measured by looking at the variance in sentence length across a paragraph or page. High variance (a mix of short and long sentences) looks human. Low variance (sentences all running 18-25 words) looks generated.

Why flat rhythm is so detectable

The problem isn't that AI sentences are "bad." It's that they're consistent in a way real writers aren't. A human drafting a blog post might get impatient and fire off three words. Or they get on a roll and write a sentence that keeps going through a dependent clause and a parenthetical before finally landing. AI rarely does either.

This is also why simple paraphrasing often doesn't help. If you ask an AI to rewrite its own output, it typically produces similar sentence lengths the second time around. The rhythm stays flat because the underlying generation process hasn't changed.

Why both signals are used together

Neither perplexity nor burstiness is reliable on its own.

High perplexity could just mean someone used unusual vocabulary or wrote in a niche domain. A legal brief or a medical abstract might score high perplexity simply because the terminology is specialized, not because a human wrote it. Low burstiness could mean someone wrote in a deliberate, formal style, which is a legitimate choice for certain audiences.

Detectors look for the combination: text that is both very predictable at the word level and very uniform at the sentence level. That pattern rarely appears in human writing except by accident. In AI output, it appears consistently.

It's also worth knowing that this combination creates real problems for false positives. Non-native English writers, people writing in a formal register, and anyone editing their work for clarity can all end up with text that scores low on both metrics. This is one of the reasons AI detectors flag human-written text more often than most people expect.

How to write with higher perplexity and burstiness

You don't need to make your writing weird. You need to make it yours.

On perplexity: Use words and phrases you'd actually say. Avoid the phrasing that sounds like a press release ("it is essential to consider," "plays a critical role in"). Be slightly specific where you'd otherwise be generic. Instead of "businesses can benefit from this approach," write "a three-person agency can set this up in an afternoon."

On burstiness: Read your draft aloud. If it has a metronomic beat, break it up. Add a two-word sentence somewhere. Let one sentence run longer than feels comfortable. The rhythm should feel a little uneven, because that's what human writers actually do.

On both at once: The fastest fix is usually to cut your most "AI-sounding" sentence and replace it with something direct and slightly informal. Not sloppy. Just written from a specific point of view rather than a neutral one.

The free humanizer prompt at /humanizer-prompt is built around these principles. It targets both dimensions systematically, which is faster than hand-editing each paragraph alone.

What the scores can and can't tell you

Perplexity and burstiness scores are probabilistic signals, not verdicts. A low combined score means "statistically similar to AI output." It does not mean "definitely written by AI."

Different detectors weight these signals differently. Some also layer in other features: repetition patterns, syntactic uniformity, topic coherence across paragraphs. A text can score poorly on perplexity but high on burstiness and still pass, depending on the tool's thresholds and training data.

The score is also sensitive to the domain. A test on a coding tutorial will produce different numbers than the same algorithm run on a personal essay. Detectors trained primarily on web content sometimes mis-flag academic writing, and vice versa. Before trusting any single score, consider whether the detector is actually reliable for your use case.

None of this means you should ignore the scores entirely. A score of 95% "AI-generated" across multiple independent tools is meaningful data. A single tool giving you 62% is much less so.

Frequently asked questions

Does "high perplexity" mean my writing is confusing?

No. In the context of AI detection, perplexity is a technical measurement about word predictability, not about clarity. High perplexity means your word choices were less statistically expected, which is a good thing from a detection standpoint. Clear, readable prose can still have high perplexity if you're making genuine word choices instead of defaulting to the most common phrase.

Can I increase my burstiness score by just adding short sentences randomly?

You can, but it tends to look forced. Detectors now include models trained on edited AI text, and "fake burstiness" (short sentences dropped in mechanically) doesn't always fool them. The better approach is to revise for content first. If a sentence is doing too much, split it. If a paragraph needs a quick transition, let it be quick. Burstiness that comes from genuine editing tends to look more authentic than burstiness you manufacture.

Why do some AI detectors give completely different scores for the same text?

Because they're measuring different things, or weighting them differently. One tool might focus heavily on perplexity using a GPT-2-based model. Another might weight syntactic patterns or paragraph-level features more heavily. There's no standard formula. This is why a single score from a single tool isn't worth much. Treating any detector result as a final answer is a mistake.

Does using AI to rewrite AI text fix the perplexity and burstiness problem?

Usually not. If you ask an AI to "make this sound more human," it tends to adjust word choice slightly but keeps the same rhythmic patterns. The underlying generation process produces similar sentence length distributions. You can sometimes get marginal improvements, but for consistent results, you need a human editor or a prompt specifically designed to target these signals (like the one at /humanizer-prompt).

Are there types of writing where low perplexity and low burstiness are expected and normal?

Yes. Legal contracts, technical specifications, and certain forms of academic writing are deliberately uniform and predictable. That's a feature, not a bug. Detectors calibrated on general web content often struggle with these genres. If your writing falls into one of these categories and gets flagged, the detector may simply be poorly suited to your text type, not evidence that your writing is AI-generated.