Blog / Research

Is Google punishing AI-written content? We checked 258 top-ranking pages.

Everyone has a number. Originality.ai says 17%. Ahrefs says 87%. So we ran our own test: 50 keywords, 258 ranking pages fetched and scored through GPTZero. Our number? 1.9% AI. That's not a typo — and it's not a contradiction. It's six tools measuring six different things and calling them all "AI content." We also synthesized five industry studies, pulled fresh SERP data via DataForSEO, and scraped Reddit for operator sentiment. The finding isn't a percentage. It's a measurement crisis.

· 12 min read · Original research

TL;DR

We pulled 50 Google SERPs via DataForSEO, fetched 258 ranking pages, and ran every one through GPTZero. 96.5% scored human. Only 1.9% flagged as AI. Zero AI content at positions 1 or 2. Then we compared against five industry studies covering 1.5M+ pages: Ahrefs says 5% fully AI, Originality.ai says 17%, and Ahrefs also says 87% contain some AI. Our GPTZero run produced a sixth number — and widened the gap. The disagreement between tools isn't noise. It's the finding. Detection tools can't agree on what "AI-generated" means, and the numbers range from 1.9% to 87% depending on who's measuring and how.

The number depends on what you're measuring

Here's the core problem: five credible research teams studied roughly the same question and got numbers that range from 4.6% to 86.5%. That's not noise. It's five different definitions of "AI-generated" applied with five different tools to five different samples.

Before we show you the data, you need to understand the three thresholds:

Threshold % of top results What it means
Fully AI-generated 4.6% No human editing detected. Pure model output published as-is.
Majority AI-generated 17.3% Detection tool flags the content as primarily written by AI (>50% AI confidence).
Contains any AI 86.5% Even one paragraph, one section, or light AI editing registers. Includes human-led, AI-assisted workflows.

The difference between "5% of Google is AI" and "87% of Google is AI" isn't a contradiction. They're measuring different things. But headlines don't carry footnotes, so the discourse is a mess.

Five studies, one table

We compiled every major published study from the last 18 months that measured AI content prevalence in Google's top results. Here they are, standardized:

Study Sample Tool Finding
Ahrefs (2025) 600K pages, 100K keywords bot_or_not 86.5% contain some AI; 4.6% fully AI; correlation with rank = 0.011
Originality.ai (ongoing) 500 keywords, top 20, bimonthly Originality.ai 17.31% AI-generated (Sep 2025); up from 2.27% in 2019
Semrush (Nov 2025) 42K blog pages, 20K keywords GPTZero Position #1 is 8x more likely human (80% human vs 9% AI)
SE Ranking / SEL (2025) 2,000 AI articles, 20 new domains N/A (controlled) 28% in top 100 at month 1; collapsed to 3% by month 4
Ahrefs (Apr 2025) 900K new pages bot_or_not 74.2% of new web content contains AI

Notice the tools: Ahrefs built their own (bot_or_not), Originality.ai uses their own product, Semrush used GPTZero. Each has different thresholds, different training data, different definitions of where "human-assisted" ends and "AI-generated" begins.

The Semrush study itself flags this: "AI detection tools are widely known to be inconsistent and can misclassify human and AI-written content, creating some possible fuzziness in these classifications."

The timeline: 2% to 17% in six years

Originality.ai has tracked AI content in Google's top-20 results bimonthly since February 2019. Their dataset is the only longitudinal tracker in the industry. Here's the trajectory:

Two things stand out. First: the March 2024 core update created a visible dip (8.48% to 7.43%), but the recovery was complete within months and continued climbing. Second: the all-time high of 19.56% was hit in July 2025 — before slightly pulling back. Google's algorithmic corrections slow the growth but don't reverse it.

Meanwhile, 74.2% of new web pages published in April 2025 contained AI content (Ahrefs). Production is running far ahead of what actually ranks. Google's filter exists — it's just not keeping up.

The position gradient: humans dominate the top spot

The Semrush study (42,000 blog pages) reveals something the aggregate percentages hide: AI content is not evenly distributed across positions 1-10. It clusters lower.

Position #1 is 8x more likely to be human-written. By position 4, AI content has nearly doubled its presence. From position 5 onward, the gap narrows — AI content "holds its own" in the middle and bottom of page one.

The implication: if you're competing for the top 1-3 spots, human editorial quality still provides a clear edge. If you're competing for positions 5-10, AI content is already your peer group.

Pure AI on new domains collapses within 3 months

SE Ranking and Search Engine Land ran the most controlled experiment in this space: 2,000 AI-generated articles published across 20 brand-new domains with zero backlinks, zero authority, zero human editing. The results are unambiguous.

Month 1: 28% of pages ranked in Google's top 100. Indexed quickly. Impressions flowed. Then the cliff. By month 4, only 3% remained. The remaining 13 months showed no recovery.

Glenn Gabe calls this pattern "Mount AI" — sites scale up with AI content, surge in rankings, then crash as Google's quality systems catch up. His analysis: "Mt. AI isn't a punishment for using AI. It's a punishment for using AI as a replacement for editorial judgment rather than a supplement to it."

The 16-month experiment confirms: Google's quality detection has a lag (roughly 3 months), but it eventually catches pure AI content that lacks authority signals. The pages that survived? They had unique data, interactive elements, or genuine utility that AI alone couldn't generate.

What we found: 50 SERPs, 258 pages, GPTZero

We didn't want to just synthesize other people's numbers. We wanted our own. So we pulled live SERPs via DataForSEO for 50 keywords across five categories (10 each): informational how-to, product listicles, YMYL health/finance, commercial/transactional, and AI-adjacent topics. That gave us 406 unique top-10 URLs.

We fetched the full body text from each URL, extracted the main content, and ran it through GPTZero's API — the same detector Semrush used in their 42,000-page study. Of the 406 URLs, 258 returned analyzable text (52 were skipped — Reddit, YouTube, PDFs — and 96 returned errors, mostly 403 blocks from sites like Fidelity and Scribbr).

The result:

96.5%

of Google's top 10 scored as human-written

258 pages scored via GPTZero · 50 keywords · 5 categories · May 2026

Only 5 pages out of 258 (1.9%) were classified as AI-generated. Another 4 (1.6%) scored as mixed. The remaining 249 pages — 96.5% — scored as human with high confidence.

By category

Category Pages scored AI Mixed Human Avg AI prob
Informational 55 1 (2%) 1 (2%) 53 (96%) 0.041
Listicles / Reviews 53 1 (2%) 0 52 (98%) 0.033
YMYL (health/finance) 57 2 (4%) 2 (4%) 53 (93%) 0.053
Commercial 48 0 (0%) 0 48 (100%) 0.037
AI-adjacent 45 1 (2%) 1 (2%) 43 (96%) 0.050
All categories 258 5 (1.9%) 4 (1.6%) 249 (96.5%) 0.043

Commercial pages — pricing pages, service comparisons, vendor landing pages — were 100% human according to GPTZero. Not a single flagged page. YMYL had the highest AI rate at 4%, but the flagged pages were a small regional bank and a credit union. Not content farms. Not SEO spam. Institutional content that was likely generated at scale.

By rank position

Zero AI content at positions 1 or 2. Not a single page ranking at the top two positions was flagged by GPTZero. The few AI-flagged pages appeared at positions 3-9 — the middle and bottom of page one. This aligns with Semrush's finding that position #1 is 8x more likely to be human, but our data is even more extreme: it's not 8:1 at position 1. It's infinity:0.

The five pages GPTZero flagged as AI

Domain Keyword Rank AI prob Category
fswb.bank how to improve credit score 5 0.996 Informational
digitalocean.com prompt engineering tips 3 0.728 AI-adjacent
navyfederal.org best retirement investments 9 0.668 YMYL
microsoft.com best CRM for small business 6 0.616 Listicles
hfcuvt.com should I refinance my mortgage 4 0.612 YMYL

Look at that list. A regional bank. DigitalOcean. Navy Federal Credit Union. Microsoft. A Vermont credit union. These aren't SEO content farms. They're established institutions publishing content at scale — likely using AI-assisted workflows for their help centers and resource libraries. The "AI content problem" in Google's top 10, as measured by GPTZero, is institutional content teams automating routine educational pages. It's not what anyone imagines when they say "the SERPs are overrun with AI slop."

The SERP landscape: Reddit is everywhere

Beyond the detection scores, the domain composition of our 50 SERPs tells its own story. Reddit appeared in 33 of 50 SERPs — often at position 1 or 2. YouTube appeared in 15. Together, these two platforms dominate page one in 2026.

YMYL results are almost exclusively institutional. Search "early signs of cancer" and you get cancer.org, Hopkins Medicine, Mayo Clinic, Cancer Research UK. No blogs. No AI farms. Google's quality filter is strictest where the stakes are highest.

Why 1.9% when Originality.ai says 17%

Our GPTZero result (1.9% AI) is dramatically lower than every other study. That's not a bug — it proves the thesis. The gap between 1.9% and 87% is the entire story of this article.

Why the divergence? Three reasons:

  1. Different tools, different thresholds. GPTZero, Originality.ai, and Ahrefs' bot_or_not use different models, different training data, and different confidence cutoffs. Where GPTZero calls something "human," Originality.ai might flag it as "AI-assisted." Neither is wrong — they're measuring different things.
  2. "Contains any AI" vs. "fully AI-generated." Ahrefs' 87% figure counts pages with any AI involvement — including human-led drafts that used AI for a paragraph or an outline. GPTZero's binary classification only flags pages that are predominantly machine-generated. These are different questions with legitimately different answers.
  3. False positive rates are real. GPTZero's false positive rate, like all AI detectors, ranges from 2-45% depending on the study. Detectors show documented bias against non-native English speakers. Our 1.9% could be 0% with the false positives removed. Or it could be 5% with false negatives added. The tool's confidence score is high, but the tool itself is a black box.

As Semrush noted in their own study (which also used GPTZero): "AI detection tools are widely known to be inconsistent and can misclassify human and AI-written content." We used the same tool they did. We got an even lower number. That tells you more about the tool's variance than about the actual state of Google. (We covered this in detail in our evidence-based review of detector accuracy in 2026.)

What operators think (and why they're wrong)

We scraped Reddit for threads about AI content in search results. The sentiment is clear — and clearly inflated relative to the data.

"The SERPs are literally being overrun by AI-generated oatmeal written by a robot that has never experienced a single human emotion."

— r/content_marketing, 138 upvotes

"It's gotten to the point where I notice ChatGPT's linguistic style EVERYWHERE."

— r/ChatGPT, 10,834 upvotes

The r/ChatGPT thread — with over 10,000 upvotes and 1,683 comments — captures something real: people are noticing AI writing patterns in content they encounter daily. The top reply (7,910 upvotes) is a satirical response written in ChatGPT's style, which itself became the most upvoted comment in the thread. (This pattern of operator discourse diverging from measured reality is something we've seen before — in our 446-post analysis of GEO/AEO sentiment, the most-upvoted takes were also the most hostile to the data.)

But the perception is running far ahead of the measurement. Our GPTZero scan found 1.9% AI. Even the highest credible estimate (Originality.ai) is 17%. Neither is "overrun." Operators encounter AI content disproportionately in the niches where they do research (informational how-to, product reviews, AI tools) and extrapolate that experience to all of search.

The perception gap exists because:

  1. Operators search informational queries (higher AI prevalence) not YMYL queries (near-zero AI)
  2. AI content is more noticeable than human content — it has telltale patterns that pattern-matchers spot
  3. The 10,834-upvote thread proves everyone is primed to see AI style now, creating confirmation bias
  4. Revenue drops from AI Overviews stealing clicks are real, but get conflated with "AI content ranking"

One operator on r/SEO put it precisely: "Google takes our content and serves it up in its AI Overview, clicks have now become a true mirage." (151 upvotes). The frustration is with Google's AI interface eating their traffic — not with AI content outranking them. But in the discourse, these become the same complaint.

What this actually means

After spending a week with this data, here's what the synthesis tells us:

1. "AI-generated" as a binary category is dead. When the answer ranges from 1.9% (our GPTZero scan) to 87% (Ahrefs, any AI involvement), the binary is useless. The real spectrum is editorial investment: how much human judgment, original research, unique expertise, and editing went into the final piece?

2. Google's filter works — with a lag. 74% of new pages published contain AI. Only 17% of top results do. The filter exists. It's just not instant. Pure AI content can rank for weeks or months before quality signals catch up. That's enough time for the "AI slop is ranking" perception to form, even though the long-term data shows it gets weeded out.

3. Detection tools are measuring their own thresholds, not reality. We now have six studies with answers ranging from 1.9% to 86.5% — a 45x gap. That's not a disagreement about the world. It's a disagreement about where to draw a line on a continuous spectrum. There is no ground truth for "is this page AI-generated?" when the page was drafted by GPT-4, rewritten by a human editor, fact-checked by a subject expert, and formatted by a CMS template. What percentage is "AI"?

4. Position #1 still rewards human editorial quality. This is the most actionable finding. Our data shows zero AI content at positions 1-2. Semrush found an 8:1 human advantage at #1. If you're competing for the top spot, invest in expertise, original research, and editorial voice. If you're competing for positions 5-10, you're already competing with AI content whether you know it or not.

5. The real risk isn't "Google detecting AI." It's "readers detecting laziness." Google doesn't penalize AI content. It penalizes low-quality content. But increasingly, readers themselves can tell when content lacks genuine expertise or editorial care — the 10,834-upvote Reddit thread proves the public is pattern-matching on AI style. The risk isn't algorithmic. It's reputational.

Methodology notes

Primary data collection. 50 keywords across 5 categories. SERPs pulled via DataForSEO Live Advanced endpoint (location_code 2840, US, English) on May 2, 2026. 406 unique organic URLs extracted (rank_group 1-10). Body text fetched via HTTP with 15-second timeout. Content extracted using HTML parsing (article/main element preference). Text truncated to 5,000 words max, minimum 50 words for GPTZero scoring. 258 pages successfully scored. 52 skipped (social media, video platforms, PDFs). 96 returned fetch errors (mostly 403 blocks from Cloudflare-protected sites like Fidelity, Scribbr, and Copyleaks).

Detection tool. GPTZero API v2 (predict/text endpoint). We chose GPTZero because Semrush used it for their 42,000-page study, making our results directly comparable. Each page scored for: predicted_class (ai/human/mixed), completely_generated_prob (0-1), and confidence_category (low/medium/high). 2-second rate limit between API calls. Total run time: ~20 minutes.

Known limitations. Our 258-page sample is smaller than the industry studies (Ahrefs: 600K, Semrush: 42K). The 96 fetch errors introduce survivorship bias — sites with aggressive bot protection (often enterprise sites) are underrepresented. GPTZero's false positive rate is 2-45% depending on the study, and it shows documented bias against non-native English. We used a single detection tool; running the same pages through Originality.ai or Ahrefs' bot_or_not would almost certainly produce different numbers — which is exactly the point.

One more thing

Here's what we kept coming back to throughout this research: we ran 258 pages through GPTZero and got 1.9%. Originality.ai runs their own sample and gets 17%. Ahrefs runs theirs and gets 87%. If six research teams using six different tools can't agree on whether a page is "AI-generated" — why would you trust a single detector to make that call about your content?

The answer, increasingly, is that you wouldn't. And that's the world we're building tools for. ToHuman exists because the line between AI and human writing has become genuinely blurry — and the tools claiming to police that line can't even agree with each other.

Published May 2, 2026 by the ToHuman team.

Back to blog