Blocking AI crawlers backfired: news publishers lost 23% of traffic

Large language models reshaped news consumption patterns starting in August 2024, but publishers who blocked AI bots experienced worse outcomes than those who didn't, according to research published December 31, 2025, by Rutgers Business School and The Wharton School.

Publisher traffic decline shown on analytics dashboard tracking visitor losses from AI summaries
Publisher traffic decline shown on analytics dashboard tracking visitor losses from AI summaries

Hangcheng Zhao from Rutgers Business School and Ron Berman from The Wharton School combined daily traffic data, human web-browsing records, robots.txt policies, job postings, and webpage content to document four effects related to predicted shifts in news publishing following the introduction of generative AI. Their working paper analyzed patterns from October 2022 through June 2025, revealing outcomes that contradict conventional assumptions about how publishers should respond to AI-powered platforms.

Traffic declined only after August 2024

Publisher traffic remained broadly stable through mid-2023 despite widespread predictions of immediate collapse following ChatGPT's November 2022 launch. Multiple change-point detection procedures identified persistent breaks in traffic patterns, most prominently in November 2023 and August 2024.

The research employed synthetic difference-in-differences analysis using log traffic in six-month windows before and after detected breakpoints, with the top 100 retail websites serving as the control group. Following the August 2024 break, traffic to news publishing websites decreased approximately 13.2% relative to retail sites, according to the study. Point estimates for the November 2023 break proved negative but statistically insignificant.

"The traffic decline wasn't immediate. Publisher traffic remained broadly stable through mid-2023; a Synthetic Difference-in-Differences analysis shows a decline in traffic that started after August 2024 (−13%)," according to the LinkedIn post from Zhao announcing the findings.

This timeline challenges narratives suggesting ChatGPT or Google AI Overview triggered instant traffic collapses for publishers. The delayed impact indicates more complex mechanisms than simple user substitution from traditional search to AI interfaces.

Blocking AI bots reduced both total and human traffic

Approximately 80% of top news publishers block LLM access using the robots.txt file standard, with staggered adoption starting mid-2023. The researchers used staggered difference-in-differences analysis comparing blocking publishers to not-yet-blocking publishers to estimate effects.

SimilarWeb traffic data showed blocking LLM crawlers led to persistent post-blocking decline of 23.1% in log-monthly visits. Comscore Web-Behavior panel data, which tracks actual human browsing rather than total traffic that may include bots, showed post-blocking decline in monthly publisher visits of 13.9%.

"Blocking GenAI crawlers may backfire. About 80% of the top news publishers block LLM access using the robots.txt file standard. A staggered Difference-in-Differences analysis of the effect of blocking shows that traffic declined after blocking in both SimilarWeb traffic data (−23%) and Comscore data (−14%)," the study found.

These results imply blocking LLMs can have significant negative effects on large publishers by reducing both total traffic and human traffic, not merely removing bot visits mechanically. However, results proved heterogeneous when extending analysis to lower-traffic websites. Publishers averaging more than 10 daily visits in the Comscore panel experienced negative effects consistent with primary findings, while mid-sized publishers averaging 1-10 visits per day saw positive effects from blocking.

The differential impacts by publisher size suggest blocking strategies should vary based on traffic volume and audience composition. Large publishers appear to benefit from AI referrals despite concerns about content appropriation, while smaller publishers may gain protective benefits from restricting access.

Major news publishers already block AI crawlers at rates far exceeding other industries, with outlets including The New York Times, The Guardian, CNN, Reuters, The Washington Post, and Bloomberg implementing restrictions against multiple OpenAI crawlers. The pattern reflects journalism industry concerns about AI systems potentially replicating their content without compensation.

Advertise on ppc land

Buy ads on PPC Land. PPC Land has standard and native ad formats via major DSPs and ad platforms like Google Ads. Via an auction CPM, you can reach industry professionals.

Learn more

No newsroom job replacement yet

Editorial/content roles did not exhibit post-GenAI collapse despite predictions that AI tools would reduce demand for newsroom employees. Job posting data from Revelio Labs showed the share of new editorial and content-production job listings increased over time rather than decreased.

"No 'newsroom job replacement' yet. Data from job postings show that editorial/content roles do not exhibit a post-GenAI collapse; replacement or reduction of these jobs in the near term appears limited," according to the research findings.

The monthly number of producer-role postings fluctuated and exhibited secular post-COVID decline in overall hiring, but researchers observed no discrete collapse in producer-role postings coinciding with GenAI expansion. The share of producer/editorial postings relative to total postings among newspaper publishers did not fall in the post-GenAI period and increased in several periods.

Two-way fixed effects difference-in-differences modeling found the treatment effect on editorial hiring after November 2022 proved positive rather than negative, indicating no disproportionate contraction in editorial roles relative to non-editorial hiring during the sample period.

This evidence aligns with broader research showing measured labor-market impacts from LLM adoption have been modest in the near term. Publishers appear not to respond primarily by reducing newsroom headcount despite widespread availability of AI writing tools.

Publishers shifted to richer content and more ads

News publishers did not scale up textual production following LLM introduction. Instead, they significantly increased rich content and advertising technologies relative to retail websites used as controls.

The number of interactive elements rose 68.1% and ads/targeting tech increased 50.1%, while article volume declined relative to retail sites. The study found no evidence that large publishers increased text volume; instead, they "richened" pages with additional multimedia components.

"Publishers aren't scaling text volume; instead, they're 'richening' pages. The number of interactive elements rises (+68%) and ads/targeting tech increases (+50%), while article volume declines relative to top retail websites used as controls," the researchers documented.

HTTP Archive page-structure metrics showed publishers shifted toward richer pages with more interactive components and greater use of advertising and targeting technologies. Site framework and layout elements increased approximately 70.2%, while article volume decreased 31.2% in post-November 2022 periods compared to retail sites.

Internet Archive URL histories confirmed patterns, showing growth concentrated in image-related URLs rather than text/article URLs. The number of unique image URLs increased substantially while text-based content URLs showed moderate decreases.

This pattern proves consistent with research linking monetization design to content incentives and evidence that multimedia and interactivity shape user engagement. Publishers appear to respond by enriching existing content with additional media and embedded components rather than scaling text output, potentially adapting to changing user preferences and traffic patterns.

Publishers already face substantial pressure from AI-powered discovery, with Google Discover now showing 51% AI Summaries in test markets according to December 2025 Marfeel data. The shift toward richer multimedia content may represent publishers' attempts to differentiate from text-focused AI summaries.

Complex responses to multifaceted pressures

The research provides early evidence that generative AI does not currently function as direct substitute for traditional news production. Rather, the industry appears adjusting along multiple domains including access control, hiring composition, and content production format and quantity.

"Some interesting initial findings show that the impact of LLMs is nuanced and sometimes surprising. For example, blocking LLM access using robots.txt may reduce both bot access and real audience demand," Zhao noted in the LinkedIn announcement.

The impact of strategic adjustments sometimes produces unforeseen outcomes. Publishers blocking AI crawlers expected to protect content from unauthorized training and use, but instead experienced traffic declines suggesting LLMs may drive discovery and referrals that benefit publishers despite content appropriation concerns.

Technical limitations compound strategic challenges. The robots.txt file standard operates as voluntary protocol rather than enforceable access control. Crawlers can choose to ignore directives without consequences, making the mechanism unreliable for publishers seeking to protect content.

Research from Kim et al. (2025) shows compliance falls with stricter robots.txt directives and some bot categories including AI-related crawlers rarely check robots.txt files. This imperfect compliance means even publishers actively blocking may still have content accessed by non-compliant crawlers.

The combination of automated systems and selective commercial partnerships creates asymmetric power relationships where publishers have limited control over traffic sources while facing unpredictable revenue impacts. Google announced commercial partnerships with select major publishers on December 10, offering financial arrangements to offset potential traffic impacts from AI features, but these partnerships exclude smaller publishers facing similar or greater traffic challenges.

Publishers lose significant organic traffic when AI Overviews appear in search results. Research from Ahrefs examining 300,000 keywords found AI-generated summaries reduce organic clicks by 34.5% when present, compounding the challenges documented in the Wharton study.

The European Commission launched formal antitrust investigation on December 9, 2024, examining whether Google violated EU competition rules by using publisher content for AI purposes without appropriate compensation. Independent publishers filed complaints alleging significant harm including traffic, readership and revenue loss.

Publishers must navigate competing pressures: protecting intellectual property from unauthorized AI training while maintaining visibility in AI-powered search and discovery systems. Content creators cannot opt out from AI training and content crawling without losing visibility in general search results, creating forced participation in systems that may undermine their business models.

Methodological approach and data sources

The researchers constructed high-frequency publisher panel by combining multiple data sources. SimilarWeb provided daily domain-level estimates of total visits for desktop and mobile from October 17, 2022, through July 1, 2025. Comscore Web-Behavior Panel recorded desktop browsing behavior for large sample of U.S. households during 2022-2024.

HTTP Archive supplied robots.txt rules and page-level HTML metadata tracking how the web is built and changes over time. Internet Archive's Wayback Machine provided annual counts of unique URLs observed for each domain as proxy for published content volume.

Revelio Labs job posting data via WRDS enabled analysis of hiring patterns, providing employer identifiers, job titles, occupation codes, locations and posting dates. The researchers constructed publisher-level monthly counts of new job postings by occupation category.

The sample started from domains with employer-linked job postings in Revelio appearing in at least one traffic source. Domains matched across sources using Revelio's firm-URL mapping at domain level. The final dataset included 30 URLs classified under NAICS 513110 as newspaper publishers after manual review.

For Comscore-based measures, researchers restricted sample to active panelists with at least four browsing sessions in each month of calendar year, aggregating visits to domain-month level. Coverage expanded to top 500 news-publisher domains with highest Comscore traffic for analyses relying on panel data.

Limitations and future research directions

Several limitations qualify interpretation. Traffic measures combine modeled aggregates from SimilarWeb with U.S. desktop panel from Comscore. Neither directly observes LLM-mediated consumption paths including in-chat answers, referrals or citation links.

Robots.txt measures stated restrictions rather than enforceable access control. Blocking decisions may coincide with other time-varying publisher actions including paywall changes, SEO strategy, site redesigns or platform shocks that could confound causal estimates.

The study covers early phase of technology before newer interfaces and integration including more prominent AI summaries and search-chat products may fully diffuse. Patterns documented may intensify, attenuate or shift toward new adjustments as AI capabilities and adoption change.

Future work can sharpen mechanisms by incorporating direct measures of LLM discovery and referral, richer firm-level data on AI access/licensing negotiations and enforcement. Continued monitoring will help determine whether patterns intensify, attenuate or shift toward new adjustments as AI capabilities and adoption evolve.

The researchers note their findings provide early evidence of some unforeseen impacts of LLM introduction on news production and consumption. The industry faces ongoing adaptation as technology platforms, regulatory frameworks and competitive dynamics continue developing.

Timeline

  • November 2022: ChatGPT launched, triggering predictions of immediate publisher traffic collapse
  • Mid-2023Publishers begin blocking LLM access using robots.txt in staggered pattern, with approximately 80% of top publishers eventually implementing blocks
  • November 2023: First statistically detected change point in publisher traffic patterns, though decline not significant relative to retail controls
  • May 2024: Google AI Overview launched, expanding to more than 100 countries by mid-year
  • August 2024: Persistent traffic decline emerges for news publishers, with 13.2% decrease relative to retail websites
  • September 2024Cloudflare launches AI Audit tools giving publishers detailed analytics on AI bot activity
  • December 9, 2024: European Commission launches formal antitrust investigation into Google's AI content practices
  • December 10, 2024Google announces commercial partnerships with select major publishers offering financial arrangements to offset AI feature impacts
  • December 11, 2024Google unleashes third core algorithm update, triggering severe ranking volatility and Discover traffic collapse
  • December 2024Google Discover transformed into AI platform, with 51% of feed positions occupied by AI Summaries in test markets
  • December 31, 2025: Hangcheng Zhao and Ron Berman publish working paper documenting four major effects of LLMs on news publishing

Summary

Who: Researchers Hangcheng Zhao from Rutgers Business School and Ron Berman from The Wharton School analyzed data affecting news publishers, particularly the 30 major newspaper websites including outlets like CNN, The New York Times, BBC, The Guardian, Fox News, and Daily Mail.

What: The study documented four major findings: (1) publisher traffic declined 13.2% starting only in August 2024 rather than immediately after ChatGPT launch, (2) blocking AI crawlers via robots.txt reduced both total traffic by 23% and human traffic by 14% for large publishers, (3) editorial/content job postings did not decline and instead increased as share of total postings, and (4) publishers shifted from text production to richer multimedia content with 68% more interactive elements and 50% more advertising technologies.

When: The research analyzed patterns from October 2022 through June 2025, with key inflection points in November 2023 and August 2024. The working paper was published December 31, 2025.

Where: The study examined global publisher traffic patterns using SimilarWeb data and U.S. desktop browsing through Comscore panel data. Effects manifested across news publishing websites competing with AI-powered search and discovery features from platforms including Google, OpenAI, Anthropic, and Perplexity.

Why: The matter proves significant for the marketing community because it reveals counterintuitive outcomes from publisher strategies responding to AI threats. Blocking AI crawlers—intended to protect content—actually harmed traffic more than allowing access. Publishers face forced participation in systems that may undermine their business models while having limited control over distribution and monetization. The findings challenge assumptions about optimal publisher responses to AI-powered platforms and highlight asymmetric power relationships between technology platforms and content creators.