Anthropic clarifies what its three web crawlers do

Anthropic this week updated its official support documentation to provide a clearer breakdown of the three web crawlers it operates, explaining the distinct purpose of each bot and spelling out the consequences for site owners who choose to block them. The update, first spotted on February 20 by Pedro Dias, who shared the observation on X, adds detail that had previously been absent from the company's public materials.

The timing matters. Publishers and webmasters have been wrestling with the question of AI crawlers for well over a year, and the documentation update arrives as scrutiny of how AI companies collect web data has reached a new level of intensity. Until now, Anthropic's guidance on this topic was comparatively thin. The revised page draws a clearer line between what each bot is for - and what a site owner actually gives up by blocking it.

Three bots, three functions

According to the updated documentation, Anthropic operates three distinct crawlers. Each carries a separate user-agent string, which in principle allows site owners to grant or deny access on a granular basis.

ClaudeBot is described as the crawler that collects web content which could potentially contribute to AI model training. According to Anthropic's documentation, "ClaudeBot helps enhance the utility and safety of our generative AI models by collecting web content that could potentially contribute to their training." When a site restricts ClaudeBot access, it signals to Anthropic that the site's future materials should be excluded from AI model training datasets. This is the bot at the center of most public debate, given the scrutiny Anthropic has faced over its crawling practices and the controversy over extreme crawl-to-referral ratios documented by Cloudflare.

Claude-User serves a different role. According to the documentation, "Claude-User supports Claude AI users. When individuals ask questions to Claude, it may access websites using a Claude-User agent." Blocking this bot prevents Anthropic's system from retrieving a site's content in response to a user query. The documentation notes that doing so "may reduce your site's visibility for user-directed web search" - a significant practical implication for publishers who care about appearing in AI-driven answers.

Claude-SearchBot, the third crawler, is focused on search result quality. According to Anthropic, "Claude-SearchBot navigates the web to improve search result quality for users. It analyzes online content specifically to enhance the relevance and accuracy of search responses." Blocking it prevents the system from indexing the site for search optimization, with the documentation warning this may reduce visibility and accuracy in user search results.

The separation of these three functions into distinct bots marks a meaningful step toward the kind of transparency that publishers and regulators have been asking for. Cloudflare's research has argued that Google's single-purpose Googlebot approach is less transparent by comparison, a point Cloudflare raised directly with the UK's Competition and Markets Authority.

How to control access

Anthropic's documentation outlines two main mechanisms for controlling crawler behavior. Both rely on robots.txt, the three-decade-old web standard that became a formal internet standard as RFC9309 in 2022.

To limit crawl frequency, Anthropic supports the non-standard Crawl-delay extension. The documentation gives the following example:

User-agent: ClaudeBot
Crawl-delay: 1

To block a bot entirely from a website, site owners must add the following to the robots.txt file in their top-level directory, and repeat the process for every subdomain they wish to exclude:

User-agent: ClaudeBot
Disallow: /

The documentation is explicit on one technical limitation: blocking Anthropic's bots by IP address will not reliably work. According to the company, "Alternate methods like blocking IP address(es) from which Anthropic Bots operates may not work correctly or persistently guarantee an opt-out, as doing so impedes our ability to read your robots.txt file." Anthropic also notes that it does not currently publish IP ranges, as it uses service provider public IPs, though this may change in the future.

For reports of malfunctioning bots, Anthropic directs site owners to [email protected], requesting that any contact come from an email address associated with the relevant domain.

Principles the company says it follows

The documentation also lays out a set of stated principles for Anthropic's data collection practices. According to the company, "our collection of data should be transparent," and its crawling "should not be intrusive or disruptive." Anthropic says it aims for minimal disruption by being "thoughtful about how quickly we crawl the same domains and respecting Crawl-delay where appropriate."

Two additional commitments are stated directly: Anthropic's bots respect "do not crawl" signals by honoring industry standard directives in robots.txt, and they respect anti-circumvention technologies, specifically noting that the company "will not attempt to bypass CAPTCHAs for the sites we crawl."

These commitments are notable given the context. Cloudflare data published in late 2025 showed that Anthropic maintained the highest crawl-to-referral imbalance among major AI platforms - 38,000 pages crawled for every single page visit referred back to publishers in July 2025, down from a ratio of 286,930 crawls per referral in January of that year. Whether documented commitments translate into actual crawler behavior has been a point of contention; Reddit's lawsuit against Anthropic, filed on June 4, 2025, alleged that Anthropic continued accessing its platform more than 100,000 times after publicly claiming it had stopped.

The absence of a verification protocol for ClaudeBot - a gap documented by Cloudflare researchers - means site owners currently have no reliable way to confirm whether traffic claiming to be ClaudeBot is genuine. Anthropic has not addressed this specific issue in the updated documentation.

Why this matters for publishers and marketers

The decision to block or allow any one of these three bots now carries specific, named trade-offs - which is more than Anthropic's previous documentation offered. For marketing professionals and site owners, the distinction between ClaudeBot, Claude-User, and Claude-SearchBot has direct operational implications.

Blocking ClaudeBot addresses the training data concern but leaves the site accessible to Claude's real-time query system through Claude-User. Blocking Claude-User means the site will not surface in responses when Claude users ask questions - the digital equivalent of opting out of a search index. Blocking Claude-SearchBot similarly reduces the chances of appearing in Anthropic's search-optimised results.

This layered structure mirrors, to a degree, what OpenAI introduced when it updated its own crawler documentation in December 2025, separating GPTBot, OAI-SearchBot, and ChatGPT-User. Both companies appear to be moving toward a model where granular control is at least technically possible, even if the practical consequences of each choice remain weighted against publishers.

Research published in January 2026 added a cautionary dimension to that calculus. A study by Rutgers Business School and The Wharton School found that publishers who blocked AI crawlers via robots.txt saw total traffic fall by 23% - a finding that complicates simple decisions to opt out. The trade-off between protecting content from AI training and maintaining visibility in AI-powered discovery channels is not straightforward, and the Anthropic documentation does not resolve it.

Meanwhile, the broader infrastructure for managing AI crawlers is developing quickly. Cloudflare's Robotcop tool, launched in December 2024, moved beyond voluntary compliance by enforcing robots.txt directives at the network level. The pay-per-crawl service Cloudflare launched in July 2025 introduced a third path - neither blocking nor free access, but monetized access - as an option for publishers who want to participate in AI ecosystems without giving away content for nothing.

Anthropic's documentation update does not engage with these commercial frameworks directly. It frames the robots.txt mechanism as the primary tool available to site owners, consistent with industry norms. Over 35% of the world's top websites already block major AI crawlers, and the number continues to grow. For those who have not yet made a decision, Anthropic's clearer documentation provides at least a more informed starting point.

Timeline

1994: robots.txt introduced as an informal web standard for controlling crawler access
2022: robots.txt formally standardized as RFC9309
June 2024: Cloudflare introduces a feature to block AI scrapers and crawlers
July 3, 2024: Cloudflare reveals AI bots accessed approximately 39% of top one million internet properties
August 3, 2024: Over 35% of top 1,000 websites block OpenAI's GPTBot, a seven-fold increase from 2023
December 12, 2024: Cloudflare launches Robotcop to enforce robots.txt policies at network level
March 28, 2025: Google outlines how the robots.txt protocol could evolve
June 4, 2025: Reddit files lawsuit against Anthropic over unauthorized Claude AI training
July 1-2, 2025: Cloudflare launches pay-per-crawl service for content creators
August 28-29, 2025: Cloudflare expands AI Crawl Control with customizable HTTP 402 responses
September 1, 2025: Cloudflare data confirms Anthropic's crawl-to-referral ratio at 38,000:1 in July 2025
December 9, 2025: OpenAI revises ChatGPT crawler documentation, separating training and search bots
December 20, 2025: Cloudflare 2025 review: AI bots now 4.2% of web traffic as internet grows 19%
January 1, 2026: Study finds publishers who blocked AI crawlers lost 23% of traffic
January 6, 2026: Research documents AI agents masquerading as humans to bypass website defenses
January 30, 2026: Cloudflare data shows Google's crawler accesses 3x more web content than competing AI crawlers
February 20, 2026: Anthropic updates crawler documentation; Pedro Dias spots and shares the change on X
February 25, 2026: Barry Schwartz reports on the update at Search Engine Roundtable

Summary

Who: Anthropic, the AI company behind the Claude chatbot, updated its official support documentation. The change was spotted by Pedro Dias and reported by Barry Schwartz at Search Engine Roundtable on February 25, 2026.

What: Anthropic clarified the roles of its three web crawlers - ClaudeBot (model training), Claude-User (real-time user queries), and Claude-SearchBot (search result quality) - and explained the specific consequences of blocking each one through robots.txt. The documentation also states that IP-based blocking is unreliable and reaffirms that Anthropic respects robots.txt directives and will not attempt to bypass CAPTCHAs.

When: The documentation was updated around February 20, 2026, and publicly flagged on February 25, 2026.

Where: The update appears on Anthropic's official support site. The documentation is directed at website owners globally who manage how automated systems access their content.

Why: The update comes as publishers, regulators, and researchers have placed AI crawler practices under intensifying scrutiny. Growing legal disputes, third-party data showing extreme crawl-to-referral imbalances, and the development of enforcement tools like Cloudflare's Robotcop have created pressure on AI companies to be more transparent about how their bots operate and what site owners can actually do to control access.