The user agent strings every SEO and site owner needs right now

A compact reference for identifying the major AI bots in production server logs has circulated on LinkedIn, bringing into one place the user agent strings that now determine who gets access to your content - and who gets blocked.

What the post actually shows

Aquib Israr, an independent SEO and GEO consultant, shared yesterday on LinkedIn a technical post listing the full user agent (UA) strings for six major web crawlers, formatted with operator URLs as they appear in production HTTP request headers. The list covers OpenAI's GPTBot, OAI-SearchBot, and ChatGPT-User; Anthropic's ClaudeBot and Claude-SearchBot; PerplexityBot; and Microsoft's Bingbot. Israr describes himself as operating across EdTech, FinTech, SaaS, and publishing verticals, with clients in MENA, Australia, and India.

The post is not an official announcement from any of these companies. It is a practitioner-compiled reference, the kind that circulates when a technical task - configuring robots.txt rules, writing server-side allow lists, auditing log files - requires precise string values that can otherwise take time to track down from multiple company documentation pages. That it attracted attention reflects where the industry is right now: site owners are making high-stakes decisions about AI crawler access, and those decisions depend on being able to correctly identify who is knocking.

The strings, according to the post, are as follows.

GPTBot (training): Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)

OAI-SearchBot: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; OAISearchBot/1.0; +https://openai.com/searchbot)

ChatGPT-User: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ChatGPT-User/1.0; +https://openai.com/bot)

ClaudeBot: Mozilla/5.0 (compatible; ClaudeBot/1.0; +claudebot@anthropic.com)

Claude-SearchBot: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Claude-SearchBot/1.0; +https://www.anthropic.com/claudesearchbot)

PerplexityBot: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)

Bingbot: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36

Why the format details matter

User agent strings look uniform at a glance but carry meaningful structural differences between operators. The GPTBot, OAI-SearchBot, and ChatGPT-User strings all follow a pattern built on the Mozilla/5.0 and AppleWebKit/537.36 base - a convention shared with most modern browsers - before declaring compatibility and their specific bot identifier. The AppleWebKit/537.36 reference is a legacy artifact; it does not mean these crawlers use the WebKit rendering engine. It exists because many server-side rules were historically written to allow requests with those tokens.

ClaudeBot departs from that convention noticeably. Its string does not include AppleWebKit or a Chrome-like rendering base. It declares compatibility directly: Mozilla/5.0 (compatible; ClaudeBot/1.0; +claudebot@anthropic.com). The operator contact is an email address rather than a URL - a different convention from OpenAI, which links to a dedicated documentation page for each bot. Claude-SearchBot, by contrast, does use the AppleWebKit base and links to the Anthropic page at https://www.anthropic.com/claudesearchbot.

Bingbot is the most elaborated string on the list. It appends a Chrome version identifier - Chrome/116.0.1938.76 Safari/537.36 - that does not appear in the OpenAI or Anthropic strings. This means server-side filters matching on "Chrome" in the user agent would catch Bingbot but not GPTBot, and filters matching on "bingbot/2.0" would isolate it from the AI model training crawlers entirely. These structural differences are what make a compiled reference useful; pattern-matching across crawlers in log analysis tools or web application firewalls requires knowing how the strings actually differ.

Three OpenAI bots, three different jobs

The presence of three distinct OpenAI entries on the list reflects a crawling infrastructure that has been documented in increasing detail over the past year. GPTBot, version 1.2 in the current string, is the training crawler. It collects content that may be used to build or refine OpenAI's generative models. OAI-SearchBot operates differently: it runs when ChatGPT performs a web search not directly triggered by a user, retrieving live content to ground responses. ChatGPT-User executes when a person explicitly asks ChatGPT to access or interact with a specific page.

The distinction matters operationally. When OpenAI revised its crawler documentation in December 2025, it removed robots.txt compliance language for ChatGPT-User, positioning it as a proxy for user browsing rather than an autonomous crawler - a framing that justified exempting it from robots.txt controls. A site that blocks OAI-SearchBot in its robots.txt may still receive ChatGPT-User requests without any recourse through the standard exclusion mechanism. GPTBot and OAI-SearchBot remain subject to robots.txt under current documentation. An analysis of more than 7 billion server log entries published in April 2026 found that OAI-SearchBot activity tripled following the GPT-5 launch in August 2025, registering a 3.5x increase, while GPTBot expanded 2.9x. ChatGPT-User events, conversely, dropped 28% from December 2025 through mid-March 2026.

Two Anthropic bots with very different reach

Anthropic clarified the roles of its three crawlers in February 2026. ClaudeBot, per that documentation, collects web content that could potentially contribute to AI model training. Claude-SearchBot navigates the web to improve search result quality. A third crawler, Claude-User, handles real-time user queries but does not appear in Israr's list, possibly because its string differs from the others in ways that make it less relevant to the specific use case of production testing and robots.txt configuration.

The structural difference between ClaudeBot and Claude-SearchBot in the strings is worth examining. ClaudeBot uses an email contact (+claudebot@anthropic.com) rather than a documentation URL. Claude-SearchBot uses a web URL (+https://www.anthropic.com/claudesearchbot). For site owners trying to verify crawler authenticity beyond the user agent string, the verification mechanisms differ accordingly. According to Anthropic's updated documentation, its bots respect robots.txt directives and the company will not attempt to bypass CAPTCHAs. That commitment has not been without controversy - Reddit's lawsuit against Anthropic, filed in June 2025, alleged that ClaudeBot-identified traffic accessed its platform more than 100,000 times after Anthropic publicly stated it had stopped crawling the site.

A practical limitation applies to both Anthropic strings. As documented by Cloudflare researchers and covered at PPC Land, no verification protocol existed for ClaudeBot at the time of writing, meaning the string can be spoofed without a cryptographic check. A server seeing a ClaudeBot user agent cannot confirm, using standard mechanisms, that the request actually originated from Anthropic's infrastructure. This is not unique to Anthropic - it applies broadly to the ecosystem - but it is a gap worth understanding when writing allow lists based solely on user agent strings.

Bingbot in a changing context

Bingbot's inclusion alongside AI training crawlers reflects a structural shift in how Microsoft's crawler is perceived. Bingbot is a traditional search crawler: it feeds the Bing search index and, by extension, Bing's AI-powered search features, including those now integrated with Copilot. Its user agent string, uniquely among those in the post, appends a specific Chrome version: Chrome/116.0.1938.76 and Safari/537.36. That Chrome version - 116 - was released in August 2023, which means the string has not been updated to reflect a more recent Chrome version in the period since. Whether this matters in practice depends on whether the server-side rule matching for it is version-sensitive or uses a broader substring match.

The robots.txt behavior for Bingbot is governed by standard web conventions: it reads and respects disallow rules. Unlike ChatGPT-User, Bingbot has not been documented as bypassing robots.txt for any category of request. This places it in a different category from the AI-specific crawlers, even if they share server log space and show up together in analytics dashboards.

The verification problem behind every string

A user agent string is a self-declaration. Nothing in the HTTP protocol requires a crawler to send an accurate one, and research documented by PPC Land in January 2026 showed that some AI agents were using spoofed user agent strings to bypass blocks, with a single query to xAI's Grok triggering 16 requests from 12 IP addresses impersonating human browsers. Data from DataDome found that 80% of AI agents do not declare themselves properly when visiting websites.

For this reason, practitioners using the strings in Israr's post for anything beyond log analysis or robots.txt configuration should pair them with IP range verification. Google maintains daily-updated JSON files containing its crawlers' IP ranges, documented by PPC Land when Google shifted from weekly to daily refresh cycles. OpenAI publishes a similar list. Anthropic's verification infrastructure has been less consistently documented, which is the gap that the Google-Agent crawler announcement in March 2026 also highlighted - Google was exploring a web-bot-auth protocol that would use cryptographic signatures to verify bot identity, a mechanism that does not yet have widespread adoption across the industry.

For most production configurations, the workflow is: match the user agent string to identify the crawler type, then confirm the originating IP against the published IP range file from the relevant operator. Relying on the string alone is sufficient for log analysis but insufficient for security-critical access decisions.

What the robots.txt consequences actually are

The strings in Israr's list are most immediately actionable for robots.txt configuration. A site that wants to block OpenAI's training crawler but remain visible in ChatGPT search results would disallow GPTBot while leaving OAI-SearchBot unrestricted. A site that wants to block all model training from both OpenAI and Anthropic would disallow both GPTBot and ClaudeBot. A site that wants to opt out of AI-powered search results across platforms would disallow OAI-SearchBot and Claude-SearchBot in addition.

Research published in early 2026 by Rutgers Business School and The Wharton School complicated the opt-out calculus: publishers who blocked AI crawlers via robots.txt saw total traffic fall by 23%, a finding that reflects the dependency between content blocking and AI-powered discovery. Blocking the training crawler and the search crawler are not equivalent choices. The training block removes content from model datasets; the search block removes the site from AI-generated answers. The distinction matters for publishers who care about both data rights and audience reach.

A study by ProGEO.ai published March 31, 2026, scanning all 500 companies on the Fortune 500 list, found that among the 55 companies that had named any AI user agent in their robots.txt, the most frequently targeted was GPTBot, with 32 total directives - and those directives leaned toward restriction. Directives aimed at search crawlers, by contrast, skewed toward allowing access. That pattern reflects the same logic visible in the string list: different bots, different consequences, different policies.

Scale of what these strings are tracking

The volume of requests these user agent strings appear in has reached levels that are practically significant for server infrastructure. Kinsta's analysis of more than 10 billion requests across its managed WordPress hosting infrastructure, published in June 2026, found a ClaudeBot-identified crawler hitting add-to-cart URLs 3.75 million times in a single 24-hour period - roughly one request every 23 milliseconds, sustained around the clock. A separate misbehaving crawler generated 550 million requests in a single calendar month. These are not background noise. They are infrastructure events, and identifying which user agent string generated them is the first step in any response.

Cloudflare's data for the week ending June 5, 2026, showed bots accounting for 57.4% of all HTML web traffic. HUMAN Security's May 2026 data found that the rate at which sites block agentic traffic climbed to nearly 9% - up from 8.2% the previous month - even as total agentic traffic volume dipped 4.3% month over month. Against that backdrop, a reference list of user agent strings is not merely a convenience for SEO practitioners. It is a precondition for any informed access policy.

Microsoft Clarity introduced a violations detection feature on June 23, 2026, giving publishers visibility into which bots are ignoring robots.txt directives and targeting which specific content paths. The feature requires CDN integration to function, but once connected, it shows violations as a percentage of total bot requests - not just a raw count - making it possible to distinguish between compliant crawl traffic and active policy violations. That kind of per-bot, per-path visibility depends on the same underlying data as the strings in Israr's post: the user agent values the bots send when they arrive.

Why this matters for marketing teams

For advertising and marketing professionals, the stakes of AI crawler identification extend beyond content access. As documented by PPC Land, OpenAI's bots - including ChatGPT-User, OAI-SearchBot, GPTBot, and ChatGPT Agent - accounted for approximately 69% of all observed AI-driven traffic by volume in 2026 data from HUMAN Security. Anthropic identities, including ClaudeBot and Claude-SearchBot, made up roughly 11%. The choices a site makes about which bots to allow shape directly which AI platforms can surface that site's content in responses.

Google-Agent, added to Google's official crawler documentation on March 20, 2026, formalized a new category: user-triggered fetchers that execute browsing tasks on behalf of users. Unlike autonomous crawlers, user-triggered fetchers bypass robots.txt, which means a site that has blocked Google's training and search crawlers can still receive requests from Google-Agent when a user with AI browsing enabled visits. That distinction - autonomous crawler versus user-triggered fetcher - does not appear explicitly in the user agent strings Israr listed, but it is the context in which those strings are increasingly deployed.

For a marketing team managing content strategy in 2026, knowing that OAI-SearchBot/1.0 and Claude-SearchBot/1.0 are the identifiers that determine AI search visibility, while GPTBot/1.2 and ClaudeBot/1.0 are the ones tied to training datasets, is foundational operational knowledge. The user agent strings are the entry point to every other decision in that stack.

Timeline

August 2023 - OpenAI launches GPTBot; 5% of top 1,000 websites block it at launch
August 2024 - Blocking of GPTBot reaches 35.7% of top 1,000 websites, a seven-fold increase
September 2024 - Google revamps its public crawler and fetcher documentation
November 2025 - Google launches dedicated crawling infrastructure documentation site; detailed by PPC Land
December 2025 - OpenAI revises ChatGPT crawler documentation, removing robots.txt compliance language for ChatGPT-User; covered by PPC Land
December 2025 - Cloudflare year-in-review data shows AI bots at 4.2% of all HTML requests globally
January 6, 2026 - Research documents AI agents using spoofed user agents to bypass site defenses; covered by PPC Land
February 20, 2026 - Anthropic updates crawler documentation; PPC Land covers the change on February 25
March 12, 2026 - Google engineers explain the SaaS architecture behind Googlebot; PPC Land coverage
March 20, 2026 - Google-Agent added to official crawler documentation; PPC Land covers the addition
March 31, 2026 - ProGEO.ai study finds only 7.4% of Fortune 500 companies have llms.txt; PPC Land coverage
April 21, 2026 - OAI-AdsBot documented in OpenAI's official crawler documentation; PPC Land coverage
April 24, 2026 - Analysis of 7 billion OpenAI log events shows 3.5x OAI-SearchBot surge after GPT-5; PPC Land coverage
June 2026 - Kinsta data finds ClaudeBot-identified crawler hitting WordPress cart pages 3.75 million times in 24 hours; PPC Land coverage
June 4, 2026 - HUMAN Security May 2026 data shows site blocking of agentic traffic climbing to nearly 9%; PPC Land coverage
June 23, 2026 - Microsoft Clarity launches robots.txt violations detection in its Bot Analytics dashboard; PPC Land coverage
June 28, 2026 - Aquib Israr publishes full production user agent strings for six major AI and search crawlers on LinkedIn

Summary

Who: Aquib Israr, an independent SEO and GEO consultant with over six years of experience across EdTech, FinTech, SaaS, and publishing, serving clients in MENA, Australia, and India. The reference is relevant to webmasters, SEO practitioners, developers, and marketing professionals managing content access policies for AI crawlers.

What: A LinkedIn post listing the complete user agent strings for seven crawlers - GPTBot/1.2, OAI-SearchBot/1.0, ChatGPT-User/1.0, ClaudeBot/1.0, Claude-SearchBot/1.0, PerplexityBot/1.0, and bingbot/2.0 - formatted with their operator documentation URLs as they appear in production HTTP request headers.

When: The post was published today, June 28, 2026.

Where: Published on LinkedIn. The strings are relevant to server log analysis, robots.txt configuration, web application firewall rules, and bot analytics dashboards used globally by site operators.

Why: User agent strings are the primary mechanism by which web crawlers identify themselves to servers, and they are the first data point in any decision to allow or block a crawler. With AI bot traffic accounting for a growing share of web requests - Cloudflare recorded bots at 57.4% of HTML traffic in early June 2026 - and with different bots carrying different consequences for content training, AI search visibility, and server load, accurate identification is a prerequisite for informed access policy. The reference compiles strings from multiple company documentation sources into a single practitioner-ready format.