Originality.ai last month published its June 2026 update to a year-long tracking study of llms.txt, llms-full.txt, and ai.txt adoption, finding that the number of sites implementing these files has grown sharply while evidence that AI systems are reading them remains close to zero.
The study, which monitored more than 3 million websites, recorded 4,088 llms.txt instances in June 2025. By May 2026, that count had reached 36,120 - an 8.8x increase across twelve months. Including the two companion formats, llms-full.txt and ai.txt, the total number of adopting sites reached 38,980 according to Originality.ai.
The newer formats grew even faster from much smaller bases. According to the study, llms-full.txt expanded 107.1x - from 23 sites to 2,463 - over the same period. ai.txt grew 99.3x, from 4 sites to 397. By count, both remain a fraction of the llms.txt footprint: llms.txt accounts for 92.7% of the distribution, llms-full.txt for 6.3%, and ai.txt for 1%.
The request gap
Adoption growth, however, does not translate into use. A separate study by Ahrefs, published in June 2026 and drawing on server-log data from 137,000 domains, found that 97% of llms.txt files received zero requests in May 2026. The vast majority of these files were never read.
Even where AI retrieval bots - those that answer user queries inside AI search products - were active, they accounted for just 1.1% of requests to llms.txt files, according to Ahrefs. The training crawlers fared slightly better in aggregate but still showed low individual figures. GPTBot accounted for 4.51% of requests to these files, ClaudeBot for 0.80%, and DeepseekBot for 0.02%.
What were the top requesters? According to the Ahrefs data cited by Originality.ai, SEO audit tools led with 21.7% of requests, followed by unidentified bots at 14.9%, and general web crawlers at 13.1%. The AI assistants that llms.txt was designed to serve were largely absent from the logs.
What the files are meant to do
llms.txt was proposed by data scientist Jeremy Howard in 2024. It is a plain-text markdown file placed at the root of a website - at the path /llms.txt - that gives AI models a structured summary of a site's purpose, its most important pages, and notes on how to prioritise content. Howard's rationale was practical: large language models have limited context windows and struggle to process most websites in full. A curated index placed at one accessible path lowers the cost for AI systems to understand a site's structure.
llms-full.txt extends this by providing a flattened, full-text version of an entire site's content in a single markdown file. Unlike llms.txt, which links outward to key pages, llms-full.txt contains the actual text of every important section in one document. Because of its size, it is better suited to smaller sites, documentation sets, or self-contained knowledge bases where the entire content can fit inside an AI model's context window.
ai.txt, promoted by the company Spawning, takes a different approach. Rather than guiding AI models on how to interpret content, ai.txt issues directives on what crawlers can and cannot do with it - specifying file-type restrictions, terms for AI training, and contact details for licensing questions. According to Originality.ai, Spawning states that AI bots using its API will respect ai.txt, but most do not yet.
Why robots.txt does not solve the problem
Part of the rationale for these new files is that the existing standard - robots.txt - was not designed for AI. According to Originality.ai, robots.txt cannot distinguish between a crawler indexing a page for search and one collecting training data for a language model. It also cannot specify licensing terms or contact information, and it treats all crawlers the same regardless of their purpose.
More fundamentally, compliance with robots.txt is voluntary. AI bots can ignore it, spoof their user-agent strings to avoid detection, or - as some research has documented - declare themselves as human browsers rather than automated systems. A study cited by PPC Land in February 2026 found that Anthropic crawled 38,000 pages for every single page visit it referred back to publishers - a ratio that illustrates the asymmetry between crawl access and any benefit to site owners.
Blocking AI crawlers through robots.txt carries its own risks. Research published in April 2026 by Rutgers Business School and The Wharton School found that news publishers who blocked AI crawlers via robots.txt lost roughly 7% of weekly website traffic within six weeks - a decline that appeared in human browsing data, not just bot metrics.
Conflicting signals from the platforms
The June 2026 Originality.ai update dedicates significant attention to what it calls inconsistency from the major AI platforms. The pattern is consistent across all three: each company advises site owners to rely on robots.txt for crawler management, yet each also publishes its own llms.txt file for its developer documentation.
According to Google's published guidance, llms.txt "won't negatively or positively impact your visibility or rankings" in Search - a position PPC Land covered in detail when Google published its AI search guide on May 15, 2026. At the same time, Chrome Lighthouse added an optional agentic browsing audit for llms.txt, as PPC Land reported in May 2026. The Lighthouse audit flags server errors if a site has attempted an llms.txt implementation but the file returns an error; it marks the check as "Not Applicable" if no file exists, since the file remains optional. Google clarified its position again on June 15, 2026, when it added a note to its AI optimisation guide stating that llms.txt files are not required for Google Search, as noted in PPC Land's reporting on that documentation update.
OpenAI's crawler documentation does not mention llms.txt. According to Originality.ai, OpenAI directs site owners to robots.txt for managing how a website interacts with AI, while separately referencing llms.txt in its developer and API documentation. Anthropic's guidance on data crawling takes the same position - robots.txt for crawler management, llms.txt referenced in developer docs only. Both companies publish llms.txt for their own documentation sets.
The Originality.ai update describes this pattern as "concerning, not to mention confusing for users." As the study notes, each of the three major AI platforms publishes an llms.txt for its own documentation, yet none tells site owners to rely on the file for AI visibility or search ranking.
John Mueller, former Senior Webmaster Trends Analyst at Google, noted on Bluesky in June 2025 that "no AI system currently uses llms.txt" - a statement that now coexists with the Lighthouse audit, OpenAI's and Anthropic's own documentation files, and the 8.8x adoption growth documented by Originality.ai.
What the file is actually for
Given the data, Originality.ai concludes that llms.txt looks far more like agent-readiness infrastructure and AI-readable documentation than a search visibility tool. The practical use case is not ranking in AI Overviews or generative search features. It is providing autonomous agents - systems that navigate websites to complete tasks, book appointments, or retrieve documentation on a user's behalf - with a clean map of a site's structure before they begin.
That distinction matters. As PPC Land reported in May 2026, Google has begun separating its AI-related site guidance into two layers: one addressing how content surfaces inside generative search features, the other addressing how autonomous agents navigate site structure. The Lighthouse agentic audit fits the second category. Google's Google-Agent user-triggered fetcher, added to its official list of crawlers in March 2026, operates on behalf of users and bypasses robots.txt entirely - since it is not an autonomous crawler but a proxy for a user action.
iPullRank CEO Mike King argued in a May 2026 analysis, covered by PPC Land, that dismissing llms.txt because Google does not read it misses the point: systems including Claude, Perplexity, and various vertical agents have different retrieval architectures and may process the file. The challenge for site owners is that there is no consolidated guidance on which systems read llms.txt and under what conditions.
PPC Land's April 2026 coverage of a ProGEO.ai study found that only 7.4% of Fortune 500 companies - 37 out of 500 - had implemented llms.txt as of March 2026. robots.txt, by contrast, was present at 92.8% of those companies. The gap between the two figures reflects how long it takes for a web standard to move from experimental to expected: robots.txt dates to 1994 and reached RFC status in 2022.
Adoption challenges
Originality.ai's study attributes the slow corporate adoption to several overlapping factors. Many site owners have not heard of llms.txt at all. Those who have heard of it face conflicting guidance from the platforms most likely to consume it. And without clear evidence of measurable benefits - no traffic uplift, no documented improvement in AI citation rates - the incentive to implement and maintain an additional file is limited.
The situation for llms-full.txt is more acute. A comprehensive llms-full.txt file can run to hundreds of thousands of words on large sites, requiring ongoing maintenance as content changes. The effort is substantial, and the documented return is near zero for most deployments.
ai.txt faces a different constraint. According to Originality.ai, the standard's compliance depends entirely on individual AI companies choosing to respect it. Spawning, the company promoting the standard, confirms its own API respects ai.txt. Most AI systems do not yet. Until major platforms commit to reading and honouring ai.txt directives, it functions primarily as a visible declaration of intent rather than a functional access control.
A January 2026 article in PPC Land documented this stalemate in detail, citing Ahrefs analysis that confirmed no major LLM provider - not OpenAI, not Anthropic, not Google - was parsing llms.txt at that time. The Originality.ai data released today shows adoption has continued to climb in the months since, while the usage problem has not meaningfully changed.
The practical position as of mid-2026
Originality.ai's current guidance, as of the June 2026 update, is that llms.txt is worth implementing for sites with substantial documentation, APIs, or structured product information that agents might need to navigate. For those use cases, it provides a clean, low-overhead map that some systems - particularly coding assistants and documentation-aware agents - can consume on demand.
For general SEO and AI search visibility, it is not the tool. robots.txt and named-crawler controls remain the primary mechanisms for actual crawler management. The Originality.ai study recommends keeping both in place while monitoring how platform behaviour evolves.
The underlying tension is structural. Site owners are being asked to implement files that the platforms most likely to benefit from them have not committed to reading, in an environment where blocking those same platforms carries documented traffic penalties. The 8.8x adoption growth Originality.ai documents is real. The usage gap documented by Ahrefs is equally real. Both are true at the same time, and nothing in the June 2026 data resolves the contradiction.
Timeline
- September 3, 2024 - Jeremy Howard proposes the llms.txt protocol as a markdown file placed at a site's root to help AI models navigate content
- June 2025 - Originality.ai records 4,088 llms.txt instances in its tracker; llms-full.txt at 23 sites, ai.txt at 4 sites; John Mueller notes on Bluesky that no AI system currently uses llms.txt
- July 2, 2025 - PPC Land reports llms.txt adoption stalls as major AI platforms decline to support the standard
- March 31, 2026 - ProGEO.ai publishes study finding 7.4% of Fortune 500 companies have implemented llms.txt
- May 5, 2026 - Chrome for Developers documentation, last updated this date, adds optional agentic browsing audit for llms.txt to Lighthouse
- May 15, 2026 - Google publishes AI search optimisation guide stating no requirement to produce llms.txt for search visibility
- May 2026 - Ahrefs publishes server-log research from 137,000 domains finding 97% of llms.txt files received zero requests; AI retrieval bots account for 1.1% of requests, while SEO audit tools account for 21.7%
- May 2026 - Originality.ai tracker reaches 36,120 llms.txt instances, 2,463 llms-full.txt instances, and 397 ai.txt instances across 3M+ websites
- June 15, 2026 - Google adds a note to its AI optimisation guide explicitly stating llms.txt files are not required for Google Search and carry no positive or negative effect on rankings
- July 2, 2026 - Originality.ai publishes June 2026 update to its llms.txt tracking study, documenting 8.8x growth in llms.txt instances over twelve months alongside the near-zero usage finding from Ahrefs
Related PPC Land coverage
- llms.txt adoption stalls as major AI platforms ignore proposed standard - July 2025 report documenting that no major LLM provider was parsing llms.txt files, based on Ahrefs server-log analysis at the time
- Only 7.4% of Fortune 500 have an llms.txt file, study finds - April 2026 coverage of ProGEO.ai research showing llms.txt adoption among the largest US companies mapped to the "innovators" segment of the diffusion curve
- Google adds llms.txt to Lighthouse as agentic web standards heat up - May 2026 report on Chrome Lighthouse adding an optional agentic audit for llms.txt, with context on the practitioner debate it reignited
- Google's new AI search guide challenges everything SEOs thought they knew - May 2026 analysis of Google's official AI search guide, which explicitly rates llms.txt as unnecessary for search visibility
- Mike King: Google's AI search guide serves platform over the open web - May 2026 coverage of iPullRank's counterargument that dismissing llms.txt on Google's terms ignores systems with different retrieval architectures
- Anthropic clarifies what its three web crawlers do - and how to block them - February 2026 article on Anthropic's documentation update separating ClaudeBot, Claude-User, and Claude-SearchBot, with context on the crawl-to-referral ratios that prompted it
- Blocking AI crawlers cost news publishers 7% of traffic, study finds - April 2026 coverage of Rutgers and Wharton research on the traffic cost of using robots.txt to block AI crawlers
- Blocking AI crawlers backfired: news publishers lost 23% of traffic - January 2026 report on an earlier version of the same Rutgers-Wharton study showing the traffic penalty from AI crawler blocks
Summary
Who: Originality.ai, the content verification company founded by Jonathan Gillham, published this tracking study; Ahrefs contributed the server-log research cited within it, covering 137,000 domains.
What: According to Originality.ai, llms.txt adoption grew 8.8x in twelve months, reaching 36,120 instances by May 2026, with a combined total of 38,980 sites across all three AI web standard formats. According to Ahrefs, 97% of those files received zero requests in May 2026, with SEO audit tools rather than AI assistants accounting for the largest share of requests where any occurred.
When: The Originality.ai tracking period ran from June 2025 to May 2026; the June 2026 update was published today, July 2, 2026.
Where: The study covers the open web, scanning more than 3 million websites. The Ahrefs server-log data drew on 137,000 domains globally.
Why: The findings matter because a growing number of site owners and SEO practitioners are allocating time to implement llms.txt files in the expectation that AI systems will consume them. The Originality.ai and Ahrefs data together show that adoption is accelerating while actual consumption by AI systems remains near zero, and that the major AI platforms send conflicting signals - advising robots.txt for crawler management while publishing their own llms.txt files for documentation purposes.
Discussion