Google this month published new documentation on what it takes to appear inside AI-powered search features, and SEO consultant Dr. Marie Haynes walked through the document in a YouTube video released on May 15, 2026. The guide, aimed at website owners and publishers, addresses two technical mechanisms that now sit at the center of how Google surfaces content in AI Overviews and AI Mode: retrieval-augmented generation (RAG) and the query fan-outtechnique. It also delivers pointed guidance on commodity content, link building, structured data, and the emerging world of agentic search - topics that collectively reframe what it means to optimize for Google in 2026.
The video, posted to Haynes' YouTube channel under the series "Search News You Can Use," runs approximately 26 minutes and walks through the documentation section by section. Haynes has been practicing SEO since 2008 and is one of the field's more closely followed voices on algorithm behavior and Google's quality systems.
What the documentation says about changing search behavior
According to Google's new guide, as covered by Haynes, the company acknowledges that users are "increasingly gravitating to generative AI experiences." The company frames this shift not purely as a reduction in opportunities for publishers but as a potential opening for reaching higher-quality visitors. According to the documentation, "as we upgrade search to meet these changing expectations, this transformation offers new opportunities to reach people who might be more inclined to engage with your site, spend time with your content, or even convert to becoming a subscriber or making a purchase."
Whether that framing holds for most publishers is debatable. Traffic data across the industry has moved in one direction. AI Overviews now correlate with a 58% reduction in click-through rates for top-ranking pages, according to Ahrefs research from February 2026, and SISTRIX data from Germany places the monthly click loss from AI Overviews at 265 million. Position-one CTR in that market collapsed from 27% to 11% when AI Overviews are present.
RAG: why websites still matter
The first technical mechanism the guide covers is retrieval-augmented generation. RAG is the process by which Google's AI features use the web - including individual websites - to ground their answers. The system does not generate responses from training data alone. Instead, it retrieves information from indexed pages to provide factually current, source-supported outputs. This is the technical reason why websites remain relevant even as the search results page increasingly answers questions directly: without crawlable, indexed content, the AI has nothing to ground its answers against.
Haynes frames this as a significant shift in what the index is for. Traditionally, the question the index answered was simple - which pages should a user visit? Now, the index serves primarily as grounding material for AI answers. Haynes also noted that Microsoft's Bing published a blog post around the same time reaching a similar conclusion about the changing role of the index.
Query fan-out: how Gemini breaks down a question
The second mechanism covered in Google's guide is the query fan-out. This is a technique in which a specialized model - described by Haynes as distinct from the publicly available versions of Gemini - generates a list of concurrent sub-questions from a single user query. When someone asks "how to fix a lawn full of weeds," the fan-out model might produce parallel queries including "best herbicides for lawns," "remove weeds without chemicals," and "how to prevent weeds in lawn." These run simultaneously, and the AI synthesizes their results into a single response.
Haynes first heard about the fan-out technique at Google IO last year, where she spoke with an engineer who worked on AI Mode. The documentation now makes the mechanism explicit. Some practitioners have responded to knowledge of the fan-out by creating individual content pieces targeted at each likely sub-query. Haynes pushes back on this approach. Creating large amounts of content to cover estimated fan-out queries is likely to produce what she and the Google documentation both describe as commodity content - the category Google is explicitly moving to reward less.
Commodity content: the central problem
The distinction between commodity and non-commodity content runs through the guide as its organizing principle. According to the documentation, "seven tips for first-time home buyers" is the kind of general information that exists all over the web - content that no specific user is seeking out from a specific source. That is commodity content.
Non-commodity content, as the guide defines it, starts with a unique point of view. According to Google, "our AI systems take a look at a variety of sources. So, it can be helpful to have a unique viewpoint that stands out." The second dimension is direct, first-hand experience. According to the documentation, "create the content yourself based on what you know about the topic and consider what in-depth experience you can bring to your content. Don't just recycle what others on the internet have already said or could be easily produced by a generative AI model."
This is a direct extension of the E-E-A-T framework - Experience, Expertise, Authoritativeness, and Trustworthiness - that Google has been building into its quality guidance since 2022, when the additional "E" for experience was introduced. The argument, as Haynes articulates it, is that experience is the one dimension of content quality that AI cannot replicate. Adding first-person framing to a page is not sufficient. What matters is the insight that comes from having actually done something.
Haynes offers a practical test from Danny Sullivan, Google's Search Liaison, who raised the question at the Google Search Central event in Toronto: if a site's content were removed from the web entirely, would anyone miss it? If not, it is likely commodity content. PPC Land has documented the broader consequences of this shift for publishers - sites that demonstrated measurable gains following Google's June 2025 core update shared characteristics centered around comprehensive, helpful content delivery rather than scaled production.
Scaled content and spam policies
The guide revisits Google's scaled content abuse spam policy, a topic Sullivan raised multiple times at the Toronto event. Scaled content - content produced in large quantities using automated or AI-assisted methods with the purpose of manipulating rankings - is explicitly against Google's guidelines. The documentation also notes that using AI to assist with content creation is permissible, provided the output meets Google's quality standards and does not violate spam policies.
Every site in Google's index carries an internal spam score that is not visible to the site owner. According to Haynes, sites with high spam scores face compounding disadvantages in ranking. She notes that core updates, which Google consistently says are not about penalizing link building, nonetheless consistently harm sites with link profiles built for SEO purposes. Her interpretation is not that those sites are being penalized, but that the links that once helped are becoming progressively less useful as Google's quality signals grow more sophisticated.
Technical SEO: less central than it once was
On technical SEO, the guide advocates for technically sound sites but does not place primary weight on performance metrics. According to Haynes' reading of the documentation, the key question is whether Google can retrieve and render the content at all - crawlability, indexability, and rendering are the threshold requirements. Core Web Vitals that score in the medium range are not, in her view, a primary driver of ranking for most sites. The exception is e-commerce: studies have shown that every millisecond of page speed improvement corresponds to conversion rate gains, making performance optimization commercially justified in that category.
Business agents: what Merchant Center now enables
One section of the guide points to the emerging business agent feature in Google Merchant Center. Haynes tested this with one client and found a search result that displayed the brand name alongside a button labeled "chat with the business agent." The information surfaced in the agent was pulled directly from the client's site. Nothing in it was inaccurate. Haynes interprets this as the likely future of search for most businesses: the website serves not as the destination but as the grounding layer - accurate information that AI assistants draw from when conversing with users.
This is consistent with the infrastructure Google has been building toward. Google launched the Universal Commerce Protocol on January 11, 2026, co-developed with Shopify, Etsy, Wayfair, Target, and Walmart, establishing a technical standard for AI agents to execute purchases across retail platforms. The protocol supports REST and JSON-RPC transport layers and is compatible with Agent2Agent and Model Context Protocol. Merchant Center is the central hub for product data that feeds these experiences. Google expanded UCP checkout into main search results on May 5, 2026, pushing a Buy button directly into standard search results for the first time.
Myth-busting: LLMs.txt and content chunking
The guide addresses two practices that have generated significant discussion in the SEO community and largely dismisses both. The first is LLMs.txt - a file format intended to give AI systems a curated, structured version of a site's content. According to Google's documentation, as relayed by Haynes, there is no requirement to produce LLMs.txt files for a site to appear in search. Language models are capable of reading standard HTML and determining whether text will be useful to users. Google's own decision to publish markdown versions of its developer documentation was, in Haynes' reading, for developers who wanted to feed those documents to their own AI agents - not a signal that site owners should follow suit.
Shopify has implemented LLMs.txt across its merchant sites. Haynes' interpretation is that this serves the Shopify agent's ability to interact with and modify those sites - a different use case from search visibility. The conclusion: unless there is a specific functional reason to produce LLMs.txt, it is not a priority.
The second practice the documentation dismisses is content chunking - the deliberate breaking of content into small, discrete units to make it easier for language models to extract specific answers. Haynes notes that this practice made some intuitive sense in the early days of AI-assisted search, when having individual sentences that precisely answered a query appeared to help. Current language models - Gemini, ChatGPT, Claude - are sophisticated enough to extract relevant passages from naturally flowing prose. Highly chunked content may actually work against the uniqueness and experiential qualities that Google now rewards.
Inauthentic mentions and link building
A section of the guide covers what Google calls "inauthentic mentions." According to the documentation, "just like the rest of Google search, our generative AI features can show what's being said about products and services across the web, including in blogs, videos, and forum discussions. However, seeking inauthentic mentions across the web isn't as helpful as it might seem."
Haynes interprets this as a reference to link building and the broader practice of creating content - lists, round-ups, blog posts - that places a brand in a favorable context without genuine third-party endorsement. The documentation appears to extend that concern from traditional search into AI feature inclusion.
Structured data: useful in narrow cases
On structured data and schema markup, the guide's position is more nuanced. Google deprecated FAQ-rich results some time ago; structured data focused on FAQ schema is therefore of limited value for most sites. Haynes notes that Ahrefs published a study testing schema implementation across a large number of pages already appearing in AI Overviews and found no measurable ranking difference. Her own practice: schema receives relatively little attention for informational content. For e-commerce, the calculus is different. AI agents require accurate, structured product data - prices, availability, specifications - to operate effectively. Merchant Center product data serves this function, and schema that exposes frequently changing information to AI agents has genuine utility.
Agentic experiences and WebMCP
The section that generated the most attention in Haynes' commentary covers agentic experiences - specifically WebMCP, a protocol that allows agents to use tools and interact with the functional layer of a website, not just its content. At the Google Search Central event in Toronto, Haynes asked how much attention practitioners should pay to WebMCP. The answer from Google staff was effectively: don't. Their stated reasoning was to focus on what Google has already communicated - primarily UCP and the business agent.
Haynes is skeptical of that answer and has reasons for it. She notes that a client implemented WebMCP on their site following discussions about its potential. The implementation worked, but the user experience required giving an external agent explicit instructions along with API tokens before it could interact with the site - a friction level that makes the current version difficult to deploy practically. Six days after the Toronto event, Google published a blog post about creating agent-friendly sites. That post describes three ways agents perceive a site: screenshots of the interface, the HTML source, and the accessibility tree.
The accessibility tree is a structural representation of a page that exposes what is interactive - buttons, form fields, clickable elements - in a format optimized for automated agents. It is distinct from the visual design and from the raw HTML. Haynes built a tool using this framework that audits a site's accessibility tree and returns both a diagnostic report and a prompt that can be given to a language model for improvement planning.
Google IO, referenced in the video as an upcoming event at the time of recording, was expected to address changes to search and the movement toward agentic search. Several agentic capabilities are already live, according to Haynes: some agents can book services or compare products, and a voice agent can contact local businesses on a user's behalf to gather quotes before assembling a summary document.
Dos and don'ts: what the guide means in practice
The documentation, taken as a whole, draws a fairly clear line between practices Google's systems reward and practices that are either neutral or actively counterproductive. The following is drawn directly from the guide as analyzed by Haynes.
Do: produce content grounded in direct experience. The guide is explicit that content written from genuine first-hand knowledge has qualities that purely informational writing lacks. Publishing an account of attending an industry event, testing a product, or working through a specific problem with a client is the kind of material Google's quality systems are designed to surface. The question Haynes suggests asking: does this content contain something that could not have been produced by someone who had not actually done this?
Do: ensure Google can crawl, render, and index the site. Technical SEO is not irrelevant - it is foundational. If Googlebot cannot access the content, retrieve it, and render it correctly, nothing else matters. Crawl errors, noindex directives applied incorrectly, and JavaScript rendering failures are worth fixing. Beyond that threshold, however, the guide does not prioritize performance optimization for most sites. Core Web Vitals in the medium range are not, in the documentation's framing, a primary ranking driver except in e-commerce, where page speed has a demonstrated relationship with conversion rates.
Do: add high-quality images and video. The documentation specifically recommends these formats. Haynes notes that Google has begun placing short-form video content on the home screen of Google TVs, and that video is a format AI cannot easily replicate. A walkthrough, demonstration, or commentary delivered on camera carries experiential weight that text summaries of the same content typically do not.
Do: use structured data for e-commerce and rapidly changing information. Product schema that exposes accurate pricing, availability, and specifications is valuable because AI agents require current, structured data to function. Merchant Center product data feeds directly into UCP-powered experiences. For information-focused sites, schema adds limited value and the documentation does not emphasize it.
Do: make the site accessible to agents. Google's blog post on agent-friendly sites - published six days after the Toronto event - identifies three layers through which agents perceive a page: screenshots, HTML source, and the accessibility tree. Buttons that are too small, interactive elements that are not labeled, and functional components that do not expose correctly in the accessibility tree are worth auditing. This is not a ranking factor in the conventional sense, but it matters as agentic search features mature.
Don't: produce content at scale to target query fan-out sub-queries. The guide does not prohibit creating multiple pages on related topics, but Haynes is direct about the risk. If the purpose of producing those pages is to cover every likely sub-query rather than to bring genuine depth or a unique angle to each one, the result is almost certainly commodity content. Google's spam policies on scaled content apply regardless of whether the content is AI-generated or human-written.
Don't: build or seek inauthentic mentions. The documentation addresses this directly in the context of AI features as well as traditional search. Lists, round-ups, and link placements engineered to make a brand appear recommended when it is not carry a spam risk. Every site in Google's index has an internal spam score that is not visible externally, and practices that inflate that score create compounding disadvantages across both traditional results and AI feature inclusion.
Don't: rely on LLMs.txt for search visibility. There is no evidence that producing an LLMs.txt file improves a site's appearance in Google's AI features. The format may have functional utility for other purposes - Shopify's implementation appears to serve its own agent infrastructure, not search. Unless there is a specific, non-search reason to produce one, the documentation does not support the investment.
Don't: chunk content into artificially small units. Writing content in discrete, answer-sized fragments to make extraction easier for language models works against the qualities Google now rewards. Natural, flowing prose that contains genuine analysis is harder to chunk but easier for sophisticated language models to understand - and more likely to demonstrate the non-commodity characteristics the guide is designed to reward.
Don't: overinvest in schema for informational content. Ahrefs' study of pages already appearing in AI Overviews found no measurable difference when schema was added. FAQ schema, in particular, lost its primary practical application when Google deprecated FAQ-rich results. The time spent implementing schema on informational pages is, in most cases, better directed elsewhere.
Don't: disavow links expecting a recovery. Haynes is careful here: she does not think disavowing low-quality links restores whatever benefit those links may once have provided. The value of manipulative links tends to decay rather than invert. Sites with link profiles built for SEO purposes are not necessarily being penalized in the conventional sense, but the signals those links provide carry diminishing weight. The practical implication is to stop building links for SEO purposes rather than to attempt to reverse what has already been done.
What this means for marketers
The significance of this documentation for the marketing community lies in what it formally confirms rather than introduces. PPC Land's coverage of Google's new AI search guide noted that Google published the guide on May 15, 2026, with three specific areas of focus: non-commodity content, practical optimization for local, shopping, image, and video formats, and direct correction of widespread misinformation about AI search practices. The guide is notable for what it pushes back on: a significant portion of the SEO industry's response to AI search has involved tactics - LLMs.txt, content chunking, FAQ schema, fan-out targeting - that the documentation now explicitly rates as unnecessary or counterproductive.
Research from Cyrus Shepard published on May 7, 2026, covering 54 experiments, patents, and case studies, found that URL accessibility and search rank score 9.5 and 9.4 respectively on an evidence-weighted scale of AI citation factors - placing traditional SEO fundamentals at the top of the ranking. The finding that 38% of AI Overview citations come from top-10 organic results, according to Ahrefs data cited in that analysis, means that the path to AI citation visibility runs primarily through conventional quality signals, not AI-specific tactics.
The guide does not resolve the fundamental tension facing publishers: organic clicks are declining as AI Overviews serve users answers without requiring a site visit, while the same AI systems increasingly depend on those publishers' content as grounding material. What it does clarify is Google's stated view of what content is worth grounding against - and commodity content, at scale, is not it.
Timeline
- 2022: Google introduces the fourth "E" in E-E-A-T, adding "Experience" to its quality evaluation framework
- May 14, 2024: AI Overviews launch in Google Search, placing generative AI output directly into main results
- August 2024: Dr. Marie Haynes releases book "SEO in the Gemini Era" covering AI's impact on Google's ranking systems
- June 30 - July 17, 2025: Google's June 2025 core update completes after a 16-day rollout, with sites demonstrating first-hand experience content gaining visibility
- July 18, 2025: Marie Haynes publishes analysis of sites that improved after the June 2025 core update, identifying comprehensive, experience-based content as the shared characteristic
- July 2025: Google reports 65% surge in visual searches as AI Mode drives multimodal adoption; query fan-out technique described publicly
- December 11 - 29, 2025: Google's December 2025 core update runs 18 days, the third major update of the year
- December 2025: Marie Haynes explains in a published interview how Google is transforming into an AI agent that performs tasks on behalf of users
- January 11, 2026: Google launches the Universal Commerce Protocol with Shopify, Etsy, Wayfair, Target, and Walmart, alongside the Business Agent feature and Direct Offers ad format
- February 2026: Ahrefs research documents a 58% reduction in click-through rates for top-ranking pages when AI Overviews are present
- March 24, 2026: Google's March 2026 spam update goes live, applying globally across all languages
- May 5, 2026: Google expands UCP checkout into main search results, placing a Buy button directly in standard search results for the first time
- May 7, 2026: Cyrus Shepard publishes AI Citation Ranking Factors analysis covering 54 studies, finding URL accessibility and search rank as the top two citation factors
- May 15, 2026: Google publishes new documentation on ranking in AI search; Dr. Marie Haynes releases a video walkthrough on YouTube covering RAG, query fan-out, commodity content, agentic experiences, and WebMCP
Summary
Who: Dr. Marie Haynes, SEO consultant and researcher practicing since 2008, published a video analysis of new Google documentation on AI search ranking. The documentation itself is authored by Google and addresses website owners, publishers, and SEO practitioners.
What: Google released new guidance explaining the two primary technical mechanisms that determine visibility in AI search features - retrieval-augmented generation and the query fan-out technique - alongside guidance on non-commodity content, structured data, LLMs.txt, content chunking, inauthentic mentions, and the transition to agentic search experiences. Haynes walked through the document publicly, adding context from client work and attendance at Google events.
When: The YouTube video was published on May 15, 2026. The Google documentation it analyzes was released the same day.
Where: The video was published on YouTube under Haynes' "Search News You Can Use" series. The Google documentation is part of the company's developer and search help resources.
Why: The documentation is significant because it formally addresses - and in several cases dismisses - practices that have proliferated in the SEO industry as direct responses to AI search features. It confirms that traditional SEO fundamentals remain the primary path to AI citation, while clarifying that the index now functions primarily as grounding material for AI answers rather than a list of pages for users to visit. For advertisers and publishers, the implications extend into agentic commerce infrastructure and the changing role of websites as grounding layers for AI assistants rather than direct traffic destinations.