23 factors that actually get your content cited by AI search engines

Cyrus Shepard, founder of Zyppy SEO, yesterday published a comprehensive analysis of the factors most associated with earning citations from AI search engines, distilling findings from 54 experiments, patents, and case studies published over the past two years. The resulting framework - 23 scored "AI Citation Ranking Factors" - offers the most consolidated evidence-based view to date of what separates cited content from content that AI engines overlook.

The analysis, published May 7, 2026, on Shepard's Substack newsletter Zyppy Signal, covers AI engines including ChatGPT, Gemini, and Perplexity. It arrives at a moment when the value of being cited by these platforms is increasingly measurable. According to the research, a study by Seer Interactive found that appearing in Google's AI Overviews results in 120% more organic clicks per impression and a 41% increase in paid clicks compared with when a brand is not cited - a finding that puts concrete commercial weight behind what had previously been treated as a matter of visibility alone.

URL accessibility and search rank top the list, scoring 9.5 and 9.4 respectively on Shepard's evidence scale. That two foundational SEO criteria occupy the first two positions is itself the central message of the analysis: winning traditional search and earning AI citations are not competing objectives. According to the research, Ahrefs found that 38% of AI Overviews citations come from the top 10 Google results, and going beyond the top 10 only increases the overlap. The finding provides a direct answer to a question many practitioners have wrestled with since AI search features began reshaping traffic patterns.

The scoring methodology

Each of the 23 factors was scored using three weighted criteria. Repeatability - how many independent studies documented the same finding, and how consistent the outcomes were - formed the first criterion. Strength of evidence formed the second, with a 50-million-query study carrying substantially more weight than a 10-query case study. Official support from platform documentation, technical specifications, or patents formed the third. Shepard then manually assigned scores using AI to help refine the weighting, a process he describes as combining aggregation with judgment.

The methodology matters because it distinguishes this analysis from looser frameworks circulating in the industry. Each factor is grounded in published research rather than anecdotal observation, and Shepard is explicit that correlation, not causation, is what the scores reflect. "These aren't 'Ranking Factors' in the traditional sense. Instead, they are features correlated with AI citations across multiple studies," according to the publication. The acknowledgment is significant. Much of the industry discussion around AI visibility has struggled to separate genuine signal from noise, and Shepard's insistence on published evidence as the baseline sets a different standard.

What the top factors reveal

Fan-out rank scores 9.3 - just below search rank - and its position near the top reflects something technically important about how AI engines construct their answers. Rather than retrieving content for a single query, AI engines perform multiple supplementary searches, called fan-out queries, to ground their responses. Ranking highly for these related queries is therefore nearly as important as ranking for the primary one. This mechanism, well-documented in the literature Shepard cites, explains why topic cluster ranking also appears in the top tier at 8.9. A site that ranks across multiple related queries has a compounding probability advantage.

Preview controls score 9.2, tied with query-answer match. The preview controls finding is specifically relevant to publishers who have deployed AI scraper-blocking tools or who have applied directives such as "nosnippet" and "data-nosnippet" to limit how search engines display their content. According to the analysis, limiting the visibility of specific text can lower AI visibility. Cloudflare's AI-blocking protections, which many publishers adopted as a response to content scraping without traffic return, may therefore carry a direct cost in citation probability. The tradeoff is real and the evidence for it is consistent across studies.

Query-answer match, also at 9.2, reflects the documented pattern of "semantic closeness" between AI-generated answers and the pages they cite. This means page titles, subheadings, and body content should closely mirror both the search query and the kind of answer an AI engine would construct in response to it. It is a more demanding form of relevance than matching keywords - it asks whether the content would plausibly serve as a source for an AI-generated sentence.

Intent-format match at 9.0 follows directly from this logic. AI engines tend to cite pages whose format suits the query type. According to the analysis, "best"-type queries prefer listicles or comparison tables, while "how-to" queries reward step-by-step guides. This isn't a new idea in SEO, but the evidence confirms it extends into AI citation behavior as well.

Answer near the top

One of the more operationally specific findings is answer near the top, which scores 8.8. According to the research, AI engines do not treat all text on a page equally. The analysis cites work by Dan Petrovic showing that Google's Gemini applies a strict retrieval cap per URL, meaning content near the top of the page is more likely to make the cut when the system is deciding what to extract. Several other studies corroborate this pattern. The implication is that pages structured with the most important information in the first scrollable section are better positioned for AI citation than those that bury their main claims below long introductions or background sections.

AI-ready structure scores 8.6. This refers to how well-organized a page is for AI extraction - not in the sense of breaking content into tiny chunks, but in providing clear headings, sections, and tables that help AI systems parse meaning before they begin retrieval. According to the research, AI engines typically break pages into sections before processing, and unclear organization increases the difficulty of that process. The finding aligns with longstanding accessibility and readability guidance, suggesting the practices that help human readers also help AI systems.

Factual specificity and explicit phrasing

Factually specific content scores 8.3, and its companion factor, explicit phrasing, scores 8.1. The distinction between the two is subtle but meaningful. Factual specificity means using verifiable, concrete claims rather than vague generalizations - the difference between "adults need a lot of protein" and a statement giving exact grams per kilogram of body weight. Explicit phrasing means expressing those claims without hedging. According to the analysis, a sentence like "some people prefer magnesium glycinate, while others use citrate or threonate" is weaker than a definitive statement identifying a single best choice.

Both factors appear because AI engines are constructing answers they need to justify with citations. A page that mirrors the precision of a good AI answer - specific, committed, and verifiable - is more likely to be the page cited. This has implications for editorial decisions. Content strategies built around cautious, both-sides framing may be less well-suited to AI citation environments than content that makes clear, evidenced claims.

Cites sources and self-contained passages both score 8.0. The cites-sources finding means pages that reference their own evidence base - linking to studies, naming sources, or indicating how conclusions were reached - appear more frequently in AI citations across studies. Self-contained passages means that individual sentences or blocks of text should be interpretable without requiring context from the surrounding content. According to the research, if a passage says "this ingredient has better evidence" without naming the ingredient or the evidence, an AI engine must parse meaning from elsewhere on the page. If it says the ingredient is "supported by 137 scientific studies for heart health," the information requires no further context to extract.

Content visibility scores 7.6 and reflects the established principle that text hidden behind JavaScript, tabs, or expandable divs is less accessible to both search engines and AI systems. Modern pages increasingly use these techniques for design purposes, but according to the analysis, content that isn't visible in the HTML is less likely to be cited.

Freshness, brand trust, and length

Freshness scores 7.0 and behaves similarly to traditional search: its importance varies by query type. Questions about recent events require more current information than questions about established practices. Brand and entity trust scores 6.8, a figure Shepard notes is likely to rise. According to the analysis, AI engines increasingly seek credible sources and what they already know about a brand may influence how much they trust it. The note echoes a pattern observed independently on PPC Land, where research has documented that AI-mediated search increasingly concentrates citation probability in established, high-authority sources.

Content length scores 6.7, with a caveat: while many studies found longer content performed better, the evidence was inconsistent. Several researchers noted that very long content reduced the probability of AI systems retrieving all of it - a practical constraint given the retrieval cap dynamic documented for answer-near-the-top behavior.

Language scores 6.3. The studies documented a clear bias toward the language and sometimes location of the question. A French query from France is more likely to return French citations. This has operational significance for multilingual content strategies and for publishers serving non-English markets.

The bottom tier

Entity consistency at 5.8 and structured data at 5.6 both score in the middle of the range. Entity consistency refers to using stable, consistent naming conventions for brands, people, and products across a page. Structured data's position at 5.6 reflects what Shepard describes as one of the more contested questions in the field. Large language models don't typically ingest schema as training data, yet nearly every study that examined the relationship between schema markup and AI citations found a positive, if small, relationship. The consistency of that finding across studies earns it a non-trivial score despite the mechanism remaining unclear.

Known source scores 5.4 - the phenomenon where an AI system cites a URL from its own training data, bypassing the usual grounding process. This can lead to citations that no longer exist, and it's more typical of ChatGPT and Perplexity than Gemini. Domain authority scores 5.0, with studies finding a relationship but typically a weak one. At the bottom, LLMs.txt scores 2.0. Shepard notes he couldn't find any credible evidence that maintaining an LLMs.txt file influences AI citations in any measurable way.

Industry context

The analysis arrives as the marketing industry grapples with a structural shift in how citations and traffic intersect. AI Overviews now correlate with a 58% reduction in click-through rates for top-ranking pages, according to Ahrefs research from February 2026. In Germany, SISTRIX data shows the feature costs the market 265 million organic clicks per month, with position-one CTR collapsing from 27% to 11% when AI Overviews are present.

Against that backdrop, being cited - rather than merely ranking - increasingly determines whether a page receives any click at all. Seer Interactive's analysis of 42 client organizations found that brands cited in AI Overviews earned 35% more organic clicks and 91% more paid clicks compared to non-cited brands, even as overall CTRs declined sharply across the category.

Shepard's analysis was produced by downloading nearly every published AI citation experiment, study, explainer, and patent published over the past two years, then narrowing to the 54 most salient and helpful. The work acknowledges contributions from a dozen researchers including Dan Petrovic, Rand Fishkin, Kevin Indig, Lily Ray, and Britney Muller. Shepard describes the exercise as something he "couldn't have done without the contributions of the AI researchers and thought leaders in our industry."

Microsoft's grounding framework, published in February 2026, formalized many of the same principles Shepard documents - structured content, authoritative signals, and retrievable passages - under the banner of Generative Engine Optimization. Shepard's analysis provides an evidence-scored version of that framework, separating factors with consistent multi-study support from those resting on thinner ground.

The practical summary Shepard offers is: "Relevance, Trust, Topical Authority, and Extractability - all signals that should align with current SEO thinking." The conclusion, backed by 54 studies, is that practitioners don't need to abandon what works in search to compete in AI citation environments. They need to execute the same fundamentals more precisely - and pay closer attention to how their content is structured, where key information sits on the page, and how clearly individual passages convey verifiable claims.

A 7-step AI Citation Checklist based on the same data was announced for release the following week.

Timeline

Past two years: Multiple academic papers, studies, and experiments on AI citation factors published across the industry
February 2026: Ahrefs publishes research showing AI Overviews correlate with 58% reduction in click-through rates
February 12, 2026: Microsoft positions grounding as the infrastructure powering AI search, introduces Generative Engine Optimization terminology via Bing Webmaster Tools
March 1, 2026: SISTRIX data documents 265 million monthly clicks lost in Germany due to AI Overviews
March 10, 2026: Semrush and LinkedIn analysis of 89,000 URLs cited by ChatGPT, Google AI Mode, and Perplexity published
April 9, 2026: Cyrus Shepard publishes analysis of 400+ websites identifying five features predicting Google traffic outcomes in 2026
May 7, 2026: Cyrus Shepard publishes the AI Citation Ranking Factors analysis on Zyppy Signal, covering 54 studies and scoring 23 factors
Week of May 11, 2026: 7-step AI Citation Checklist based on the same research announced for release

Summary

Who: Cyrus Shepard, founder of Zyppy SEO and author of the Zyppy Signal newsletter, with acknowledgments to a dozen researchers including Dan Petrovic, Rand Fishkin, Kevin Indig, Lily Ray, Britney Muller, Ann Smarty, Andrea Volpini, David McSweeney, Metehan Yesilyurt, Michael King, Dawn Anderson, and Oshen Davidson.

What: A scored analysis of 23 factors associated with earning citations from AI search engines, derived from 54 experiments, patents, and case studies. The top factors are URL accessibility (9.5), search rank (9.4), fan-out rank (9.3), preview controls (9.2), and query-answer match (9.2). The lowest-scoring factor is LLMs.txt (2.0). The overall finding is that traditional SEO fundamentals remain the strongest predictors of AI citation probability, with some technical adjustments required.

When: Published May 7, 2026, on the Zyppy Signal Substack newsletter. The research draws on studies published over approximately the past two years. A follow-up 7-step checklist was announced for release the following week.

Where: Published on Substack via the Zyppy Signal newsletter. The underlying studies covered AI engines including ChatGPT, Gemini, and Perplexity, with research drawn from multiple countries and query types.

Why: The publication addresses a documented gap between the volume of AI citation research being produced and the rate at which practitioners are implementing findings. According to the analysis, most marketers are not yet acting on available research, either because of uncertainty about what matters most or because of information overload. By consolidating 54 sources into a single scored framework, the analysis attempts to reduce that friction and provide a prioritized starting point.