Screaming Frog SEO Spider 22.0 introduces semantic similarity analysis
Screaming Frog releases version 22.0 with AI-powered semantic analysis, content clustering and search features.

Screaming Frog announced version 22.0 of its SEO Spider tool yesterday, marking the most significant update to the software's artificial intelligence capabilities since its inception. The release, codenamed internally as "knee-deep," was posted on June 10, 2025.
The UK-based search marketing agency's latest version introduces semantic similarity analysis that utilizes large language model embeddings to identify duplicate and potentially off-topic content across websites. This development moves beyond traditional text matching methods that have dominated content analysis for over a decade.
Semantic analysis capabilities replace traditional duplicate detection
The software now employs vector embeddings to capture semantic meaning and relationships between words, enabling detection of similar pages with different phrases but overlapping themes. According to the company's announcement, "This makes it possible to identify similar pages with different phrases but overlapping themes, covering the same subject multiple times, which can cause cannibalisation or inefficiencies in crawling and indexing."
Dan Sharp, founder of Screaming Frog, explained the technical implementation requires existing AI provider integrations through OpenAI, Gemini, or Ollama to capture vector embeddings of pages. The system processes these embeddings to generate semantic similarity scores ranging from 0 to 1, with scores above 0.95 considered semantically similar by default.
The similarity threshold can be adjusted down to 0.5, providing flexibility for different analysis requirements. The lower 'Duplicate Details' tab displays all semantically similar URLs alongside the content analyzed, while the main Content tab shows the closest semantically similar address for each URL.
Low relevance content detection identifies topical outliers
Version 22.0 introduces automated detection of pages that deviate from a website's overall content theme. The system calculates this by averaging embeddings across all crawled pages to identify the "centroid" and measuring deviation from this average.
According to the documentation, "Measuring the deviation of page embeddings from a site embedding is something that was hinted at within the Google leak, and SEOs have been playing with this concept to find outliers." The feature identifies pages that fall below the relevance threshold under the 'Low Relevance Content' filter.
For Screaming Frog's own website, the system identified blog content about Olympic torch coverage, maternity leave posts, and login pages as outliers compared to the site's technical SEO focus. While these pages serve legitimate purposes, the analysis demonstrates how content themes can be quantified algorithmically.
Content cluster visualization maps semantic relationships
The update includes a two-dimensional visualization tool accessible through 'Visualisations > Content Cluster Diagram.' This feature plots URLs from crawls based on embeddings data, clustering semantically similar content together.
The company provided examples showing how semantic relationships mirror natural taxonomies. Tiger population pages clustered tightly together, with Liger hybrids positioned between Tiger and Lion content, followed by other big cats like Leopards, Jaguars, and Cheetahs as neighboring clusters.
The visualization tool includes adjustable settings for sampling, dimension reduction, clustering algorithms, and color schemes. It integrates with the software's segment functionality, allowing analysis of specific website sections. The company indicated plans to enhance these diagrams with additional crawl data for deeper insights.
Semantic search functionality transforms keyword analysis
A new semantic search tab enables users to enter queries and identify the most relevant pages within a crawl using vector similarity calculations rather than keyword matching. The functionality vectorizes search queries and calculates cosine similarity between queries and page content.
This approach aligns with modern search engine methodologies and large language model content retrieval systems. Marketing professionals can use this feature for keyword mapping, internal linking strategies, and competitor analysis against specific search terms.
The tool includes an 'Embedding Display' filter that can be adjusted to 'Centroid' mode, revealing outlier pages and identifying the most representative page closest to the average embedding across entire websites.
AI integration improvements expand automation capabilities
The software introduces multiple prompt targets, allowing users to create advanced prompts with several target elements simultaneously. Marketing teams can now run AI prompts against URLs matching specific segments, reducing credit consumption by targeting relevant content only.
New segmentation options based on 'Issues' enable precise automation. Teams can generate image alt text exclusively for images missing this attribute rather than processing every image on a website. URL Details data can now be referenced in AI prompts for enhanced flexibility.
Custom endpoint configuration allows integration with private LLM APIs and alternative AI providers including DeepSeek, Microsoft Copilot, and Grok. Users can customize model parameters, headers, and content length limits to prevent token exceeded errors on lengthy pages.
The update adds support for Anthropic's Claude through the API Access configuration, expanding the available AI model options for content analysis and generation tasks.
Advanced export and column management features
Version 22.0 introduces an advanced column configurator enabling bulk selection, hiding, and reordering of columns. The 'Multi Export' option allows users to select any combination of tabs, bulk exports, or reports for single-click export functionality.
Teams can save common export sets as presets for manual use, scheduling, or command-line interface operations. This functionality enables Export for Looker Studio from manual crawls rather than exclusively through scheduling.
Bulk exports can now consolidate multiple exports into single Google Sheets or Excel workbooks with individual tabs for each export type. This feature works for both Google Sheets and Excel formats.
Multiple sitemap processing and Google Sheets integration
List mode now supports uploading multiple XML sitemaps without requiring a Sitemap Index file. The software can also process URLs directly from Google Sheets, with options to input Google Drive credentials for accessing private spreadsheets.
This Google Sheets integration enables automation potential through associated add-ons and app scripts. The feature works in scheduling and command-line interface modes for enterprise workflow integration.
APIs mode enables data retrieval without crawling
A new 'APIs' mode allows URL upload and data retrieval from APIs without crawling operations, improving processing speed for API-focused analysis. The system enables API data requests when crawls are paused, not just upon completion.
Configuration modifications for GA4 and Google Search Console now prompt users to remove existing data and request updated information. Users can right-click any URL to request data from connected APIs, with requests taking priority in the queue for immediate display.
Technical specifications and compatibility
The software maintains compatibility with existing AI provider integrations while expanding support for additional services. Image and text speech generation capabilities work with OpenAI and Gemini integrations, enabling automated content creation during crawl operations.
All visualizations now support external browser display for improved performance at scale. A configuration difference window accessible through 'control + shift + C' enables quick comparison between current and default configurations.
The Moz API integration updates to version 3, adding metrics including link propensity, spam score, and brand authority alongside existing Domain Authority and Page Authority measurements. Majestic API integration now supports Trust Flow Topics selection.
Marketing community implications
The semantic analysis capabilities address long-standing challenges in content audit processes. Traditional duplicate content detection often missed conceptually similar pages that used different terminology but covered identical topics.
The introduction of semantic clustering and outlier detection provides data-driven approaches to content strategy decisions. Marketing teams can identify content gaps, redundancies, and topical drift that manual analysis might overlook.
Semantic search functionality transforms keyword research and internal linking strategies. Instead of relying on exact keyword matches, teams can identify content relevance based on semantic similarity to target queries.
The AI integration improvements reduce operational costs through targeted automation. Running prompts only against relevant segments prevents unnecessary API calls while maintaining comprehensive analysis coverage.
Enhanced export capabilities streamline reporting workflows, particularly for agencies managing multiple client accounts with standardized reporting requirements. The Google Sheets integration enables real-time collaboration and automated URL list management.
Industry expert commentary
Marketing technologist Mic King highlighted the significance of the semantic features in social media posts following the announcement. According to King, "I'm especially excited about the native semantic similarity features that help you not only pull embeddings into the crawl, but also allow you to do analysis directly in the tool! You can find semantically similar pages, low relevance pages, cluster pages semantically, and do semantic look ups based on queries."
King emphasized the software's continued leadership position, stating "The team continues to lead the frontier of SEO software and this version is no different." He advised immediate download of the new version for practitioners seeking advanced semantic analysis capabilities.
The Screaming Frog announcement garnered significant engagement across social media platforms, with users expressing interest in features like CSV export for content clusters and integration possibilities with existing workflows.
Timeline
Version 22.0 represents a significant advancement in technical SEO analysis capabilities, introducing semantic understanding that mirrors modern search engine technologies. The integration of vector embeddings, content clustering, and semantic search provides marketing professionals with sophisticated tools previously available only through custom development or enterprise-level solutions.
The announcement timing, just hours before this report on June 10, 2025, demonstrates the rapid pace of innovation in SEO tooling as artificial intelligence capabilities become more accessible to marketing practitioners.
Timeline
- June 10, 2025: Screaming Frog SEO Spider version 22.0 released with semantic analysis features
- June 10, 2025: Marketing technologist Mic King endorses the update on social media
- May 19, 2025: Google launched NotebookLM mobile applications for AI research assistance (Related: NotebookLM mobile app extends research capabilities)
- May 2, 2025: Google unveiled server-side tagging and measurement tools (Related: Google enhances measurement capabilities)
- April 15, 2025: Google updated quality rater guidelines for AI content evaluation (Related: Quality guidelines address AI content)
- January 5, 2025: Analysis positioned SEO professionals to lead AI implementation (Related: SEO expertise aligns with AI systems)
- September 30, 2024: Google's Danny Sullivan discussed SEO future and AI integration (Related: Google addresses SEO evolution)
- September 14, 2024: Google promoted AI tools for SEO optimization (Related: Google endorses AI for SEO)