Google's site quality scoring system revealed through endpoint discovery

Mark Williams-Cook, Director at Candour Agency, disclosed on December 15, 2024, that his team discovered a Google endpoint that revealed extensive internal data about how the search engine evaluates and ranks websites. The finding, which was reported to Google's Vulnerability Reward Program, resulted in a $13,337 payout due to its high exploitation risk.

The discovery provided access to over 2 terabytes of data covering 90 million queries, exposing more than 2,000 properties Google uses to classify queries and websites. According to Williams-Cook's presentation at SearchNorwich, the data revealed previously unknown details about Google's site quality scoring system and various ranking mechanisms.

Site Quality Scoring System

Among the most notable findings was evidence of Google's site quality scoring system, which assigns values between 0 and 1 to websites at the subdomain level. The data showed that sites must achieve a minimum quality score of 0.4 to be eligible for enhanced search features such as featured snippets and People Also Ask boxes.

According to the revealed data, Google calculates site quality scores based on several factors:

Frequency of brand-specific searches combined with other search terms
Selection rate of the website in search results, particularly when not in the top position
Prevalence of the website's name or brand in anchor text across the web

Content Consensus Evaluation

The data exposed Google's method for evaluating content reliability through what it calls a "consensus score." The system counts passages within content that agree with, contradict, or remain neutral to the general consensus on a topic. This scoring mechanism appears to influence ranking positions differently depending on query types.

For queries classified as "debunking queries," such as "is the earth flat," Google strongly favors content that aligns with scientific consensus. However, for political and more subjective topics, the system intentionally maintains a mix of consensus, neutral, and non-consensus viewpoints in the results.

Query Classification System

The discovered data revealed Google's internal classification of search queries into eight distinct categories, termed "Refined Query Semantic Classes." A significant portion of queries fell into categories labeled as "short fact" or "BOOLEAN," representing questions with straightforward yes/no or true/false answers.

Click Probability Prediction

The data confirmed the existence of a click probability prediction system for organic search results. This system builds expectations about how likely users are to click on specific results, with these predictions potentially influencing ranking positions. The predictions can be modified through changes to page titles and other elements.

YMYL Query Handling

The data validated Google's special treatment of Your Money or Your Life (YMYL) queries, which involve topics that could impact users' health or finances. According to the findings, Google applies different ranking criteria and stricter evaluation standards for these queries compared to general informational searches.

Industry Response

The search marketing community has responded with significant interest to these findings. Aleyda Solis, a prominent SEO consultant, highlighted the importance of understanding these newly revealed ranking mechanisms. Barry Schwartz, founder of Search Engine Roundtable, noted the significance of having concrete evidence for previously theoretical ranking factors.

Implications for Search Rankings

The discoveries suggest that traditional SEO metrics may be becoming less reliable indicators of ranking potential. According to Williams-Cook's analysis, sites with high domain authority but low brand recognition face increasing challenges in achieving prominent rankings, particularly following Google's Helpful Content Update.

These findings provide unprecedented insight into Google's ranking mechanisms and may influence how websites approach their search optimization strategies. However, the search engine's algorithms continue to evolve, and these insights represent a snapshot of current systems rather than permanent features.

The team behind the discovery is developing classifier models to help analyze queries based on the revealed data, with plans to release tools through their CoreUpdates newsletter. This development could provide SEO practitioners with new ways to understand and adapt to Google's ranking mechanisms.

This discovery represents one of the most significant insights into Google's ranking systems since the search engine's inception, offering concrete evidence of various ranking factors that were previously only theoretical. As the search industry digests these findings, they are likely to influence SEO strategies and content development approaches across the web.