Google Search Central yesterday released a video addressing a common concern among webmasters: why some pages appear as "Discovered - currently not indexed" in Google Search Console. Martin Splitt, a Search Relations Advocate at Google, explained the reasons behind this status and what website owners can do to address it. The video, part of the SEO Made Easy series, aims to demystify the indexing process and help webmasters understand how Google's crawling and indexing systems work.

According to Splitt, the "Discovered - currently not indexed" status is not necessarily an error or a problem that requires immediate attention. It simply indicates that Google has found the URL but has not yet crawled or indexed it. This status is part of the normal process of how pages move through Google's indexing system.

Splitt outlined the journey of a webpage through Google's systems. Initially, Googlebot, the search engine's web crawler, discovers a URL through various means such as sitemaps or links from other pages. At this point, the URL is added to a to-do list of pages to be crawled and potentially indexed later. However, much like a personal to-do list, not everything gets done immediately.

The first reason a page might remain in the "Discovered - currently not indexed" state is simply that Googlebot hasn't gotten around to crawling it yet. Splitt emphasized that patience is often the solution in these cases, as Googlebot will eventually process these URLs. Once crawled, the page will either move to the "Crawled - currently not indexed" status or become fully indexed.

However, Splitt also addressed situations where pages remain in the "Discovered" state for extended periods. He highlighted two main reasons for this: technical issues related to the server hosting the website, and quality concerns about the content itself.

On the technical side, Splitt explained that if Googlebot has previously encountered performance issues when crawling a site, it might adopt a more cautious approach to avoid overwhelming the server. For example, if a website with thousands of new product pages experiences slowdowns when Googlebot attempts to crawl more than a few pages simultaneously, the crawler might spread its efforts over a longer period. This can result in some pages remaining in the "Discovered" state for longer.

To diagnose potential server issues, Splitt recommended checking the Crawl Stats report in Google Search Console, particularly the Reply section. Webmasters should look for signs of slow responses or HTTP 500 errors when Googlebot attempts to crawl. If such issues are identified, Splitt suggested consulting with the hosting provider to address these performance problems.

The more common reason for pages remaining in the "Discovered" state, according to Splitt, is content quality. He explained that if Google Search detects a pattern of low-quality or thin content on a website, it might choose not to proceed with crawling and indexing certain URLs. In some cases, Google might even skip crawling these URLs altogether, leaving them in the "Discovered" state.

For webmasters concerned about pages stuck in this state, Splitt recommended focusing on improving content quality. He suggested reworking the content to provide more value and ensuring that internal linking effectively connects this content to other parts of the website. Splitt referenced a previous episode on internal linking for more detailed information on this topic.

Throughout the video, Splitt emphasized that it's normal and often acceptable for some pages on a website not to be indexed. Google rarely indexes all content from a site, and this doesn't necessarily indicate a problem. However, for pages that webmasters believe should be indexed, Splitt advised carefully examining the quality of the content and addressing any potential server performance issues.

The video also touched on the scale at which these issues typically become significant. While server performance problems can affect websites of any size, Splitt noted that they are more common for sites with very large numbers of pages, typically in the millions. However, he cautioned that smaller sites can also experience server issues that impact crawling and indexing.

Splitt's explanation provides valuable insights into Google's indexing process and offers webmasters a clearer understanding of how to interpret and address the "Discovered - currently not indexed" status in Google Search Console. By focusing on content quality and server performance, website owners can work towards improving their site's visibility in Google Search results.

Key points from the video

  • "Discovered - currently not indexed" status is not necessarily an error
  • Googlebot adds discovered URLs to a to-do list for future crawling
  • Pages may remain in this state due to Googlebot's crawling queue
  • Technical issues with server performance can delay crawling
  • Low-quality or thin content is a common reason for pages to remain undiscovered
  • Improving content quality and internal linking can help resolve indexing issues
  • Server performance should be monitored, especially for large websites
  • Not all pages on a website need to be indexed by Google