A widely asked question from webmasters for several years has often revolved around the notorious Google index and their sites placing within it. Is my site included? Has it been removed? Has that new page been indexed yet? What about that other one?
Fortunately for everyone, last month Google announced its attempts to answer some of these questions by publishing a new feature to its webmaster tools.
Found under the Health section of your webmaster tools account, the new Index Status report is able to tell you exactly how many pages it has included in its index.
Initially you’ll be given a graph showing the total number of URLs from your site that has been added to Google’s index during the last year. Most sites will see a steady increase in the number indexed over time.
Under the advanced tab you are given access to far more useful information. Not only are you given the total number of pages indexed but also the total pages crawled, the pages crawled but not indexed and the attempted page crawls which were blocked.
It is broken down as so:
Total Indexed – the total number of URLs from the site added to the Google Index.
Ever Crawled – the cumulative total number of URLs on your site which Google has ever accessed.
Not Selected – URLs which Google have chosen not to include in their index. This is often due to the URLs redirecting to other pages or containing content which is significantly similar to other pages.
Blocked by Robots - URLs which Google have attempted to crawl but were denied access due to being blocked within the site’s robots.txt file.
It is important that you note that the figures provided are all totals. In that the figure for that particularly day meant that at that point in time, those number of pages are indexed or have been crawled. The figure doesn’t suggest that number of pages were indexed that day. This is important for older sites with a large number of pages. Those sites may experience significantly large differences between the number of pages crawled and the number of pages indexed.
But what if your graph doesn’t look like those above. What if your graph is showing spikes and valleys? Whilst a spiking and dropping graph would be the first indicator of possible indexation problems, the important thing to do is assess how and when the graph spikes.
Any variations in the charts could well be easily explained based on changes you have made to your site.
Changing your URL structure, setting up a high number of redirects or canonical URLs could well see a rise in the "Not Selected" count as well as a spike and drop with your total indexed count. Adding lots of new content to your site which is getting initially indexed will also cause variation in the charts.
It is important to assess any variations and see if there are legitimate causes behind these changes. If you have no clear idea as to why these counts may change then that is a fairly clear indication that there are technical issues with your site which need addressed.
The most useful function of the new feature is to allow webmasters to identify trends and discover whether Google is indexing their content. If Google is shown to be having difficulty indexing the site correctly this can be the first indicator that the site is having technical issues with canonicalization, duplicate content or other elements of your sites structure.
Although only once Google reveals exactly which pages are indexed or not will this tool be able to fully solve any indexation problems.