What's your process for removing pages from your website?
These days site reliability has much more impact on search listings in Google than it used to. My own observations suggest the Penduin update added much more emphasis on assessing site reliability than we realised at the time. Sites with less than perfect reliability are seeing listings which have remained solid for years suddenly dropping.
If you look at the Google Webmaster Tools, you can see the list of errors it complains about. These include the HTTP status code 404 (file not found).
So the issue is - when a page is removed from your website, what happens?
Most CMS systems will update site links so the page is no longer linked to. Google (and Bing) XML sitemaps will probably be updated. The website sitemap page will probably be updated. Your site does all this automatically - right?
But this still leaves Google with a listing for that page. Sooner or later, it will ask for it again. What happens then?
On many sites, Google will get a 404 error. This is BAD. Boo to 404!
A 404 status code merely says "I can't find it." It could indicate a minor malfunction, or that the page is just damaged, soon to be restored, or that the page is gone for good. Google has no way of knowing. So it has to come back later and try again. If it keeps getting 404, does this indicate the page is gone for good (and why are you wasting Google's time) or does it indicate you've got a second-rate website which doesn't work properly? Obviously you don't want Google removing a listing the instant it gets a 404, that's terrible punishment for what may be a temporary glitch. So poor old Google has to keep hammering the site, asking for the same page, until it finally gives up.
THERE IS A BETTER WAY
This is sloppy server configuration. We have had a better way of doing things for 20 years. You need to use the HTTP status code system properly:
- If the page has moved to a new location, put a 301 (permanent redirect) in place. This tells Google to replace the old URL with the new one. You can also do this if the page is gone, but equivalent copy exists somewhere else in the site.
- If the page is gone and there is no replacement, put a 410 (permanently removed) in place. This tells Google to remove the URL and never ask for it again.
Google likes both these methods, and will reward your site for it. Besides, it's just doing things properly, as the web was intended to be done.
Don't move or remove a page from your site without setting the correct status code. Sites that fail in this respect will see their search listings suffer. Don't let that be you. Know your HTTP status code policies, and ensure you use status codes properly.
The full list of HTTP status codes is at http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
Here's a basic video from Google on errors it may report in Webmaster Tools. http://www.youtube.com/watch?v=tH1gQoCd05g&feature=plcp