Is Google reading your webpages as you surf?
This is a call for feedback.
Google Webmaster provides error message about your site(s) if it doesn't like what happens when it tries to read your webpages. Most of the time these are notifications about broken links. Google will tell you the page it can't find and list all the places it found links to those pages. You occassionally see it looking for pages inside secure sections of the site it can't access, usually reported as 403 (access forbidden) errors. I have noticed that in many cases the only links to these pages are themselves inside secure pages it shouldn't have been able to see in the first place.
This suggests Google is somehow able to bypass website security and get inside things like CMS systems. The only way I can see this happening is if Google is reading the contents of webpages inside your browser - Google is reading everything you read as you surf the web. Under this scenario Google would find links to internal pages while you were editing one in your CMS, and also have access to any other links your CMS put onscreen. Google's statements about this a contradictory. At https://support.google.com/accounts/answer/54068?hl=en Google state that they do not store copies of the pages you read:
"We store information related to search queries you entered into Google, the results that appeared, and the pages you visit, such as the URL, but don't store a snapshot of the page itself in your account."
But in the same document they then state:
"[You can] view and search across webpages you've visited in the past, including Google searches."
This clearly says you can view and search non-Google pages. Google can't run a search on a page without a copy of it. Notice they also state they don't store a copy in your account. This doesn't mean they don't store a copy elsewhere, nor does it say that they don't read and process the page while you're reading it.
I have no proof that Google is doing this, but I'd be interested to know if others have seen Google trying to access pages which can only be found inside secure pages. If others have seen this, it has implications for SEO, and even bigger ones for privacy.
So - have you seen Google trying to read pages it shouldn't even know exist?