Googlebot has been occasionally indexing one of our sites with a bad query string parameter. I am not sure how it is getting this query string parameter (there don't appear to be any sites linking to us with bad links, and nothing in our site is inserting the bad value). The bad parameter causes the site to throw a 500 error, as we expect.
I was under the impression that Google would not index pages that return a 500 error, but it turns out that it is. So now I have two questions:
1) Why would Googlebot be inserting random bad query string values? (I don't really care about the answer to this question, but if we could do something to avoid that, it would solve our problem.)
2) Why would Google index a page that returns a 500 error?
Here is one of the erroneous links that the Googlebot created and that Google has indexed:
The bad parameter is gb=baqhuxts. The parameter 'gb' is expected to be an integer. If you remove that parameter from the query string you should get a nice catalog page showing.
Regarding nofollow and robots.txt solutions: [ REDACTED ]
I realize now that I am a moron and put a meta tag telling search robots to index the page. That was a dumb thing to do. I'm removing those. W-(
If you search on Google for 'baqhuxts' you will find that it has indexed 10 pages with this bad parameter. But each of these pages returns a 500 error. Does anyone have insight about why Google believes these are valid pages to index?