ansaurus

Question

Google sees something that it shouldn't see. Why?

Answer 1

+3 A:

If you use sitemap generators to submit to search engines, you'll want to disallow in them as well. They are likely where Google got your links, from crawling your folder and from checking your logs.

Nerdling 2009-03-13 20:15:34

Answer 2

+3 A:

Better check what URI has been requested ($_SERVER['REQUEST_URI']) and redirect if it was /index.php.

Gumbo 2009-03-13 20:15:46

Answer 3

+10 A:

You'd be surprised as how pervasive and quick the google bots are at indexing site content. That, combined with lots of CMS systems creating unintended pages/links making it likely that at some point those links were exposed is the most likely culprit. It's also possible your administration area isn't as secure as you think, the google bot got through that way.

The well-behaved, and google recommended, things to do here are

If possible, create 301 redirects from you query string style URLs to your canonical style URLs. That's you saying "hey there, web bot/browser, the content that used to be at this URL is now at this other URL"
Block the query string content in your robots.txt. That's like asking the spiders or other automated programs "Hey, please don't look at this stuff. These aren't the URLs you're looking for"
Google apparently allows you to specify a canonical URL now via a <link /> tag in the top of your page. Consider adding these in.

As to whether doing the well behaved things is the the "right" thing to do re: Google rankings ... who knows. Only "Google" knows how their algorithms work now, and will work in the future, and by Google, I mean a bunch of engineers and executives with conflicting goals on how search should work.

Alan Storm 2009-03-13 20:55:39

Canonical URL via <link /> is the way to go. Or a sitemap.

spoon16 2009-03-14 08:26:16

Answer 4

+1 A:

Changing robots.txt will not help, since the page is already indexed.

The best is to use a permanent redirect (301).

If you want to remove a page once indexed by Google the only way, more or less, is to make it return a 404 not found message.

stefpet 2009-03-13 21:27:31

Answer 5

+7 A:

Google now offers a way to specify a page's canonical URL. You can use the following code in your HTML to tell Google your canonical URL:

<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish" />

You can read more about canonical URLs on Google on their blog post on the subject, here: http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html According to the blog post, Ask.com, Microsoft Live Search and Yahoo! all support the canonical tag.

sjstrutt 2009-03-13 22:56:10

I did not know that! Very cool.

David Grayson 2009-03-14 09:04:55

Answer 6

+1 A:

Is it possible you're posting a form to a similar url and google is simply picking it up from the source?

MK_Dev 2009-03-13 23:06:18

ansaurus

tags:

views:

answers:

Google sees something that it shouldn't see. Why?

related questions