tags:

views:

32

answers:

3

We implemented a rating system on a site a while back that involves a link to a script. However, with the vast majority of ratings on the site at 3/5 and the ratings very even across 1-5 we're beginning to suspect that search engine crawlers etc. are getting through. The urls used look like this:

http://www.thesite.com/path/to/the/page/rate?uid=abcdefghijk&value=3

When we started we add the following to our robots.txt:

User-agent: *
Disallow: /rate

Is this incorrect or are googlebot and others simply ignoring our robots.txt?

+2  A: 

You should use POST for actions which change things as search engine usually do not submit forms. Additionally, this will prevent users who download your website recursively (e.g. with wget) from submitting tons of votes.

Depending on your site, handling voting though javascript might be a solution, too.

Regarding your robots.txt: It has to be in the root path - i.e. http://www.thesite.com/robots.txt - and if your rating system is at /blah/rate you need to use Disallow: /blah/rate instead of Disallow: /rate

ThiefMaster
Confirms my suspicions - I will look into modifying the rating so it either uses post or only accesses the /rate script at the root so /rate is actually correct. Thanks all
chrism
Please use POST. Using GET for any operations which perform actions which are not read-only is not a good idea unless it requires you to be logged in (e.g. a 'delete' link in an adminstration area would be ok)
ThiefMaster
I will, but as a quick fix for now (well closing the stable gate after the horse has bolted anyway) I will get it to execute the script at the root.
chrism
A: 

User-agent: *
Disallow: /path/to/the/page/rate

You have to use the full path.

Might want to read up here a bit: http://www.javascriptkit.com/howto/robots.shtml

RandyMorris
A: 

Looks incorrect to me. You're only disallowing access to http://www.thesite.com/rate (and pages below it IIRC). Plus some crawlers ignore robots.txt!

Better to make it so that ratings are only ever altered in response to a POST, rather than a GET. Search engines never use POST.

Donal Fellows