views:

419

answers:

8

First of all, I searched as best I could and read all SO questions that seem relevant, but nothing specifically answered this. This is not a duplicate, afaik.

Obviously if anonymous voting on a website is allowed, there is no fool proof way to prevent someone voting more than once.

However, I am wondering if someone with experience can aide me in coming up with a reasonably reliable way of tracking absolutely unique visitors and recording votes against those credentials.

Currently I am ensuring that only one vote per item/session combo is allowed, however this is easily circumvented by restarting browser, changing browsers/computers, or clearing your session data.

Recording against IP seems the next reasonable solution but I wonder if this will get false positives too often (multiple people on same LAN behind a NAT will have same external IP, etc).

Is there a middle ground to be had here or some other method/combination I am overlooking?

+3  A: 

If you're not looking at authenticating voters, then you're going to be getting some duplicate votes no matter what you use. I'd use a cookie, and have done with it for the anonymous users.

UserVoice allows both anonymous voting and voting when logged in, but then allows the admin to filter out anonymous votes - a nice solution to this problem.

Mr. Matt
I am well aware that without authentication it's impossible to guarantee no duplicates, I am simply looking at a way to reduce them.
bjeanes
+8  A: 

The simplest answer is to use a cookie. Obviously it's vulnerable to people clearing their cookies, but anonymous voting is inherently approximate anyway.

In practice, unless the topic being voted on is in some way controversial or inflammatory, people aren't going to have a motive behind rigging the vote anyway.

IP is more 'reliable' but will produce an unacceptably high level of collisions due to NATs.

How about a more unique identifier composed of IP + user-agent (maybe a hash)? That effectively means for each IP, each exact OS/browser version pair gets 1 vote, which is a lot closer to 1 vote per person. Most browsers provide detailed version information in the user-agent -- I'm not sure, but my gut feel is that this would prevent the majority of collisions caused by NATs.

The only place that would still produce lots of collisions is a corporate environment with a standardised network, where everyone is using an identical machine.

ben_h
Not a bad idea. Still will have some collisions due to same browsers (for instance every one on my network is using same version of Leopard and Safari) but would be much more reliable than IP alone.
bjeanes
+10  A: 

The Chinese have to share one IPv4 address with hundreds of others; Hp/Compaq/DEC has almost 50 million addresses. IPv6 doesn't help as everyone get addresses by the billion. A person just is not the same as an IP address, and that notion is becoming ever more false.

There are just no proper ways to do this on the Internet. Persons are simply a concept unknown on the Internet, and any idea to introduce the concept is unlikely to succeed. (Too many governments would not want this to happen, for instance.)

Of course, you can relate the amount of votes per IP to the amounf of repeat page visits from that IP, especially in combination with cookie tracking. This works best if you estimate that number before you start the voting period. If the top 5% popular articles are typically read 10 times from a single IP, it's likely 10 people share that IP and they should get 10 votes. Cookies can be used to prevent them from stealing each others vote, but on the whole they can't skew your poll. (Note: this fails in small communities where a large group of voters come from a small number of IPs, in particular this happens around universities).

MSalters
+2  A: 

Anything based on IP addresses isn't an option - the case of NAT has been mentioned, but this seems to only be in the case of home users. There are many larger installations that use NAT - some corporations can have thousands of users pooled behind a single IP address. There are also ISP's that use proxy servers for their users - another case where you can have many thousands of users appear to your application as a single address. Adding unique UA combinations to this won't help, as there isn't enough variation.

A persistent cookie is going to be your best bet - and you'll have to live with the fact that it is easy to game. At least when the cookie is persistent (as opposed to session based) you'll catch the majority of users who run a single browser.

If you really want to rely on the results, you are going to have to add some form of identification in the process (like e-mail validation, which is still gameable).

At the end of the day any internet survey is going to have flaws (like: http://www.time.com/time/arts/article/0,8599,1894028,00.html), and you'll have to live with this.

lstoll
Yup, I can live with duplicates, but reducing it is better. If people really want to game the system, they can and will. This is just a reduction measure
bjeanes
+11  A: 

I'd collect as much data about the session as possible without asking any questions directly (browser, OS, installed plugins, all with versions numbers, IP address etc) and hash it.

Record the hash and increment a counter if you want multiple votes to be allowed. Include a timestamp (daily, hourly etc) in the salt to make votes time sensitive, say 5 votes per day.

mlambie
Hashing lots of user info is a brilliant idea! I don't want a time based salt though as currently the voting model is one up/down vote ever.
bjeanes
A: 

Two ideas not mentioned yet are:

  • Asking for the user's email address and emailing them a verification link
  • Using a captcha

Obviously the former can be circumvented with disposable email addresses and so on, but gives you an audit trail, and provides a significant hurdle to casual/bot vote-stuffing. A good captcha likewise severely limits vote-stuffing, but with all the usual caveats surrounding their use.

grahamparks
well users are anonymous not for anonymity's sake but for ease's sake. I don't want to have them fill in any forms for each vote or at all really
bjeanes
+1  A: 

Use a persistent cookie to allow only one vote per item

and record the IP, if there are more than 100 (1,000? 10,000?) requests in less than X mins then "soft block" the IP

The "soft block": dont show a page saying "your IP has been blocked" but show your "thank you for your vote" page and DONT record the vote in your DB. You even can increase the counter for that IP only. You want to prevent them to know that you are blocking their IP.

Victor P
I was previously doing that, but something like curl doesn't use cookies so they could game the votes by just running curl in a while loop. I settled with user agent + IP combo. Nothing will ever be 100% without tying votes to some sort of user record, but this seems to have been working well enough.
bjeanes