views:

172

answers:

3

I have a site where users are entering data of some products they buy.

How do I ensure correctness of data entered via crowdsourcing (enabling users to vote/edit products) minimizing amount of work that needs to be done by administrator? I'm looking for some how-tos, best practices, etc.

+1  A: 

Make sure you keep a log of IP addresses with every action made, malicious users or bots would trample on session data or cookies. Doing this ensures that a single entity cannot skew any results or do anything drastic by appearing to be multiple users.

Sam152
+2  A: 

What sort of data are you collecting ?

You're talking about crowd-sourcing, and thus (I assume) aggregating of data across this crowd. As they're talking about products they buy, I suspect you're going to be athering product attributes and prices.

Some possible approaches. If you users are entering non-numerical data (e.g. colours), just record the most common entries, or the mode (the most commonly entered).

If they're entering numeric data, discard outliers. i.e. bin the lowest and highest results, and average the rest (you could do this for prices, say. This is the approach that electronic exchanges use for resolving closing prices out of many trades).

Depending on your application, you may want to have a historical bias towards the most recent entries.

But this all depends on your application, and how much storage and crunching of data you're prepared to do.

Brian Agnew
actually we are collecting lists of ingredients of edible products, we identify each ingredient enteredwe would either have the situation where some ingredients appear/disappear in the product or total garbage inputwhat we really need is the full correct list, not just the most common entities...
miceuz
A: 

As a high level data can be gathered from the 'crowd' with an associated correctness value. Looking at SO, an answer or response from someone with 1000+ rep, has more wieght that a casual user. Look for validations and triangulation, if it's a single voice in the crowd that you're listening too, then it's probably not worth that much. If other voices join then you know you're onto something, again in SO terms we all get a chance to upvote questions.

I've recently seen some really good iPhone apps which rely in crowd sourcing for their data, and then validate it by asking other users if it's correct.

MrTelly