views:

35

answers:

1

I can't find the type of problem I have and I was wondering if someone knew the type of statistics it involves. I'm not sure it's even a type that can be optimized.

I'd like to optimize three variables, or more precisely the combination of 2. The first is a likert scale average the other is the frequency of that item being rated on that likert scale, and the third is the item ID. The likert is [1,2,3,4]

So:

3.25, 200, item1. Would mean that item1 was rated 200 times and got an average of 3.25 in rating.

I have a bunch of items and I'd like to find the high value items. For instance, an item that is 4,1 would suck because while it is rated highest, it is rated only once. And a 1,1000 would also suck for the inverse reason.

Is there a way to optimize with a simple heuristic? Someone told me to look into confidence bands but I am not sure how that would work. Thanks!

+1  A: 

Basically you want to ignore scores with fewer than x ratings, where x is a threshold that can be estimated based on the variance in your data.

I would recommend estimating the variance (standard deviation) of your data and putting a threshold on your standard error, then translating that error into the minimum number of samples required to produce that bound with 95% confidence. See: http://en.wikipedia.org/wiki/Standard_error_(statistics)

For example, if your data has standard deviation 0.5 and you want to be 95% sure your score is within 0.1 of the current estimate, then you need (0.5/0.1)^2 = 25 ratings.

Doug