views:

282

answers:

5

This is literally about comparing cakes. My friend is having a cupcake party with the goal of determining the best cupcakery in Manhattan. Actually, it's much more ambitious than that. Read on.

There are 27 bakeries, and 19 people attending (with maybe one or two no-shows). There will be 4 cupcakes from each bakery, if possible including the staples -- vanilla, chocolate, and red velvet -- and rounding out the 4 with wildcard flavors. There are 4 attributes on which to rate the cupcakes: flavor, moistness, presentation (prettiness), and general goodness. People will provide ratings on a 5-point scale for each attribute for each cupcake they sample. Finally, each cupcake can be cut into 4 or 5 pieces.

The question is: what is a procedure for coming up with a statistically meaningful ranking of the bakeries for each attribute, and for each flavor (treating "wildcard" as a flavor)? Specifically, we want to rank the bakeries 8 times: for each flavor we want to rank the bakeries by goodness (goodness being one of the attributes), and for each attribute we want to rank the bakeries across all flavors (ie, independent of flavor, ie, aggregating over all flavors). The grand prize goes to the top-ranked bakery for the goodness attribute.

Bonus points for generalizing this, of course.

This is happening in about 12 hours so I'll post as an answer what we ended up doing if no one answers in the meantime.

PS: Here's the post-party blog post about it: http://gracenotesnyc.com/2009/08/05/gracenotes-nycs-cupcake-cagematch-the-sweetest-battle-ever/

A: 

If you can write SQL, you could make a little database and write some queries. It should not be that difficult.

e.g. select sum(score) / count(score) as finalscore, bakery, flavour from tables where group by bakery, flavour

Emiswelt
+1  A: 

Break the problem up into sub-problems.

What's the value of a cupcake? A basic approach is "the average of the scores." A slightly more robust approach may be "the weighted average of the scores." But there may be complications beyond that... a cupcake with 3 goodness and 3 flavor may be 'better' than one with 5 flavor and 1 goodness, even if flavor and goodness have equal weight (IOW, a low score may have a disproportionate effect).

Make up some sample cupcake scores (specifics! Cover the normal scenarios and a couple weird ones), and estimate what you think a reasonable "overall" score would be if you had an ideal algorithm. Then, use that data to reverse engineer the algorithm.

For example, a cupcake with goodness 4, flavor 3, presentation 1 and moistness 4 might deserve a 4 overall, while one with goodness 4, flavor 2, presentation 5, and moistness 4 might only rate a 3.

Next, do the same thing for the bakery. Given a set of cupcakes with a range of scores, what would an appropriate rating be? Then, figure out the function that will give you that data.

The "goodness" ranking seems a bit odd, as it seems like it's a general rating, and so having it in there is already the overall score, so why calculate an overall score?

If you had time to work with this, I'd always suggest capturing the raw data, and using that as a basis to do more detailed analysis, but I don't think that's really relevant here.

kyoryu
Thanks; good comments! You're right about the goodness rating; it's already meant to be an overall rating for the cupcakes. In other words, rather than defining a function from the individual ratings to an overall rating, we're letting the people rating the cupcakes do that with their "goodness" ratings.Good point about capturing the raw data; we'll certainly do that. I'll add a link to my friend's blog post about this when it's all said and done!
dreeves
+2  A: 

Perhaps reading about voting systems will be helpful. PS: don't take whatever is written on Wikipedia as "good fish". I have found factual errors in advanced topics there.

zvrba
Yeah, I can't decide whether to worry about the incentives at all. It's maybe not such an issue as we'll be rating them "blind" (not knowing which bakery which cupcake is from). (Btw, I hadn't known the idiom "good fish". I would've said "don't take it as gospel".)
dreeves
It's a Norwegian idiom... I googled a bit and it doesn't appear to be an English idiom :/ You got the point anyway :)
zvrba
+1  A: 

Perhaps this is too general for you, but this type of problem can be approached using Conjoint Analysis (link text). A R package for implementing this is bayesm(link text).

MRE
+3  A: 

Here's what we ended up doing. I made a huge table to collect everyone's ratings at http://etherpad.com/sugarorgy (Revision 25, just in case it gets vandalized with me adding this public link to it) and then used the following Perl script to parse the data into a CSV file:

#!/usr/bin/env perl
# Grabs the cupcake data from etherpad and parses it into a CSV file.

use LWP::Simple qw(get);

$content = get("http://etherpad.com/ep/pad/export/sugarorgy/latest?format=txt");
$content =~ s/^.*BEGIN_MAGIC\s*//s;
$content =~ s/END_MAGIC.*$//s;
$bakery = "none";
for $line (split('\n', $content)) {
  next if $line =~ /sar kri and deb/;
  if ($line =~ s/bakery\s+(\w+)//) { $bakery = $1; }
  $line =~ s/\([^\)]*\)//g; # strip out stuff in parens.
  $line =~ s/^\s+(\w)(\w)/$1 $2/;
  $line =~ s/\-/\-1/g;
  $line =~ s/^\s+//;
  $line =~ s/\s+$//;
  $line =~ s/\s+/\,/g;
  print "$bakery,$line\n"; 
}

Then I did the averaging and whatnot in Mathematica:

data = Import["!~/svn/sugar.pl", "CSV"];

(* return a bakery's list of ratings for the given type of cupcake *)
tratings[bak_, t_] := Select[Drop[First@Select[data, 
                        #[[1]]==bak && #[[2]]==t && #[[3]]=="g" &], 3], #!=-1&]

(* return a bakery's list of ratings for the given cupcake attribute *)
aratings[bak_, a_] := Select[Flatten[Drop[#,3]& /@ 
                        Select[data, #[[1]]==bak && #[[3]]==a&]], #!=-1&]

(* overall rating for a bakery *)
oratings[bak_] := Join @@ (tratings[bak, #] & /@ {"V", "C", "R", "W"})

bakeries = Union@data[[All, 1]]

SortBy[{#, oratings@#, Round[Mean@oratings[#], .01]}& /@ bakeries, -#[[3]]&]

The results are at the bottom of http://etherpad.com/sugarorgy.

dreeves