tags:

views:

68

answers:

4

I have some data that looks something like this...

+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+----------+----------+----------+
|   Red    |   Blue   |   Green  |
|  Yellow  |   Blue   |   Pink   |
|  Black   |   Grey   |   Blue   |
+--------------------------------+

I need to go through this data and find the 3 most common colours.

The raw data is in CSV and there's likely to be thousands more rows. (link)

What's the best way of doing this?

+2  A: 

Loop through all the values while keeping a count of each one of them in an array (word => count). After you've done that, find the keys with the highest values.

Matti Virkkunen
+7  A: 

There's no magic... one row at time, one column at time.

And count each color.

Paulo Santos
+2  A: 

If the number of possible colors is manageable, just to use an associative array:

$histo = array();

//foreach cell
  $color = ??; //however you're getting a cell's value
  if(!isset($histo[$color]))
    $histo[$color] = 1;
  else
    $histo[$color]++;
//end loop

//reverse sort by value
$histo = arsort($histo);

//now the first three colors in $histo are the most common ones.
Kip
+1  A: 

If you're doing the processing in PHP and not a database, and the file contains purely color names, I'd go with something like:

$colors = array();

$fh = fopen('data.txt');
while($row = fgetcsv($fh)) { // omitting length/delimiter arguments
    foreach($row as $field) {
        $colors[$field]++;
    }
}
fclose($fh);

$colors = arsort($colors); // sort in decescending order

After that the top 3 colors will be the first elements in $colors.

Marc B