views:

113

answers:

4

I am creating an ecommerce site, and I am having trouble developing a good algorithm to sort a products that are pulled from the database into halfway appropriate groups. I have tried simply dividing the highest price into 4, and basing each group off that. I also tried standard deviations based around the mean. Both could result with price ranges that no product would fall into, which isn't a useful filtering option.

I also tried take quartiles of the products, but my problem is that the price ranges from $1 items to $4,000. The $4,000 almost never sell, and are far less important, but they keep skewing my results.

Any thoughts? I should have paid more attention in stats class ...

Update:

I ended up combining methods a bit. I used the quartile/bucket method, but hacked it a bit by hardcoding certain ranges within which a greater number of price groups would appear.

//Price range algorithm

sort($prices);

//Divide the number of prices into four groups
$quartilelength = count($prices)/4;

//Round to the nearest ...
$simplifier = 10;

//Get the total range of the prices
$range = max($prices)-min($prices);

//Assuming we actually are working with multiple prices
if ($range>0 )
{
    // If there is a decent spread in price, and there are a decent number of prices, give more price groups
    if ($range>20 && count($prices) > 10) 
    {
        $priceranges[0] = floor($prices[floor($quartilelength)]/$simplifier)*$simplifier;
    }

    // Always grab the median price
    $priceranges[1] = floor($prices[floor($quartilelength*2)]/$simplifier)*$simplifier;

    // If there is a decent spread in price, and there are a decent number of prices, give more price groups
    if ($range>20 && count($this->data->prices) > 10)
    {
        $priceranges[2] = floor($prices[floor($quartilelength*3)]/$simplifier)*$simplifier;
    }
}
A: 

What exactly are you looking for as your end result (could you give us an example grouping)? If your only goal is for all groups to have a significant number of important enough products, then, even if you come up with the perfect algorithm that works for your current data set that does not mean it will work with tomorrow's dataset. Depending on the number of sets of groups you need I would simply make arbitrary groups that fit your needs instead of using an algorithm. Ex. ($1 - $25, $25-100, $100+). From a consumer's perspective my mind naturally distributes products into 3 difference price categories (cheap, midrange and expensive).

Justin Lucas
A: 

I think you're thinking too much.

If you know your products, and you like fine grained results, I would simply hard code those price ranges. If you think $1 to $10 makes sense for what you are selling, put it in, you don't need an algorithm. Just do a check so that you only show ranges that have results.

If you don't know your products, I would just sort all the products by price, and divide it into 4 groups of equal number of products.

nute
+1  A: 

Here is an idea, following the line of thought of my comment:

I assume you have a set of products, each of them tagged by a price and a sales volume estimate (as a percent from the total sales). First, sort all products by their price. Next, start splitting: traverse the ordered list, and accumulate sales volume. Each time you reach about 25%, cut there. If you do so 3 times, it will result in 4 subsets having disjoint price ranges, and a similar sales volume.

Eyal Schneider
A: 

Here is an idea: basically you would sort the price into buckets of 10, each price as the key in the array, the value is a count of how many products are at the given price point:

public function priceBuckets($prices)
{    
    sort($prices);

    $buckets = array(array());
    $a = 0;

    $c = count($prices);
    for($i = 0; $i !== $c; ++$i) {
        if(count($buckets[$a]) === 10) {
            ++$a;
            $buckets[$a] = array();
        }

        if(isset($buckets[$a][$prices[$i]])) {
            ++$buckets[$a][$prices[$i]];
        } else if(isset($buckets[$a - 1][$prices[$i]])) {
            ++$buckets[$a - 1][$prices[$i]];
        } else {
            $buckets[$a][$prices[$i]] = 1;
        }
    }

    return $buckets;
}

//TEST CODE
$prices = array();

for($i = 0; $i !== 50; ++$i) {
    $prices[] = rand(1, 100);
}
var_dump(priceBuckets($prices));

From the result, you can use reset and end to get the min/max of each bucket

Kinda brute force, but might be useful...

tsgrasser
This is similar to my approach where I took the quartiles, except you chose 10 groups rather than 4. I think this is one of the most promising approaches, my only problem being that it results in odd price ranges, even if they are a good representation of the data. In other words, I might end up with prices ranges like $15.47 to $152.87. Each bucket might have an even distribution, but the price boundaries are arbitrary and confusing.
Dave W.