views:

71

answers:

2

Greetings!

I feel like this problem is related to the bin packing problem, as well as potentially to the set partitioning problem... I just want to bounce this off of someone before I head down the path too deeply.

I have input data (in a datafile) as follows:

entry_one 55
entry_two 56
entry_three 61
entry_four 62
entry_five 62
entry_six 68
entry_seven 72
entry_eight 73
entry_nine 78
entry_ten 79
entry_eleven 84
entry_twelve 85
entry_thirteen 91
entry_fourteen 92
entry_fifteen 99
entry_sixteen 100
entry_seventeen 121
entry_eighteen 125
entry_nineteen 127
entry_twenty 161

With this data I want to have an algorithm that: groups the entries into groups such that the entries's associated numerical values within a group are within X (in my case, X is 16.) So for example, one arrangement could be:

group one:
 entry_one
 entry_two
 entry_three
 entry_four
 entry_five
 entry_six

group two:
 entry_seven
 entry_eight
 entry_nine
 entry_ten
 entry_eleven
 entry_twelve

group three:
 entry_thirteen
 entry_fourteen
 entry_fifteen
 entry_sixteen

group four:
 entry_seventeen
 entry_eighteen
 entry_nineteen

group five:
 entry_twenty

This particular arrangement was achieved using a naive greedy algorithm in which I started with the lowest value (entry_one's 55), and allowed all values that were under 55+16 to be part of that group. I then started with the very next entry which was not in that group (entry_seven's 72) and allowed all values that were under 72+16 to be part of that group (group two), and so on in that order.

I believe that although a naive greedy algorithm works, it is unlikely to give me an optimal grouping/categorization, where I define "optimal grouping" such that the total number of groups is what is being minimized (in my case, this is for job scheduling, so I want to group like work as best as possible to minimize changeover.)

Any thoughts, modules, algorithms, sample code out there that people can suggest?

Thanks!

EDIT: I thought I should add how this is different from the bin packing problem. In the bin packing problem the optimization is: "given these bins of fixed size, with these objects of fixed size, how can I stuff the most of these fixed size objects into these bins without overflowing each bin." In my case, what I have is bins of infinite size but of filtered entry, so that if an object does "match" the signature for a bin, it can be inserted into said bin, but what we want is to minimize the total number of bins that we need to create.

+1  A: 
  1. Start at the point nearest the midpoint
  2. select all values within half the range of it
  3. Use your existing solution going out in both directions.
  4. Repeat from step 2 using every other point in the original selection.
  5. select the best result.


Worst case this is an O(n2) problem as you can replace step 4 with "repeat with all points".

BCS
I don't understand how starting the greedy algorithm in the middle (arbitrarilly) changes the results materially. It would be different if the answer was: "for every entry you are going to make a pass through your algorithm where that entry is the starting position, and where you are then randomly jumping out in both directions." But I don't think that's what you meant.
earino
Starting in the middle is just to give an edge free span to start with. The important thing is to run the greedy solution starting from each point in some span so that every possible result is found (I think this will work). You might get the same result by starting from the low end and stepping through the points until you find another point that give you the first result again. And worst comes to worst, run every start point.
BCS
A: 

If I understand the problem right, then the greedy algorithm is in fact optimal. Let G be the greedy solution and let S be any solution. Consider the first group where G differs from S. Since G is a greedy solution, G's group is larger than S's. For example,

G : [1, 2, 3][4, 5, 6, 7][8, 9, 10][11, 12]...
S : [1, 2, 3][4, 5, 6][7, 8, 9][10, 11, 12]...

We can derive a new solution S' from S that agrees with G on a longer prefix without introducing a new group. Just steal some items from the next group. The resulting groups are valid because one is already in G and the other is a subgroup of a valid group.

S': [1, 2, 3][4, 5, 6, 7][8, 9][10, 11, 12]...

By induction, it follows that G is in fact optimal.

Dave