views:

208

answers:

3

Hey folks, I'm trying to figure out an algorithm that will help me group an assortment of files of varying sizes into say, 'n' groups of approximately equal size.

Any ideas on how to achieve this?

+2  A: 

K means might help you. It's a good starting point to research about more advanced clustering algorithms, but given that your problem is 1-dimensional, k-means should be more than enough.

fortran
How would you solve this problem using K-means? The OP wants "groups of approximately equal size", not clusters containing items of similar size
bubaker
Hmmm... are you sure of that? If the op is using size in two different contexts he ought to be more explicit :-/ Anyway I said it was a good start to find more suitable clustering methods.
fortran
+5  A: 
Find the target group size. This is the sum of all sizes divided by n.
Create a list of sizes.
Sort the files decreasing in size. 
for each group
    while the remaining space in your group is bigger than the first element of the list
        take the first element of the list and move it to the group
    for each element
        find the elemnet for which the difference between group size and target group size is minimal
    move this elemnt to the group

This doesn't produce optimal results, but is easy to implement and gets you good results. For the optimal solution you need an exhaustive search which is NP complete.

drhirsch
+1  A: 

Your implicit optimization goal is most likely to minimize n, the number of groups. Then you have exactly the bin packing problem, sometimes called the cutting stock problem.

Netlib has this fortran code to solve the more general multiple knapsack problem (items have profit as well cost/weight values).

bubaker