views:

94

answers:

3

I have a number of chunks of data.

For arguments sake we'll say they are

File 1 - 150Kb
File 2 -  50Kb
File 3 -  70Kb
File 4 -  60Kb
File 5 -  70Kb
File 6 - 100Kb
File 7 -  90Kb

For transmission, I can package these up into a max payload of 300Kb.

If you just iterate through them in order you'd get

TRANS1: 150Kb + 50Kb + 70Kb = 270Kb - the next file puts you over the limit of 300Kb
TRANS2: 60Kb + 70Kb + 100Kb = 230Kb - the next file puts you over the limit of 300Kb
TRANS3: 90Kb

So three seperate transmissions.

But if you re-organise them, you can send

Files 1,2,6 together = 300Kb
Files 3,4,5,7 together = 290Kb

So you cut the number of separate transmissions needs. Since there's a monetary cost associated with each transmission (these transmissions are actually API calls to a 3rd party system, where we're billed per API Call) we'd like to keep the number of separate payload sends to a minimum.

Is there any sorting algorithm out there around this kind of optimization, that will take a list of weighted objects, and sort/group them so you end up with fewest number of groups

For reference, I'd be coding this up in .NET but if you can provide and example in another language or psuedo-code implementation/description, that'd be great also.

Thanks, Eoin C

+1  A: 

What you are looking for is the knapsack algorithm or a modification to it. The problem is not solveable efficiently, but you can get close to the perfect solution.

You may use this google search for implementations.

Thariama
To which to add, if the actual `n` being dealt with is only 7 or so, *asymptotic* efficiency isn't really a problem.
AakashM
@AakashM: right :)
Thariama
A: 

This reminds me of the Knapsack problem. I have not followed up on it, but I remember it is an NP-complete problem : read 'hard'. You basically need to find a compromise between "right" and "fast".

That does not mean impossible, but this clearly an area where perfection is the enemy of good.

I would start with the "Big Rocks First" approach, run a couple of simulations. Keep the best solution. Use a bounded time to break the search off and use the best one found so far.

Peter Tillemans
The original knapsack problem can be solved in time NumberOfItems * AvailableCapacity. But his problem is a little bit different. I think to find the optimal solution he has to check all possibilites in contrast to the knapsack problem.
Petar Minchev
Yes, but that's difficult to do in a bounded time as is always necessary in "real life". That's why you must come up with a reasonable solution fast, see if you can find better ones, and use the best you found when the time is up. In practical cases the number of items will be reasonable, but if my wife goes shopping, you'll need a hadoop cluster.
Peter Tillemans
Yeah, that's true. I just wanted to say that his problem(Bin packing) is harder than the original Knapsack one.
Petar Minchev
+3  A: 

Your problem is exactly the Bin packing problem which is unfortunately NP-Complete:( If the number of packets is pretty small you can bruteforce every possible combination.

Otherwise, the dynamic programming solution I propose will not give the optimal answer because it will assume you always group consecutive packets. But it will perform fast and give something close to the truth. I use recursion with memoization. It will be good to sort the packets in increasing order at the beginning.

const int INF = 1000000;
const int MAXSIZE = 300;
int DP[NumberOfPackets][MaxPayload];

int solve(int packetNum, int sizeUsed)
{
   if (packetNum == NumberOfPackets)
      return 0;

   if (DP[packetNum][sizeUsed] != -1)
      return DP[packetNum][sizeUsed];

   int res = INF;

   //Try to put the packet in the current group
   if (sizeUsed + size[packetNum] <= MAXSIZE) 
      res = min(res, solve(packetNum + 1, sizeUsed + size[packetNum]));

   //Try to start another group with the current packet
   res = min(res, solve(packetNum + 1, size[packetNum]) + 1);

   return DP[packetNum][sizeUsed] = res;
}

int answer = solve(1, size[0]);
Petar Minchev
actually this looks like a slightly better match for what I'm doing than the knapsack one. Thanks Petar.
Eoin Campbell
@Eoin: Not 'slightly better'. Bin-packing is an exact match. Knapsack and bin-packing are different! Bin-packing is strongly NP-Complete i.e. no pseudo polynomial time algorithms. That is not the case with Knapsack.
Moron