views:

124

answers:

5

I need to write an application that will take a list of files (some large, some small) and fit them onto DVDs (or CDs, or whatever) as efficiently as possible. The whole point of this application is to use up as much of the 1st disc before moving onto the 2nd disc, filling the 2nd disc up as much as possible before moving onto the 3rd disc, etc.

(Note: The application doesn't have to do the actual burning to the DVD, it just has to figure out the best possible fit).

I initially thought I had a good game-plan by generating a permutation of the files and then checking each combination to see what fits the best. (My request for help on this can be found HERE)

But the more files there are, the longer it takes... exponentially. So I wanted some of your opinions on how to best achieve this.

I feel helpless with this project.

Any ideas? And, as always, C# code is always appreciated.

Thanks!

+11  A: 

What you're facing is related to the knapsack problem. The linked wikipedia page has lots more information, including suggested ways of solving it.

Jon Skeet
Thanks for the reply. Looks like I'll have to do a lot more studying, as most of that is over my head at the moment.
Jason Thuli
Actually I think this problem is a simplified version of the knapsack problem: you only have the weight to consider, not the value, so it should be much easier to solve
Thomas Levesque
+1  A: 
for each file
 is there enough room this dvd?
   yes, store it here
   no, is there room on another already allocated dvd?
     yes, store it there
     no, allocate another dvd and store it there
Gabriel
Thanks, but it's not quite as simple as that. Just because the first file CAN fit, doesn't mean it's part of the overall best possible combination of files. I need to check all of the files in groups to find the best fit.
Jason Thuli
That won't really solve the problem, that will only fill up a bunch of DVDs but it won't do the Tetris Math on it.
Rangoric
interesting... i figured i missed something... and i assume you don't get the answer by sorting them from large to small before each iteration, right? that was just my next thought... i love learning how much i don't know :)
Gabriel
Rangoric, Tetris Math is a perfect description. Gabriel, I'm with you! I consider myself a decent developer, but this one is stumping me! Sorting won't work. Consider: 2 large files may fill up everything except 1 byte of the DVD (which is a great fit), but if there are 3 medium files that can fill it up perfectly, that is technically a better fit... and it works in the opposite direction too.
Jason Thuli
@Rangoric: interesting. what if after i'm done i start asking 'which n files can be replaced by one file?' and swap them, with increasing n? would that converge? i'll go read about it!
Gabriel
Here's a thought: Maybe the fact that this problem isn't too easy is reflected by the fact that disks require degragging? I'm thinking of DVDs as drive cylinders...
Gabriel
@Gabriel It would take quite a few rounds, as there is the possibility that you can't fit them into the theoretical smallest number of disks because the file sizes won't allow it. Even if you do it 1 disk at a time you can get good result, but it won't be a best fit for all disks.
Rangoric
+8  A: 

Simple algorithm:

  1. Sort the file list by file size
  2. Find the largest file smaller than the remaining free space on the DVD, and add it to the DVD.
  3. If the remaining DVD free space is smaller than any remaining files, start a new dvd.
  4. Repeat from 2.
dthorpe
Sorting won't work. Consider: 2 large files may fill up everything except 1 byte of the DVD (which is a great fit), but if there are 3 medium files that can fill it up perfectly, that is technically a better fit... and it works in the opposite direction too. I need to fill it up as completely as possible before moving on to the next disc.
Jason Thuli
@Jason Thuli But you need to ask yourself what is your goal; the maximum number of bytes on a DVD or the minimum number of DVD's? this formula satisfies the second condition but not the first (but do you really need the first?)
Scott Chamberlain
i was just going to add that very same thing as I tried to work it out on my own for fun... my first idea, which isn't likely the best way, is to ask yourself "is there 1 file that takes up all the space?" and then "how about 2?" and just keep going
Gabriel
My concern on the 2nd bullet point. Finding the "largest file smaller than the remaining free space" might not fit as well as multiple smaller files. So, if I were to use the multiple smaller files, I would have less overall bytes remaining to fit onto disc #2. Now, in this example, if I were to use the largest fittable file, I would have more bytes remaining for disc #2, which means those extra bytes could possibly require a disc #3. Granted, there is a very, VERY slim chance of this, but I want to be as efficient as possible.
Jason Thuli
@Jason As you said in another answer comment, the math for the perfect solution is pretty heady. Mathematically perfect is great, but when simple gets you 95% of the way to perfect, how much value is in that last, most difficult 5%? Mathematically perfect has value when you're talking about mfg 10 million DVDs, but not when you're worried about eliminating one 50 cent DVD-R from a backup file set of 20 DVDs. How much effort is worth saving that 50 cents?
dthorpe
You also need to take into account the DVD file system overhead per file, block sizes, etc. If you have 30MB left, you probably can't put 3 10MB files in there because there will be some bytes lost to the file directory entries and block size remainders.
dthorpe
dthorpe. I completely agree with you. And I'm sure I'll be using your approach until I can figure out that last 5%. I just wanted to see if anybody else out there can point me in the right direction. And good point about the DVD overhead, I never thought of that.
Jason Thuli
I'm going to retract my concern with this solution. After sitting down and drawing things out on paper, my concern doesn't make sense. This solution should work perfectly. Thanks!
Jason Thuli
Brain = fried. ;-)
Jason Thuli
Woops. typo in step 3. if the free space is SMALLER than any remaining file (not greater). Edited.
dthorpe
@Jason Thuli if you have a chance of identical files, IIRC DVDs optimize them, so that and the file system overhead are both things to keep in mind.
Rangoric
+1  A: 

While thats a cool problem to solve in a program for certain applications... however in your application, why not just use WinRAR or some other archiving program that has the capability to split up the archive into specific sized file chunks. You could make each chunk the size of a DVD and then just burn away.

EDIT: one issue you would run into is that if one of your files is greater than the size of your media, you are not going to be able to burn that file.

Jonathan S.
Thanks for the reply, but WinRAR kind of acts like HJSplit. It essentially makes 1 giant file, and splits that file into smaller segments. That means I wouldn't be able to view any of the contents without having access to the remainder of the giant file. In my problem, the files need to be free and accessable.
Jason Thuli
this is true... but what about the scenario where your file is larger than the size of the media?
Jonathan S.
A: 

How about if you started by putting as many of the largest files you can onto one DVD and then filling it up with as many of the smallest files that you can (starting with the smallest).

Repeat this process with the remaining files for each disk.

I'm not sure that's going to give you perfect coverage/distribution but I think it might go some way to solving your needs.

Chris Schumann