views:

263

answers:

4

The question is: In a k-way merge, how many merge operation will we perform. For example: 2-way merge:2 nodes 1 merge; 3 nodes 2 merge; 4 nodes 3 merge. So we get M(n)=n-1.

What the the M(n) when k is arbitrary?

+1  A: 

2-way merges are most efficient when merging equal-sized blocks, so the most efficient k-way merge based on 2-way merges is to first merge block 1 with block 2, block 3 with block 4, and so on, then merge the first two resulting blocks, and so on. This is basically how mergesort works, and results in O(kn log k) time, assuming each of the k blocks contains n items. But it's only perfectly efficient if all blocks have exactly n items, and k is a power of 2, so...

Instead of performing k separate merge passes, you can use a single pass that uses a heap containing the first item of each block (i.e. k items in total):

  1. Read the lowest item from the heap (O(log k) time)
  2. Write it out
  3. Remove it from the heap
  4. If the block that that item came from is not yet exhausted, place the next item from it into the heap (O(log k) time again).
  5. Repeat until the heap is empty.

If there are a total of kn items, this always takes O(kn log k) time regardless of how they are distributed amongst blocks, and regardless of whether k is a power of 2. Your heap needs to contain (item, block_index) pairs so that you can identify which block each item comes from.

j_random_hacker
A: 

yes, The heap way may be more effective. But what's the answer to the orginal question? I found there may be no answer about that since it is maybe not a full k-way tree, so 4-way could regress to 3-way, 2-way.

Fair enough, I didn't answer the original question. The answer to that question is "M(k) = k - 1 always", as I explain in my 2nd answer. This applies regardless of what k is.
j_random_hacker
+1  A: 

OK, to answer the original question as stated:

To merge k blocks using a sequence of 2-way merges always requires exactly k - 1 merges, since regardless of what pair of blocks you choose to merge at any point in time, merging them reduces the total number of blocks by 1.

As I said in my original answer, which pairs of blocks you choose to merge does impact the overall time complexity -- it's better to merge similar-sized blocks -- but it doesn't affect the number of 2-way merge operations.

j_random_hacker
A: 

thanks hacker. In your original solution, it seems you use a heapsort to merge the blocks?

And I know if the each block doesn't contain the same number items, the merge is not effective. But Is there a formula to calculate that? just like 2-way which is (n-1)

That's the subject of computational complexity theory (Big-Oh notation and so on) which is quite mathematical. Usually you need to specify a precise algorithm before you can work out its complexity.
j_random_hacker
As an example, the worst approach is to merge block 1 and 2, then merge the result with block 3, then the result of that with block 4 etc. Suppose each block has n items; in this case, the 1st merge takes 2n - 1 steps, the 2nd takes 3n - 1 steps, etc. Adding these gives a total of O(n*k^2) steps.
j_random_hacker
P.S.: On SO, instead of responding to an answer with a separate answer, it's usually better to write a comment. Also you can upvote an answer that you find helpful ;)
j_random_hacker
I don't think he has enough reputation yet to leave comments...
chrispy