views:

27

answers:

1

Can anyone point me to a good reference on External Memory Mergesort? I've read the wiki page but am having trouble understanding it exactly. An animation might help but I can't seem to find one.

Basically, I know that you have a certain number of blocks on disk, and you can fit a certain number of blocks in memory. Lets say you have 32 blocks on disk and 4 blocks in memory. In the first pass you read 4 blocks into memory at a time, sort them in memory, and write them back out do disk. So at this point you have 8 sorted runs of 4 blocks. How does the merging work? Since I have 4 blocks in memory (assume I have one more for output) I think I should be able to merge 4 of those 8 runs at a time, and then merge the next 4 runs. And then in the last pass I want to merge the whole thing. But don't you have to read each block from disk each time? So how does this not become a n^2 solution?

A: 

I think I get it now. When you merge you still have to read all of the blocks on disk (let's call that B) but you don't have to read them B times. You only read them B log B times.

JPC