tags:

views:

91

answers:

3

I need to be able to sort multiple intermediate result sets and enter them to a file in sorted order. Sort is based on a single column/key value. Each result set record will be list of values (like a record in a table)

  1. The intermediate result sets are got by querying entirely different databases.
  2. The intermediate result sets are already sorted based on some key(or column). They need to be combined and sorted again on the same key(or column) before writing it to a file.
  3. Since these result sets can be massive(order of MBs) this cannot be done in memory.

My Solution broadly :

To use a hash and a random access file . Since the result sets are already sorted, when retrieving the result sets , I will store the sorted column values as keys in a hashmap.The value in the hashmap will be a address in the random access file where every record associated with that column value will be stored.

Any ideas ?

+5  A: 

Have a pointer into every set, initially pointing to the first entry

Then choose the next result from the set, that offers the lowest entry

Write this entry to the file and increment the corresponding pointer

This approach has basically no overhead and time is O(n). (it's Merge-Sort, btw)

ziggystar
I second both merge-sort suggestions. It's a perfect fit for the problem.
Carl Smotricz
A: 

Sounds like you are looking for an implementation of the Balance Line algorithm.

Jeroen van Bergen
+2  A: 

If you've got 2 pre-sorted result sets, you should be able to iterate them concurrently while writing the output file. You just need to compare the current row in each set: Simple example (not ready for copy-and-paste use!):

ResultSet a,b;
//fetch a and b
a.first();
b.first();
while (!a.isAfterLast() || !b.isAfterLast()) {
  Integer valueA = null;
  Integer valueB = null;

  if (a.isAfterLast()) {
    writeToFile(b);
    b.next();
  }
  else if (b.isAfterLast()) {
    writeToFile(a);
    a.next();
  } else {
    int valueA = a.getInt("SORT_PROPERTY");
    int valueB = b.getInt("SORT_PROPERTY");
    if (valueA < valueB) {
      writeToFile(a);
      a.next();
    } else {
      writeToFile(b);
      b.next();
    }
  }



}
FRotthowe
I like this idea. But this assumes just 2 result sets. I can have 2 or MORE result sets at a time. Porting this code for multiple result sets will be cumbersome. Plus this forces me to execute all result sets at once(increasing memory) instead of doing it sequentially.
Zenil
Of course this is only a simple example. For more result sets, you could use a sorted queue to keep track of the next element for each result set.Plus, as long as your database supports cursors, memory consumption (on the java side) won't be an issue.
FRotthowe