I need to be able to sort multiple intermediate result sets and enter them to a file in sorted order. Sort is based on a single column/key value. Each result set record will be list of values (like a record in a table)
- The intermediate result sets are got by querying entirely different databases.
- The intermediate result sets are already sorted based on some key(or column). They need to be combined and sorted again on the same key(or column) before writing it to a file.
- Since these result sets can be massive(order of MBs) this cannot be done in memory.
My Solution broadly :
To use a hash and a random access file . Since the result sets are already sorted, when retrieving the result sets , I will store the sorted column values as keys in a hashmap.The value in the hashmap will be a address in the random access file where every record associated with that column value will be stored.
Any ideas ?