views:

947

answers:

4

I want to put "random" output from my result set (about 1.5 mil rows) in a file in a sorted manner. I know i can use sort by command in my query but that command is "expensive". Can you tell me is there any algorithm for writing result set rows in a file so the content would be sorted in the end and can i gain in performance with this? I'm using java 1.6, and query has multiple joins.

+4  A: 

Define an index for the sort criteria in your table, then you can use the order by clause without problems and write the file as it comes from the resultset.

If your query has multiple joins, create the proper indexes for the joins and for the sort criteria. You can sort the data on your program but you'd be wasting time. That time will be a lot more valuable when employed learning how to properly tune/use your database rather than reinventing sorting algorithms already present in the database engine.

Grab your database's profiler and check the query's execution plan.

Vinko Vrsalovic
+1  A: 

In my experience sorting at the database side is usually as fast or faster...certainly if the column you sort on is indexed

NR
A: 

If you're reading from a database, getting sorted output shouldn't be so 'expensive' if you have appropriate indexes.

But, sometimes with complex queries it's very hard for the SQL optimiser to apply indexes. In that case, the DB simply accumulates the results in a temporary table and sorts it for you, transparently.

It's very unlikely that you could match the level of optimisations put into your DB engine; but if your problem arises because you're doing some postprocessing of the data that negates any sorting done by the DB, then you have no alternative other than sorting it yourself.

Again, the easiest would be to use the DB: simply write to a temporary table with an appropriate index and dump from there.

If you're certain that the data will always fit in RAM, you can sort it in memory. It's the only case in which you might be able to beat the DB engine, just because you know you won't need HD access.

But that's a lot of 'ifs'. Better stay with your DB

Javier
Stress that 'might' please, because if the data fits in RAM, the database knows (or can be told) about that too and you're back in square one.
Vinko Vrsalovic
A: 

If you need the data sorted, someone has to do it - either you or the database. It's certainly easier effort-wise to add the ORDER BY to the query. But there's no reason you can't sort it in-memory on your side. The easiest way is to chunk the data in a sorted collection (TreeSet, TreeMap) using a Comparator to sort on the column you need. Then write out the sorted data.

Alex Miller
There's no reason you can't but performance-wise you really shouldn't unless you have a very special case (data fits in memory, very weird database schema that leads to a query plan that cannot be fixed).
Vinko Vrsalovic