views:

163

answers:

7

Let's assume I have a table with field of type VARCHAR. And I need to get data from that table sorted alphabetically by that field.

What is the best way (for performance): add order by field to the SQL-query or sort the data when it's already fetched?

I'm using Java (with Hibernate), but I can't tell anything about DB engine. It could be any popular relational database (like MySQL or MS Sql Server or Oracle or HSQL DB or any other).

The amount of records in table can vary greatly but let's assume there are 5k records.

UPD: how well does 2nd level hibernate cache (EHCache for example) support sorted data?

+3  A: 

Sort the data in the database - that's (part of) what it's there for. The database engine is probably better at sorting this data than you are.

Dominic Rodger
+7  A: 

If this field is indexed, then the average DB would be much more efficient in this task than Java. Also note that you normally wouldn't retrieve all those rows at once if it's for pure display, but rather retrieve a subset of it so that it can be shown by pagination. You can do this at DB level as well. Sorting the data in Java would require the entire table being hauled into Java's memory, you don't want to do that.


In Hibernate you can order the results using Criteria#addOrder() and paginate using Criteria#setFirstResult() and Criteria#setMaxResults(). E.g.

List users = session.createCriteria(User.class)
    .addOrder(Order.asc("username"))
    .setFirstResult(0) // Index of first row to be retrieved.
    .setMaxResults(10) // Amount of rows to be retrieved.
    .list();
BalusC
@BalusC: thanks for this point. It's not indexed unfortunately, but I'll remember for the further cases.
Roman
Even then, doing this in a decent DB is more efficient than doing this in Java. That's a fact. The DB is designed for exactly those data arranging and grouping purposes. Benefit its powers.
BalusC
A: 

My solution would be create index for the sort column and write query with order by clause.

Sujee
A: 

What is the best way (for performance): add sort by field to the SQL-query or sort the data when it's already fetched?

It's ORDER BY, not sort by.

It's a matter of tradeoff: sorting on client side is distributed which means less impact on the server. However, it can require more client resources.

If the field is not indexed, to return the whole sorted, recordset the server will need to do the following things:

  1. Fetch the whole recordset
  2. Sort it
  3. Send it over the network to the client

, while sorting on the client side requires only points 1 and 3 (which are the least resource-intensive).

If you server needs to serve hundreds of clients simultaneously and your clients need the whole recordsets, then most probably sorting on the client side will be more efficient.

If the field is indexed, the database can return the data already sorted from that index. However, this will require additional table lookups to get the other fields.

Also, if you don't want the whole recordset but only some top fields (like in ORDER BY LIMIT or SELECT TOP … ORDER BY), the whole recorset will not need to be fetched and transmitted over the network. In this case, ordering on database side will likely be more efficient.

Quassnoi
Thanks, my bad, will correct. Last time I did it almost 2 years ago.
Roman
A: 

For only 5 thousand records, it doesn't really make much difference, but I'd sort it the database; even if there's no index on the field, it's probably at least as fast as doing it afterwards.

Artefacto
+1  A: 

Pro sorting in the Database:

  1. Speed. If you have an index on the order by condition, the databasae shouldn't have to sort at all, and for maximum performance you could use a clustered index.
  2. Ease of use. An order by in the sql query is easier to write and maintain than a Java Comparator.

Pro sorting in the application:

  1. Customizability. Maybe you want to sort by more elaborate criteria, then a custom sort in Java will be more flexible.
  2. Reproducibility. If you code for different databases, their Collating rules will probably differ. Maybe that's a problem, and you want one particular odering. In Java, you can write a Custom Collator to make sure the output from all databases is ordered the same way.
wallenborn
A: 
  • do you usually extract only a subset of that data ? -> a good back end design (indexing and and/or partitioning) helps you extracting that subset ordered faster; then an "order by" on the db is matter of instants.
  • tables always contain a few rows of data ? then an "order by" on the db is matter of instants

and even if you don't(can't) optimize your database you should (almost) always prefer to leave that kind of op.s to the b.e.

Murko