ResultSet and Select * Performance

views:

594

answers:

+4 Q:

ResultSet and Select * Performance

I am refactoring some Spring JDBC code in which some of the costlier queries do "SELECT * FROM..." - and was about to start checking which columns were actually needed and just SELECT x , y FROM.. them. But reading through the ResultSet class is seemed that most data is lazily loaded. When you do a ResultSet.next() it moves the cursor in the database (Oracle 10g in this application) and when you do a ResultSet.getXX() it retrieves that column. So my thought was that if you do a "SELECT * " but only retrieve the columns you want you are not really taking a performance hit. Am I thinking about this correctly? The only place I can think of where this hurts you is inside the database because it is storing the query results in memory and has to use more memory then it would if only a few rows are selected, but if it's actually only storing pointers to the columns that hit the query then even this wouldn't be the case.

Thoughts?

NOTE : this only applies to standard ResultSet, I know CachedResultSet acts differently.

+6 A:

I would be surprised if going from "SELECT *" to "SELECT A,B,C" gave you any meaningful performance improvement, unless you had a huge number of columns that you didn't need.

This is all very dependent on your database, your driver and your application, and most generalisations are going to be pretty meaningless.

The only reliable answer you're going to get from this is by benchmarking it - try "SELECT *", try "SELECT A,B,C", and see if there's improvement worth chasing.

skaffman 2009-07-24 16:58:20

Wow, i always thought we sould not use select *,

Rakesh Juyal 2009-07-25 07:41:27

You shouldn't, as a rule. But the question wasn't about whether we should use it or not, but what its performance is like.

skaffman 2009-07-25 09:16:48

+3 A:

I do know that in an application I was involved with, at large data volumes (and large table sizes) changing from select * to select x, y did buy us a small performance gain. However, I would strongly recommend, as did skaffman, that you use a profiling tool, such as Oracle's built in profiler or an external profiler, and do large data sets to normalize out noise (like network traffic, hard drive spin up, sun spots, etc)

aperkins 2009-07-24 17:02:32

I've seen better behavior with `SELECT A,B,C` than `SELECT *` when a `BLOB` column wasn't included.

seth 2009-07-24 17:04:51

+3 A:

In the environments that I've worked in, as a rule SELECT * is simply never used. I believe skaffman & aperkins are probably correct about performance gain being small. This is one of those things where as a database developer I have a strong opinion that you should always name the columns you're retrieving, but I guess there may be no real basis for this.

Hmmm... I guess, from a maintainability perspective, one could argue that naming the columns you're retrieving serves to self-document your code a bit. SELECT * doesn't give another developer as much information to work with down the line. Whether or not that & the small performance benefit justifies the extra typing I'm not sure.

codemonkey 2009-07-24 17:07:36

+1 for the second paragraph.

banjollity 2009-07-25 08:20:22

+2 A:

I'm with @skaffman and others on this - minor gains at best. If you think about how Oracle retrieves data and remember it's block I/O, then regardless of the columns you ask for in your client, the database is going to fetch the entire block the record is found in anyway. If your client always retrieves the entire record (e.g., doing the SELECT * in SQL*Plus), there could be a performance gain, but in your situation where the data is only transmitted if you ask for it, then probably not much.

"SELECT *" can be evil for apps that are compiled. If the table changes, your code could break. That's why I wouldn't use it.

EDIT: Mulling over all the excellent responses here:

Justin makes great points about certain situations where significant performance improvements can arise.
Codemonkey makes the good point about self-documenting code.
Aperkins and skaffman make one of the best suggestions of all: try it out, measure, and see for your own situation what the effects are.

+1's all around... What I don't see is anyone falling over themselves recommending using "SELECT *" at all. If it's easy to specify the exact columns you need, I'd fix the code to do so.

DCookie 2009-07-24 17:09:49

+4 A:

Depending on the table structure, the Oracle version, and the indexes involved, it is entirely possible that changing the set of columns you are selecting would substantially improve performance by changing query plans for the better. For most queries, the performance benefits may well be minimal, but overall it is generally good practice to name columns explicitly.

The simplest case where performance will be improved will occur when you have a "covered index" that the optimizer could use. If all the columns you are selecting and all the columns you are filtering by are part of a single index, that index is a covered index for the query. In that case, Oracle can avoid ever reading the data from the table and can just read the index.

There are other cases where performance will be improved as well. The optimizer may be able to perform table elimination if you have queries there are interim joins that don't affect the eventual output. If you are selecting all the columns, that optimization isn't possible. If you have tables with chained rows, eliminating columns can also eliminate the need to fetch the additional blocks where the eliminated columns reside. If there are LONG and LOB columns in the table, not selecting those columns would also result in large improvements.

Finally, eliminating columns will generally reduce the amount of space Oracle will require to sort and hash results before shipping them over the wire. And even though the ResultSet may lazily load data in the application server's RAM, it is probably not able to lazily fetch columns over the network. If you select all the columns from the table, the JDBC driver likely has to fetch at least 1 complete row at a time (more likely it is fetching 10 or 100 rows per network round-trip). And since the driver doesn't know when the data is fetched what columns are going to be requested, you'll have to ship all the data over the network.

Justin Cave 2009-07-24 18:03:27

I don't think that the columns you SELECT by have anything to do with the query tuning/plan - that is only effected by what comes after the WHERE statement.

Gandalf 2009-07-24 18:15:50

@Gandalf - no, Justin is correct when all of the columns referenced in the SELECT are available from the index structure (in certain cases). Oracle doesn't have to read the table to obtain the column values

dpbradley 2009-07-24 19:01:38

Thanks for the clarification, I can see how Oracle could get all it needed from just the index. Unfortunately that's not the case for these queries, but something I can definitely try to take advantage of in the future.

Gandalf 2009-07-28 18:01:56

I have never noticed any performance gains between one and the other when switching statements around - I'm fairly certain Oracle grabs an entire row's contents first anyway, regardless of wild card or column specification. There are much larger factors on performance to be checked into before this (indexes, hard drive speed, etc.).

As a coding practice I'd avoid "SELECT *". Specifying the specific columns does make the intent of each query a bit more obvious. It makes for good self-documenting code. Writing out the column names also helps me to wrap my head around what exactly I plan to do with a query as I'm writing it.

JohnRegner 2009-07-24 20:04:50

ansaurus

tags:

views:

answers:

ResultSet and Select * Performance

related questions