views:

797

answers:

20

Is there any performance issue in using SELECT * rather than SELECT FiledName, FiledName2 ... ?

A: 

Supposedly yes. I'm whole-heartedly told at work that I should never use SELECT *. In fact, it's in our policy not to use it because a) it means there's ambiguity in what is used and what's available just by looking at the query and b) it's slower as the SQL server has to find each column it needs and return them.

I've never seen any proof of this, however.

EDIT: Also, if a stored procedure is compiled on the server and uses SELECT *, when the underlying table structure changes supposedly it won't select back the newly introduced columns as SQL compiles SELECT * down to the individual columns.

Kezzer
There is no query plan difference between the two, but if the table gains an extra column then the query does not execute as it originally did and it returns more data than is probably needed.
StingyJack
The DB has to "find each column" whether you name them explicitly or use *. In the case where you want all columns, the performance is identical. However, disallowing * forces you to think about what you really need from the DB, which is a good practice.
Adam Bellaire
+2  A: 

The only performance issue will be if your application only needs a subset of the fields returned by select *. There is no performance difference in the database as they are effectively the same thing.

Andrew Hare
+2  A: 

I don't know about computing performance but in terms of read/maintain-ability (i.e. Human Performance) we don't use select * at my shop. Everything is explicitly selected.

Jason Punyon
+1  A: 

If all fields are specified, then no, there shouldn't be a meaningful difference performance-wise. But if you just want a couple specific fields from a table with a dozen columns, it's slower.

There are readability and maintainability issues with SELECT *. It makes sense to use specific field names all the time, even if you want to select all fields.

Mehrdad Afshari
A: 

SELECT * is translated to SELECT Field1 , Field2 .... etc before it is run so they are effectively the same thing. No difference in performance.

However, readability and maintaiability are better when its SELECT Field1 , Field2 ..

Learning
+6  A: 

If you need a subset of the columns, you are giving bad help to the optimizer (cannot choose for index, or cannot go only to index, ...)

Some database can choose to retrieve data from indexes only. That thing is very very helpfull and give an incredible speedup. Running SELECT * queries does not allow this trick.

Anyway, from the point of view of application is not a good practice.


Example on this:

  • You have a table T with 20 columns (C1, C2, ..., C19 C20).
  • You have an index on T for (C1,C2)
  • You make SELECT C1, C2 FROM T WHERE C1=123
  • The optimizer have all the information on index, does not need to go to the table Data

Instead if you SELECT * FROM T WHERE C1=123, the optimizer needs to get all the columns data, then the index on (C1,C2) cannot be used.

In joins for multiple tables is a lot helpful.

FerranB
I thought indexes were only relevant in JOIN, WHERE, and GROUP BY clauses. Someone can correct me if I'm wrong, but how do columns in the select clause prevent the optimizer from selecting an index?
Juliet
@Princess I've updated the post with an example
FerranB
+5  A: 

Every time you do a select *, there is may be an additional query to get the list of columns. In high transaction environments this could become a visible overhead, but once every now and then will make no difference.

Also, when inserting records, never use select * in an insert in case columns are added.

ck
i just wanted to ask you - when you specify a field in a table explicitly in select, server checks if the field really exists, so there is still additional query or am i wrong?
empi
This is not entirely accurate (well for some databases at least), most top tier databases prepare a plan for a query an cache it, so whether you use * or col list the list of columns is still queried at plan compile time. The query cache is invalidated when DDL changes occur on the table.
Pop Catalin
+2  A: 

I'm not a DBA, but from what I recall learning from our DBA, the reasoning (at least with SQL Server) is that the DB caching algorithms don't cache '*' queries well, but if you are running the same query with exact columns specified multiple times, it will cache that well.

I'm sure a more knowledgeable DBA could go into the exact details of how the caching mechanism works, but that's why there is a performance hit.

NOTE: Caching performance only works if the query is going to be run multiple times, especially in a small time frame, so otherwise you would see no performance difference.

Jay S
A: 

Technically, it would depend on the relational database management system you are using. I think the performance hit would be microseconds. If you are absolutely trying to squeeze the last bit of performace out of your system, I would say don't use them.

I personally use it all the time.

tyndall
+2  A: 

Hi,

this is what i've learned in the past.

I’ve also seen far too much use of SELECT *. There is almost never a good reason to do this. If you have this in your applications, take it out and select the columns you need. In fact, it’s often better to select all columns by name than to use the asterisk because of the processing the engine has to do in some cases to determine the column names.

I've found this online. Hope this helps you out. It is true that a Select * needs to find it's columnames (especially when using joins). I know we never use it.

Greetz, SD

Sem Dendoncker
sorry about the oneliner :(
Sem Dendoncker
Even if you give the columns explicitly, does the processing engine need to do some extra work to make sure the columns in the SELECT clause actually exist?
Juliet
Good question. I would have to look that one up.
Sem Dendoncker
+9  A: 

Take a look at this post:

http://stackoverflow.com/questions/321299/what-is-the-reason-not-to-use-select

and these:

link text
link text
link text
link text

Fred
That post has more to do with maintainability than performance. I agree with the answer to that post that select * is an anti-pattern but this question was about performance and whether or not there is a difference.
Andrew Hare
A: 

If you are only using a subset of the fields, then the performance difference can be substantial. See the following example, which involves retrieving 1,411,771 rows from CScout's analysis of the Linux code.

$ time sh -c "echo 'select * from IDS' | mysql cslinux >/dev/null"
real    0m5.622s
user    0m2.580s
sys     0m0.532s

$ time sh -c "echo 'select EID from IDS' | mysql cslinux >/dev/null"
real    0m4.492s
user    0m0.716s
sys     0m0.096s

An this is not even measuring the performance impact on the server.

Diomidis Spinellis
Nice. I guess to in order to gete a "fair" result, you should include more that one column in the second select statement.
splattne
He should, in fact, name all of the fields on the table to make a fair test.
jmucchiello
With all fields specified there should be no perceptible performance difference (the overhead is simply getting the names of the table's columns).
Diomidis Spinellis
A: 

If your embedding sql in code then your should always use the long form for clarity, not performance. For ad-hoc queries the select * syntax is ESSENTIALLY no less efficient than specifying column names unless you have a massive number of columns which you shouldn't unless you're denormalizing.

I should get 1 point for using 2 unlesses in a sentence and still habving it make sense!! :)

klyde
+1  A: 

Performance, not much. It's just a bit clumsy: in a table, say, with 10 columns, joined on two other tables or even more, particularly with large result sets, SELECT * can return dozens of columns, often with mostly unused or even useless data. In terms of the hit on the DBMS, there wouldn't be much, but all that data still needs to travel across the wire somehow; network bandwidth and consequent latencies certainly add up. I've seen this first-hand in high-volume environments. It definitely matters.

Aside from the bandwidth issues, you can also run into ambiguous column-naming problems (disambiguating usually means removing SELECT * anyway, so you might as well do it from the start), and it's also considered good practice to be explicit about the needs of the code inside the code; doing so helps in lots of ways -- with debugging, with collaborating, etc.

Christian Nunciato
A: 

SELECT * requires SQL to find all the column names however this is not the biggest performance hit by a long shot.

The biggest performance hit of a SELECT * statement is when you are performing a query which requires a Nonclustered index to evaluate. Even if the nonclustered index is a covering index of every column, SQL will still look up the primary key and get the values from the clustered index.

On top of this, if you only need one or two columns, you have the network bottleneck to deal with due to returning a bigger result set than required.

John
A: 

I will echo what others have said regarding "select *" retrieving a list of columns as part of the request handling. By contrast, you can also select columns by ordinal, which saves even more time since the RDBMS engine doesn't even need to look up the column to determine the position of the column to retrieve. I find this highly useful for aggregate queries.

For example: select count(1) from ... versus select count(*) from ...

In this example, the RDBMS only needs to know that it needs the count of the first column, and ZING, it's off. In the (unfortunately) more common select count(*), the RDBMS retrieves a list of all the columns, and then verifies each row to determine if it's valid for counting (as opposed to validating the 1st column only).

This works great most of the time. I'm pretty sure most DB systems count NULL values in the count, but you should watch out for this and verify before assuming.

YMMV, void where prohibited, etc.!

Dan Coates
A: 

Performance This will always be bad if you do not NEED all the columns. Returning more data than is needed will throttle the database and your lan/wan bandwidth.

Readability Knowing which columns are in the view, procedure can be very useful , SELECT * is not helpful at all and I would consider it counter-productive.

*Testing If you make a schema change, all your code that use SELECT * within should be invalidated because any tests you write to check metadata should check the output of the view, proc.

*Of course, assuming you have tests in place as all good DB Dev's should have :)

Coolcoder
+1  A: 

If you use select * in a join then you are automatically sending more information than you need becasue the join field(s) are repeated. This is a waste of processing time and network resources and can cause performance problems. Further not specifying the fields means that your application can break when new fields are added especially if they are fields that the user is not intended to see but which are there for auditing or database type processing. Select * in an insert is always a bad idea as somewhere along the line some somen who is less than smart may actually change the order of the columns in the table .

HLGEM
A: 

I agree with just about all the answers except certain performance claims. If you are actually going to use all the columns in the table, I'd argue the SELECT * version is a smidgen faster. Here's why:

Take these two queries on a table where there is a unique index on (id,x):

SELECT x,y,z,w FROM tab WHERE id='abc' ORDER BY s

SELECT x,y,z,w FROM tab WHERE id='abc' 
AND x in ('a','b','c','d','e','f','g','h',...)
ORDER BY ('a','b','c','d','e','f','g','h',...)

Which is faster? If the 'x in' clause names all the values for x in the table for id 'abc' then the first query is probably faster. Now let's rename these fields:

SELECT field_name, field_type, field_offset, field_len
FROM internal_field_catalog
WHERE table_name = 'abc'
ORDER BY field_order

So when retrieving the data, the SELECT * allows the engine to do (the equivalent of) a single memcpy to move the row data to the result set and when retrieving the field data it probably is selected faster.

All I'm saying is there is an edge case where SELECT * is perfectly useful and probably faster. One reason you might always need all columns from a table is when storing object persistence in an RDBMS (for some reason). To every rule of thumb there is an exception.

jmucchiello
I accept this , but I think other reasons "not to use it " counter this example use of SELECT *.
Coolcoder
+1  A: 

Maybe. It depends largely on the database engine, how it stores stuff, how many rows are coming back, how many other columns there are and the sizes of the other columns.

If you are using a row-based database (i.e. most of them) which stores all the columns together (almost all do, except for BLOBs which are often stored separately, especially the larger ones), then doing SELECT * has little impact on the server itself - it has to fetch the entire row anyway.

On the other hand, if you're sending the data over a network (or even locally, as it'll impact the size of buffers used etc), then it may help to have fewer columns, as there will be fewer bytes to send back. This difference could be dwarfed anyway by server performance if the query is in any way difficult (e.g. requires IO).

If you have big blobs in the rows, SELECT * is not very clever - otherwise, it's unlikely to make much difference, but could.

There are a few "column based" database engines knocking around - they are completely different - for them, "SELECT *" is a total performance killer; be sure to avoid it. The chances are, if you're using one, you are completely aware of this though (typically they're used for very large datawarehouse apps).

For me, the main advantage of not using "SELECT *" is maintainability. You get no surprises when someone adds extra columns to the table; your query "fails fast" when someone removes one of the columns you were using. It makes the code more self-documenting as someone can casually see what columns you wanted.

MarkR