tags:

views:

2211

answers:

18

I've seen a number of people claim that you should specifically name each column you want in your select query.

Assuming I'm going to use all of the columns anyway, why would I not use SELECT *?

Even considering the question from 9/24, I don't think this is an exact duplicate as I'm approaching the issue from a slightly different perspective.

One of our principles is to not optimize before it's time. With that in mind, it seems like using SELECT * should be the preferred method until it is proven to be a resource issue or the schema is pretty much set in stone. Which, as we know, won't occur until development is completely done.

That said, is there an overriding issue to not use SELECT *?

+14  A: 

One major reason is that if you ever add/remove columns from your table, any query/procedure that is making a SELECT * call will now be getting more or less columns of data than expected.

ahockley
You should never write code that depends on the number of columns returned anyway.
le dorfier
+26  A: 

If your code depends on the columns being in a specific order, your code will break when there are changes to the table. Also, you may be fetching too much from the table when you select *, especially if there is a binary field in the table.

Just because you are using all the columns now, it doesn't mean someone else isn't going to add an extra column to the table.

It also adds overhead to the plan execution caching since it has to fetch the meta data about the table to know what columns are in *.

Bob
Good answer, but I'd change the "code will break" to "code MAY break." That's the real trouble here, the "select *" use doesn't ALWAYS produce a breaking change. And when the break does happen is usually highly decoupled from the use that ends up broken.
BQ
If someone is referencing columns ordinally in their code, they're in trouble regardless of whether they use SELECT * or not. The plan execution overhead is trivial, and wouldn't matter anyway once the plan is cached.
MusiGenesis
Then the programmer error lies in writing code that depends on the sequence of the columns. You never need to do that.
le dorfier
@doofledorfer - never say never. It's faster to access ordinal columns, and it is practical at times. It's a bigger error to use select * than there is to use ordinal access.
Robert Paulson
This was the first reason I thought of in answer to the question. Therefore, it MUST be the correct one.
dviljoen
+1  A: 

Even if you use every column but address the row array by numeric index you will have problems if you add another row later on.

So basically it is a question of maintainability! If you don't use the * selector you will not have to worry about your queries.

tharkun
+1  A: 

Selecting only the columns you need keeps the dataset in memory smaller and therefor keeps your application faster.

Also, a lot of tools (e.g. stored procedures) cache query execution plans too. If you later add or remove a column (particularly easy if you're selecting off a view), the tool will often error when it doesn't get back results that it expects.

Soldarnal
+8  A: 

There are a few reasons:

  1. If the number of columns in a database changes and your application expects there to be a certain number...
  2. If the order of columns in a database changes and your application expects them to be in a certain order...
  3. Memory overhead. 8 unnecessary INTEGER columns would add 24 bytes of wasted memory. That doesn't sound like a lot, but this is for each query and INTEGER is one of the small column types... the extra columns are more likely to be VARCHAR or TEXT columns, which add up much more quickly.
  4. Network overhead. Related to memory overhead: if I issue 30,000 queries and have 8 unnecessary INTEGER columns, I've wasted 960kB of bandwidth. VARCHAR and TEXT columns are likely to be considerably larger.

Note: I chose INTEGER in the above example because they have a fixed size of 4 bytes.

R. Bemrose
+5  A: 

If your application gets data with SELECT * and the table structure in the database is changed (say a column is removed), your application will fail in every place that you reference the missing field. If you instead include all the columns in your query, you application will break in the (hopefully) one place where you initially get the data, making the fix easier.

That being said, there are a number of situations in which SELECT * is desirable. One is a situation that I encounter all the time, where I need to replicate an entire table into another database (like SQL Server to DB2, for example). Another is an application written to display tables generically (i.e. without any knowledge of any particular table).

MusiGenesis
The question isn't 'is select * ever desirable', so the 2nd part of your answer is irrelevant. The question states that using 'select *' should be preferable, which of course is complete bollocks.
Robert Paulson
Yes, my 2nd part is irrelevant. OQ changed the question to state SELECT * is preferable, and yeah that's kind of bollocksy.
MusiGenesis
Ah yeah sorry - question changed it's direction after your answer.
Robert Paulson
That's alright. Even Mozart was an editor (http://stackoverflow.com/questions/292682/understanding-dijkstras-mozart-programming-style#292704). My original post suggested that use of SELECT * led to cannibalism. :)
MusiGenesis
lol .. Mozart did some pretty good PR
Robert Paulson
+2  A: 

You might join two tables and use column A from the second table. If you later add column A to the first table (with same name but possibly different meaning) you'll most likely get the values from the first table and not the second one as earlier. That won't happen if you explicitly specify the columns you want to select.

Of course specifying the columns also sometimes causes bugs if you forget to add the new columns to every select clause. If the new column is not needed every time the query is executed, it may take some time before the bug gets noticed.

Kaniu
+1  A: 

I think it's ok when your doing "...exists(select * from ..." Or, does that actually incur some overhead?

dotjoe
See comment on the elected answer of the duplicate thread...
PhiLho
I guess "exists(select 1..." would be better, but does * actually get expanded in an exists clause?
dotjoe
+2  A: 

I actually noticed a strange behaviour when I used select * in views in SQLServer2005

Run that query and you will see what I mean

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[starTest]') AND type in (N'U'))
DROP TABLE [dbo].[starTest]
CREATE TABLE [dbo].[starTest](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [A] [varchar](50) NULL,
    [B] [varchar](50) NULL,
    [C] [varchar](50) NULL
) ON [PRIMARY]

GO

insert into dbo.starTest
select 'a1','b1','c1'
union all select 'a2','b2','c2'
union all select 'a3','b3','c3'

go
IF  EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[vStartest]'))
DROP VIEW [dbo].[vStartest]
go
create view dbo.vStartest as
select * from dbo.starTest
go

go
IF  EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[vExplicittest]'))
DROP VIEW [dbo].[vExplicittest]
go
create view dbo.[vExplicittest] as
select a,b,c from dbo.starTest
go


select a,b,c from dbo.vStartest
select a,b,c from dbo.vExplicitTest

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[starTest]') AND type in (N'U'))
DROP TABLE [dbo].[starTest]
CREATE TABLE [dbo].[starTest](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [A] [varchar](50) NULL,
    [B] [varchar](50) NULL,
    [D] [varchar](50) NULL,
    [C] [varchar](50) NULL
) ON [PRIMARY]

GO

insert into dbo.starTest
select 'a1','b1','d1','c1'
union all select 'a2','b2','d2','c2'
union all select 'a3','b3','d3','c3'

select a,b,c from dbo.vStartest
select a,b,c from dbo.vExplicittest

Compare the results of last 2 select statements. I believe what you will see is a result of Select * referencing columns by index instead of name

If you rebuild the view it will work fine again.

EDIT

I have added a separate question to look into that behaviour in more details

kristof
+1  A: 

It makes your code more ambiguous and more difficult to maintain; because you're adding extra unused data to the domain, and it's not clear which you've intended and which not. (It also suggests that you might not know, or care.)

le dorfier
+1  A: 

To answer you question directly: Do not use "SELECT *" when it makes your code more fragle to changes to the underlying tables. Your code should break only when a change is made to the table that directly affects requirments of your program.

Your application should take advantage of the abstraction layer that Relational access provides.

Metro
+2  A: 

I understand where you're going regarding premature optimization, but that really only goes to a point. The intent is to avoid unnecessary optimization in the beginning. Are your tables unindexed? Would you use nvarchar(4000) to store a zip code?

As others have pointed out, there are other positives to specifying each column you intend to use in the query (such as maintainability).

Jim B-G
+50  A: 

The essence of the quote of not prematurely optimizing is to go for simple and straightforward code and then use a profiler to point out the hot spots, which you can then optimize to be efficient.

When you use select * you're make it impossible to profile, therefore you're not writing clear & straightforward code and you are going against the spirit of the quote. select * is an anti-pattern.


So selecting columns is not a premature optimization. A few things off the top of my head ....

  1. If you specify columns in a sql statement, the sql execution engine will error if that column is removed from the table and the query is executed.
  2. You can more easily scan code where that column is being used.
  3. You should always write queries to bring back the least amount of information.
  4. As others mention if you use ordinal column access you should never use select *
  5. If your sql joins tables, select * gives you all columns from all tables in the join

The corollary is that using select * ...

  1. The columns used by the application is opaque
  2. DBA's and their query profilers are unable to help your app's poor performance
  3. The code is more brittle when changes occur
  4. Your database and network are suffering because they are bringing back too much data (I/O)
  5. Database engine optimizations are minimal as you're bringing back all data regardless (logical).


Writing correct Sql is just as easy as writing Select *. So the real lazy person writes proper sql because they don't want to revisit the code and try to remember what they were doing when they did it. They don't want to explain to the DBA's about every bit of code. They don't want to explain to their clients why the application runs like a dog.

Robert Paulson
BTW, this was an awesome answer. Thanks!
Chris Lively
Glad you liked it. I wouldn't have considered the angle of 'select *' as an anti-pattern if you hadn't asked if it was premature optimization. So good question as well. :-)
Robert Paulson
In your first section, point #5 should read "select * gives you all *columns* from all tables in the join". In your second section, points #2 and #5 are not necessarily true, and should not be listed as reasons to not use "select *".
jimmyorr
Robert Paulson
Awesome post! As a sidenote, I believe the only place where SELECT * is actually valid is inside EXISTS() subqueries (where it's optimized to SELECT 1 anyway) and in debug code within stored procedures which normally never runs unless a special param is passed.
Joe Pineda
I'd Argue that #3(Brittle code) isn't really true. Depending on the implementation, Select * might make it LESS brittle, but I don't see how it could be more so.
JohnFx
@JohnFx, I guess you define brittle differently. Brittle is normally defined as 'breaks easily'. Having unknown or hard-to-find dependencies because each piece of code will use different columns means I can't easily change anything at the data level without full regression .. which seems brittle.
Robert Paulson
I JohnFX is pointing out that while "select *" might make it harder to track down breakage, it actually makes it (slightly) less likely to happen in the first place. Hence less brittle. I agree, but (like JohnFX, I suspect) don't think that really counters all of the other valid arguments against select * in almost all situations.
mavnn
@mavnn, w.r.t. brittleness, I fear this is a devolving into a semantics issue on my choice of the word brittle. My last word is to say it makes little difference anyways. The only scenario is renamed / removed columns. You are just moving the break from when the sql is executed (explicit) versus breaking when the results are consumed. The way in which the query result is consumed can vary, and the code may or may not silently fail, but the sql execution engine will definitely fail with invalid sql. So did select * help you? IMO explicit failure closer to the DB for a DB issue is better. Thx
Robert Paulson
Dogs run pretty fast - I've never beaten one in a footrace.
Erik Forbes
+7  A: 

(1) In a roundabout way you are breaking the modularity rule about using strict typing wherever possible. Explicit is almost universally better.

(2) Even if you now need every column in the table, more could be added later which will be pulled down everytime you run the query and could hurt performance. It hurts performance because (A) you are pulling more data over the wire; and (B) Because you might defeat the optimizer's ability to pull the data right out of the index (for queries on columns that are all part of an index.) rather than doing a lookup in the table itself

When TO use select *

When you explicitly NEED every column in the table, as opposed to needing every column in the table THAT EXISTED AT THE TIME YOU WROTE THE QUERY. For example, if were writing an DB management app that needed to display the entire contents of the table (whatever they happened to be) you might use that approach.

JohnFx
Thank you! I was looking for someone to state that.
Kev
Another time to use `SELECT *` would be when you're doing test queries using the db client.
cdmckay
That seems like a strange exception given the context of the question. Other than saving some typing, what is the advantage of doing this for test queries?
JohnFx
A: 

I don't use SELECT * simply because it is nice to see and know what fields I am retrieving.

lkessler
+1  A: 

Generally bad to use 'select *' inside of views because you will be forced to recompile the view in the event of a table column change. Changing the underlying table columns of a view you will get an error for non-existant columns until you go back and recompile.

SomeMiscGuy
+1  A: 

Just to add one thing that no one else has mentioned. Select * returns all the columns, someone may add a column later that you don't necessarily want the users to be able to see such as who last updated the data or a timestamp or notes that only managers should see not all users, etc.

Further, when adding a column, the impact on existing code should be reviewed and considered to see if changes are needed based on what information is stored in the column. By using select * , that review will often be skipped because the developer will assume that nothing will break. And in fact nothing may explicitly appear to break but quereis may now start returning the wrong thing. Just because nothing explicitly breaks, doesn't mean that there should not have been changes to the queries.

HLGEM
+1  A: 

When you're specifying columns, you're also tying yourself into a specific set of columns and making yourself less flexible, making Feuerstein roll over in, well, whereever he is. Just a thought.

orbfish