views:

1250

answers:

18

I'm using MS SQL Server 2005. Is there a difference, to the SQL engine, between

SELECT * FROM MyTable;

and

SELECT ColA, ColB, ColC FROM MyTable;

When ColA, ColB, and ColC represent every column in the table?

If they are the same, is there a reason why you should use the 2nd one anyway? I have a project that's heavy on LINQ, and I'm not sure if the standard SELECT * it generates is a bad practice, or if I should always be a .Select() on it to specify which cols I want.

EDIT: Changed "When ColA, ColB, and ColC are all the columns to the table?" to "When ColA, ColB, and ColC represent every column in the table?" for clarity.

+2  A: 

When you select each field individually, it is more clear which fields are actually being selected.

Ikke
+32  A: 

Generally, it's better to be explicit, so Select col1, col2 from Table is better. The reason being that at some point, an extra column may be added to that table, and would cause unneeded data to be brought back from the query.

This isn't a hard and fast rule though.

swilliams
In many cases, it also causes dependent code to break. This is especially true in environments with lazy developers who do not use column lists on INSERT statements.
Pittsburgh DBA
+16  A: 

1) The second one is more explicit about which columns are returned. The value of the 2nd one then is how much you value explicitly knowing which columns come back.

2) This involves potentially less data being returned when there are more columns than the ones explicitly used as well.

3) If you change the table by adding a new column, the first query changes and the second does not. If you have code like "for all columns returned do ..." then the results change if you use the first, but not the 2nd.

Josh
+3  A: 

Some reasons not to use the first statement (select *) are:

  1. If you add some large fields (a BLOB column would be very bad) later to that table, you could suffer performance problems in the application
  2. If the query was a Join-Query with two or more table, some of the fields could have the same name. It would be better to assure that your field names are different.
  3. The purpose of the query is clearer with the second statement from an programming esthetics viewpoint
splattne
+7  A: 

You should specify an explicit column list. SELECT * will bring back more columns than you need creating more IO and network traffic, but more importantly it might require extra lookups even though a non-clustered covering index exists (On SQL Server).

Mitch Wheat
Yes! This is the best reason to be explicit--covering indexes.
Michael Haren
It's interesting that my reply posted at the same time contains more info than the accepted answer and yet has a huge diff in votes? What gives?
Mitch Wheat
Fastest gun in the west...
Josh
The accepted answer seems to satisfy more people's [mis]conceptions about the difference between * and col1, col2, col3.
ProfK
+1  A: 

A quick look at the query execution plan shows that the querys are the same.

The general rule of thumb is that you will want to limit your queries to only the fields that you need returned.

TGnat
+2  A: 

Its good for forward-compatiblity.

When you use

SELECT * FROM myTable

and in "myTable" are 3 columns. You get same results as

SELECT Column1, Column2, Column3 FROM myTable

But if you add new column in future, you get a diferent results.

Of course, if you change name one of existing column, in first case you get results and in the second case you get a error ( I think, this is correct behaviour of application ).

TcKs
I'm not sure that breaking when future changes occur is the definition of forward compatibility.
I think the case TcKs makes is that breaking is better than pretending to work (but not working).
Greg D
I agree, the app should break if you rename the column and access results by column name. Using * isn't any better in protecting yourself against a rename if you still access the results by column name instead of index.
Josh
@Greg D Yes, that's exactly, what I meant.
TcKs
+1  A: 

For LinqToSql, if you plan to modify those records later, you should pull the whole record into memory.

David B
+2  A: 

SELECT * is a bad practice in most places.

  • What if someone adds a 2gb BLOB column to that table?
  • What is someone adds really any column to that table?

It's a bug waiting to happen.

"SELECT *" would not return the 2 gigs, so that is irrelevant.It is the addition of columns impacting dependent code, along with the readability and explicitness that really matter.
Pittsburgh DBA
I agree mostly, except for the bug part. I find I've avoided bugs, by just getting the whole row and only having one place (Data access layer class) to handle that data. I usually handle possible issues there. the data model classes with the link to SQL are a great place to tweak stuff.
stephenbayer
@ P DBA: Did I say that it would return the 2 gigs? I was merely asking if the code would handle the change in the table as illustrated in the second bullet.@SB You might code around changes to the schema, most devs don't and things break. Why leave yourself open to the chance that it would?
+1  A: 

It depends on what you mean by "difference". There is the obvious syntax difference, but the real difference is one of performance.

When you say SELECT * FROM MyTable, you are telling the SQL query engine to return a data set with all of the columns from that table, while SELECT ColA, ColB, ColC FROM MyTable tells the query engine to return a data set with only ColA, ColB, and ColC from the table.

Say you have a table with 100 columns defined as CHAR[10]. SELECT * will return 100 columns * 10 bytes worth of data while SELECT ColA, ColB, ColC will return 3 columns * 10 bytes worth of data. This is a huge size difference in the amount of data that is being passed back across the wire.

Specifying the column list also makes it much clearer what columns you are interested in. The drawback is that if you add/remove a column from the table you need to ensure that the column list is updated as well, but I think that's a small price compared to the performance gain.

Scott Dorman
In this case, there is no difference in performance. I believe that you missed this key piece of information in the first part of the question:"When ColA, ColB, and ColC are all the columns to the table?"Since he wants all of the columns, there is no performance difference. Other issues? Yes.
Pittsburgh DBA
@Pittsburgh DBA: I saw that but read it differently...as ColA, ColB, and ColC are all columns in the table not that they were the full set of columns in the table. Either way, the information in my response is still relevant (but in this _specific_ case, it's a wash.)
Scott Dorman
A: 

selecting each column is better than just * because in case you add or delete a new row you HAVE to look at the code and take a look what you were doing with the retrieved data.
Also, it helps you understand your code better and allows you to use aliases as column names (in case you're performing a join of tables with a column sharing the name)

sebastian
+6  A: 

I'm going to get a lot of people upset with me, but especially if I'm adding columns later on, I usually like to use the SELECT * FROM table. I've been called lazy for this reason, because if I make any modifications to my tables, I'd like not to track down all the stored procs that use that table, and just change it in the data access layer classes in my application. There are cases in which I will specify the columns, but in the case where I'm trying to get a complete "object" from the database, I'd rather just use the "*". And, yes, I know people will be hating me for this, but it has allowed me to be quicker and less bug free while adding fields to my applications.

stephenbayer
I don't think you have to be too defensive. It's a valid answer to a stylistically subjective question. :)
Greg D
"Quicker" on the development side, sure. However, you may be wasting I/O on nearly every one of your SELECT statements.
Pittsburgh DBA
I'm with you, and I have yet to come across a performance problem that can be traced back to "select *" vs. "select column1, column2...". I'm sure it happens, but I'd rather deal with that exception when it happens than defensively code against it when I don't need to 99% of the time.
JasonS
I am with you, too. In certain circumstances, asterisk obviously does make code cheaper to maintain (esp. if you have a decent IDE and/or good coding conventions), which more often than not is much more important than performance-related issues.
Yarik
+2  A: 

If your code relies on certain columns being in a certain order, you need to list the columns. If not, it doesn't really make a difference if you use "*" or write the column names out in the select statement.

An example is if you insert a column into a table.

Take this table: ColA ColB ColC

You might have a query:

SELECT *
FROM myTable

Then the code might be:

rs = executeSql("SELECT * FROM myTable")
while (rs.read())
    Print "Col A" + rs[0]
    Print "Col B" + rs[1]
    Print "Col C" + rs[2]

If you add a column between ColB and ColC, the query wouldn't return what you're looking for.

Jim
+1  A: 
SELECT * FROM MyTable

select * is dependent on the column order in the schema so if you refer to the result set by the index # of the collection you will be looking at the wrong column.

SELECT Col1,Col2,Col3 FROM MyTable

this query will give you a collection that stays the same over time, but how often are you changing the column order anyways?

Scott Cowan
+3  A: 

A couple things:

  • A good number of people have posted here recommending against using *, and given several good reasons for those answers. Out of 10 other responses so far only one doesn't recommend listing columns.
  • People often make exceptions to that rule when posting to help sites like StackOverflow, because they often don't know what columns are in your table or are important to your query. For that reason, you'll see a lot of code here and elsewhere on the web that uses the * syntax, even though the poster would tend to avoid it in his own code.
Joel Coehoorn
+4  A: 

The two sides of the issue are this: Explicit column specification gives better performance as new columns are added, but * specification requires no maintenance as new columns are added.

Which to use depends on what kind of columns you expect to add to the table, and what the point of the query is.

If you are using your table as a backing store for an object (which seems likely in the LINQ-to-SQL case), you probably want any new columns added to this table to be included in your object, and vice-versa. You're maintaining them in parallel. For this reason, for this case, * specification in the SELECT clause is right. Explicit specification would give you an extra bit of maintenance every time something changed, and a bug if you didn't update the field list correctly.

If the query is going to return a lot of records, you are probably better off with explicit specification for performance reasons.

If both things are true, consider having two different queries.

Jeff Paulsen
A: 

An example as to why you never (imho) should use SELECT *. This does not relate to MSSQL, but rather MySQL. Versions prior to 5.0.12 returned columns from certain types of joins in a none-standard manner. Of course, if your queries defines which columns you want and in which order you have no problem. Imagine the fun if they don't.

(One possible exception: Your query SELECTs from just one table and you identify columns in your programming language of choice by name rather than position.)

Berserk
A: 

Using "SELECT *" optimizes for programmer typing. That's it. That's the only advantage.

Andy Lester