views:

873

answers:

18

If I have two queries

SELECT Id, Forename, Surname
FROM Person
WHERE PersonName Like(‘%frank%’)

And

SELECT *
FROM Person
WHERE PersonName Like(‘%frank%’)

Which query will run faster? Is the where clause / table joining the biggest factor, or the number of columns returned?

I’m asking because I’m building a series of objects that map to database tables. Typically each object will have at least these three methods:

Select – Selects everything

List – Selects enough that is can be used to populate a dropdown list

Search – Selects everything that is viewable in the results, typically about 6 or so columns.

If each query returned exactly the same set of columns the maintenance and testing of the code should be more trivial. The database is unlikely to get above 50,000 rows in any given table, so if the difference in performance is small then I’ll take the saving in development time. If the performance is going to drop through the floor, then I’ll work at in from a different angle.

So, for ease of development, is SELECT * sensible, or is it naive?

+21  A: 

You better avoid SELECT *

  • It leads to confusion when you change the table layout.
  • It selects unneeded columns, and your data packets get larger.
  • The columns can get duplicate names, which is also not good for some applications
  • If all the columns you need are covered by an index, SELECT columns will only use this index, while SELECT * will need to visit the table records to get the values you don't need. Also bad for performance.
Quassnoi
I agree with most of your points but "The columns can get duplicate names" - that's news to me. How so?
paxdiablo
@Pax: if you have a self-join inside the query, or join two tables that have some columns with same names.
Quassnoi
@Pax Select * From Person p, Contact c Where c.PersonID = p.PersonID; this will normally have [Name] in the 2 tables for example
balexandre
I stand corrected - thanks for the elucidation. I have to admit I didn't consider multi-table queries. +1 for impugning your character :-)
paxdiablo
I would add that if some of the excluded columns(but listed when using SELECT *) are of type TEXT or similar it will become extremely slow. Additionally using SELECT * sometimes may lead to some unexpected results like here http://stackoverflow.com/questions/321468/select-from-table-vs-select-cola-colb-etc-from-table-interesting-behaviour-in-s (I guess that could be one of the examples of "It leads to confusion when you change the table layout.")
kristof
A: 

Sure. Better name the columns you want to retrieve.

Luixv
A: 

If person only has Id, Forename, and Surname, the queries should be equivalent. However, the query time will grow proportionately to the number of column (really amount of data) returned.

Also, if query will only ever need those three columns, you should only ask for those three. If you SELECT * and you change your schema later, you're basically just adding extra processing to all of your queries with not real added benefit.

Jacob Adams
A: 

I would visit this question on why using the "Select * " construct is not preferred.

In my experience selecting 3 columns versus select * in a 3 column table might not have a noticeable impact performance wise but as tables get larger and wider you will notice a performance difference.

brendan
A: 

Generally, in any situation, you want to stay away from using

SELECT * FROM TABLE

in your code. Doing so can lead to several issues, only one of which is performance. Two others I can think of off the top of my head are resource utilization (if you're selecting columns you don't need, or somebody adds columns later...you're bringing back data and wasting memory) and code readability (if somebody sees SELECT * FROM in your code...they're not necessarily going to know which columns are actually being used in your application).

Just a couple of things to think about...but the best practice is NOT to use it.

Justin Niessner
A: 

Yes it does. Basically:

  • More data has to be transfered from your database server
  • The database server has to fetch more data

You shouldn't use select *

Razzie
A: 

In addition to the other answers, consider that SELECT * will return data from all tables in the query. Start adding other tables through JOINs, and you'll start seeing things you don't want to see.

I believe I've also seen cases where SELECT * requires data actually be fetched from a joined table, as opposed to only using the indexes on that table to help narrow down the overall result set. I can't think of an example of that, though.

John Saunders
+1  A: 

If remember correctly from college (and its been awhile), selecting * is not prefered, but not that bad -- until you start joining. When you get into the relational alegbra of creating the joined tuples, every column adds to time, so I would definately avoid it if possible.

dsrekab
+4  A: 

SELECT * is usually never a good idea. It may not slow down your DBMS fetch a lot but it will probably result in more data being transmitted over the network than you need.

However, that's likely to be swamped into insignificance by the use of the LIKE '%frank%' clause which is basically non-indexable and will result in a full table scan.

You might want to consider cleaning up the data as it enters the database since that will almost certainly make subsequent queries run much faster.

If you're after frank, then make sure it's stored as frank and use:

select x,y,z from table where name = 'frank'

If you want to get franklin as well, use:

select x,y,z from table where name like 'frank%'

Both of these will be able to use an index on the name column, "%frank%" will not.

paxdiablo
A: 

There are multiple dimensions to this. For once the * will make your code more fragile. When in later versions you change table layouts code that relies on column order might break - or might not but read or modify the wrong columns if the data types still match which can be a really nasty problem!

Moreover if you always request all columns you will require more memory on your database client and the database server for the unneeded columns. This can be really expensive if the table contains long character fields, very many fields and/or BLOBs. Selecting unnecessary columns will also thrash the server's cache by flooding it with superflous contents that is never looked at by a client.

So in general you should not use it. Most object relational mapping tools generate SQL that contains all column names anyway, so during development this is probably not an issue anyway. I personally only tend to use * for quick ad-hoc queries that I have to type manually.

Daniel Schneller
+1  A: 

The number of columns in the table does not affect the performance of your query. The number of columns operated upon in the query will.

Note the following example from the Oracle concepts manual:

Row Format and Size Oracle stores each row of a database table containing data for less than 256 columns as one or more row pieces. If an entire row can be inserted into a single data block, then Oracle stores the row as one row piece. However, if all of a row’s data cannot be inserted into a single data block or if an update to an existing row causes the row to outgrow its data block, then Oracle stores the row using multiple row pieces. A data block usually contains only one row piece for each row. When Oracle must store a row in more than one row piece, it is chained across multiple blocks.

When a table has more than 255 columns, rows that have data after the 255th column are likely to be chained within the same block. This is called intra-block chaining. A chained row’s pieces are chained together using the rowids of the pieces. With intra-block chaining, users receive all the data in the same block. If the row fits in the block, users do not see an effect in I/O performance, because no extra I/O operation is required to retrieve the rest of the row.

HOWEVER: If there are 400 columns, I would bet that most rows will not fit in one block and hence you will see a lot more 'db file sequential read' than normally required. As well, I remember that Steve Adams (or someone long ago) mentioning that there is an additional cost for accessing a column "further down the list" - sorry don't have that link.

hypoxide
Does this apply to MS Sql as well?
ilivewithian
+2  A: 

Regardless of performance issues, it is good practice to always enumerate all fields in your queries.

  • What if you decide to add a TEXT or BLOB column in the future that is used for a particular query? Your SELECT * will return the additional data whether you need it or not.
  • What if you rename a column? Your SELECT * will always work, but the relying code will be broken.
Ozgur Ozcitak
A: 

This is the correct way and the most optimal. The reason is that your only gathering the data needed so it takes up the correct space (What you need) in storing the data before you get your results.

SELECT Id, Forename, Surname
FROM Person
WHERE PersonName Like(‘%frank%’)

This is incorrect as it takes up unused fields which takes up more space to run your query which slows down your results. Even if you get lucky and use all the fields in your query it's best to list them individually. This will clarify the query and what data is to be returned to any other developer who might need to modify the query in the future.

SELECT *
FROM Person
WHERE PersonName Like(‘%frank%’)
Phill Pafford
+1  A: 

For small projects, you can usually get away with select *. It's "right" to not do that, though. You won't notice any appreciable speed difference for one table in a non-index query... the only thing you're appreciably doing is using more bandwidth for columns you don't read.

That said, you will notice a difference in index-only queries where you're hitting the full table when you only needed to hit the index. This will especially crop up when you're doing joins.

Select * does have uses though, and if you use it properly (say, in combination with a cache, making sure it's select table.*, and addressing results by column name) you can reduce queries made by your application.

Autocracy
+3  A: 

I'm going to go against the flow here and say you should go with the select *. I think that premature optimization is the root of a lot of problems, and you may well find that it doesn't affect your performance when you get to real utilization. Of course, by the book it is slower, it must be, but that doesn't mean the difference is important in practice.

Something to be aware of, though, is that some SQL engines (MS-SQL for sure) will cache the select *, so if you are using a prepared statement, or a view or stored procedure that has it, and change the table schema, it won't pick up on the change unless the view or sp is recompiled, so that is a good reason to avoid doing it if you aren't running these queries dynamically.

And of course, this varies by database engine, so a little load testing would be in order to make sure the hit isn't obviously large.

Yishai
A: 

the only time i use "select * " is not event really a "select *"

specifically:

select count(*) from table

is not that same as

select count(ID) from table

the first returns the number of rows in the table
but the second returns the number of rows with a NOT NULL ID value.

a subtle distinction but worth remembering.

ShoeLace
Another acceptable usage of SELECT * is in the subquery of an EXISTS clause.
onedaywhen
A: 

SELECT * will be slower since it has to transfer more data. Also because of some other reasons already mentioned. It really becomes a problem when joining tables since you start adding many more columns, when really all you want to do is join so you can filter.

If you really want to use * , specify the table you want all the columns from, like SELECT Person.* FROM Person...

That will narrow down the amount of data returned and makes it a little more readable.

Brent Baisley
A: 

Let me play devils advocate and suggest a scenario where SELECT * is a better choice. Suppose you are creating a user interface where you take the results of the dataset and display it in some form of table or grid. You could build the columns in the UI to match the columns in the dataset and do the SELECT * FROM MyView.

By using a View in the database you have complete control over what columns are returned by the query and the UI can by dynamic enough to display all of the columns. Changes to the view would be reflected immediately in the UI without recompiling and re0 Obviously I would suggest following the previous advice and specify all of the columns in the view definition.

Just thought I would add that as sometimes people get dogmatic about following certain rules and forget that context matters.

Darrel Miller