views:

43

answers:

6

I am looking to improve the performance of a query which selects several columns from a table. was wondering if limiting the number of columns would have any effect on performance of the query.

+1  A: 

Limiting the number of columns has no measurable effect on the query. Almost universally, an entire row is fetched to cache. The projection happens last in the SQL pipeline.

The projection part of the processing must happen last (after GROUP BY, for instance) because it may involve creating aggregates. Also, many columns may be required for JOIN, WHERE and ORDER BY processing. More columns than are finally returned in the result set. It's hardly worth adding a step to the query plan to do projections to somehow save a little I/O.

Check your query plan documentation. There's no "project" node in the query plan. It's a small part of formulating the result set.

To get away from "whole row fetch", you have to go for a columnar ("Inverted") database.

S.Lott
+3  A: 

Reducing the number of columns would, I think, have only very limited effect on the speed of the query but would have a potentially larger effect on the transfer speed of the data. The less data you select, the less data that would need to be transferred over the wire to your application.

tvanfosson
+3  A: 

I might be misunderstanding the question, but here goes anyway:

The absolute number of columns you select doesn't make a huge difference. However, which columns you select can make a significant difference depending on how the table is indexed.

If you are selecting only columns that are covered by the index, then the DB engine can use just the index for the query without ever fetching table data. If you use even one column that's not covered, though, it has to fetch the entire row (key lookup) and this will degrade performance significantly. Sometimes it will kill performance so much that the DB engine opts to do a full scan instead of even bothering with the index; it depends on the number of rows being selected.

So, if by removing columns you are able to turn this into a covering query, then yes, it can improve performance. Otherwise, probably not. Not noticeably anyway.

Quick example for SQL Server 2005+ - let's say this is your table:

ID int NOT NULL IDENTITY PRIMARY KEY CLUSTERED,
Name varchar(50) NOT NULL,
Status tinyint NOT NULL

If we create this index:

CREATE INDEX IX_MyTable
ON MyTable (Name)

Then this query will be fast:

SELECT ID
FROM MyTable
WHERE Name = 'Aaron'

But this query will be slow(er):

SELECT ID, Name, Status
FROM MyTable
WHERE Name = 'Aaron'

If we change the index to a covering index, i.e.

CREATE INDEX IX_MyTable
ON MyTable (Name)
INCLUDE (Status)

Then the second query becomes fast again because the DB engine never needs to read the row.

Aaronaught
+1 for showing the code and adding index that shows how INCLUDE works
SQLMenace
A: 

It can depend on the server you're dealing with (and, in the case of MySQL, the storage engine). Just for example, there's at least one MySQL storage engine that does column-wise storage instead of row-wise storage, and in this case more columns really can take more time.

The other major possibility would be if you had segmented your table so some columns were stored on one server, and other columns on another (aka vertical partitioning). In this case, retrieving more columns might involve retrieving data from different servers, and it's always possible that the load is imbalanced so different servers have different response times. Of course, you usually try to keep the load reasonably balanced so that should be fairly unusual, but it's still possible (especially if, for example, if one of the servers handles some other data whose usage might vary independently from the rest).

Jerry Coffin
A: 

yes, if your query can be covered by a non clustered index it will be faster since all the data is already in the index and the base table (if you have a heap) or clustered index does not need to be touched by the optimizer

SQLMenace
A: 

To demonstrate what tvanfosson has already written, that there is a "transfer" cost I ran the following two statements on a MSSQL 2000 DB from query analyzer.

SELECT datalength(text) FROM syscomments

SELECT text FROM syscomments

Both results returned 947 rows but the first one took 5 ms and the second 973 ms.

Also because the fields are the same I would not expect indexing to factor here.

Conrad Frix