views:

57

answers:

5

I've got an application that generates data for reports that look like:

                    age < 30   | age >=30  |   asian   | hispanic
-----------------------------------------------------------------
clients in prog A              |          |           |
-----------------------------------------------------------------
clients in prog B              |          |           |
-----------------------------------------------------------------
number clients                 |          |           |
-----------------------------------------------------------------
number children                |          |           |

The queries are sometimes very very long, and I'd like to optimize them.

I don't have permissions on the server to run the query analyzer (and I read that it's often better not to use it's suggestions). The longest sprocs take ~35 seconds to execute.

Reading around, the things to avoid for high query optimization are :

  • Select *
  • exists
  • distinct
  • cursors
  • having

I have a few questions about the task at hand:

  • how much of a difference am I looking at by changing Select * into Select colA, colB ... ? Is it really worth the trouble?
  • how can I optimize if exists( ... )? Is if( Select Count(query ) > 0 ) a good optimization?
  • If I am really going to return all of the columns in a table, is it okay to use Select * ?

I don't want to post these queries because they are so long and terrible, but what other suggestions might you be able to offer? I'm trying to use re-usable functions and temporary tables wherever possible to ease the strain both on my brain and on the server.

+1  A: 

Can you post the query

here are just some pointers because you are not showing any code

in general exists is faster then count(*) because exists returns the moment it found a match where count() will continue until it has reached the end of the result set

select col1, col2 is better than select * because if the columns are in a non clustered index then the base table/clustered index won't even be touched, this is even more true now that you have included columns in indexes. you will also use less bandwith if you return only the columns that you need

If I am really going to return all of the columns in a table, is it okay to use Select * ?

what if someone adds 4 columns to the table later on? Now you will be returning those 4 columns also

SQLMenace
+1  A: 

1) how much of a difference am I looking at by changing Select * into Select colA, colB ... ? Is it really worth the trouble?
That can make quite a big difference - it's always good practice generally to specify the fields you want and ONLY those fields. i.e. if you do a SELECT * to return 50 fields when you only need 2 of them, and those 2 fields are included in a suitable index then all the data can be provided from the index without having to look up the rest of the data from the data pages. So this is much better.

2) how can I optimize if exists( ... )? Is if( Select Count(query ) > 0 ) a good optimization?
No...SELECT COUNT() is worse. EXISTS is the most performant way to do this kind of thing as it is optimised to stop checking as soon as it finds the first matching record. Whereas COUNT() will keep going til it's found them all which is unnecessary. I wouldn't be classing "EXISTS" in the bad camp with cursors at all tbh.

3) If I am really going to return all of the columns in a table, is it okay to use Select *?
Well, if you truly want them all then it doesn't matter as much. That assumes if you want to add more columns in future then you also want those to also be returned which could break existing code if it suddenly changes.

AdaTheDev
as far as number 3, if you have any joins at all, select * has repeated columns (the join fields) and is adding extra data sent across the network for no reason at all. Select * should not be used on production systems.
HLGEM
A: 

You will not get much of a benefit from changing from Select * to Select column1,column2,.... However, you should do it because it is good coding. If someone changed the column order or number of columns in the future it could cause your reports to break depending on how they are built.

How about another approach? If you are able to add non-clustered indexes on your tables I would suggest looking into that. Specifically, look at your exist sub-queries and see if the columns that are in the Where section have an index on them. If they do not then you will be doing a table scan every time the exists returns false and you could be doing up to a table scan every time even if it returns true (it depends on where the value is at). The non-clustered indexes will allow the sub-queries to quickly find any results in your table. Sometimes you have to use inefficient queries but if you can optimize your table structure through indexes then it makes much less of an impact on your speed.

Also, for your Exists sub-queries is it ever the case that you will have at most 1 result? If so then you might want to try doing a left join to the table. That probably won't help if you do not an index on both the column sets on the left and right of your join but if you do it should be pretty helpful as you would basically scan your right hand table 1 time instead of once per row.

RandomBen
+1  A: 

I'm trying to use re-usable functions and temporary tables wherever possible to ease the strain both on my brain and on the server

Assuming you mean user-defined functions, they're not always good for performance. Seeking to ease the strain on your brain can come at the expense increasing the strain on the server. Ones that are purely scalar (ie they take a value, manipulate it and return another value) should be fine, but ones that scan tables can usually run quicker when their logic is used in the stored procedure directly. As an example, a function that scans Table X for occurrences of value Y and returns a count will run slower (because of the repeated calls to it) than a SQL statement containing a join that can do every value's count in one go.

You should also check if there are indices on the relevant source tables and whether they are being used.

CodeByMoonlight
A: 

For counting, the most effective form is SELECT Count(1) FROM table. (Or 0 or 123 or any simple constant value).

You should change to SELECT field1, field2,.. for manageability, too. SELECT * is slower, and later you may run into problems when code, views or tables (or more of them) change.

Dercsár