views:

1470

answers:

13

I've always taken the approach of first deploying the database with a minimal set of indexes and then adding/changing indexes as performance dictates.

This approach works reasonably well. However, it still doesn't tell me where I could improve performance. It only tells me where performance is so bad that users complain about it.

Currently, I'm in the process of refactoring database objects on a lot of our applications.

So should I not bother to look for performance improvements since "premature optimization is the root of all evil"?

When refactoring application code, the developer is constantly looking for ways to improve the code quality. Is there a way to constantly be looking for improvements in database performance as well? If so, what tools and techniques have you found to be most helpful?

I've briefly played around with the "Database engine tuning advisor" but didn't find it to be helpful at all. Maybe I just need more experience interpreting the results.

+1  A: 

profile your queries, not the obvious ones, but the complex that access different tables, views, etc and/or the ones that return many rows from different tables

That will tell you exactly where you should focus

BlackTigerX
A: 

It seems that you're talking about MS SQL.

Start the profiler and record tehe most common queries you run on the database. Then run those queries with the Execution Plan turned on and you will see what (if anything) is slowing your queries down. You could then go and optimize the queries or add more indexes on your fields.

SQL Books will give you a good overview of both profiling and query analysis functionality.

Ilya Kochetov
+1  A: 

profiling is key, but when using a profiling set you MUST be sure that it is an accurate test set of data, otherwise the tuning tools will not be able to get you an accurate result what is needed.

Also the management objects with fragmentation an usage reporting in 2005 are very helpful!

Mitchel Sellers
+1  A: 

After you profile, put the queries you see as troublesome into SQL Query Analyzer and display the execution plan. Identify portions of the queries that are performing costly table scans and re-index those tables to minimize this cost.

Try these references:

Optimizing SQL
How to Optimize Queries

Arthur Miller
+4  A: 

SQL Server Execution Plan!!! Go here: http://dbalink.wordpress.com/2008/08/08/dissecting-sql-server-execution-plans-free-ebook/

MarlonRibunal
+6  A: 

My approach is to gather commands against the server or database into a table using SQL Server Profiler. Once you have that, you can query based on the max and avg execution times, max and avg cpu times, and (also very important) the number of times that the query was run.

Since I try to put all database access code in stored procedures it's easy for me to break out queries. If you use inline SQL it might be harder, since a change to a value in the query would make it look like a different query. You can try to work around this using the LIKE operator to put the same types of queries into the same buckets for calculating the aggregates (max, avg, count).

Once you have a "top 10" list of potential problems you can start looking at them individually to see if either the query can be reworked, an index might help, or making a minor architecture change is in order. To come up with the top 10, try looking at the data in different ways: avg * count for total cost during the period, max for worst offender, just plain avg, etc.

Finally, be sure to monitor over different time periods if necessary. The database usage might be different in the morning when everyone is getting in and running their daily reports than it is at midday when users are entering new data. You may also decide that even though some nightly process takes longer than any other query it doesn't matter since it's run during off hours.

Good luck!

Tom H.
A: 

You might want to check internal and external framentation of current indexes and either drop and re-create them or re organize them.

Charles Graham
A: 

Make sure you are profiling using production volumes - in terms of number of rows and load. The queries and their plans behave differently under different load/volume scenarios

Simon Munro
+1  A: 

Of course you have to profile your queries and look at the execution plan. But the two main things that come up over and over again are filter out as much as you can as soon as you can and try to avoid cursors.

I saw an application where someone downloaded an entire database table of events to a client and then went through each row one by one filtering based on some criteria. There was a HUGE performance increase in passing the filter criteria to the database and having the query apply the criteria in a where clause. This is obvious to people who work with databases, but I have seen similar things crop up. Also some people have queries that store a bunch of temp tables full of rows that they don't need which are then eliminated in a final join of the temp tables. Basically if you eliminate from the queries that populate the temp tables then there is less data for the rest of the query and the whole query runs faster.

Cursors are obvious. If you have a million rows and go row by row then it will take forever. Doing some tests, if you connect to a database even with a "slow" dynamic language like Perl and perform some row by row operation on a dataset, the speed will still be much greater than a cursor in the database. Do it with something like Java/C/C++ and the speed difference is even bigger. If you can find/eliminate a cursor in the database code, it will run much faster... If you must use a cursor, rewriting that part in any programming language and getting it out of the database will probably yield huge performance increases.

One more note on cursors, beware code like SELECT @col1 = col1, @col2 = col2, @col3 = col3 where id = @currentid in a loop that goes through IDs and then executes statements on each column. Basically this is a cursor as well. Not only that but using real cursors is often faster than this, especially static and forward_only. If you can change the operation to be set based it will be much faster.....That being said, cursors have a place for some things....but from a performance perspective there is a penalty to using them over set based approaches.

Also beware the execution plan. Sometimes it estimates operations that take seconds to be very expensive and operations that take minutes to be very cheap. When viewing an execution plan make sure to check everything by maybe inserting some SELECT 'At this area', GETDATE() into your code.

Cervo
+4  A: 

"premature optimization is the root of all evil"

In terms of database programming, I think this quote is nonsense. It is extremely expensive to re-write your whole application because your developers don't care to write efficient code the first time. All t-sql code should be thought of in terms of how it will affect database performance second (data integrity is, of course, first). Perfomance should trump everything except data integrity.

Yes, there are optimization things you shouldn't do until you have issues, but some things should be done as a matter of course and not fixed later. It takes no more time to write code that has a better chance of being efficient than code which will not be once you understand how you are affecting efficiency with the bad code. Cervo's discussion of cursor code is one example. Set-based actions are almost always much faster than cursor solutions, so cursors should not ever be written initially when a set-based solution will do. It almost always takes me less time to write a set-based solution that it would to write a cursor, but the only way to get that way is to never write cursors.

And there is no reason to ever use select * instead of specifying your field names. In MSSQL you can drag those names over from the object explorer so you can't tell me it's too hard to do that. But by specyfying only the fields you actually need, you save network resources and database server resources and web server resources. So why should a programmer ever take the lazy option of select * and worry about optimizing later?

The same thing with indexes. You say you do a minimal set of indexes. Depending on how you define minimal, that could be ok, but it is critical to have indexes on all foreign keys and I wouldn't want to push a database that didn't have indexes on a few fields that are most often in the where clauses. If your users are outside clients and not internal, they won't complain about how slow your site is, they will go elsewhere. It only makes busness sense to plan for efficient database access from the start.

One of my main concerns about failing to consider efficiency from the beginning is that the first couple of times that things are too slow companies tend to just throw more equipment at the issue rather than performance tune. By the time people start performacne tuning you have a several gigabyte or more database with many unhappy customers who are getting timeouts more than results. At this point, often almost everything in the database has to be re-written and in the meantime you are losing customers. I remember providing support at one company with a commercial application that it literally took ten minutes for the customer service reps to move from one screen to another while they were trying to help already disgruntled customers on the phone. You can imagine how many customers the company lost due to poorly designed database queries in the commercial product that we could not change.

HLGEM
I agree with this - Database systems involve bits of metal going around and back and forth. These bits of metal are not getting exponentially faster in the way that IC's are and getting the query wrong can make orders of magnitude difference to the performance.
ConcernedOfTunbridgeWells
Indeed, I replaced cursors with set-based code that took the performance form minutes to millseconds and sometimes from hours to seconds. No need to write this kind of code to begin with.
HLGEM
A: 

Generally, the tips here:

http://www.sql-server-performance.com/

have been high quality and useful for me in the past.

Sean
A: 

My advice would be to start with techniques applicable to all databases and then try the ones specific to MsSQL.

Optimizing SQL is difficult, and there are no hard and fast rules. There are very few generic guidelines that you can follow, such as:

  • 95% of performance improvements will come from the application, not from server or database engine configuration.
  • Design for correctness first, tweak for performance later
  • Reduce trips to the database
  • Try to express things in a way that fits your data model
  • Ignore generic advice about performance - yes, at some point you'll find a system or SQL statement where one of those rules does not apply.

But the key point is that you should always apply the 80-20 rule. Which means that in any system you need to tweak 20% (often much less) of your code for the biggest performance gains. That's where the vendor provided tools usually fail, as they cannot usually guess the application/business context of execution.

A: 

My advice is that "premature optimization is the root of all evil" in this context is absoulte nonsense.

In my view its all about design - you need to think about concurrency, hotspots, indexing, scaling and usage patterns when you are designing your data schema.

If you don't know what indexes you need and how they need to be configured right off the bat without doing profiling you have already failed.

There are millions of ways to optimize query execution that are all well and good but at the end of the day the data lands where you tell it to.