views:

88

answers:

4

I'd like some suggestions for online resources (blogs, guides, etc - not forums) to help me become good at designing high performance SQL Server databases that operate with large amounts of data and have heavy loads in terms of data turnover and queries per minute.

Suggestions?

EDIT

The load I'm talking about is mainly in terms of data turnover. The main table has up to a million rows, about 30 fields of data of varying size and is updated with about 30-40000 new rows per day and at least 200000 rows are updated with new data every day. These updates happen on a continuing basis throughout the day. On top of this, all changes and updates need to be pulled from the database throughout the day to keep a large Lucene index up to date.

+1  A: 

http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=high+performance+database

This is subject better explored first with books as it is highly technical and complex.

I will point out that the people who created this website include several who work with very large databases. You can learn alot from them. http://lessthandot.com/

HLGEM
+2  A: 

You might try the SQL Server samples on CodePlex or DatabaseAnswers.com.

tvanfosson
+1  A: 

Here are some resources about troubleshooting and optimizing performance in SQL Server, that I've found really helpful:

http://updates.sqlservervideos.com/2009/09/power-up-with-sql-server-sql-server-performance.html

In particular, effective use of indexes can be a huge performance booster. I think that most web applications, in most circumstances, do a lot more reading than writing. Also, the sargability of an expression can have a serious impact on performance.

RMorrisey
+1  A: 

Sounds like a fairly manageable load on a moderate server - you haven't said what kind of read operations are happening while these inserts and updates are going on (other than the extractions for Lucene) and the size (byte-wise/data type-wise) of the data (the cardinality you have given seems fine).

At this point, I would recommend just using regular SQL Server best practices - determine a schema which is appropriate (normalize, then denormalize only if necessary), review execution plans, use the index tuning wizard, use the DMVs to find the unused indexes and remove them, choose clustered indexes carefully to manage page splits, carefully choose data types and size and use referential integrity and constraints where possible to give the optimizer as much help as possible. Beyond that is performance counters and ensuring your hardware/software installation is tuned.

In many/most cases, you'll never need to go beyond that to actually re-engineer your architecture.

However, even after all that, if the read load is heavy, the inserts and updates can cause locking issues between reads and writes, and then you are looking at architectural decisions for your application.

Also, the million rows and 200k updates a day wouldn't worry me - but you mention Lucene (i.e. full text indexing), so presumably some of the columns are rather large. Updating large columns and exporting them obviously takes far longer - and far more bandwidth and IO. 30 columns in a narrow million row table with traditional data type columns would be a completely different story. You might want to look at the update profile and see if you need to partition the table vertically to move some columns out of the row (if they are large, they will already be stored out of row) to improve the locking behavior.

So the key thing when you have heavy read load: Inserts and updates need to be as fast as possible, lock as little as possible (avoiding lock escalation), update as few indexes as can be afforded to support the read operation.

If the read load is so heavy (so that the inserts/updates start to conflict) but does not require 100% up to date information (say a 5 minute or 15 minute delay is not noticeable), you can have a read only version of the database which is maintained (either identical through replication, differently indexed for performance, denormalized or differently modeled - like a dimensional model). Perhaps your Lucene indexes can contain additional information so that the expensive read operations all stay in Lucene - i.e. Lucene becomes covering for many large read operations, thereby reducing your read load on the database to essential reads which support the inserts/updates (these are typically small reads) and the transactional part of your app (i.e. say a customer service information screen would use the regular database, while your hourly dashboard would use the secondary database).

Cade Roux
Many thanks for the detailed response - will have to go through this to digest it all.
Nathan Ridley