views:

130

answers:

3

Hi Guys,

I am working on a data driven web application that uses a SQL 2005 (standard edition) database.

One of the tables is rather large (8 million+ rows large with about 30 columns). The size of the table obviously effects the performance of the website which is selecting items from the table through stored procs. The table is indexed but still the performance is poor due to the sheer amount of rows in the table - this is part of the problem - the table is as equally read as updated, so we can't add / remove indexes without making one of the operations worse.

The goal I have here is to increase the performance when selecting items from the table. The table has 'current' data and old / barely touched data. The most effective solution we can think of at this stage is to seperate the table into 2, i.e, one for old items (before a certain date, say 1 Jan 2005) and one for newer items (equal to or before 1 Jan 2005).

We know of things like Distributed Partitioned Views - but all of these features require Enterprise Edition, which the client will not buy (and no, throwing hardware at it isn't going to happen either).

+3  A: 

You can always roll your own "poor man's partitioning / DPV," even if it doesn't smell like the right way to do it. This is just a broad conceptual approach:

  1. Create a new table for the current year's data - same structure, same indexes. Adjust the stored procedure that writes to the main, big table to write to both tables (just temporarily). I recommend making the logic in the stored procedure say IF CURRENT_TIMESTAMP >= '[some whole date without time]' - this will make it easy to backfill the data in this table which pre-dates the change to the procedure that starts logging there.

  2. Create a new table for each year in your history by using SELECT INTO from the main table. You can do this in a different database on the same instance to avoid the overhead in the current database. Historical data isn't going to change I assume, so in this other database you could even make it read only when it is done (which will dramatically improve read performance).

  3. Once you have a copy of the entire table, you can create views that reference just the current year, another view that references 2005 to the current year (by using UNION ALL between the current table and those in the other database that are >= 2005), and another that references all three sets of tables (those mentioned, and the tables that pre-date 2005). Of course you can break this up even more but I just wanted to keep the concept minimal.

  4. Change your stored procedures that read the data to be "smarter" - if the date range requested falls within the current calendar year, use the smallest view that is only local; if the date range is >= 2005 then use the second view, else use the third view. You can follow similar logic with stored procedures that write, if you are doing more than just inserting new data that is relevant only to the current year.

  5. At this point you should be able to stop inserting into the massive table and, once everything is proven to be working, drop it and reclaim some disk space (and by that I mean freeing up space in the data file(s) for reuse, not performing a shrink db - since you will use that space again).

I don't have all of the details of your situation but please follow up if you have questions or concerns. I have used this approach in several migration projects including one that is going on right now.

Aaron Bertrand
Thanks for the answer, we originally were under the impression that the historical data was still updatable but have recently found out that we can make it readonly. So your answer sounds like a good option, cheers :)
Scozzard
+1  A: 

Rebuild all your indexes. This will boost up performance of queries. How to do it is this and More on effect on rebuild of clustered and non-clustered index here

Secondly perform de-fragmentation on your drive on which the DB is stored.

Finally for a complete list of workarounds for tuning db's can be found here

HotTester
For index maintenance tasks like rebuild / reorganize you should use one of these utilities that help you take the guesswork out of which indexes to rebuild and which to reorganize. Michelle Ufford's script is here (watch her blog for a new version coming soon): http://sqlfool.com/2009/06/index-defrag-script-v30/ and Ola Hallengren's script is here: http://ola.hallengren.com/
Aaron Bertrand
We frequently reorganize and rebuild indexes appropriately as part of routine maintenance. The indexes are as optimized as they can be.The drive has been as defragmented as it will get - the system is a 24 hour system with no load balancing, so we can't take the system offline long enough to do a full defragment of the drive. I know that's not ideal, but them's the breaks in a not-ideal world.
Scozzard
What are the autogrowth settings on the file(s)? File system fragmentation shouldn't really be a performance issue unless your autogrowth setting is really small or the disk layout was otherwise poorly planned.
Aaron Bertrand
+1  A: 

performance is poor due to the sheer amount of rows in the table

8 million rows doesn't sound all that crazy. Did you check your query plans?

the table is as equally read as updated

Are you actually updating an indexed column or is it equally read and inserted to?

(and no, throwing hardware at it isn't going to happen either)

That's a pity because RAM is dirt cheap.

Jonas Elfström
Your right, 8 million rows isn't crazy. However, the table is the most commonly used in the system, frequently hit for both reads and writes. The query plans for both reads and writes are miserable due to trying to optimise the other - in other words we are at an impasse.
Scozzard
Re: Read/UpdatedYes, my bad, I meant write (inserted) - our write operations into that table are responsible for 30% of our SQL timeouts in the system. Unfortunately, as this is a synchronous web application, we don't have the luxury of just lengthening the timeout as performance is already a common complaint and this would just make it worse. We realise that the current structure is not going to continue 'working' and are looking for the best alternative structure which won't involve rebuilding the application from scratch (which while ideal, won't happen).
Scozzard
Re: RAM - yes you're right. Unfortunately in the contracting world, we can advise as much as we would like, but ultimately the decision is the clients. They have only just bought new hardware for it and a new refresh (or even upgrade) is 3 years away. At some point as well, the DB structure needs to be corrected for the amount of old vs. current data in there which is redundant - we'd like to try a fix rather than a patch.
Scozzard
Thanks for the clarifications. I have a feeling you wount get a much better answer than @AaronBertrand gave unless you provide us with table structure, indexes and maybe even query plans.
Jonas Elfström