tags:

views:

408

answers:

7

Currently, I am developing a product that does fairly intensive calculations using MS SQL Server 2005. At a high level, the architecture of my product is based on the concept of "runs" where each time I do some analytics it gets stored in a series of run tables (~100 tables per run).

The problem I'm having is that when the number of runs grows to be about 1,000 or so after a few months, performance on the database really seems to drop off, and specifically simple queries like checking for the existence of tables or creating views can take up to a second to two.

I've heard that using multiple filegroups, which I'm not currently doing, could help. Is this true, and if so, why/how would that help? Also, if there are other suggestions, even ones like, use fewer tables, I'm open to them. I just want to speed the database up and hopefully get it in a state where it will scale.

A: 

It could if you place them on separate drives - not logical but physical drives so IO is not slowing you down so much.

Per Hornshøj-Schierbeck
A: 

The file groups being on different physical drives is what will give you the biggest performance boost, can also split up where the indexes are housed so that table writes and index accesses are hitting different disks. There's a lot you can do with partitioning, but that general concept is where the biggest speed impact comes from.

Nick Craver
A: 

It can help with performance. moving certain tables/elemnts to distinct file areas/portions of the disk. this can reduce to a certain extent the amount of external fragmentation impacting the daabase.

I would also look at other factors such as tracesql to determine why queries etc are slowing down - there can be other factors such as query statistics, SP recompiles etc that are easier to fix and can give you greater gains in performance.

MikeJ
+1  A: 

About 1000 of what? Single row writes? Multiple row transactions? Deletes?

A general tip would be to place the data files and log files on separate physical drives. SQL Server keeps track of every write to the log so having those in different drives should give you a general better performance.

But SQL Server tuning depends on what the application is actually doing. There are general tips but you have to measure your own thing...

Alejandro Mezcua
+3  A: 

In terms of performance, the big gain in using separate files/filegroups is that it lets you spread your data across multiple physical disks. This is beneficial because with several disks, multiple data requests can be handled simultaneously (parallel is generally faster than serial). All other things being equal, this would tend to benefit performance, but the question of how much depends on your particular data set and the queries you're running.

From your description, the slow operations you're concerned about are creating tables and checking for the existence of tables. If you are generating 100 tables per run, then after 1000 runs you have 100,000 tables. I don't have much experience with creating that many tables in a single database, but you may be pressing the limits of the system tables that track the database schema. In this case, you might see some benefit by spreading your tables across more than one database (these databases could still all live within the same instance of SQL Server).

In general, the SQL Profiler tool is the best starting point for finding slow queries. There are data columns which indicate the CPU and IO cost of each SQL batch, which should point you to the worst offenders. Once you have found the problem queries, I would use the Query Analyzer to generate query plans for each of these queries, and see if you can tell what's making them slow. Do this by opening a query window, entering your query, and hitting Ctrl+L. A complete discussion of what might be slow would fill an entire book, but good things to look for are table scans (very slow for large tables) and inefficient joins.

In the end, you may be able to improve things simply by rewriting your queries, or you may have to make more broad changes to the table schema. For instance, maybe there's a way to create only one or a few tables per run, instead of 1000. More specifics about your particular setup would help us give a more detailed answer.

I also recommend this website for lots of tips on how to make things faster:

http://www.sql-server-performance.com/

Charlie
+1  A: 

When you talk about 100 tables per run, do you actually mean that you're creating new SQL tables? If so, I think that the architecture of your application may be the issue. I can't imagine a situation where you would need that many new tables as opposed to reusing the same few tables multiple times and simply adding a column or two to differentiate between runs.

If you're already reusing the same group of tables and new runs just mean additional rows in those tables, then the issue could simply be that the new data over time is hurting performance in one of several ways. For example:

  1. The tables/indexes could be fragmented after awhile. Make sure that all of your tables have a clustered index. Check for fragmentation using sys.DM_DB_INDEX_PHYSICAL_STATS and issue ALTER INDEX with the REBUILD option if needed to defrag them.
  2. The tables could simply be too large, so that inefficient on small tables are now obvious on the larger tables. Look into proper indexes on the tables to improve performance.
  3. SQL Server will cache query plans (especially for stored procedures), but if the data in a table changes significantly over time that query plan may no longer be appropriate. Look into sp_recompile for your stored procedures to see if that's needed.

#2 is the culprit that I see most often in real world situations. Developers tend to develop using only a small set of test data and overlook proper indexing because you can do almost anything with a table of 20 rows and it will look fast.

Hope this helps

Tom H.
A: 

Split the tables across separate physical drives. If you have that much disk IO, you need a decent IO solution. Raid 10, fast disks, split the logs and DBs onto separate drives.

Re-examine your architecture - can you use multiple databases? If you create 1000s of tables in a go, you will soon hit some interesting bottlenecks that I've not had to deal with before. Multiple DBs should solve that. Think about having one "Controlling" db containing all your main meta-data, and then satellite DBs containing the actual data.

You don't mention any specs about your server - but we saw a decent increase in performance when we went from 8GB to 20GB RAM.

Valerion