tags:

views:

76

answers:

3

As part of my role at the firm I'm at, I've been forced to become the DBA for our database. Some of our tables have rowcounts approaching 100 million and many of the things that I know how to do SQL Server(like joins) simply break down at this level of data. I'm left with a couple options

1) Go out and find a DBA with experience administering VLDBs. This is going to cost us a pretty penny and come at the expense of other work that we need to get done. I'm not a huge fan of it.

2) Most of our data is historical data that we use for analysis. I could simply create a copy of our database schema and start from scratch with data putting on hold any analysis of our current data until I find a proper way to solve the problem(this is my current "best" solution).

3) Reach out to the developer community to see if I can learn enough about large databases to get us through until I can implement solution #1.

Any help that anyone could provide, or any books you could recommend would be greatly appreciated.

A: 

DBA's are worth their weight in gold, if you can find a good one. They specialize in doing the very thing that you are describing. If this is a one time problem, maybe you can subcontract one.

I believe Microsoft offers a similar service. You might want to ask.

Robert Harvey
A: 

You'll want to get a DBA in there, at least on contract to performance tune the database.

Joining to a 100 Million record table shouldn't bring the database serer to its knees. My company customers do it many hundreds (possibly thousands) of times per minute on our system.

mrdenny
+1  A: 

Here are a few thoughts, but none of them are quick fixes:

  1. Develop an archival strategy for the data in your large tables. Create tables with similar formats to the existing transactional table and copy the data out into those tables on a periodic basis. If you can get away with whacking the data out of the tx system, then fine.

  2. Develop a relational data warehouse to store the large data sets, complete with star schemas consisting of fact tables and dimensions. For an introduction to this approach there is no better book (IMHO) than Ralph Kimball's Data Warehouse Toolkit.

  3. For analysis, consider using MS Analysis Services for pre-aggregating this data for fast querying.

  4. Of course, you could also look at your indexing strategy within the existing database. Be careful with any changes as you could add indexes that would improve querying at the cost of insert and transactional performance.

  5. You could also research partitioning in SQL Server.

  6. Don't feel bad about bringing in a DBA on contract basis to help out...

To me, your best bet would be to begin investigating movement of that data out of the transactional system if it is not necessary for day to day use.

Of course, you are going to need to pick up some new skills for dealing with these amounts of data. Whatever you decide to do, make a backup first!

One more thing you should do is ensure that your I/O is being spread appropriately across as many spindles as possible. Your data files, log files and sql server temp db data files should all be on separate drives with a database system that large.

jn29098
Thanks, this is exactly the jumping off point I was hoping for.
Jonathan Beerhalter