Very large tables in SQL Server

views:

2428

answers:

+6 Q:

Very large tables in SQL Server

We have a very large table (> 77M records and growing) runing on SQL Server 2005 64bit Standard edition and we are seeing some performance issues. There are up to a hundred thousand records added daily.

Does anyone know if there is a limit to the number of records SQL server Standard edition can handle? Should be be considering moving to Enterprise edition or are there some tricks we can use?

Additional info:

The table in question is pretty flat (14 columns), there is a clustered index with 6 fields, and two other indexes on single fields.

We added a fourth index using 3 fields that were in a select in one problem query and did not see any difference in the estimated performance (the query is part of a process that has to run in the off hours so we don't have metrics yet). These fields are part of the clustered index.

+4 A:

http://msdn.microsoft.com/en-us/library/ms143432.aspx

You've got some room to grow.

As far as performance issues, that's a whole other question. Caching, sharding, normalizing, indexing, query tuning, app code tuning, and so on.

Tom Ritter 2009-03-24 13:54:00

+5 A:

The first thing I'd look at is indexing. If you use the execution plan generator in Management Studio, you want to see index seeks or clustered index seeks. If you see scans, particularly table scans, you should look at indexing the columns you generally search on to see if that improves your performance.

You should certainly not need to move to Enterprise edition for this.

mwigdahl 2009-03-24 13:54:00

Good answer because it gets right to the point on the next step - find out what's going on. Many other answers are "try this" advice, often unlikely and expensive in time and/or $$.

le dorfier 2009-03-25 19:44:31

+1 A:

Standard should be able to handle it. I would look at indexing and the queries you use with the table. You want to structure things in such a way that your inserts don't cause too many index recalcs, but your queries can still take advantage of the index to limit lookups to a small portion of the table.

Beyond that, you might consider partitioning the table. This will allow you to divide the table into several logical groups. You can do it "behind-the-scenes", so it still appears in sql server as one table even though it stored separately, or you can do it manually (create a new 'archive' or yearly table and manually move over rows). Either way, only do it after you looked at the other options first, because if you don't get that right you'll still end up having to check every partition. Also: partitioning does require Enterprise Edition, so that's another reason to save this for a last resort.

Joel Coehoorn 2009-03-24 13:54:21

+1 A:

In and of itself, 77M records is not a lot for SQL Server. How are you loading the 100,000 records? is that a batch load each day? or thru some sort of OLTP application? and is that the performance issue you are having, i.e adding the data? or is it the querying that giving you the most problems?

If you are adding 100K records at a time, and the records being added are forcing the cluster-index to re-org your table, that will kill your performance quickly. More details on the table structure, indexes and type of data inserted will help.

Also, the amount of ram and the speed of your disks will make a big difference, what are you running on?

EJB 2009-03-24 14:29:25

What type of disks do you have?

You might monitor some disk counters to see if requests are queuing.

You might move this table to another drive by putting it in another filegroup. You can also to the same with the indexes.

Sam 2009-03-24 16:51:26

+2 A:

Hi,

Do you really need to have access to all 77 million records in a single table?

For example, if you only need access to the last X months worth of data, then you could consider creating an archiving strategy. This could be used to relocate data to an archive table in order to reduce the volume of data and subsequently, query time on your 'hot' table.

This approach could be implemented in the standard edition.

If you do upgrade to the Enterprise edition you can make use of table partitioning. Again depending on your data structure this can offer significant performance improvements. Partitioning can also be used to implement the strategy previously mentioned but with less administrative overhead.

Here is an excellent White paper on table partitioning in SQL Server 2005

http://msdn.microsoft.com/en-us/library/ms345146.aspx

I hope what I have detailed is clear and understandable. Please do feel to contact me directly if you require further assistance.

Cheers,

John Sansom 2009-03-24 16:57:18

Possibly, but there are many, many other more likely questions that don't appear to have been asked yet.

le dorfier 2009-03-25 19:41:55

+2 A:

[there is a clustered index with 6 fields, and two other indexes on single fields.]

Without knowing any details about the fields, I would try to find a way to make the clustered index smaller.

With SQL Server, all the clustered-key fields will also be included in all the non-clustered indices (as a way to do the final lookup from non-clustered index to actual data page).

If you have six fields at 8 bytes each = 48 bytes, multiply that by two more indices times 77 million rows - and you're looking at a lot of wasted space which translates into a lot of I/O operations (and thus degrades performance).

For the clustered index, it's absolutely CRUCIAL for it to be unique, stable, and as small as possible (preferably a single INT or such).

Marc

marc_s 2009-03-25 19:31:06

Simply not true. A clustered index needn't be unique, stable, and size is irrelevant because the whole record is always available.

le dorfier 2009-03-25 19:42:58

the clustered index MUST BE unique - and its size DOES MATTER - all its fields are included in all non-clustered indices.

marc_s 2009-03-25 20:07:05

the size inside the clustered index itself doesn't matter - true. but the fields of the clustered index are included in every entry of every single non-clustered index --> you want to minimize that by all means.

marc_s 2009-03-25 20:09:25

HEre's a link to a blog post about Kim Tripp's "Indexing best practices" training: http://blogs.technet.com/josebda/archive/2009/03/17/indexing-best-practices-for-sql-server-2008.aspx.One of the main points: "Clustered Index keys: Unique, narrow, static, ever increasing"

marc_s 2009-03-25 20:12:40

"Ever increasing" that depends -- if you are primarily using a lot of RANGE (e.g. BETWEEN) queries then it's generally better to make that the clustered index (poor man's partitioning) than a semantically meaningless ID column... But it depends on the app! That's why we do Database design :D

Matt Rogish 2009-03-25 23:15:47

Yes - it's a general guideline and a good idea - but there might be cases as you mention where something else is even better!

marc_s 2009-03-26 06:02:08

Initially I wanted to agree with Marc. The width of your clustered index seems suspect, as it will essentially be used as the key to perform lookups on all your records. The wider the clustered index, the slower the access, generally. And a six field clustered index feels really, really suspect.

Uniqueness is not required for a clustered index. In fact, the best candidates for fields that should be in the clustered index are ones that are not unique and used in joins. For example, in a Persons table where each Person belongs to one Group and you frequently join Persons to Groups, while accessing batches of people by group, Person.group_id would be an ideal candidate, for this particular use case.

Mark Canlas 2009-03-25 19:36:44

+2 A:

Agreeing with Marc and Unkown above ... 6 indexes in the clustered index is way too many, especially on a table that has only 14 columns. You shouldn't have more than 3 or 4, if that, I would say 1 or maybe 2. You may know that the clustered index is the actual table on the disk so when a record is inserted, the database engine must sort it and place it in it's sorted organized place on the disk. Non clustered indexes are not, they are supporting lookup 'tables'. My VLDBs are laid out on the disk (CLUSTERED INDEX) according to the 1st point below.

Reduce your clustered index to 1 or 2. The best field choices are the IDENTITY (INT), if you have one, or a date field in which the fields are being added to the database, or some other field that is a natural sort of how your data is being added to the database. The point is you are trying to keep that data at the bottom of the table ... or have it laid out on the disk in the best (90%+) way that you'll read the records out. This makes it so that there is no reorganzing going on or that it's taking one and only one hit to get the data in the right place for the best read. Be sure to put the removed fields into non-clustered indexes so you don't lose the lookup efficacy. I have NEVER put more than 4 fields on my VLDBs. If you have fields that are being update frequently and they are included in your clustered index, OUCH, that's going to reorganize the record on the disk and cause COSTLY fragmentation.
Check the fillfactor on your indexes. The larger the fill factor number (100) the more full the data pages and index pages will be. In relation to how many records you have and how many records your are inserting you will change the fillfactor # (+ or -) of your non-clustered indexes to allow for the fill space when a record is inserted. If you change your clustered index to a sequential data field, then this won't matter as much on a clustered index. Rule of thumb (IMO), 60-70 fillfactor for high writes, 70-90 for medium writes, and 90-100 for high reads/low writes. By dropping your fillfactor to 70, will mean that for every 100 records on a page, 70 records are written, which will leave free space of 30 records for new or reorganized records. Eats up more space, but it sure beats having to DEFRAG every night (see 4 below)
Make sure the statistics exist on the table. If you want to sweep the database to create statistics using the "sp_createstats 'indexonly'", then SQL Server will create all the statistics on all the indexes that the engine has accumulated as requiring statistics. Don't leave off the 'indexonly' attribute though or you'll add statistics for every field, that would then not be good.
Check the table/indexes using DBCC SHOWCONTIG to see which indexes are getting fragmented the most. I won't go into the details here, just know that you need to do it. Then based on that information, change the fillfactor up or down in relation to the changes the indexes are experiencing change and how fast (over time).
Setup a job schedule that will do online (DBCC INDEXDEFRAG) or offline (DBCC DBREINDEX) on individual indexes to defrag them. Warning: don't do DBCC DBREINDEX on this large of a table without it being during maintenance time cause it will bring the apps down ... especially on the CLUSTERED INDEX. You've been warned. Test and test this part.
Use the execution plans to see what SCANS, and FAT PIPES exist and adjust the indexes, then defrag and rewrite stored procs to get rid of those hot spots. If you see a RED object in your execution plan, it's because there are not statistics on that field. That's bad. This step is more of the "art than the science".
On off peak times, run the UPDATE STATISTICS WITH FULLSCAN to give the query engine as much information about the data distributions as you can. Otherwise do the standard UPDATE STATISTICS (with standard 10% scan) on tables during the weeknights or more often as you see fit with your observerations to make sure the engine has more information about the data distributions to retrieve the data for efficiently.

Sorry this is so long, but it's extremely important. I've only give you here minimal information but will help a ton. There's some gut feelings and observations that go in to strategies used by these points that will require your time and testing.

No need to go to Enterprise edition. I did though in order to get the features spoken of earlier with partitioning. But I did ESPECIALLY to have much better mult-threading capabilities with searching and online DEFRAGING and maintenance ... In Enterprise edition, it is much much better and more friendly with VLDBs. Standard edition doesn't handle doing DBCC INDEXDEFRAG with online databases as well.

SnapJag 2009-03-25 23:05:13

ansaurus

tags:

views:

answers:

Very large tables in SQL Server

related questions