views:

665

answers:

10

Is there some hard and fast rule about how big is too big for a SQL table?

We are storing SCORM tracking data in a name/value pair format and there could be anywhere from 4-12 rows per user per course, down the road is this going to be a bad thing since there are hundreds of courses and thousands of users?

+2  A: 

Not really. It all depends on your business needs, and you'll have to buy the product that supports your estimated row count.

Otávio Décio
+6  A: 

I personally have had tables in production with 50 million rows, and this is small compared with I have heard. You might need to optimize your structure with partioning but until you test your system in your environment you shouldn't waste time doing that. What you described is preety small IMHO

I should add I was using SQL Server 2000 & 2005, each DBMS has its own sizing limitions.

JoshBerke
+5  A: 
Too Many == Just Enough + 1
Paul Tomblin
(nothing constructive) hahaha thats an awesome answer :D
Anders
+4  A: 

The magic number is billions. Until you get to billions of rows of data, you're not talking about very much data at all.

Do the math.

4-12 rows per user per course,... hundreds of courses and thousands of users?

400,000 to 1,200,000 rows. Let's assume 1000 bytes per row.

That's 400Mb to 1.2Gb of data. You can buy 100Gb drives for $299 at the Apple store. You can easily spend more than $299 of billable time sweating over details that don't much matter any more.

Until you get to 1Tb of data (1,000 Gb), you're not talking about much data at all.

S.Lott
Or an 80GB from Newegg for $33.99
Tmdean
100gb drive for $299? Maybe 5 years ago! Today you can get 1Tb+ for $100!
rmeador
Yeah, but he said "at the Apple store". You can hardly get a mouse for under $100 there.
P Daddy
The point is that gloriously overpriced storage is cheap. Cheap storage is really cheap. Hand-wringing over storage is a waste of money.
S.Lott
+5  A: 

100 (courses) * 1000 (users) * 10 (records) is only a million. That's the low end, but a decent database ought to handle it okay.

What sounds iffy are Name/Value pairs. That will limit your ability to correctly index things, which will be critical to good performance.

Joel Coehoorn
+2  A: 

No, there isn't really any hard rule about how many rows you can have in a table, it depends a lot on how much data there is in the rows, and how well the data can be indexed.

A quick estimate on the figures that you stated gives something like tens of millions of rows. That's certainly not too much, but it's enough that it could be a problem if you aren't a bit careful.

Perhaps the table could be normalized? Does the same names occur a lot, so that you could put the names in a separate table and use the id in the table?

Guffa
+1  A: 

I don't think there is really a limit here, but drive space. BUT PLEASE add good indexes while its small, becuase when the table is huge indexes will take a lot longer to add. Plus if you have bad indexes queries will slow down as it gorws and people will complain when there is really nothing wrong, but a crappy to no index.

Jojo
+2  A: 

I once worked on a web form system with over 300 million rows in their name/value pair table. Many of the forms had over 300 rows per form submission. Performance wasn't too bad actually, but it was a total PITA to query from! My sql writing ability definitely improved over the life of this gig.

But IMHO, if you have any say get rid of it in favor of a standard normalized table.

John MacIntyre
A: 

I've worked on databases where we tried to create tables with 2B rows of data - that doesn't work, we got to 500M and re-designed. One of the biggest gotchas of working with such large table was the time taken to do deletions - I often see the approach where old records are archived and then deleted from the main table. If the table is big enough that deletion will run for many hours as the indexes are rebuilt.

Not sure where the cut off is but gut feel indicates a table > 10M rows is probably too big. Our approach was to partition data by date, so we ended up with a table for a week of data, and another summary table for months, and another summary for years - very common in DataWarehousing. BTW this was on SQL 7.0, interested to know if the DB's are better at this type of stuff yet?

MrTelly
On Oracle you use partitioning. Data with different dates go to different partitions. Old partitions can be archived on tapes and dropped with something like "ALTER TABLE DROP PARTITION" in seconds.
jva
+3  A: 

No hard and fast rule, but there is a hard and fast way to get a number.

Write a program to populate your table with dummy data roughly approximating the expected form of the actual data (e.g. similar regularity, characters, patterns, etc.) Run performance tests against it using actual queries with the dummy data, gradually increasing the number of rows in the table, perhaps by steps of 1000 or 10000 rows.

At the cusp of when the query performance (e.g. queries completed per second) becomes unacceptable, you'll have your "too big" number of rows.

Triynko
You can be creative generating the dummy data. If a table column consists of English text, flood it with random words from a dictionary. If it contains names, download a list of names, cross them to produce fake full names, then flood the table with them at the expected frequency.
Triynko
+1 Nice practical tip there.
Wayne Koorts