views:

417

answers:

7

Hi,

i've been asked to do some performance tests for a new system. It is only just running with a few client, but as they expect to grow, these are the numbers i work with for my test:

200 clients, 4 years of data, and the data changes per.... 5 minutes. So for every 5 minutes for every client there is 1 record. That means 365*24*12 = 105.000 records per client per year, that means 80 milion records for my test. It has one FK to another table, one PK (uniqueidentifier) and one index on the clientID.

Is this something SqlServer laughs about because it isn't scaring him, is this getting too much for one quad core 8 GB machine, is this on the edge, or.....

Has anybody had any experience with these kind of numbers?

+4  A: 

The software can handle it, can your server? Well, it depends.

Are you just archiving it? Sure get a hundred billion rows, it doesn't care, the performance problems come in then you're querying the data. The larger it gets, the more space you need in terms of total storage and for operations (deletes, rollback segments, etc) on that data, preferably in memory but on a fast tempdb drive if not.

What matters more to a SQL server than processor on large data sets (though processor certainly affects the time it takes, not the threshold of query/data it can handle) is memory and space (both HD and RAM since it'll overflow to TempDB for large operations), this is speaking in terms of capacity. For performance you need disk IO, memory and processor power all together.

The short answer to can it handle it yes, provided you have enough space. Does it handle it fast enough? That's depends on what kind of queries you're running and how much performance matters.

One last thing, don't forget to take a look at other questions here on optimizing large tables.

Nick Craver
+1, I like the IT DEPENDS :)
Mark Schultheiss
+1  A: 

too many really:). I'm responsible for a web site which has 2 million registered users. Some of our tables has more than 100 million records and we can achieve great performance with 4 million daily page views. But I must admit caching with a good architecture is the main reason that things aren't getting ugly.

hakan
+2  A: 

If you are after ultimate high performance, I'd design the PK to not be a uniqueidentifier. If you need to merge data sets, I'd go with an INT IDENTITY + SMALLINT (or even a tinyint) to determine the originating location. You don't say much about your design, but there are issues trying to use uniqueidentifier as a clustered index.

Given the proper server hardware, most decent designs will do just fine. Don't plan on running anything except the OS and SQL Server on the server. The main concern is RAM, for best performance, you'll need enough RAM for the entire database, indicies, etc., and that is beyond what the OS will use up. I've even seen massive servers help bad designs run very good.

KM
+1 for recommending **NOT** to use Uniqueidentifier as PK - actually, it should not be used as **clustering index** - that's the key.
marc_s
+3  A: 

SQL Server will have no trouble storing this many records.

If you have designed your indexes properly, and your database is properly normalized, you will have absolutely no problem accessing an arbitrary number of records. Oftentimes, people make poor design decisions early on when their database has no information in it, and you never know about it, because everything is fast for small "n".

So while I will say that SQL Server can handle what you're doing, I would also say that now would be a good time to sit back and take a look at how your queries are performing with SQL Server Profiler. Is everything still fast? Do you see excessive scanning or hashing in your frequent queries which is leading to degraded performance? If so, now's the time to analyze and fix those issues.


As an aside, people really like to think of size limitations based on number of rows and columns. Try to go a step further and talk about bytes, because ultimately, bytes are what are being scanned in a reporting query, and bytes are what are being stored to disk.

Dave Markle
+2  A: 

Field PK should be as small as possible and not be random - GUID sucks here. THe main problems are:

  • The PK is used in all foreign keys to reference the row, so a large PK uses more space ?= more IO.
  • A random PK means inserts happen all over the place = many page splits = inefficient index usage.

How bad is that? I know somescenarios you loose 80% speed there.

Otherwise - no problem. I have a table in excess to 800 million rows, and things are super fast there ;) nNaturally you ned to have decent queries, decent indices and obviously that does not run on a single 5400 RPM green hard disc to be efficient - but given proper IO and not stupid queries and some decent indices, SQL does not bulk on a couble of billion rows.

So, while "it depends", the generic answer is that large tables are not a problem... ...unless you do MASS deletes. Deleting half the table will be a HUGH transaction, which is why partitioning is nice for stuff like accounting - one partition table per year means I can get rid of a year data without a DELETE statement ;)

TomTom
+1 for recommending NOT to use Uniqueidentifier as PK - actually, it should not be used as clustering index - that's the key.
marc_s
I always use a bigint, and a uniqueidentifier doesn't seem ok, but why exactly is it bad as a clustered index (is a PK always a clustered index?)
Michel
@Michel, why do you always use a bigint? a regular int will do 2 billion. do you have any tables that big???? size your columns properly, a bigint will double your index vs. an int, and a bigint wastes cache memory and fits fewer items on a page. Every index pulls in the PK so a big PK is duplicated many times.
KM
@KM think it's the 'everything is fast form small 'n'' thing...
Michel
@Michel said `but why exactly is it bad as a clustered index (is a PK always a clustered index?)` a clustered index defines the actual physical ordering of the table, like a paper phone book the clustered index is the last name + first name + middle initial. No, a PK is not always clustered, but if you have a surrogate key that is sequential in nature, it is a good choice for a clustered index. ...more...
KM
...more... If insert your data and your clustered index is a uniqueidentifier (which is basically random) it does not insert the new rows at the end of the table, like it would with an identity. By nature of a clustered index (physical location) you will need to squeeze new rows in between existing rows. Sometimes there is enough padding space to do it, other times you need to push multiple rows around.
KM
is it better that decided to use a sequential Guid?
Michel
A little. Does not change that you use a 128bit = 20 byte wide primary key, which is a lot slower than an int (4 byte) as it will use a lot more index pages.
TomTom
thanks, as you mightn have guessed i'm not a (sqlserver) database expert, but i'll dive a bit more into it. Is a 20 bytes index page 5 times slower than a 4 byte, or is that a to simple conclusion?
Michel
A: 

Even MS Access can laugh at a half million row table (depending on row size).

If you don't have any queries to profile, think of the table as a file. The rows aren't the important number compared to the sp_spaceused.

If you do have some queries, think of the table as a data structure. How can the query be fulfilled with the minimum amount of IO. Use the query plan, and SET STATISTICS IO ON

David B
+1  A: 

SQL Server can handle terrabytes worth of data. The kicker is that you have design correctly and have the right equipment mix. You might need partitioning for instance. You definitely do need to think about every millisecond of performance on every query and avoid poorly performing designs and querying techniques like EAV tables and correlated subqueries and cursors and "like '%sometext%'".

If you expect your datbase to be that large, then buy and read cover to cover a book on performance tuning before starting the design. Bad design kills database performance and it is extremely hard to correct once you have 80,000,000 records.

I also suggest you find a dba with experience with high performance, high volume databases. This is a whole new game design wise and it needs to be thoguht out from the beginning.

Good for you for doing this kind of testing now before the system has that number of records.

HLGEM