Performance Testing - How much data should I create

views:

answers:

+2 Q:

Performance Testing - How much data should I create

Hello,

I'm very new to performance engineering, so I have a very basic question.

I'm working in a client-server system that uses SQL server backend. The application is a huge tax-related application that requires testing performance at peak load. Meaning that there should be like 10 million tax returns in the system when we run scenarios related to creating tax returns and submitting them. Then there will also be proportional number of users that need to be created.

Now I'm hearing in meetings that we need to create 10 million records to test performance and run scenarios with 5000 users and I just don't think it is feasible.

When one talks about creating a smaller dataset and extrapolating the performance planning, a very common answer I hear is that we need to 10 million records because we cannot tell from a smaller data set how the database or network will behave.

So how does one plan capacity and test performance on large enterprise application without creating peak level of data or running peak number of scenarios?

Thanks.

+1 A:

refer this and this

HotTester 2010-02-16 10:03:46

Nice linkS maybe add some text indicating these are for .net environments.

MikeJ 2010-02-16 16:05:37

Have a peek at "The art of application performance testing / Ian Molyneaux, O’Reilly, 2009".

rleir 2010-02-16 15:48:14

Your test data is ideally a realistic variety of records. But for first approximations you could have just a few unique records, and duplicate them until you have the desired size. Then use ApacheBench to roughly approximate the traffic.

rleir 2010-02-16 15:54:17

If you exactly duplicate records, you don't really get true insight into performance. For example, keys that work well in real life with mostly unique values won't work as well with lots of duplicate values.

Eric J. 2010-02-16 16:38:21

+1 A:

I would take a look at Redgate's SQL Data Generator. It does a good job of generating representative data.

Mitch Wheat 2010-02-16 15:57:20

+3 A:

Personally, I would throw as much data and traffic at it as you can. Forget what traffic you "think you need to handle". And just see how much traffic you CAN handle and go from there. Knowing the limits of your system is more valuable than simply knowing it can handle 10 million records.

Maybe it does handle 10 million, but at 11 million it dies a horrible death. Or maybe it's well written and will scale to 100 million before it dies. There's a very distinct difference between the two even though both pass the "10 million test"

Chad 2010-02-16 16:07:37

Not a bad idea, but I believe it's often more important to test with "realistic" data than to test how much your system can handle. But of course that depends on the expected use.

sleske 2010-02-16 18:38:43

@sleske, well you should be testing with realistic data. But unrealistic volumes are quite beneficial.

Chad 2010-02-16 19:41:42

Yes we are going to try to find the point where the system breaks above 10 million. At the moment we can even get to our goal :)

2010-02-18 06:23:52

To help generate data look at ruby faker and perl data faker. I have had good luck with it in generating large data sets for testing. SQL generator from redgate is good too.

MikeJ 2010-02-16 18:25:13

Thank you. We are looking at this tool.

2010-02-18 06:22:24

+2 A:

Now I'm hearing in meetings that we need to create 10 million records to test performance and run scenarios with 5000 users and I just don't think it is feasible.

Why do you think so?

Of course you can (and should) test with limited amounts of data, but you also really, really need to test with a realistic load, which means testing with the amount (and type) of data that you will use in production.

This is just a special case of a general rule: For system or integration testing, you need to test in a scenario that is as close as possible to production; ideally you just copy/clone a live production system, data, config and all and use that for testing. That is actually what we do (if we technically can and the client agrees). We just run a few SQL scripts to randomize personal data in the test data set, so prevent privacy concerns.

There are always issues that crop up because production data is somehow different from what you tested on, and this is the only way to prevent (or at least limit) these problems.

I've planned and implemented reporting and imports, and they invariably break or misbehave the first time they're exposed to real data, because there are always special cases or scaling problems you didn't expect. You want that breakage to happen during development, not in production :-).

In short:

Bite the bullet, and (after having done all the tests with "toy data"), get a realistic dataset to test on. If you don't have the hardware to handle that, then you don't have the right hardware for your tests :-).

sleske 2010-02-16 18:37:33

+1 for this. In some apps, testing with anything less than 10 *billion* records is useless. A fake test can only get you fake results.

Aaronaught 2010-02-16 21:56:50

Thank you so much. I guess we are going to have to get as close to the load as possible. Your answer was really helpful.

2010-02-18 06:21:49

You're welcome :-). I'm glad the answer helped you.

sleske 2010-02-18 10:57:19

ansaurus

tags:

views:

answers:

Performance Testing - How much data should I create

related questions