ansaurus

Question

Simulating queries of large views for benchmarking purposes

Answer 1

A:

If you want results you can rely on you need to make the testing scenario as realistic as possible, which makes option 1 by far your best bet. As you point out if you get results that aren't good enough with the other options you won't be sure that it wasn't due to the different database behaviour.

How you generate the data will depend to a large degree on the problem domain. Can you take data sets from multiple customers and merge them into a single mega-dataset? If the data is time series then maybe it can be duplicated over a different range.

Rob Walker 2008-08-31 17:57:43

Answer 2

A:

The data is typically CRM-like, i.e. contacts, projects, etc. It would be fine to simply duplicate the data (e.g., if I only have 20,000 rows, I'll copy them five times to get my desired 100,000 rows). Merging, on the other hand, would only work if we never deploy the benchmarking tool publicly, for obvious privacy reasons (unless, of course, I apply a function to each column that renders the original data unintelligible beyond repair? Similar to a hashing function, only without modifying the value's size too much).

To populate the rows, perhaps something like this would do:

WHILE (SELECT count(1) FROM benchmark) < 100000
  INSERT INTO benchmark
  SELECT TOP 100000 * FROM actualData

Sören Kuklau 2008-08-31 18:12:52

Answer 3

+2 A:

For data for testing CRM type tables, I highly recommend fakenamegenerator.com, you can get 40,000 fake names for free.

SqlACID 2008-10-11 17:14:42

Answer 4

+1 A:

You didn't mention if you're using SQL Server 2008. If you use 2008 and you use Data Compression, be aware that random data will act very differently (slower) than real data. Random data is much harder to compress.

Quest Toad for SQL Server and Microsoft Visual Studio Data Dude both have test data generators that will put fake "real" data into records for you.

Brent Ozar 2008-10-19 00:45:27

Good point. I've re-tagged my post; we're still using 2005.

Sören Kuklau 2008-10-26 21:30:14

ansaurus

tags:

views:

answers:

Simulating queries of large views for benchmarking purposes

related questions