views:

33

answers:

2

I have about 4 days to load 4 million rows of data, where the stored procedures appear to take about 500ms/row, so I am suggesting we temporarily repurpose two computers, one as the SQL Server and one to feed it. Each machine is a 4 cpu, 2 core/cpu computer with tons of memory.

There are two competing ideas on how to load the data.

  1. on the import computer maintain using a VM, which limits me to 4 cores of use.
  2. ditch the VM for the load and use all 8 cores for loading.

The database will be on the actual computer, no VM.

I will get one shot at showing that using the VM will considerably impact performance, but that is just a gut feeling. I need to prove it, so I am going to suggest we do a test load with a smaller dataset, so rather than loading about 12G data we will do perhaps only 6G, and get numbers.

But, since I have one shot at this, I want to get a reasonable guess at how many threads can be used in both environments.

My guess is 4 threads/core, since database transactions involve a great deal of waiting time for threads, but I don't know if that would be too many threads on the VM. I am not certain how to determine what is a reasonable number of threads to suggest.

The limiting factor may not be the computer that has the program that loads the data, it may be the database computer will actually be the limiting factor, and 32 concurrent database connections may be too much for an 8 core machine.

So, is my guess of 4 threads per core reasonable in both environments?

The OS on the computers is Windows Server 2008 R2 and the VM would be using Hyper-V.

UPDATE:

In case it matters, I am using .NET 4, using the parallel framework, so I adjust the number of concurrent threads by changing it in my configuration file, so I use PLINQ heavily.

UPDATE 2:

I had an idea today to speed things up, where we will use one 4 core VM to benchmark but the SQL Server will use a 24G virtual drive (ram drive), since the computer has 48G RAM, and the database will be completely in memory. This way I should be able to feed it with several threads as fast as the feeder program can. If that isn't fast enough then we will use all 12 cores to feed it, but they expect it won't be needed. So, I should be able to hopefully load all the data in 1 day instead of a bit over a month, but, once I get some benchmarks I will let everyone know. I may see about using a smaller desktop to feed it to see if there is a penalty by using a VM, but there will only be one VM on this box, so 8 cores will be unused. The OS on the Hyper-V box is Windows Core, btw.

A: 

There are so many variables involved that I would say your guess is as good as any. You have thought it out reasonably and testing will be your best bet to see if you need to adjust for the big run.

klabranche
A: 

Make a small test, and test with 1, 2, 3, 4, 5, 6 threads per core. Graph it out. Should show pretty clearly...

bwawok
I would like to do that, but I don't think I will be given the opportunity, as I won't have the time, I expect I will get one shot at comparing the the two options.
James Black