views:

129

answers:

4

Let's say at your job your boss says,

That system over there, which has lost all institutional knowledge but seems to run pretty good right now, could we dump double the data in it and survive?

You're completely unfamiliar with the system.

It's in SQL Server 2000 (primarily a database app).

There's no test environment.

You might be able to hijack it on the weekends if you needed to run a benchmark.

What would be the 3 things you'd do to convince yourself and then your manager that you could take on that extra load. And if you couldn't do it, on the same hardware... the extra hardware (measured in dollars) it would take to satisfy that request.

To address the response from doofledorfer, you assumptions are almost all 180 degrees off. But that's my fault for an ambiguous question.

  1. One of the main servers runs 7x24 at 70% base and spikes from there and no one knows what it is doing.

  2. This isn't an issue of buy-in or whining... Our company may not have much of a choice in the matter.

  3. Because this is being externally mandated, delays in implementation could result in huge fines. So large meeting to assess risk are almost impossible. There is one risk, that dumping double the data would take the system down for the existing customers.

I was hoping someone would say something like, see if you take the system off line Sunday night at midnight and run SQLIO tests to see how close the storage subsystem is to saturation. Things like that.

A: 

it mostly depends on its current level. If doubling is going from 2GB to 4GB just do it. If it's going from 1TB to 2TB you've got some planning to do.

I'd collect some info using Performance Monitor and provide it to help make an educated decision.

Kyle West
+2  A: 
  1. Set up a test environment, even if I have to do it on my laptop.

  2. Enable some kind of logging on the production system to get an idea of the volume of transactions in addition to the volume of data.

  3. Read the source code as I run stress tests on my laptop with increasing amounts of data.

Having said that, I sympathize with this assignment, because it's unfair. It's like asking someone in a boat if the boat can float with twice the cargo -- but you can't get out of the boat or take it out of its regular service.

Bill Karwin
Very Good analogy. I love it. Seriously, I'm going to use that like 50 times by the end of next year and never give you credit. ;-) mod +2
Unfortunately, I don't doubt you will have plenty of opportunity to use the analogy. This kind of hopeless assignment is very common.
Bill Karwin
Hmm. I wonder if I can get the "good analogy" badge? :-)
Bill Karwin
+2  A: 

You've just described a typical Agile project. Your answer should be:

  1. I don't know, and I won't be able to tell without testing.
  2. In addition to data volume, there might be issues with usage patterns, application interactions, database and server tuning, etc.
  3. So let's work through a basic list of risk factors, and how we might resolve them.
  4. Once we've done that, let's work through them in inverse order of risk; and make a stop/continue decision as we develop the results.
  5. etc.

Without management buy-in and participation at least at that level, any other answer you might give is high-risk wishing, and "3 most important" is a non sequitur.

I'd be optimistic unless your current system is substantially loaded already. Most servers should run at less than 50% capacity on all resources, or else be on life-support. And I expect you wouldn't be having the conversation if the existing server were already dealing with load issues; although "seems to run pretty good right now" is imprecise enough to be worrisome.

le dorfier
A: 

It depends what you mean by "double the data".

If that is going to affect one table only (say product table) then you are probably safe as most queries that are referring to that one table are most likely to double the time of execution (that assumes that you do not reference the same time twice in a query).

The problem will arise if you double the amount of data in all the tables as the execution time may grow in exponential fashion then and it can lead to some serious issues.

But in general I would support the answer by doofledorfer

kristof
double the data means each source table in the database will have twice the rows (customers, accounts, locations) stuff like that... and tables added to every day with results will have twice the rows added each day... IOW, it won't double immediately but the daily addition will double.