views:

316

answers:

3

Are there any industry standard formulas or rules of thumb for determining:

  1. Application bandwidth usage/requirements
  2. Database growth requirements

I have recently started managing a new .NET 3.5/SQL Server project and would like to take a more structured approach than previously when determining exactly what my application needs in terms of storage and bandwidth. If anyone out there has any pointers I would greatly appreciate it!

A: 

Irreverently I would point you at Parkinson's Law of Data.

However, for each table in a database I try to get an idea of the average record size (especially when dealing with variable length fields like varchars) and then multiply that by the number of records you expect to have added over a year. Then I add them all together, and round up to the most significant digit, and double the result. That leaves plenty of room for overhead and growth.

 round_up_to_one_sig_digit(sum(average_table_row_size 
                             * num_rows_in_one_year)) * 2

A similar approach works with the network capacity, but you will run into some peculiarity of humans and netwoks. They don't all log in at average intervals (so you get peaks during the day/evening and valleys in the early morning. You also don't want to exceed 80% on your network capacity or performance just plain tanks with collisions, etc.)

CodeSlave
+1  A: 

I am not an SQL Server expert, but in general, for database Sizing, the best way to go forward is to understand the schema little bit. For example, are there partitions present in the database ? Are there lot of indexes etc. Now Multiply number of records coming to the database in each transaction with the frequency of transactions per hour. This gives the total number of records coming to the database per hour. Multiply this with the average row size, this provides the size of the database without partition and index space overhead. To calculate partition overhead, need to understand the type of partition like range partition or hash partition etc, number of partitions that will be created per hour or per day and add up the space overhead for partitions. Usually this number needs to be bumped up by 50% to estimate the size of the database. In case of network, there are many ways to do it. I run etheral to capture the network traffic. If you capture network traffic, it becomes interesting - how the seasonality of the data is - like when the peack hours are, what is the max usage of bandwidth at the busy hours etc. Then you need a good tool to do the forecasting - like which will take care of seasonality in the data, understand the trend of the data and forecast approximately what will happen if you increase the load. A simple graph and a line fitting curve using y=mx+c will also help you here.

Shamik
+1  A: 

Disclosure first: I work for Quest Software, a company that makes performance management and capacity planning.

There's a lot of products out there to answer those needs. Quest makes a few, like Spotlight for SQL Server, Spotlight for IIS, Capacity Manager for SQL Server, and so on. There's no single formula or rule of thumb because each component in the system reacts differently to load, and each thing you're storing scales differently.

For example, if you're storing sales data in a data warehouse, your sales data will grow pretty linearly. It's a simple formula of:

(Days Open) * (Transactions Per Day) * (Items Per Transaction)

When you first open your shop, the transactions per day is fairly low, but as word spreads about your business, the transactions per day rises. If you start carrying more items (like Amazon going from books to everything) your Items per Transaction may rise as well - but not necessarily. Over time, as your reporting needs grow, you'll implement aggregate tables to include data about your customers, demographics, etc and that'll change the amount of data you store as well.

On the other hand, if you're building a web filtering application, the formula revolves around the number of employees each company has. People will surf roughly the same amount over time, but the formula is affected by whether you're going to hire more people or lay people off.

If you nail down a formula to predict your data growth, that formula isn't necessarily useful to predict, say, your CPU needs or your bandwidth needs.

As a result, each capacity planning product has its own sets of formulas. For example, Capacity Manager has something like half a dozen different formulas just to predict disk growth, and that's not even talking about CPU or memory needs. In a big shop, you'll find that different formulas work better for different types of data. Generally, I've found that it's more efficient to buy a product off-the-shelf that has all these formulas built in rather than reinvent the wheel to become an expert in predictive formulas. (I know, I know, you'd expect me to say that because I work for a vendor, but I bought Capacity Manager as a DBA before I came to work for Quest, heh.)

Brent Ozar