views:

798

answers:

4

I was wondering what you do when developing a new application in terms of estimating database size.

E.g. I am planning to launch a website, and I am having a hard time estimating what size I could expect my database to grow. I don't expect you to tell me what size my database will be, but I'd like to know if there are general principles in estimating this.

E.g. When Jeff developed StackOverflow, he (presumably) guesstimated his database size and growth.

My dilemma is that I am going for a hosted solution for my web application (its about cost at this stage), and preferably don't want to shoot myself in the foot by not purchasing enough SQL Server space (they charge a premium for this).

+4  A: 

If you have a database schema, sizing is pretty straightforward ... it's just estimated rows * avg row size for each table * some factor for indexes * some other factor for overhead. Given the ridiculously low price of storage nowadays, sizing often isn't a problem unless you intend to have a very high traffic site (or are building an app for a large enterprise).

For my own sizing exercises, I've always created an excel spreadsheet listing:

  • col 1: each table that will grow
  • col 2: estimated column size in bytes
  • col 3: estimated # of rows (per year or max, depending on application)
  • col 4: index factor (I always set this to 2)
  • col 5: overhead factor (I always set this to 1.2)
  • col 6: total column (col 2 X 3 X 4 X 5)

The sum of column 5, plus the initial size of your database without growth tables, is your size estimate. You can get much more scientific, but this is my quick and dirty way.

Jess
+1. a better explanation than mine!
Mitch Wheat
you type too fast! that's almost exactly what I said :)
warren
Thanks for all your help guys, i think this gives me a good starting place.
Ash M
A: 

The cost of estimating is likely to be larger than the cost of the storage

Most hosting providers sell capacity by the ammount used at the end of each month, so just let it run

TFD
A: 

Determine:

  • how many visitors per day, V
  • how many records of each type will be created per visit, N1, N2, N3...
  • the size of each record type, S1, S2, S3...

EDIT: forgot index factor which a good rule of thumb is 2 times

Total growth per day = 2* V * (N1*S1 + N2*S2 + N3*S3 + ...)

Mitch Wheat
A: 

My rules-of-thumb to follow are

  • how many users do I expect?
  • what content can they post?
  • how big is a user record?
  • how big is each content item a user can add?
  • how much will I be adding?
  • how long will those content items live? forever? just a couple weeks?

Multiply the user record size times the number of users; add the number of users times the content item size; multiply by two (for a convenient fudge factor).

warren
Good answer - the fudge factors are the most important part of database sizing ... as we said when I was a consultant "if the # looks too low, just throw in another fudge factor".
Jess