views:

71

answers:

3

For some reason my MDF file is 154gigs, however, I only loaded 7 gigs worth of data from flat files. Why is the MDF file so much larger than the actual source data?

More info:

Only a few tables with ~25 million rows. No large varchar fields (biggest is 300, most are less than varchar(50). Not very wide tables < 20 columns. Also, none of the large tables are indexed yet. Tables with indexes have less than 1 million rows. I don't use char, only varchar for strings. Datatype is not the issue.

Turned out it was the log file, not the mdf file. The MDF file is actually 24gigs which seems more reasonable, however still big IMHO.

UPDATE:

I fixed the problem with the LDF (log) file by changing the recovery model from FULL to simple. This is okay because this server is only used for internal development and ETL processing. In addition, before changing to SIMPLE I had to shrink the LOG file. Shrinking is not recommended in most cases, however, this was one of those cases where the log file should have never grown so big and so fast. For further reading see this

+2  A: 

Could be a lot of reasons maybe you are using char(5000) instead of varchar(5000), maybe you are using bigints instead of int, nvarchar when all you need is varchar etc etc etc. Maybe you are using a lot of indexes per table, these will all add up. Maybe your autogrow settings are wrong. You are sure this is the MDF and not the LDF file right?

SQLMenace
Also, watch out for dodgy fill factor on indexes - I've encountered indexes on more than one occasion with fill factor of 10% rather than the intended 90%. :)
Will A
Also, Index Fragmentation could be a factor. http://www.sqlmag.com/article/tsql3/automatic-reindexing.aspx
David Stratton
I feel silly. It is the log file.
subt13
Haha - silly! :p That's a big log file for a comparatively small amount of data - how did you load it - using SSIS?
Will A
Haha, well I'm attempting to do some ETL work on the source data with sql server. I guess scanning 25 million rows has a big impact on the log file.
subt13
Scanning as in reading, or scanning from another source and writing into the database? Reading data should have no impact on the log file.
Will A
@Will For starters, I loaded this table using SQL Server import/export wizard. I am getting the source data "fixed" so that I can add proper natural key and primary key. In order to do this I have to delete bad rows, run various update statements against it, identify and delete duplicates (usually by inserting into a temp table and joining to it with a delete statement). Adding a few columns. There are about 15 or so selects/inserts/updates/deletes that I will have to do against these 25 million rows, in addition to adding proper indexes.
subt13
Aha - I can see why your log file would get that large, then - just watch for it continuing to grow into the future. :)
Will A
Thanks. I plan on it!
subt13
Make sure you are backing up the log file. From BOL:"Under the full recovery model or bulk-logged recovery model, if the transaction log has not been backed up recently, backup might be what is preventing log truncation. If the log has never been backed up, you must create two log backups to permit the Database Engine to truncate the log to the point of the last backup. Truncating the log frees space for new log records. To keep the log from filling up again, take log backups frequently."
HLGEM
Yes, you are right. The transaction log had never been backed up. Fixed now and I updated my question.
subt13
+3  A: 

Because the MDF was allocated with 154Gb, or has grown to 154Gb through various operations. A database file has at least the size of the data in it, but it can be larger than the used amount by any amount.

An obvious question will be how do you measure the amount of data in the database? Did you use sp_spaceused? Did you check sys.allocation_units? Did you guess?

If the used size is indeed 7Gb out of 154Gb, then you should leave it as it is. The database was sized by somebody at this size, or has grown, and it is likely to grow back. If you believe that the growth or pre-sizing was accidental, then the previous point still applies and you should leave it as is.

If you are absolutely positive the overallocation is a mistake, you can shrink the database, with all the negative consequences of shrinking.

Remus Rusanu
Good info. I'm not a DB admin, but I will read up on some of this. Thanks.
subt13
A: 

Either AUTO SHRINK is not enabled or The initial size was set to the larger value.

ydpages