views:

254

answers:

5

I'm working with SQL Server 2005 and have a single table:

int Code1,
int Code2, 
real Val1,
real Val2,
real Val3,

Code1 & Code2 serve as a primary key and are part of the clustered index (only one index). Each parameter occupies 4 bytes (each row occupies 20 bytes).

There are 24.5 million records in the table, fill factor is 100%, the index occupies 2MB, and the page size is 4k.

Assuming that each page is filled by as many records as possible then each page should hold 204 records which are 4080 bytes (%99.6 page fill)

So, I would expect that the size the table occupies on the disk would be around 500MB (20 Bytes * 24.5 M records) but a fact is that the table occupies 773MB.

I tried shrink and reindex but the table size didn't change.

I'm no SQL expert, can anybody help?

A: 

You mentioned that the primary key is part of the clustered index. Is it not the whole clustered index?

Just a thought, but if the clustered index is not unique (by which I mean actually declared explicitly as UNIQUE or PRIMARY KEY), then SQL Server needs to create a row ID (RID) which I believe is a GUID and therefore takes up 8 bytes.

You can also end up with additional overhead in the rows if you enable snapshot isolation. If the data was inserted or updated while read-committed snapshot was turned on, you will always have that 8-byte RID and also a 6-byte transaction sequence number (XTS).

Side note: Why are you using a FILLFACTOR of 100? If the data never changes, that's alright, but otherwise it's going to kill performance due to page splits.

Aaronaught
+3  A: 

First of all, the page size in SQL Server is 8 KB, and it cannot be changed; it's a system setting which you have no control over.

Of these 8192 bytes, you as a user have roughly 8060 at your disposal - the rest is headers and control structures and so forth.

So in your case, with each row occupying 20 bytes, you should be able to get 403 rows per page. So that gives you roughly 60'795 data pages at 8 KB a piece = 486 MB.

However: for performance reasons, SQL Server doesn't allocate each page as its needed - SQL Server will pre-allocate a given size for your database. When you create a new database in SQL Server Management Studio, you'll see that by default, SQL Server allocates 3 MB of space, and will increase by 1 MB when more space is needed. These settings are changeable - you didn't mention what they are.

Also, for performance reasons, SQL Server will typically not "return" unused data pages back to the operating system. That's a rather costly operation, and there's a good chance it might need those again some time. The same goes for index pages - if you may have had another index on that table (even just to try something out) and it used a number of pages, those won't be returned to the OS by default.

Also, depending on how the data was inserted into the tables, there might be some "holes" in the data structure - not all pages might be totally up to 100% filled. And to keep balancing the b-tree, SQL Server might even choose to split pages into two even if they're not 100% full yet.

So all in all: yes, theoretically and mathematically, your database should be roughly 486 MB for data and 2 MB for index - but how bad is it really, if the file is 770+ MB in size instead?? Does it really hurt??


With this T-SQL script which checks out the DMV (Dynamic Management Views), you can gain very deep and detailed insight into your table index structures, and how many pages are used on each level of the index, and how the fill factors on your data pages are - very useful and helpful to know!

SELECT 
    t.NAME 'Table name',
    i.NAME 'Index name',
    ips.index_type_desc,
    ips.alloc_unit_type_desc,
    ips.index_depth,
    ips.index_level,
    ips.avg_fragmentation_in_percent,
    ips.fragment_count,
    ips.avg_fragment_size_in_pages,
    ips.page_count,
    ips.avg_page_space_used_in_percent,
    ips.record_count,
    ips.ghost_record_count,
    ips.Version_ghost_record_count,
    ips.min_record_size_in_bytes,
    ips.max_record_size_in_bytes,
    ips.avg_record_size_in_bytes,
    ips.forwarded_record_count
FROM 
    sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'DETAILED') ips
INNER JOIN  
    sys.tables t ON ips.OBJECT_ID = t.Object_ID
INNER JOIN  
    sys.indexes i ON ips.index_id = i.index_id AND ips.OBJECT_ID = i.object_id
WHERE
    T.NAME = 'your-table-name-here'
ORDER BY
    AVG_FRAGMENTATION_IN_PERCENT, fragment_count
marc_s
A: 

Others have correctly mentioned that the page size is 8k, but the amount available for data is 8096, the 8060 figure is the maximum length of a single row stored on a page (without using LoB or SLoB). (The difference was mentioned as an architectural insurance when designed).

There are various overheads that can apply, from row uniquifier to the nullability bitmap - Microsoft do post a guide on how to calculate the size of a clustered table / or a heap.

Clustered Index : http://msdn.microsoft.com/en-us/library/ms178085(SQL.90).aspx

Heap : http://msdn.microsoft.com/en-us/library/ms189124(SQL.90).aspx

On the topic of shrink, also known as 'evil' - read Paul Randal's description of shrink and then avoid using it where possible : http://www.sqlskills.com/BLOGS/PAUL/post/Why-you-should-not-shrink-your-data-files.aspx

Andrew
+2  A: 

I will try to estimate your table size, note that I am using 90% for rule of thumb fill.

Row header                   4  bytes
Fixed data size             20  bytes (2 X 4 bytes for int + 3 x 4 bytes for real)
Variable size columns count  2  bytes
NULL bitmap columns count    2  bytes
Total for one row           28  bytes
Available page size       8060  bytes
Page header                 96  bytes
Rows per page (max)        284  (Available page size - Page Header) / Total for one row
Rule of thumb page fill     90% 
Rows per page (expected)   255 
Number of rows               2.45E+07 
Number of pages          96079 
Pages per MB               128 
Total MB                   751 
Damir Sudarevic
Available page size is 8096, 8060 is the maximum for a single row only, and the page header is outside of that allocation - in your maths you deducted it. 8096 data + 96 header = 8192, i.e. 8k.
Andrew
A: 

Having FILLFACTOR of 100% doesn't mean that every page is exactly filled to top capacity - it just meand that SQL Server will try to do that if it can and only for leaf nodes.

Also you do need to ask very seriour question about future performance vs space usage. With that number of records, having too tight fill factor means that every new insert or even update can trigger rather massive rearrangement and that depending on the usage that can also mean escalating deadlocks. Not saying that you may not have some good reason to pack titgly and worry about disk space, but that you need to ask these questions very seriously. Buying a bigger disk is reasonably cheap these days.

ZXX