views:

156

answers:

2

I've been trawling books online and google incantations trying to find out what fill factor physically is in a leaf-page (SQL Server 2000 and 2005).

I understand that its the amount of room left free on a page when an index is created, but what I've not found is how that space is actually left: i.e., is it one big chunk towards the end of the page, or is it several gaps through that data.

For example, [just to keep the things simple], assume a page can only hold 100 rows. If the fill-factor is stated to be 75%, does this mean that the first (or last) 75% of the page is data and the rest is free, or is every fourth row free (i.e., the page looks like: data, data, data, free, data, data, data, free, ...).

The long and short of this is that I'm getting a handle on exactly what happens in terms of physical operations that occur when inserting a row into a table with a clustered index, and the insert isn't happening at the end of the row. If multiple gaps are left throught a page, then an insert has minimal impact (at least until a page split) as the number of rows that may need to be moved to accomodate the insert is minimised. If the gap is in one big chunk in the table, then the overhead to juggle the rows around would (in theory at least) be significantly more.

If someone knows an MSDN reference, point me to it please! I can't find one at the moment (still looking though). From what I've read it's implied that it's many gaps - but this doesn't seem to be explicitly stated.

A: 

This is the first time I've thought of this, and I'm not positive about the conclusion, but,

Since the smallest amount of data that can be retrieved by SQL Server in a single Read IO is one complete page of data, why would any of the rows within a single page need to be sorted in the first place? I'd bet that they're not, so that even if the gap is all in one big gap at the end, new records can be added at the end regardless of whether that's the right sort order. (if there's no reason to sort records on a page in the first place)

And, secondly, thinking about the write side of thge IO, I think the smallest write chunk is an entire page as well, (even the smallest change requires the entire page be written back to disk). This means that all the rows on a page could get sorted in memory every time the page is written to, so even if you were inserting into the beginning of a sorted set of rows on a dingle page, the whole page gets read out, the new record could be inserted into it's proper slot in the set in memory, and then the whole new sorted page gets written back to disk...

Charles Bretana
+2  A: 

From MSDN:

The fill-factor setting applies only when the index is created, or rebuilt. The SQL Server Database Engine does not dynamically keep the specified percentage of empty space in the pages. Trying to maintain the extra space on the data pages would defeat the purpose of fill factor because the Database Engine would have to perform page splits to maintain the percentage of free space specified by the fill factor on each page as data is entered.

and, further:

When a new row is added to a full index page, the Database Engine moves approximately half the rows to a new page to make room for the new row. This reorganization is known as a page split. A page split makes room for new records, but can take time to perform and is a resource intensive operation. Also, it can cause fragmentation that causes increased I/O operations. When frequent page splits occur, the index can be rebuilt by using a new or existing fill factor value to redistribute the data.

SQL Server's data page consists of the following elements:

  • Page header: 96 bytes, fixed.
  • Data: variable
  • Row offset array: variable.

The row offset array is always stored at the end of the page and grows backwards.

Each element of the array is the 2-byte value holding the offset to the beginning of each row within the page.

Rows are not ordered within the data page: instead, their order (in case of clustered storage) is determined by the row offset array. It's the row offsets that are sorted.

Say, if we insert a 100-byte row with cluster key value of 10 into a clustered table and it goes into a free page, it gets inserted as following:

[00   - 95   ]   Header
[96   - 195  ]   Row 10
[196  - 8190 ]   Free space
[8190 - 8191 ]   Row offset array: [96]

Then we insert a new row into the same page, this time with the cluster key value of 9:

[00   - 95   ]   Header
[96   - 195  ]   Row 10
[196  - 295  ]   Row 9
[296  - 8188 ]   Free space
[8188 - 8191 ]   Row offset array: [196] [96]

The row is prepended logically but appended physically.

The offset array is reordered to reflect the logical order of the rows.

Given this, we can easily see that the rows are appended to the free space, starting from the beginning on the page, while pointers to the rows are prepended to the free space starting from the end of the page.

Quassnoi
Yes I'd seen that, but that doesn't say how the fill-factor is initially distributed.
Chris J
@Quassnoi, I think he's asking about how things would happen when a new row is 'inserted' onto the empty space on a page. Whether the gaps are sprinkled throughout, or all at the end of the physical page, (he thinks) would affect the performance of the insert...
Charles Bretana
@Charles -- "thinks" is a bit strong :-) I'm reading conflicting blogs where some people say it will, others say it won't. I'm just trying to get a deeper understanding of what's going on...
Chris J
@rangerchris, you are correct, sorry, poor choice of words!. perhaps 'he questions' would have been better...
Charles Bretana
I'd have to guess that the extra space is all in the back. How can you acceptably "scatter" empty spaces throughout a page if, for example, a row contains two varchar(100) columns? I like Charles Bretana's guess (work gets done on each disk write). Hopefully someone who *knows* will add to these comments.
Philip Kelley