views:

578

answers:

5

I have a table myTable with a unique clustered index myId with fill factor 100% Its an integer, starting at zero (but its not an identity column for the table) I need to add a new type of row to the table. It might be nice if I could distinguish these rows by using negative values of myId.

Would having negative values incur extra page splitting and slow down inserts?

Extra Background: This table exists as part of the etl for a data warehouse that gathers data from disparate systems. I now want to accomodate a new type of data. A way for me to do this is to reserve negative ids for this new data, which will thus be automatically clustered. This will also avoid major key changes or extra columns in the schema.

Answer Summary: Fill factors of 100% will noramlly slow down the inserts. But not inserts that happen sequentially, and that includes the sequntial negative inserts.

+2  A: 

Not enough to notice for any reasonable system.

Page splits happen when a page is full, either at the start or at the end of the range. As long as you regular index maintenance...

Edit, after Fill factor comments:

After a page split wth 90 or 100 FF, each page will be 50% full. FF = 100 only means an insert will happen sooner (probably 1st insert).

With a strictly monotonically increasing (or decreasing) key (+ve or -ve), a page split happens at either end of the range.

However, from BOL, FILLFACTOR

Fill

Adding Data to the End of the Table

A nonzero fill factor other than 0 or 100 can be good for performance if the new data is evenly distributed throughout the table. However, if all the data is added to the end of the table, the empty space in the index pages will not be filled. For example, if the index key column is an IDENTITY column, the key for new rows is always increasing and the index rows are logically added to the end of the index. If existing rows will be updated with data that lengthens the size of the rows, use a fill factor of less than 100. The extra bytes on each page will help to minimize page splits caused by extra length in the rows.

So does, fillfactor matter for strictly monotonic keys...? Especially if it's low volume writes

gbn
I think you may have missed the point. The problem is a fillfactor of 100% ...
Mitch Wheat
the fillfactor thing was an addition after this answer
cindi
ah, I didn't realise that.
Mitch Wheat
+1  A: 

No, not at all. Negative values are just as valid as INTegers as positive ones. No problem. Basically, internally, they're all just 4 bytes worth of zeroes and ones :-)

Marc

marc_s
I think you have missed the point. The problem is a fillfactor of 100% ...
Mitch Wheat
the fillfactor thing was an addition after this answer
cindi
Why? I don't see how the fillfactor has any impact on negative INT values.....
marc_s
excatly. The problem has nothing to do with negative clustered index values. The question should be "will I get page splits with 100% fill factor"? Answer: Yes.
Mitch Wheat
+1  A: 

You are asking the wrong question!

If you create a clustered index that has a fillfactor of 100%, every time a record is inserted, deleted or even modified, page splits can occur because there is likely no room on the existing index data page to write the change.

Even with regular index maintenance, a fill factor of 100% is counter productive on a table where you know inserts are going to be performed. A more usual value would be 90%.

Mitch Wheat
Agreed - you raise a good point. This is however totally unrelated to whether you're storing positive or negative INT values.....
marc_s
The inserts are usually going to be appends ie have a myID value greater than the existing values. Does the page creation not behave differently in this scenario?
cindi
@cindi. No, on a rebuilt clustered index, you will get page splits anyway.
Mitch Wheat
A page split happens, leaving each page 50% full. 90 vs 100 only decides how soon it happens. Anyway, with a strictly monotonic key thinsg may be different: please see my updated answer
gbn
+1  A: 

I'm concerned that this post may have taken a wrong turn, in that there seems to be an underlying design issue at work here, irrespective of the resultant page splits.

Why do you need to introduce a negative ID?

An integer primary key, for example, should uniquely indentify a row, it's sign should be irrelevant. I suspect that there may be a definition issue with the primary key for your table if this is not the case.

If you need to flag/identify the newly inserted records then create a column specifically for this purpose.

This solution would be ideal because you may then be able to ensure that your primary key is sequential (perhaps using an Identity data type, although not essential), thereby avoiding issues with page splits (on insert) altogether.

Also, to confirm if I may, a fill factor of 100% for a clustered index primary key (identity integer for example), will not cause page splits for sequential inserts!

John Sansom
+2  A: 

Besides the practical administration points you already got and the suspect dubious use of negative ids to represent data model attributes, there is also a valid question here: give a table with int ids from 0 to N, inserting new negative values where would those value go and would they cause additional splits?

The initial rows will be placed on the clustered index leaf pages, row with id 0 on first page and row with id N on the last page, filling the pages in between. When the first row with value of -1 is inserted, this will sort ahead of row with id 0 and as such will add a new page to the tree (will allocate an extent of 8 pages actually, but that is a different point) and will link the page in front of the leaf level linked list of pages. This will NOT cause a page split of the former first page. On further inserts of values -2, -3 etc they will go to the same new page and they will be inserted in the proper position (-2 ahead of -1, -3 ahead of -2 etc) until the page fills. Further inserts will add a new page ahead of this one, that will accommodate further new values. Inserts of positive values N+1, N+2 will go at the last page and be placed in it until it fills, then they'll cause a new page to be added and will start filling that page.

So basically the answer is this: inserts at either end of a clustered index should not cause page splits. Page splits can be caused only by inserts between two existing keys. This actually extends to the non-leaf pages as well, an index at either end of the cluster may not split a non-leaf page either. I do not discuss here the impact of updates of course (they can can cause splits if the increase the length of a variable length column).

Lately has been a lot of talk in the SQL Server blogosphere about the potential performance problems of page splits, but I must warn against going to unnecessary extremes to avoid them. Page splits are a normal index operation. If you find yourself in an environment where the page split performance hit is visible during inserts, then you'll be probably worse hit by the 'mitigation' measures because you'll create artificial page latch hot spots that are far worse as they'll affect every insert. What is true is that prolonged operation with frequent splits will result in high fragmentation which impacts the data access time. I say that is best mitigated with off-peak periodical index maintenance operation (reorganize). Avoid premature optimizations, always measure first.

Remus Rusanu
This probably came closest to answering the question that was in my head and that I eventualy got on the page
cindi