views:

451

answers:

4

After going thru some tutorials on SQL SERVER 2008's new feature SPARSE COLUMN, I have found that it doesn't take any space if the column value is 0 or null but when there is a value, it takes 4 times the space a regular(non sparse) column holds.

If my understanding is correct, then why I will go for that at the time of database design? And if I use that, then at what situation so I be?

Also out of curiosity, how come no space get's reserve when a column is defined as sparse column(I mean to say, what is the internal implementation for that)

Thanks in advance

+2  A: 
 * Storing a null in a sparse column takes up no space at all.

 * To any external application the column will behave the same

 * Sparse columns work really well with filtered indexes as you will only
  want to create an index to deal with the non-empty attributes in the column.

 * You can create a column set over the sparse columns that returns an xml
  clip of all of the non-null data from columns covered by the set.  The 
  column set behaves like a column itself. Note: you can only have one
  column set per table.

 * Change Data Capture and Transactional replication both work, but not the 
   column sets feature.

Downsides

* If a sparse column has data in it it will take 4 more bytes than a normal 
  column e.g. even a bit (0.125 bytes normally) is 4.125 bytes and unique 
  identifier rises form 16 bytes to 20 bytes.

* Not all data type can be sparse: text, ntext, image, timestamp, user-defined 
  data type, geometry, or geography or varbinray (max) with the FILESTREAM 
  attribute cannot be sparse. (Changed17/5/2009 thanks Alex for spotting the 
  typo)

* computed columns can't be sparse (although sparse columns can take part in a
  calculation in another computed column)

* You can't apply rules or have default values.

* Sparse columns cannot form part of a clustered index.  If you need to 
  do that use a computed column based on the sparse column and create the 
  clustered index on that (which sort of defeats the object).

* Merge replication doesn't work.

* Data compression doesn't work.

* Access (read and write) to sparse columns is more expensive, but I 
  haven't been able to find any exact figures on this.

Reference

rahul
Thanks .. I already covered that site. But even in that site failed to answer the questions which I have asked here. In which situation should I go with sparse column when it takes 4 times extra space!
priyanka.sarkar
+8  A: 

It doesn't use 4x the amount of space to store a value, it uses a (fixed) 4 extra bytes per non-null row.

  • So a non-null bit field is 1 bit + 4 bytes = 4.125 bytes. But if 99% of these records are NULL, it is still a net savings.

  • A non-null GUID (UniqueIdentifier) field is 16 bytes + 4 bytes = 20 bytes. So if only 50% of these are NULL, that's still a net savings.

So the "expected savings" depends strongly on what kind of field we're talking about, and your estimate of what ratio will be null vs non-null. Variable width-fields (varchars) are probably a little more difficult to calculate.

This Books Online Page has a table showing what percentage of different data types would need to be null for you to end up with a benefit.

So when should you use a Sparse Column? When you expect a significant percentage of the rows to have a NULL value. Some examples that come to mind:

  • A "Order Return Date" column in an order table. You would hope that a very small percent of sales would result in returned products.
  • A "4rd Address" line in an Address table. Most mailing addresses, even if you need a Department name and a "Care Of" probably don't need 4 separate lines.
  • A "Suffix" column in a customer table. A fairly low percent of people have a "Jr." or "III" or "Esquire" after their name.
BradC
A: 

You're reading it wrong - it never takes 4x the space.

Specifically, it says 4* (4 bytes, see footnote), not 4x (multiply by 4). The only case where it's exactly 4x the space is a char(4), which would see savings if the NULLs exist more than 64% of the time.

"*The length is equal to the average of the data that is contained in the type, plus 2 or 4 bytes."

mbourgon
A: 

Hello Guys, May I know why Sparse columns which are having non null values takes 4Bytes of extra space..I am finding out why it is taking 4 bytes of extra space.. Please help me out?

Vinni