views:

30

answers:

2

For simplicity suppose I have one varchar(50) per row and these are my rows:

1) The quick brown fox jumps over the lazy dog 2) I like brown foxes 3) fox 4) fox

So the page compression would find the word fox for instance and put it in the dictionary. Would only rows 3 and 4 benefit and rows 1 and 2 would not because they contain other data as well? This is what seems to be happening empirically. I have a bunch of XML data with highly repeating segments (angle brackets, repeated attribute names, etc) which compresses fantastically if I save it out to a CSV and zip it as a control case but I get almost 0% compression in SQL server. I've confirmed that SOME compression does take place by looking at sys.dm_db_index_physical_stats page_count and compressed_page_count so I know there's nothing in my structure inherently preventing compression altogether like too large rows.

The example here only includes whole columns and not subsets hence why I am asking explicitly.

http://blogs.msdn.com/sqlserverstorageengine/archive/2008/01/18/details-on-page-compression-page-dictionary.aspx

A: 

See Page Compression Implementation

As I understand, the value 'fox' will be placed in the dictionary and the dictionary value will replace all occurences on the page, so all rows should be affected.

Is your data stored in (n)varchar columns or in XML columns?

Remus Rusanu
A: 

I forgot to mention that I confirmed that in addition to the empirical testing I did I confirm this with Microsoft. Yes, Dictionary Data Compression is for WHOLE column or at least PREFIX matches only.

ss2k