views:

56

answers:

1

I read somewhere that organizing HTML attributes in a certain order can improve the rate of compression for the HTML document. (I think I read this from Google or Yahoo recommendation for faster sites). If I recall correctly, the recommendation was to put the most common attributes first (e.g. id, etc.) then put the rest in alphabetical order.

I'm a bit confused by this. For example, if id attributes were put right after every p tag, the id would contain unique values. Thus, the duplicated string would be limited to this: <p id=" (say there were <p id="1"> and <p id="2"/>). Because the value of id needs to be unique, I see this as actually causing an adverse effect to the compression.

Am I wrong?

If I needed to go through a static web page with randomly ordered attributes, what logic should I use to organize attributes to achieve maximum compression?

NOTE: I'm talking GZIP compression (if that matters): http://www.gzip.org/algorithm.txt

+3  A: 

Your aim would be to encourage repeated content. So <p class="foo" id="a">bar</p>...<p class="foo" id="b">bof</p> might indeed be easier to compress than <p id="a" class="foo">bar</p>...<p id="b" class="foo">bof</p>, and both would typically compress easier than <p class="foo" id="a">bar</p>...<p id="b" class="foo">bof</p>.

But really, the difference is going to be minuscule. You'd be much better off just writing your markup in the most readable fashion for your own benefit and letting mod_deflate get on with its job. You're going to have to go a long way to save even a single TCP packet with this kind of micro-optimisation, and second-guessing the compressor at a micro level can often generate unexpected, possibly negative results.

For some elements readability might well also mean putting the ‘common’ attributes first, eg <input type> is usually the first listed attribute; typically you'll work out your own attribute order style and if it's consistent I suppose that'll save you a few bytes here and there. I wouldn't choose raw alphabetical as the consistent order. All that has going for it is that it's what Canonical XML will produce.

Even google.com's front page, infamous for its dedication to shaving off bytes at the expense of readability, basic validation and every kind of good practice, doesn't bother use one consistent order for attributes.

bobince