views:

228

answers:

6

Are there any benefits to omitting the closing body and html tags?

Why is the google.com homepage missing the closing body and html tags?

  • edit - with regards to bandwidth, it's an extremely small amount, really. Say 10 million hits @ roughly 10 bytes saved ~ 100 mb only... Are there reasons other than bandwidth?

  • edit 2 - and nope, google (yahoo, microsoft, and others) are not w3-validator compliant... when it comes to bandwidth-saving en mass vs w3-compliance, I guess the latter's good for the sacrificing?

+7  A: 

Think of how many times that page is served every day. Even small reductions in size can be significant at that volume of traffic.

(Although I work for Google, please don't treat this as an "official answer from Google" - it's a guess, but an educated one.)

Jon Skeet
+4  A: 

Apart from a gain in bandwidth, there isn't.

Just because they do it you shouldn't.

graham.reeds
+5  A: 

You're thinking of that as a standalone thing. If they've done several different things that save bandwidth then it all adds up. Each one might not be significant on it's own but if they have 5 or 10 optimisations then it's significant in total.

Also, those ten bytes may well reduce the data size enough to make it take one less TCP/IP packet which will have significantly higher savings that simply reducing the size of one.

John Burton
+9  A: 

Take a look here : http://code.google.com/intl/fr-FR/speed/articles/optimizing-html.html

Kaaviar
But I can't see what makes the closing tags optional when I look at the DTD...
graham.reeds
Those "O"s in the element definitions in the DTD say whether the opening and closing tags are optional. Technically, the HTML, HEAD, and BODY tags -- opening and closing -- are all optional.
cHao
+2  A: 

Adding to Jon Skeet's answer, this page shows there are 3 billion searches on Google per day. Don't know how accurate it is, but I would guess it's in the ball park.

</body></html> is 14 characters and at 3 billion searches per day, it amounts to approximately 39.12 GB of data per day ignoring compressions, or around 26 GB if we take gzipping into account.

However, Google might actually send the closing tags for body and html for older browsers by looking at their user agents. I cannot confirm or deny this, but looking at modern browsers - Safari, Firefox, and Chrome shows that all are missing the closing tags.

So if you think it works for you, then go for it :). There's no harm in implementing this which may or may not be a micro-optimization for you depending on the scale you're operating at.

Anurag
Even the oldest browsers I think will happily accept those tags being missing so I doubt they bother sniffing on this particular issue.
Chris
“</body></html> is 14 characters” — although they’re gzipping the page as well (I assume), so source characters don’t map directly to bytes in the file sent. I’ll bet it doesn’t save as much as you’ve estimated when gzipping’s taken into account.
Paul D. Waite
@Paul - yes, I mentioned "ignoring compressions" in my answer. For me, the uncompressed byte size is `8113`. The gzipped content is `5585` bytes which, to keep things simple, if mapped directly gives approximately one-third reduction. That would translate to `26 GB`, instead of `39`. Yes, you are right, it's a huge drop after including compression, but by all means `26 GB` itself isn't an insignificant number by any means.
Anurag
@Anurag: doh, that you did, sorry. On compression, I’ve saved the HTML source of google.com to my machine, and it comes to 17,207 bytes. (Not sure why I’m getting more code than you? I signed out of Google before saving.) When I gzip it on the command line, it goes down to 6,859 bytes. When I add `</body></html>` to the source and gzip that, it goes up to 6,865 bytes. So I reckon your one-third estimate is about right.
Paul D. Waite
@Paul - Don't save it on the filesystem as that will make the size filesystem dependent. Get the byte size and see the gzipped size with curl - `curl www.google.com | gzip | wc -c`
Anurag
@Anurag — huh! I had no idea. There is indeed a bit of a difference between curl’s output and both file’s size once saved.
Paul D. Waite
+2  A: 

I think JB is on the right track. Their home page is about 8750 bytes (for me), meaning that if they can get 1458 bytes per tcp segment, it will be six packets. The point is not so much to make it 8750 bytes rather than 8760 but to make it six packets rather than seven.

This would make a cute Google interview question. :-)

Why does the number of packets matter? Because for modern internet connections, it's the latency that matters, not the roundtrips. A full packet is barely any slower than a 1-byte packet. The value is especially important at the start of a connection when the TCP windows are still opening.

Furthermore, the more packets, the more chance one of them will be lost, perhaps causing a very long delay. The chance is not very high but if you're already as tightly-tuned as they are, it's worth it.

So should you do it? I would say generally not, unless you're confident that you really are already sending just a handful of packets for the page in question. (You should measure it from a realistic client location.) Knowing that the page validates is worthwhile.

poolie