views:

294

answers:

1

There are plenty of articles on the web detailing why you might not want to use Apache's default inode-mtime-size format for ETags.

But I have yet to read anything on what might have motivated the inclusion of inode for Apache in the first place. On the face of it, it only seems useful if one needs to be able to differentiate between octet-for-octet facsimiles of the same resource, but this is surely counter to the very purpose of ETags.

Apache's authors are not known for their sloppy handing of internet standards, so I feel I must be missing something. Can anyone elaborate?

EDIT: I ask this here rather than on ServerFault.com because I'm implementing a web server rather than administering one. To read more about why it's a bad idea, see e.g. here or here. All such articles recommend the same thing: remove inodes from your etags. The question is, is there any advantage whatsoever to them being there?

+1  A: 

It seems like the kind of thing one could easily do by a wrong guess for what's the common case, or by preferring correctness over performance, by default, whenever there's a shred of doubt.

Allow me to make up a story about how it might have gone:

They decide early that a hash/checksum on the contents is a bad idea for performance reasons. "Who knows how big the file might be? We can't recalculate those all the time..." So they decide size and date get you pretty close.

"But wait," person A says, "nothing guarantees you don't have a file size collision. In fact, there are cases, such as firmware binaries, when the file size is always the same, and it's entirely possible that several are uploaded from a dev machine at the same time, so these aren't enough to distinguish between different contents."

Person B: "Hmm, good point. We need something that's intrinsically tied to the contents of the file. Something that, coupled with the modified time, can tell you for certain whether it's the same contents."

Person A: "What about the inode? Now, even if they rename the files (maybe they change "recommended" to a different file, for example), the default etag will work fine!"

Person B: "I dunno, inode seems a bit dangerous."

Person A: "Well, what would be better?"

Person B: "Yeah, good question. I guess I can't think what specifically is wrong with it, I just have a general bad feeling about it."

Person A: "But at least it guarantees you'll download a new one if it's changed. The worst that happens is you download more often than you need to, and anybody who knows they don't have to worry about it can just turn it off."

Person B: "Yeah, that makes sense. It's probably fine for most cases, and it seems better than the easy alternatives."

Disclaimer: I don't have any inside knowledge about what the Apache implementers could have been thinking. This is all just hand-wavy guessing, and trying to make up a plausible story. But I've certainly seen this kind of thing happen often enough.

You never know what it was that you didn't think of (in this case, that redundant load-balanced servers serving the same files was more typical than having to worry about size+time collisions). The load balancer isn't part of apache, which makes it easier to make such an oversight.

Plus, the failure mode here is that you didn't make perfectly efficient use of the cache (NOT that you got wrong data), which is arguably better, though annoying. Which suggests that even if they did think of it, they could reasonably assume somebody with enough interest to set up a load balancer would also be ok with tuning their configuration details.

PS: It's not about standards. Nothing specifies how you should calculate the etag, just that it should be enough to tell whether the contents have changed, with high probability.

GrumpyOldTroll