If served over HTTP, the file extension has no meaning. The only information that matters it the Content-Type header field where the media type of the resource is specified.
But when served from a local filesystem, the media type is normally identified by the file extension.
Edit I think the reason for why the extension .html is used even if it’s XHTML is because XHTML is HTML just with XML syntax and everyone is used to .html for HTML documents. (Although most XHTML documents are actually served as HTML as the media type text/html denotes HTML no matter what the document type declaration says.)
But again: extensions are not necessary when requested over HTTP. In HTTP the Content-Type header field tells what media type the resource should be interpreted with. So in theory you could use whatever extension you want or even use no extension at all (useful when content negotiation is used).