The project I work on takes random HTML files, converts them to XHTML as best as it can, and wraps them with some XML metdata. The DOCTYPE is stripped out as the resulting XML file is not an XHTML document. However when retrieving the wrapped XHTML from the XML file the DOCTYPE should be reinserted.
Because these are random HTML files they could contain any content, but I would prefer to not have to store or determine the original DTD. I figured that I should the Frameset DTD as it was just a superset of the Transitional DTD and would be valid for all content. However when using the W3C XHTML Validator with the same document, using the Transitional DTD passes but using the Frameset DTD fails.
I've stripped down the document to the minimum with which I can reproduce the problem. Here is the Frameset version:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Make The Move</title>
</head>
<body style="background: none;">
<h3 id="why">Why should I move to Linux?</h3>
</body>
</html>
And here is the Transitional version:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Make The Move</title>
</head>
<body style="background: none;">
<h3 id="why">Why should I move to Linux?</h3>
</body>
</html>
Please explain why this is happening, and how I should proceed.