I'm using PHP's DOMDocument to parse and normalize user-submitted HTML using the loadHTML
method to parse the content then getting a well-formed result via saveHTML
:
$dom= new DOMDocument();
$dom->loadHTML('<div><p>Hello World');
$well_formed= $dom->saveHTML();
echo($well_formed);
This does a beautiful job of parsing the fragment and adding the appropriate closing tags. The problem is that I'm also getting a bunch of tags I don't want such as <!DOCTYPE>
, <html>
, <head>
and <body>
. I understand that every well-formed HTML document needs these tags, but the HTML fragment I'm normalizing is going to be inserted into an existing valid document.