I need to postprocess some HTML that has bad structure -- e.g.
<html>
<body>...</body>
<body>...</body>
</html>
What's the best way to transform this HTML so that the contents of the second body appear inside the first, except of course the extra body tag? I don't want to manipulate anything else with this rule.
I've thought of matching on the html tag and handling it from there using explicit apply-templates calls, but it seems a little sloppy to me. I know how to match the spurious bodies ("body[position() > 1]") but I'd like some ideas on how best to write the transform.
Edit: I do need to apply other templates to children of all of these elements, so a simple copy won't work.
And I would like to preserve comments and processing instructions. I want pretty much the entire document as an identity transform, except for these multiple bodies and some other minor edits, which I am already doing successfully.
Edit 2: It is important to keep the children of the second body element in the above example. They should be children of the first body tag in the output, at the end of the child nodes of the first body tag.
Edit 3: Here is some illustrative input/output (not checked for validity):
<html>
<!-- Look at my comments -->
<head>
<title>My title!</title>
<!-- Commentary -->
</head>
<body>
<p>Something <b>bold</b></p>
</body>
<body>
<!-- heh -->
<p>Some bozo put my parent in here.</p>
</body>
<body>
<p>More stuff here</p>
</body>
</html>
needs to be:
<html>
<!-- Look at my comments -->
<head>
<title>My title!</title>
<!-- Commentary -->
</head>
<body>
<p>Something <b>bold</b></p>
<!-- heh -->
<p>Some bozo put my parent in here.</p>
<p>More stuff here</p>
</body>
</html>