tags:

views:

760

answers:

2

html Tidy gives this as output for some reason:

<?xml version="1.0" encoding="utf-16"?>
<?xml version="1.0" encoding="utf-16"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"&gt;

<html xmlns="http://www.w3.org/1999/xhtml"&gt;
<head>
<meta name="generator" content=
"HTML Tidy for Linux/x86 (vers 11 February 2007), see www.w3.org" />
<meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5" />

...rest of document

So there are 2 xml headers, and of the wrong type (not UTF-8). Is there a way to remove the 2nd header, change it to UTF-8, and also remove the DOCTYPE with XSL?

+2  A: 

Yes. Create a template that matches the first child element you want to accept and then have it just output the content of that element.

Hank Gay
...which results in an error because i have a DTD in my xml file...
Dr. Hfuhruhurr
in really rough pseudo-xsl:<xsl:template match="html"><xsl:copy-of select="."/></xsl:template>That shouldn't give you a DTD in the output file. Or is your XSL engine complaining about the one in your input file? Mine complained about the double `<?xml` directives, but you can `sed` that out.
Hank Gay
yes it was complaining about the DTD in the input file, so no parsing has been done... (thx anyway)
Dr. Hfuhruhurr
+3  A: 

I think that it would be better to fix the original problem. Do you use the HTML Tidy library?

Try setting output-encoding to utf8 and add-xml-decl to false. The DOCTYPE node can be suppressed by setting the doctype property to omit.

0xA3