tags:

views:

801

answers:

3

I use XSLT to transform an XML document which I then load on to a ASP.NET website. However, if the XML contains '<' characters, the XML becomes malformed.

<title><b> < left arrows <b></title>

If I use disable-output-escaping="yes", the XML cannot be loaded and I get the error "Name cannot begin with the '' character".

If I do not disable output escaping the escaped characters are disregarded and the text appears as it is:

<title><b> < left arrows <b></title>

I want the bold tags to work, but I also want to escape the '<' character. Ideally

<b>&lt; left arrows</b>

is what I want to achieve. Is there any solution for this?

+2  A: 

The XML should contain the escaped sequence for the less than sign (&lt;), not the literal < character. The XML is malformed and any XML parser must reject it.

In XSLT you could generate that sequence like this:

<xsl:text>&amp;lt;<xsl:text>
Gerco Dries
But I supposed that's to hardcode the '<' character using xsl. Any other ways to do it?
A: 

From what I understand, the input contains HTML and literal < characters. In that case, disable-output-escaping="yes" will preserve the HTML tags but produce invalid XML and setting it to no means the HTML tags will be escaped.

What you need to do is to leave set disable-output-escaping="no" (which is the default, you don't actually have to add that) and add a XSLT rule that will copy the HTML tags. For instance:

<xsl:template match="*">
 <xsl:copy>
  <xsl:copy-of select="@*" />
  <xsl:apply-templates />
 </xsl:copy>
</xsl:template>
Josh Davis
Thanks for the input. However, I have another problem, if the tags are enclosed in a cdata. How can i still do a template match of the html tags? Given that I have no control over the xml document.
A: 

I came up with a solution and was triggered by the last answer by Josh. Thanks Josh. I tried to used the match template, however I had a problem as the html tags are placed within cdata, so I had difficulties doing a match. There might be a way to do it, but I gave up on that.

What I did was to do a test="contain($text, $replace)" where the $replace is the '<' character and on top of that, I also added a condition to test if the substring after the '<' is a relevant html tag such that it is actually a <b> or </b>. So if it's just a '<' character not belonging to any html tags, I will convert '<' to ampersand, &amp;lt;. Basically that solved my problem. Hope this is useful to anyone who encounter the same problem as me.