tags:

views:

199

answers:

6

I'm using XSLT to transform an XML document to text. The text nodes of the XML document have < characters in them, which of course bombs the transformation. Is there anyway to get an XSLT transformation to work with a < character in a text node? In this case, all such characters are followed by whitespace.

+1  A: 

Use entities instead of the character

<myTextTag> 1 &lt; 2, and 4 &gt; 2. This is how numbers work</myTextTag>

And there should be an option in your API to convert them on transformation/output

Aiden Bell
A: 

As long as the XML document generated replaces < with &lt; and > with &gt; any properly implemented XML parsing API will load the document and properly transform the entities into < and > characters.

Jherico
I think the error is with reading XML containing the literal '>' not the output, which as you say should transform.
Aiden Bell
But again, any reasonable XML formatting API will automatically do the translation. It almost sounds as if the questioner is constructing his XML by concatenating strings and writing the buffer to disk. Which is about as efficient as writing down 1's and 0's longhand on paper.
Jherico
01010111 01101000 01100001 01110100 00100000 01101001 01110011 00100000 01110011 01101111 00100000 01110111 01110010 01101111 01101110 01100111 00100000 01110111 01101001 01110100 01101000 00100000 01110100 01101000 01100001 01110100 00111111
Aiden Bell
A: 

If it's properly formatted XML, the < character should already be escaped with a &lt; entity. There are pre-defined entities that all xml processors should properly declare. Check out the recommendation:

All XML processors must recognize these entities whether they are declared or not. For interoperability, valid XML documents should declare these entities, like any others, before using them.

This should all be transparent to XSLT and it very well might be the case that whatever/whoever is generating the XML is not doing it according to the recommended standards.

Scott Saad
A: 

If you are outputting text, you can use,IIRC, some variant of <xsl:text disable-ouput-escaping="yes">...</xsl:text> - I don't have anything "to hand", though... perhaps try &lt; in the middle?

Marc Gravell
A: 

Agh, bad luck. To be well-formed, the characters < and > may not appear unless marking tags.

Not that that helps you. If you can't fix the source, I suggest thinking about pre-processing the incoming data to either replace with entity references as described in other answers, or enclose the offending sections in CDATA tags. You can perhaps use domain knowledge with regular expressions, or tokenization to correct the fields that might have bad data.

I don't think you can make XSLT work with badly formed XML.

Brabster
Thanks. For now I'll use the < and > strings/entities/whatevers in the source XML file, and look into a CDATA usage at a later date.
chernevik
A: 

If your XML file has literal < characters in the running text, then you don't have an XML file. You have something that's almost an XML file. Either fix the process that creates the file, or pre-process it to fix it.

Ned Batchelder
Helpful precision, thank you.
chernevik