ansaurus

Question

How can I escape HTML character entities when using ColdFusion function XMLFormat()?

Answer 1

+1 A:

It's tough when you have some HTML partially converted, and then need to do the rest...

You could replace all the "&" signs temporarily, run the XMLFormat, then convert the "&" signs back.

<cfscript>
// replace & signs with a temp placeholder
myHTML = replace(myHTML, "&", "*amp*", "all");

// format for XML
myHTML = XMLFormat(myHTML);

// replace placeholders with & signs
myHTML = replace(myHTML, "*amp*", "&", "all");
</cfscript>

If it works, you could make this one step by wrapping this logic in a single function.

Dan Sorensen 2010-02-02 22:10:10

Answer 2

+1 A:

How about simply not using — escape in the source string and instead including the character in-situ.

Edit:

I'm gonna guess that the HTML content stored in the database is not known to be XHTML compliant and hence to put it in an XML document you have no choice but to either place it in a CDATA section or encode it correctly. There is an assumption that placing it in an XML document like this is useful and that it can be properly decoded at the consuming end. This will be true of either approach if a typical XML DOM is used at the consumer.

So this leads me to this quesion, whats actually wrong with &mdash? After all < will result in < etc. When retrieved from a DOM by the consumer the resulting string will be returned to using — and < and so on, when subsequently used in as HTML all will be well.

AnthonyWJones 2010-02-02 22:13:03

This is existing content for a client which I am not at liberty to edit.

Eric Belair 2010-02-04 15:41:15

Answer 3

+4 A:

You have a few options. A lot depends on how this content is going to be used. It would be extremely helpful to include a desired output document, as well as indicate where this xml is being used.

If you don't want to mess with the content of the HTML at all, you could always use CDATA, like this:

<cfxml variable="myXML">
    <content><![CDATA[#myHTML#]]></content>
</cfxml>

Also, I know you say you don't want to convert the remaining ampersands but I just don't see how this is so. Either the HTML content is a string you want to process -- in which case, all of it should be escaped so that it can be unescaped later -- or it's valid XML that you want to be part of the document. I mean, when you process the contents of the <content> tag later on, you will run into problems if the ampersands aren't escaped.

Jordan Reiter 2010-02-02 22:16:38

I am getting the content out of a SQL Server database and putting it in an XML document so that it can be imported (along with a lot of other meta data) into a CMS. CDATA is not an option....

Eric Belair 2010-02-04 15:40:36

@Eric: Why is CDATA not an option?

AnthonyWJones 2010-02-04 16:11:33

What kind of CMS? Basically none of this makes sense. If you're importing the text, then all of it must be escaped, including the . — is totally valid and should not throw an exception in the CFXML tag. You are probably doing something wrong.

Jordan Reiter 2010-02-04 16:41:27

@Jordan, I believe it's Interwoven.@Anthony, I'm not sure why CDATA is not an option, but I think the CMS import script - out of my control - is not setup to handle it.

Eric Belair 2010-02-05 05:41:29

Okay, so Interwoven is going to import all of the text between the <content></content> tags? Is it then going to unscape it into HTML? If so, then yes, you HAVE to XMLFormat everything.

Jordan Reiter 2010-02-05 17:30:59

Answer 4

A:

For the time being, I'm simply going to replace all less-than and greater-than characters with "<" and ">", respectively.

Eric Belair 2010-02-04 15:43:34

Answer 5

A:

In this specific use case, you can use URLEncodedFormat() to preserve the natural form of the content, and then use URLDecode() on the way out.

<cfxml variable="content">
    <content><cfoutput>#URLEncodedFormat(myHTML)#</cfoutput></content>
</cfxml>
<cfset xml = xmlParse(content)>
<cfoutput>#URLDecode(xml.content.xmltext)#</cfoutput>

I'm not recommending this as a best practice, only that it would work in the scenario posed by the question.

jalpino 2010-02-07 20:57:06

ansaurus

tags:

views:

answers:

How can I escape HTML character entities when using ColdFusion function XMLFormat()?

related questions