ansaurus

Question

Coldfusion XMLFormat() not converting all characters

Answer 1

+3 A:

Are you sure to output the file in the right encoding? You can't just do

<cffile action="write" file="foo.xml" output="#xml#" />

as the result very likely diverges from the character set your XML is in. Unless otherwise noted (by an encoding declaration), XML files are treated as UTF-8, and you should do:

<cffile action="write" file="foo.xml" output="#xml#" charset="utf-8" />
<!--- and --->
<cffile action="read" file="foo.xml" variable="xml" charset="utf-8" />

Tomalak 2009-11-06 20:40:23

I'm trying to use cfheader and cfcontent to serve the xml document as an actual xml document.

Jason 2009-11-06 20:58:05

So there is no safe to/load from disk part involved on the server side? If that's the case, how is the file served (check with HeaderSpy, for example)? Do file declaration and served encoding match?

Tomalak 2009-11-06 21:51:24

Also, have you considered DOM functions (`XmlNew()` et al.) to build the file, instead of string concatenation and `XmlFormat()`?

Tomalak 2009-11-06 21:53:12

Answer 2

A:

Do not forget also to put <cfprocessingdirective pageencoding="utf-8"> on top of your template.

rparente 2009-11-06 20:44:59

This is probably useless, as it only describes the encoding the CFML source file itself is in. Most of the time, there is no need to set `pageencoding`.

Tomalak 2009-11-06 21:56:48

Answer 3

+1 A:

I feel that this is a bug in XMLFormat. I am not sure who the original author of the snippet below is but here is an approach to catch the extra characters via regex...

  <cfset myText = xmlFormat(myText)>

  <cfscript>
      i = 0;
      tmp = '';
      while(ReFind('[^\x00-\x7F]',myText,i,false))
      {
        i = ReFind('[^\x00-\x7F]',myText,i,false); // discover high chr and save it's numeric string position.
        tmp = '&##x#FormatBaseN(Asc(Mid(myText,i,1)),16)#;'; // obtain the high chr and convert it to a hex numeric chr.
        myText = Insert(tmp,myText,i); // insert the new hex numeric chr into the string.
        myText = RemoveChars(myText,i,1); // delete the redundant high chr from string.
        i = i+Len(tmp); // adjust the loop scan for the new chr placement, then continue the loop.
      }
      return myText;
  </cfscript>

kevink 2009-11-06 20:48:03

Answer 4

A:

if your trying to return your XML directly to the browser, you might want to try something like for the user to download it

<cfheader name="Content-Disposition" charset="utf-8" value="attachment; filename=export.xml">
<cfcontent variable="#someXMLPacket#" type="text/xml"  reset="true">

or, if you want it returned as a webpage (ala REST) then this should do the trick

<cfheader charset="utf-8">
<cfcontent variable="#someXMLPacket#" type="text/xml"  reset="true">

hope that helps

LucasS 2009-11-10 20:58:06

Answer 5

A:

Unfortunately, XMLFormat is just not an all-inclusive solution. It has a very limited list of characters that it will replace [documentation].

You'll need to do custom encoding of characters that are invalid for XML but not covered by XMLFormat.

It's definitely not very efficient, but a potential solution would be to loop over the content of typically-suspect fields (anything user-generated, for starters) character-by-character, checking the ascii code, and if it's above 255, either omit the character or properly encode it.

Adam Tuttle 2009-11-11 21:35:30

First, non-ASCII characters aren't the issue per se, since XML was designed with Unicode in mind, and is assumed to be UTF-8 text unless otherwise noted. Second, the range of sneaky Windows characters that tend to produce the most trouble are less than 255; the troublesome quotation marks, in particular, are 145-148.

Sixten Otto 2009-11-12 06:46:07

Answer 6

A:

This was a huge issue for me as well, and it turns out charset is the main factor, you need to clearly specify the correct charset.

For me I was having foreign languages inside xml, and wouldn't be parsed correctly until i put in the correct charset...

crosenblum 2009-12-22 02:08:38

ansaurus

tags:

views:

answers:

Coldfusion XMLFormat() not converting all characters

related questions