views:

102

answers:

2

I'm not too fluent with the perl XML libraries (actually, I really suck at understanding encoding in general), all I'm doing is taking a string that possibly has characters such as "à" and putting it in an XML file, but when I open the file, I get an encoding error at the line containing such a character.

So I just need a lightweight way to take a string and encode it for XML.

+2  A: 

Your XML should specify UTF-8 encoding. For example:

<?xml version="1.0" encoding="UTF-8" ?>

There's a lot of good information at UTF-8 and Unicode Standards.

Your Perl program should also be set its output filehandle to the UTF-8 encoding so it writes the data correctly. See the perl documentation for open, for instance.

The only XML-specific escaping you need is for the XML reserved characters. See Where can I get a list of the XML document escape characters? on Stackoverflow.

You can use Perl's XML::Code or a similar module to escape the XML-specific chars

Larry K
Answer could be a lot better: ① Sloppy hypertext copy (use of "here", anaemic link texts). ② Potentially confusing information: `use utf8;` is in fact only necessary to tell Perl that the programmer has encoded the Perl source code in UTF-8. This is a different concept than writing files with an encoding which was asked in the question. ③ Inaccurate terminology: Escaping is needed for *delimiters* (XML §4.6). ④ Lack of permalink to the module.
daxim
I've fixed all of that. I think you missed the biggest problem: you shouldn't be writing out XML by hand, so you should never have to even think about this. Use a module that does it all for you. :) Oh, and get a little more rep and you can fix things too :)
brian d foy
+1  A: 

Example using LibXML, which is the standard big hammer for XML. Not lightweight, but your problem really is a familiar nail and at least we're not spending much time writing code, either.

use XML::LibXML ();
XML::LibXML::Document->new('1.0', 'UTF-8')->createTextNode($text)->toString; # returns properly encoded fragment

See method toFile for writing into a file.

daxim