tags:

views:

2079

answers:

6

Anyone know of any Perl module to escape text in an XML document?

I'm generating XML which will contain text that was entered by the user. I want to correctly handle the text so that the resulting XML is well formed.

+4  A: 

Use XML::Code.

From CPAN

XML::code escape()

Normally any content of the node will be escaped during rendering (i. e. special symbols like '&' will be replaced by corresponding entities). Call escape() with zero argument to prevent it:

        my $p = XML::Code->('p');
        $p->set_text ("—");
        $p->escape (0);
        print $p->code(); # prints <p>&#8212;</p>
        $p->escape (1);
        print $p->code(); # prints <p>&amp;#8212;</p>
joe
+2  A: 

XML::Entities:

use XML::Entities;
my $a_encoded = XML::Entities::numify('all', $a);

Edit: XML::Entities only numifies HTML entities. Use HTML::Entities encode_entities($a) instead

hovenko
XML::Entities::numify seems only to convert named XML entities to numeric XML entities.
coldeq
You are right, my mistake. It is possible to use HTML::Entities and encode_entities instead.
hovenko
+6  A: 

I am not sure why you need to escape text that is in an XML file. If your file contains:

<foo>x < y</foo>

The file is not an XML file despite the proliferation of angle brackets. An XML file must contain valid data meaning something like this:

<foo>x &lt; y</foo>

or

<foo><![CDATA[x < y]]></foo>

Therefore, either:

  1. You are not asking for escaping data in an XML file. Rather, you want to figure out how to put character data in an XML file so the resulting file is valid XML; or

  2. You have some data in an XML file that needs to be escaped for some other reason.

Care to elaborate?

Sinan Ünür
To the person who downvoted: What exactly was wrong with what I said above?
Sinan Ünür
People get mad when you remind them that their pseudo-XML is not actually real XML. It is amusing... and sad. Anyway, I upvoted you :)
jrockway
My question would be #1. I didn't realise my question wasn't clear. I'll update the question to clarify.
coldeq
+3  A: 

I personally prefer XML::LibXML - Perl binding for libxml. One of the pros - it uses one of the fastest XML processing library available. Here is an example for creating text node:

my $doc = XML::LibXML:Document->new('1.0',$some_encoding);
my $element = $doc->createElement($name);
$element->appendText($text);
$xml_fragment = $element->toString();
$xml_document = $doc->toString();

And, never, ever create XML by hand. It's gonna be bad for your health when people find out what you did.

zakovyrya
Point taken. I shouldn't have created the XML by hand (they were simple XML documents when I started). I'll need to get around to rewriting those bits of code.
coldeq
I've accepted this answer not for the XML::LibXML recommendation (I used XML::Writer) but for pointing out that it is not good practice to create XML by hand.
coldeq
A: 

After checking out XML::Code as recommended by Krish I found that this can be done using the XML::Code text() function. E.g.,

use XML::Code;
my $text = new XML::Code('=');
$text->set_text(q{> & < " ' "});
print $text->code(); # prints &gt; &lt; &amp; " ' "

Passing '=' creates a text node which when printed doesn't contain tags. Note: this only works for text data. It wont correctly escape attributes.

coldeq
A: 

Use

XML::Generator

require XML::Generator;

my $xml = XML::Generator->new( ':pretty', escape => 'always,apos' );

print $xml->h1( " &< >non-html plain text< >&" );

which will print all content inside the tags escaped (no conflicts with the markup).

muenalan