tags:

views:

24

answers:

1

Hi, I have an xml-file I want to parse:

<?xml version="1.0" encoding="UTF-8" ?>
<tag>û</tag>

It's perfectly parsed by firefox. But XML::Simple corrupts some data. I have a perl-program like this:

my $content = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n";
$content .= "<tag>\x{c3}\x{bb}</tag>\n";

print "input:\n$content\n";

my $xml = new XML::Simple;
my $data = $xml->XMLin($content, KeepRoot => 1);

print "data:\n";
print Dumper $data;

and get:

input:
<?xml version="1.0" encoding="UTF-8" ?>
<tag>û</tag>

data:
$VAR1 = {
          'tag' => "\x{fb}"
        };

it doesn't seem to be what I expected. I think there some encoding issues. Am I doing something wrong?

UPD: I thought that XMLin returned text in utf-8 (as the input). Just added

encode_utf8($data->{'tag'});

and it worked

A: 

Hexadecimal FB (dec 251) is ASCII code of "û" character. Could you please elaborate on what you expected to get in the data structure which leads you to conclude what you got was "corrupt"?

DVK
My script parses some xml (containing characters like that) and generates another xml. That another xml seemed to be malformed, because parser failed on that characters.
pacefist
I thought that xmlin returned text in utf-8, and genereted my own xml in with 'encoding=utf-8'
pacefist