ansaurus

Question

Problem with simpleXML and entity not being defined

Answer 1

+2 A:

I think this is an encoding problem. php, simplexml in this particular case, does not like the danish O you've got in that fornames tag. You could try to encode the whole file in utf-8 and removing the escaped version from the tag by that. Aferwards you can read a fully escaped character free file into simplexml.

K

KB22 2009-09-15 12:29:13

not sure what you mean. This xml file is encoded as ISO-8859-1 (<?xml version="1.0" encoding="iso-8859-1"?>).

Maarten 2009-09-15 12:32:13

Right: use utf-8 instead of iso-8859-1

Nerdling 2009-09-15 12:50:13

yepp, and make use of utf8_encode() for the actual encoding of the text.

KB22 2009-09-15 12:58:15

that'd make sense if I were the author, but I'm on the parsing end so to say ;-)

Maarten 2009-09-15 13:00:59

You got the file, so you can read it line by line and encode it - can't you? I happend to write a xmlfilter application once for a japanese customer. And belive me, doing this extra step before the actual parsing payed... ;)

KB22 2009-09-15 15:29:09

Answer 2

+1 A:

HTML Encoding of Latin1 characters (like Ø, what that character describes) is what has broken the XML parser. If you're in control of the data, you need to escape it using XML style character encoding (Ø just happens to be & #216;)

squeeks 2009-09-15 12:31:55

thanks. So this is a broken XML file actually?

Maarten 2009-09-15 12:39:07

Yes, unforgiving XML parsers break when they are expecting XML-style encoding of non-ASCII characters and are given HTML-style encoding instead.

squeeks 2009-09-15 12:41:45

ok. So I'm just parsing this. I looked at the table from Björn's answer, and it works for my first example, but the next problem is this entity which is not in that table: . Is there a more stable solution?

Maarten 2009-09-15 12:48:11

XSLT transforming the document before you pass it off to an XML parser would be one solution.

squeeks 2009-09-15 12:54:46

Answer 3

+3 A:

Björn 2009-09-15 12:32:37

Thanks so much for the table Björn, saved my ass!

FFish 2010-03-02 13:50:52

Answer 4

A:

Try to use this line:

<forenames><![CDATA[B&Oslash;IE]]></forenames><x> </x>

and read this about CDATA

lg 2009-09-15 12:33:00

ok, but this is not my XML, I'm just parsing it.

Maarten 2009-09-15 12:38:28

Before parsing you should insert CDATA tag for every entity with "strange" characters.

lg 2009-09-15 12:50:42

if it's got this error in it, then it's not valid xml to begin with. up to you to tell the original authors to fix it or do this sort of check prior to parsing and wrap the invalid chunks

Nerdling 2009-09-15 12:51:12

just send them an email to discuss this indeed..

Maarten 2009-09-15 13:04:24

ansaurus

tags:

views:

answers:

Problem with simpleXML and entity not being defined

related questions