views:

22

answers:

3

Hello.

When I try to write UTF-8 Strings into an XML file using DomDocument it actually writes the hexadecimal notation of the string instead of the string itself.

for example:

ירושלים

instead of: ירושלים

any ideas how to resolve the issue?

+2  A: 

Apparently passing the documentElement as $node to saveXML works around this, although I can't say I understand why.

e.g.

$dom->saveXML($dom->documentElement);

rather than:

$dom->saveXML();

Source: http://www.php.net/manual/en/domdocument.savexml.php#88525

Paul Annesley
A: 

When I created the DomDocument for writing, i added the following parameters:

dom = new DOMDocument('1.0','utf-8');

these parameters caused the UTF-8 string to be written as is.

ufk
+1  A: 

Ok, here you go:

$dom = new DOMDocument('1.0', 'utf-8');
$dom->appendChild($dom->createElement('root'));
$dom->documentElement->appendChild(new DOMText('ירושלים'));
echo $dom->saveXml();

will work fine, because in this case, the document you constructed will retain the encoding specified as the second argument:

<?xml version="1.0" encoding="utf-8"?>
<root>ירושלים</root>

However, once you load XML into a Document that does not specify an encoding, you will lose anything you declared in the constructor, which means:

$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadXml('<root/>'); // missing prolog
$dom->documentElement->appendChild(new DOMText('ירושלים'));
echo $dom->saveXml();

will not have an encoding of utf-8:

<?xml version="1.0"?>
<root>&#x5D9;&#x5E8;&#x5D5;&#x5E9;&#x5DC;&#x5D9;&#x5DD;</root>

So if you loadXML something, make sure it is

$dom = new DOMDocument();
$dom->loadXml('<?xml version="1.0" encoding="utf-8"?><root/>');
$dom->documentElement->appendChild(new DOMText('ירושלים'));
echo $dom->saveXml();

and it will work as expected.

Gordon
thank you for the descriptive answer.
ufk