tags:

views:

227

answers:

4

SimpleXML will convert all text into UTF-8, if the source XML declaration has another encoding. So, all the text in the resulting SimpleXMLElement will be in UTF-8 automatically.

In my case the source has the following XML decl:

<?xml version="1.0" encoding="windows-1251" ?>

What should I do so as to get normal output? Because, as you can imagine, for now I get stange symbols.

Thanks.

A: 

Maybe a stupid answer, but just don't use SimpleXML. Just use DOM.

Frank Heikens
A: 

Try using the iconv to convert the encoding.

A: 

Using the iconv() function you can convert from one encodign to another, the TRANSLIT option might work.

$xml = {STRING CONTAINING YOUR XML FILE DATA};

<?php

// convert string from utf-8 to iso8859-1
//$xml = iconv( "UTF-8", "ISO-8859-1//TRANSLIT", $xml);
$xml = iconv( "YOUR_ENCODING", "UTF-8//TRANSLIT", $xml);

?>
andreas
A: 

My advice is to use UTF-8 as source .php files encoding and (if possible) output encoding too. With gzip compression difference between size of windows-1251 and UTF-8 replies (even for mostly Cyrillic text) is minimal and UTF-8 is better in many ways. As you said, simplexml will convert windows-1251 to UTF-8 on xml import and then you don't have to worry about any encodings.

If you have to use windows-1251 for output then use something like: iconv_set_encoding("internal_encoding", "UTF-8"); iconv_set_encoding("output_encoding", "windows-1251"); ob_start("ob_iconv_handler");

One catchup for UTF-8 in PHP source files are char classes in regexps: /[ю]/ won't work as you might have expected, /(ю)/ will.

ash108