ansaurus

Question

Answer 1

+5 A:

Use iconv(). It will allow you to use native characters in UTF-8 as well - no need for HTML entities.

$data = iconv("ISO-8859-1", "UTF-8", $text);

when doing encoding from UTF-8 to another character set, use IGNORE or TRANSLIT to drop or transliterate non-translatable characters.

alternatively, the mb_* functions shown by @Gumbo will work as well.

Pekka 2010-10-21 18:31:26

+1, possibly add `//TRANSLIT` to prevent characters that can't be represented in ISO-8859-1 to break the string.

Wrikken 2010-10-21 18:44:50

@Wrikken good point, added.

Pekka 2010-10-21 18:48:56

Um, the character set of the ISO 8859-1 is a subset of the Unicode character set. So there is no need to ignore or transliterate anything because there is no difference: charset(ISO 8859-1) \ charset(Unicode) = ∅.

Gumbo 2010-10-21 18:52:21

@Gumbo of course, I wasn't thinking. Fixed, cheers

Pekka 2010-10-21 19:10:26

Don't forget to *also* modify any `META` tag that gives the charset, since it will probably be inaccurate afterwards.

Ignacio Vazquez-Abrams 2010-10-21 19:14:48

@Ignacio Vazquez-Abrams: An XML feed probably doesn’t have a `META` element – at least not those I know of.

Gumbo 2010-10-21 19:24:42

Brilliant thanks Pekka

Liam Bailey 2010-10-21 20:21:18

Answer 2

+1 A:

You can also use utf8_encode or mb_convert_encoding:

$desc = utf8_encode($desc);
// OR
$desc = mb_convert_encoding($dest, 'UTF-8', 'ISO-8859-1');

Both will convert the encoding from ISO 8859-1 to UTF-8.

Gumbo 2010-10-21 18:34:26

ansaurus

tags:

views:

answers:

iso-8895-1 to xml acceptable UTF-8

related questions