views:

93

answers:

2

I've got an XML file from which I've extracted the following text -

The Sansa Clip+ MP3 player gives you more to enjoy. Enjoy up to 2,000 songs†† with an 8GB* player, FM radio, long-life battery and voice recorder. PLUS now even more! Expand your enjoyment when you add in preloaded content cards** into the new memory card slot, including slotRadio™ and slotMusic™ cards**. Or, save your own music, podcasts, and audio books onto a microSD™/microSDHC™ memory card** to expand your play.It’s brought to you by SanDisk with awesome sound to enjoy your music. Just clip it on and enjoy more music with an incredible 15 hours† battery-fueled fun. See what you’re listening to with the bright, easy-to-read screen and intuitively searchable menus. Color your world in red, blue or sleek black undertones.

Why does it display on my webpage as below and how can I fix it automatically? Thanks.

The Sansa Clip+ MP3 player gives you more to enjoy. Enjoy up to 2,000 songs††with an 8GB* player, FM radio, long-life battery and voice recorder. PLUS now even more! Expand your enjoyment when you add in preloaded content cards** into the new memory card slot, including slotRadio™ and slotMusic™ cards**. Or, save your own music, podcasts, and audio books onto a microSD™/microSDHC™ memory card** to expand your play.It’s brought to you by SanDisk with awesome sound to enjoy your music. Just clip it on and enjoy more music with an incredible 15 hours†battery-fueled fun. See what you’re listening to with the bright, easy-to-read screen and intuitively searchable menus. Color your world in red, blue or sleek black undertones.

NOTE: I tried preinheimer's suggestion,

First I tested it with a text file which worked well.

$content = file_get_contents("test.txt");

echo htmlentities($content);

But when I tried the same thing dynamically it didn't work and left the text just the same.

$content = $responseTemp->Items->Item->EditorialReviews->EditorialReview[$j]->Content;

echo htmlentities($content);

They both contain the same text but for some reason the dynamically version doesn't work.

ANOTHER UPDATE: I tried Juan's suggestion which is a slight improvement but still doesn't reproduce correctly, replacing many charecters with a question mark. Here's what it gives me,

The Sansa Clip+ MP3 player gives you more to enjoy. Enjoy up to 2,000 songs?? with an 8GB* player, FM radio, long-life battery and voice recorder. PLUS now even more! Expand your enjoyment when you add in preloaded content cards** into the new memory card slot, including slotRadio? and slotMusic? cards**. Or, save your own music, podcasts, and audio books onto a microSD?/microSDHC? memory card** to expand your play.It?s brought to you by SanDisk with awesome sound to enjoy your music. Just clip it on and enjoy more music with an incredible 15 hours? battery-fueled fun. See what you?re listening to with the bright, easy-to-read screen and intuitively searchable menus. Color your world in red, blue or sleek black undertones.

FINAL UPDATE: Aha, my mistake, I replaced $myOutputEncoding with 'utf-8' on Juan's example and add the following in the head tags to get it working,

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+3  A: 

It sounds like a character set issue. Luckily, I wrote an article that got published today. http://phpadvent.org/2009/character-sets-by-paul-reinheimer

Check for a character set in the XML document (should be at the top, probably UTF-8), then try serving your page with the same character set.

preinheimer
I just checked, the xml file doesn't seem to have a character set. Its a response from amazon's AWS.
usertest
how about the Content-Type response header, it sometimes includes charset
gnarf
Thanks for the suggestion, it was UTF-8
usertest
+2  A: 

Since you dont know the original encoding, you can try guessing with mb_detect_encoding like so

$content = $responseTemp->Items->Item->EditorialReviews->EditorialReview[$j]->Content;
$encoding = mb_detect_encoding( $content );

$encodedText = mb_convert_encoding( $content, $myOutputEncoding, $encoding );

where $myOutputEncoding is the encoding you use. Then when you output $encodedText it should show the text correctly.

Juan
How do I decide what the encoding should be?
usertest