views:

39

answers:

2

i am using Last.fm API to fetch some info of artists .I save info in DB and then display on my webpage. But characters like “ (double quote) are shown as “ .

Example Artist info http://www.last.fm/music/David+Penn

and i got the first line as " Producer, arranger, dj and musician from Madrid-Spain. He has his own record company “Zen Recordsâ€, and ".

Mine Db is UTF-8 but i dunno why this error is still coming .

+2  A: 

This seems to be a character encoding error. Confirm that you are reading the webpage as the correct encoding and are showing the results in the correct encoding.

Sjoerd
what encoding i should use ?
Arsheep
+1  A: 

You should be using UTF-8 all the way through. Check that:

  1. your connection to the database is UTF-8 (using mysql_set_charset);

  2. the pages you're outputting are marked as UTF-8 (<meta http-equiv="Content-Type" content="text/html;charset=utf-8">);

  3. when you output strings from the database, you HTML-encode them using htmlspecialchars() and not htmlentities().

htmlentities HTML-encodes all non-ASCII characters, and by default assumes you are passing it bytes in ISO-8859-1. So if you pass it encoded as UTF-8 (bytes 0xE2, 0x80, 0x9C), you'd get &acirc;&#128;&#156;, instead of the expected &ldquo; or &#8220;. This can be fixed by passing in utf-8 as the optional $charset argument.

However it's usually easier to just use htmlspecialchars() instead, as this leaves non-ASCII characters alone, as raw bytes instead of HTML entity references. This results in a smaller page output, so is preferable as long as you're sure the HTML you're producing will keep its charset information (which you can usually rely on, except in context like sending snippets of HTML in a mail or something).

htmlspecialchars() does have an optional $charset argument too, but setting it to utf-8 is not critical since that results in no change of behaviour over the default ISO-8859-1 charset. If you are producing output in old-school multibyte encodings like Shift-JIS you do have to worry about setting this argument correctly, but today that's quite rare as most sane people use UTF-8 in preference.

bobince