views:

73

answers:

3

Hello I have a mysql database table to store country name and currency symbol - the CHARSET has correctly set to UTF8.

This is example data inserted into the table

insert into country ( country_name, currency_name, currency_code, currency_symbol) values 
('UK','Pounds','GBP','£');

When I look in the database - the pound symbol appears fine - but when I retrieve it from the database and display it on the website - a weird square symbol shows up with a question mark inside instead of the pound symbol.

Edit In my.cnf - the characterset was set to latin1 - I changed it to utf8 - then I Logged in as root and ran \s - it returned

Server characterset:    utf8
Client characterset:    utf8

Collations

-- Database
SELECT default_collation_name
  FROM information_schema.schemata
 WHERE schema_name = 'swipe_prod';

THIS DOES NOT RETURN ANYTHING

-- Table
SELECT table_collation
  FROM information_schema.tables
 WHERE TABLE_NAME = 'country';

THIS RETURNS utf8_general_ci

-- Columns
SELECT collation_name
  FROM information_schema.columns
 WHERE TABLE_NAME = 'country';

THIS RETURNS 7 ROWS but all have either null or utf8_general_ci

PHP CODE

<?php
$con = mysql_connect("localhost","user","123456");
mysql_select_db("swipe_db", $con);
$result = mysql_query("SELECT * FROM country where country_name='UK'");
while($row = mysql_fetch_array($result))
{
  echo $row['country_name'] . " " . $row['currency_symbol'];
}
mysql_close($con);
?>

Please advice Thanks

+4  A: 

When you see that "weird square symbol with a question mark inside" otherwise known as the REPLACEMENT CHARACTER, that is usually an indicator that you have a byte in the range of 80-FF (128-255) and the system is trying to render it in UTF-8.

That entire byte-range is invalid for single-byte characters in UTF-8, but are all very common in western encodings such as ISO-8859-1.

When I view your page and manually switch the character encoding from UTF-8 to ISO-8859-1 (in Firefox with View >> Character Encoding >> Western (ISO-8859-1)) then the POUND SIGN displays properly.

So, what's wrong then? It's hard to say - there are dozens of places where this can be fouled up. But most likely it's at the database level. Setting the CHARSET on the table to UTF8 is generally not enough. All of your charsets and collations need to be in order before characters will move around the system properly. A common pitfall is improperly set connection charsets, so I'd start there.

Let me know if you need more guidance.

EDIT

To check what bytes are actually stored for that value, run this query.

SELECT hex( currency_symbol )
  FROM country
 WHERE country_name = 'UK'

If you see A3 then you know the character is stored as ISO-8859-1. This means the problem occurs during or before writing to the DB.

If you see C2A3 then you know the character is stored as UTF-8. This means the problem occurs after reading from the DB and before writing to the browser.

EDIT 2

-- Database
SELECT default_collation_name
  FROM information_schema.schemata
 WHERE schema_name = 'your_db_name';

-- Table
SELECT table_collation
  FROM information_schema.tables
 WHERE TABLE_NAME = 'country';

-- Columns
SELECT collation_name
  FROM information_schema.columns
 WHERE TABLE_NAME = 'country';
Peter Bailey
Thanks Peter - I looked into the article and tried a few suggestions like SET character_set_client = utf8; set character_set_connection = utf8; - but that does not seem to fix this.I tried the simple solution of Tom Gullen and that seems to work just fine - For a quick fix, I have replaced the pound symbol in the database with the £ - I'm sure for other symbols I will run into this problem again
Gublooo
If you're going to go the route of HTML-entitization - don't do it a character at a time - just use PHP's htmlentities function. `echo htmlentities( $text );`. But I **strongly** suggest you fix your charsets instead
Peter Bailey
Provided some diagnostic information for you.
Peter Bailey
Thanks for guiding me through this Peter - I ran the query and it return C2A3 - I also checked the characterset which I've edited in my question above and all seem properly set to utf8
Gublooo
Can you check all your collations too? And post those as well?
Peter Bailey
I have added it to the main question - Thanks
Gublooo
Those are all the available collations in the system. What I need to see is the collations for the different database entities in question. I put the queries you can run in my answer.
Peter Bailey
Thanks for providing the queries - appreciate it - I have updated the question above with the result from those queries - Thanks
Gublooo
Ok, that all looks fine. You need to start eliminating stuff. Try making a page that does nothing but select that column and echo it.
Peter Bailey
Peter - I created a simple php page which simple selects and prints this currency_symbol - you can view it here - http://www.didyouswipe.com/currency.phpI've added the php code to the question above
Gublooo
One other thing I noted was on my local host on windows when I run \s on mysql I see the following 2 variables also set Db characterset:utf8 and Conn.characterset: utf8 - but when I run the same on my prod (Linux) - these two variables are not shown. Not sure if that makes any difference
Gublooo
Try adding `mysql_set_charset( 'utf8', $con );` before the select query.
Peter Bailey
:) That fixed the problem
Gublooo
Just need to figure out how to set this mysql_set_charset( 'utf8', $con ); in Zend framework - thats what I've used to build the application
Gublooo
Here: http://framework.zend.com/manual/en/zend.db.adapter.html#zend.db.adapter.connecting.parameters
Peter Bailey
THanks for all your help and time - really appreciate it - I will look into that - meanwhile I will close this question - thanks again
Gublooo
A: 

You can check out here.I'm not sure if it works but give it a try because this is weird i've never seen something like it before.

tazphoenix
A: 

Try using

&pound;

Instead of the £ symbol, alternativly, where you print the £ sign, do:

<%=replace(stringWithPoundInIt,"£","&pound;"%>
Tom Gullen
That's a bandaid. That doesn't fix his database-to-browser character encoding problem.
Peter Bailey
Thanks Tom - this solution seems pretty simple and fixed the problem in hand. But I'm not sure if we have such alternatives for other symbols like Euro symbol € or cuban currency ₱ or Israeli currency ₪
Gublooo
A full list of entities:http://www.w3schools.com/tags/ref_entities.aspBut Peter is right it is a band aid solution, however on all my websites now I always use the HTML entity for currency symbols as these for some reason seem to screw up more often than other symbols, especially when copying and pasting code between different editors.
Tom Gullen