views:

49

answers:

1

I have characters like these on our web site: Fémnyomó

That is a street address, entered in another language (though I do not know which). Here's the db setup:

mysql 4.1.2log
charset cp1252 West European (latin1)

I'm using PHP but without mbstrings() (though I do no string conversions on this address, just echo).

If I changed the mysql charset from cp1252 to UTF-8 and made sure I used things like header( 'Content-Type: text/html; charset=UTF-8' ); would that improve my situation? Or, is the data hosed because it was saved in the cp1252 charset and there is nothing I can do? The original database was created in 2002 and has been used/expanded since. We've upgraded servers and re-imported dumps but ashamedly I admit to not giving charsets much thought.

If I'm hosed, I'll probably just remove the text in those fields but I'd like to support unicode going forward, so if I issue ALTER database_name DEFAULT CHARACTER SET utf8; will that make sure future multibyte encodings are saved correctly, taking at least storage out of the equation (leaving me to worry about PHP)?

Thanks -

A: 

1) Convert all charsets to UTF8:

ALTER database_name DEFAULT CHARACTER SET utf8;

2) Issue this before any query on the page:

mysql_query("set names 'utf8'");

3) Use this header:

header( 'Content-Type: text/html; charset=UTF-8' );

4) Insert this meta tag:

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/>

5) Also, read this: http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html

shamittomar
@shamittomar - thanks. I've done all 5 steps but still getting the same result. To eliminate possibilities, I'm connecting in the mysql monitor on localhost (this box is running MySQL 5.1.41-3ubuntu12.3-log) and have set these to utf8 (reflected using `show variables`) - character_set_client, character_set_connection, character_set_database, character_set_results, character_set_server, character_set_system. The result is the same -- this comes straight from mysql monitor: Fémnyomó ut 1
Hans
That link in step 5 is probably going to end up being what I do. I'll probably just have to go through these addresses and change them manually (after converting everything to UTF-8). Is there any danger after I convert the db to UTF-8 that I'll end up with these kinds of characters from the old latin1 charset it used to be in?
Hans
For the danger you mentioned, have a backup of the database. If you screw up anything, you can restore.
shamittomar