tags:

views:

236

answers:

3

Hi,

I have a webapp that stores French text -- which potentially includes accented characters -- in a MySQL database. When data is retrieved directly through PHP, accented characters become gibbirish. For instance: qui r�fl�te la liste.

Hence, I use htmlentities() (or htmlspecialchars() ) to convert the string to html entities, and all is fine. However, when I come to output data that contains both accented characters and HTML elements, things get more complicated. For instance, <strong> is converted to &lt;strong&gt; and therefore not understood by the browser.

How can I simultaneously get accented characters displayed correctly and my HTML parsed correctly?

Thank you!

+2  A: 

Maybe you could take a look to utf8_encode() and utf8_decode()

OcuS
We had to do this when we encountered Polish characters in our SQL database, hopefully there is something similar for MySQL.
hypoxide
utf8_encode() did the trick. Thanks!
David Chouinard
+2  A: 

You should use UTF-8 encoding for storing the data in the database - then everything should work as expected and no htmlentities() will be required.

Make sure all aspect are utf-8 - the database, the tables encoding and collation, and the connection, both on the client and server side. Things might work even if not everything is utf-8, but might fail horribly when you will do backup & restore - that is why I recommend utf-8 across the board.

yhager
+1 see this SO answer: http://stackoverflow.com/questions/1344692/i-need-help-fixing-broken-utf8-encoding/1348521#1348521 for a check list.
martin clayton
@martin clayton, thanks for the link. Everything on the checklist is being respected... Furthermore the data is correctly stored in the database as UTF-8 (ie. no weird characters when I query the database directly). Any thoughts on what could cause the problem? (also, accented characters hard-coded in the HTML display properly without using html entities)
David Chouinard
Per OcuS's suggestion, I used utf8_encode() and everything works OK. Anyways, still intriguing. Thanks for your help guys!
David Chouinard
note that if you already wrote data to your db, with bad encoding settings, it will be corrupted already, and future modifications of the encoding will not help. It requires a rather involved 'mysqldump' and restore to fix malformatted data.
yhager
A: 

You could set the Collation of the database fields containing the accented character to utf8_general_ci to support them.

Eventually you can set the collation of the database as well, so all fields are set by default.

Veger