views:

108

answers:

2

Hi guys,

I've been having a lot of trouble with character sets/encoding while writing a multi-lingual web app in PHP in different places such as the shell, inside PHP itself, and in the database. I want the whole application to be UTF-8 throughout, so that I won't have to worry about converting anything back and forth anymore. Does anyone have a great tutorial or online book recommendation to learn exactly what should be considered when making such a decision, and also HOW to actually tackle such problems in PHP, MySQL and the shell?

I know it's an ambiguous question, so I really appreciate any recommendations you might have.

Thanks in Advance

+2  A: 

I'd start with the PHP UTF-8 Cheatsheet

Don't forget to set your connections!

And make sure you choose UTF-8 as the default encoding for whatever IDE you're using.

Peter Bailey
Thanks a lot Peter. Very useful stuff from the dropsend guys.
Ali
+1  A: 

Also remember to set the proper character set for your output with header('Content-type: text/html; charset=utf-8'). It can also be very helpful to use the accept-charset attribute on all of your <form> tags. Also make sure you define the character set when using htmlspecialchars. Avoid htmlentities like the plague!

Also, it is really important to note that you will not be able to cover every language by using UTF-8. Most notably, many Asian languages require UTF-16 or other larger character sets. However, UTF-8 will cover almost every language and is (mostly) compatible with Latin character sets.

shadowhand
Thank you shadowhand. Your comment on UTF-8 not handling all languages was insightful. I'll keep that in mind.
Ali