views:

74

answers:

1

This has been always a problem for me , Character problem . I always tried to solve my problem with little patches , actually this never solves my problem in reality.So I am looking for very strong solution to solve all these problems.I want to learn how big apps(facebook , google, other multi lingual ajax apps and apis) solve this problem. I want a solution which will solve all my character encoding , etc problems.I use php, mysql, html and javascript to create my application , so the solution should solve all problems or all these languages together.If you write full configuration this is perfect , but if there is a long long document , I can read it to . I need help . Thank you . I can not transfer string(text) correctly through all these languages

  1. Also I pull data from external apis.How should I take care of them
+4  A: 

It's pretty easy if you just stick to using Unicode everywhere.

  • set MySQL table encodings to UTF-8
  • make sure you're talking to the database in UTF-8 by running SET NAMES utf8
  • save all your source code in UTF-8
  • when manipulating strings in PHP which may contain UTF-8 characters, use the mb_ functions
  • send HTTP Content-Type headers denoting that the content is in UTF-8
  • Javascript is intrinsically UTF-8, so you should have no worries there

The thing is that different technologies default to different character encodings. Unfortunately strings do not have implicit encoding metadata attached, they're just sequences of bytes. Unless being told, the receiver of a string can only make a best guess what encoding that sequence is supposed to be in. Whenever connecting two pieces of anything, you need to make sure they're using the same encoding (or you need to specifically convert from one encoding to the other). Always assume that you have to define the encoding somewhere, how exactly that needs to be done depends on the technology.

deceze
chryss
@chryss Hadn't seen that retroactively added bit about APIs. I think it falls into what I have written already though. :)
deceze
Well, yes, but I am the "explicit is better than implicit" kind. Nice answer in any event.
chryss
@chryss Well sure, it never hurts. :)
deceze
Important, set MySQL to utf8-bin if version is below 5.3 and to utf8-ucs4 if it is above 5.3. This will asure that you do not loose characters above the BMP.
Sorin Sbarnea