views:

94

answers:

3

I have a website that allows users from around the world to submit profiles. Somewhere between storing/retrieving/displaying the characters, they are not rendering correctly. I'm not sure which step is having problems, but here is a breakdown of what is happening.

When I do a SELECT from my PostgreSQL DB via the psql command line interface, I see some characters such as the following appearing, which makes me believe they are saving correctly:

  • å

However, on my website, I'm seeing the above characters appearing as follows, respectively:

  • â��
  • â�¦
  • Ã¥

I have tried changing the encoding in the header, with no luck, from:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

to:

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />

I'm just looking for some insight on any PHP settings / functions, PostgreSQL settings, HTML charsets, or anything else that I should be looking into to make sure everything displays properly for my users.

A: 

I'm guessing the problem is in the browser or the webserver--those are UTF-8 sequences being misread as Latin-1. If the webserver sends an HTTP header labeled as Latin-1, which many do, that overrides anything in the document. The webserver needs to either not declare a character set (in which case the document is consulted), or declare the correct one (which is UTF-8). If that is impractical, a workaround is to use &# references for characters outside the base 0-127 ASCII range when presenting them as HTML.

silverpie
+1  A: 

You probably need to set your client encoding in Postgres. http://developer.postgresql.org/pgdocs/postgres/multibyte.html

Also, you may have to do it in the HTTP header (rather than just the meta tag). If you're using PHP, you would call:

header("Content-Type: text/html; charset=UTF-8");

Be sure to use the same client encoding when reading AND writing to the db.

Jay
I neglected mentioning when asking, but I'm using the CakePHP framework's built-in HtmlHelper::meta method, which takes care of everything for me. Thanks for pointing that out for anyone else who comes across this question and needs to do the same in their PHP app.
Matt Huggins
+3  A: 
* �
* �
* å

This pattern indicates that they're first converted from UTF-8 to ISO-8859-1 and then again from ISO-8859-1 to UTF-8.

First of all, your content-type header is fine. Keep it UTF-8.

Something in the code logic between querying the data from the DB and sending the output to response is incorrectly using ISO-8859-1. This includes the steps as querying the data from DB. I would start with the first step first. Try if pg_set_client_encoding helps:

pg_set_client_encoding($connection, 'UTF8');

Other steps are described here. Hope this helps.

BalusC
Thanks for the help! I needed to set the client encoding via pg_set_client_encoding(), as you pointed out. After that, I also had to change various calls to htmlentities() to pass "UTF-8" as the 3rd parameter, and everything looks good now. Much appreciated!
Matt Huggins