views:

26

answers:

1

I am running a fairly standard LAMP stack.

The problem is an intermittent rendering of UTF-8 characters correctly. About 50% of the time the non-ASCII UTF-8 characters render correctly (e.g. with appropriate diacritical marks), but about 50% of the time I get the '?' rendition instead. If I reload the page, sometimes it corrects the problem and sometimes it does not. It happens with all browsers on all platforms, which suggests a MYSQL or Apache problem but I have not been able to figure it out.

The data base itself is in UTF-8 format and I have never seen the problem while browsing the database in phpMyAdmin.

I issue a SET NAMES utf-8 command upon opening the data base (and have tried changing that to a SET CHARSET utf-8 command) with no luck.

What's confusing me is that it is intermittent, happening in streaks, e.g. it will happen on 30 pages in a row (even if they are just reloads), and then clear up for 10 pages, and then happen again for a few pages, etc.

You can try to see the problem by hitting the 'list' button here: http://latin-words.com/list_vocab.php though it may take many reloads to either make it happen or make it go away

Server Configuration: Ubuntu: 9.10 Mysql: 5.1.37 PHP 5.2.10 Apache 2.2.12

Any hints would be greatly appreciated?

A: 

edit:
For searchers sake, from the comments, the problem was actually an issue doing a SET NAMES utf-8; (incorrect) instead of an SET NAMES utf8; (correct) That doesn't mean my much more obscure reason posted below cannot also be the reason ;)


Sounds like a problem with locales & iconv, try to determine what locale is used in the webserver process the moment all is well, and the moment it doesn't work anymore (try $currentlocale = setlocale(LC_ALL,NULL); or $currentlocale = setlocale(LC_CTYPE ,NULL); to get the used locale).

Wrikken
I am not sure I understand your answer...you think Apache is intermittently changing the locale?
gniss
Wrikken
It shows a locale of C irrespective of the problem being present or not.
gniss
Hmm, weird, could've sworn that was it. The '?'s here are not normally an wrong character set issue (as in: claiming to send one, but serving another), but rather casting to an insufficient character set, so characters get substituted. Do you have any locations in your code where you're switching character sets (`iconv`, `mb_convert_encoding`, etc. ? The only other location would be a wrong charecter set for the connection to the database (ascii, latin1, etc). Do you get any error when you issue `SET NAMES utf8;`? Does `show variables like '%char%';` produce the expected output?
Wrikken
Your are wonderful! Thank you. It was the UTF-8 vs. UTF8 problem. I have no idea why it sometimes worked but that seems to have cleared it up.
gniss