views:

60

answers:

3

Seriously, I'm lost in the UTF-8 world. Here is my situation (everything is happening on a mac):

  • I get a web service response with perl+lwp and store it in mysql database; response is encoded in UTF-8 and I use DBI module to store data in Mysql UTF-8 table (urf8_general_ci encoding);
  • when I get strings from database with CI model; output display gets garbled like UTF-8 characters displayed as ASCII; however when I try to convert that string with iconv or mb_convert_encoding - nothing happens; I get more gibberish in php's output.
  • When I select a string via DBI and print in console with perl I see proper encoded hieroglyphs.

So the question is - how can I make my php scripts to show hieroglyphs in proper encoding.

A: 

It's not that your page has a different encoding, is it?

You could add:

<meta http-equiv="content-type" content="text/html;charset=utf-8" />

To your page and see if your characters still come all garbled.

GrayB
Meta header doesn't help; Tried that.
nweb
you might check the initial HTTP header with Firebug or similar. The encoding of the page will be set there as well. Does it look right when viewed inside Mysql?
GrayB
Page encoding is ok (UTF-8). What do you mean by "viewed inside Mysql" ?
nweb
A: 

Verify at each step that indeed the encoding went properly. Diagnostic tools include Devel::Peek, Devel::StringInfo and any MySQL front-end that can display data as hexdump. As a beginner, you are better off expressing your intentions explicitely. The only downside to this is making your code a little more verbose than usual.

Use the method decoded_content to get the response in Perl's native string format. Encode it to UTF-8 using the encode call. Switch off mysql_enable_utf8 in the database driver, and insert the data as usual.

I cannot help with the PHP part.

daxim
Yeah, php part is weird. Since I upload/download data with perl in proper encoding.
nweb
A: 

As the others mention, you need to check every step of the total stack, everything needs to be in UTF-8: input, storage, output. Debug each step individually and see in which transformation things get lost. You should UTF-8 encode not only your column and table, sometimes also the connection itself.

I hate to promote my own site here, but in this case I am referring to an extensive article that I have written on building Unicode LAMP applications:

http://ferdychristant.com/blog/articles/DOMM-7LDBXK

I hope you find it useful.

Ferdy