views:

105

answers:

2

For some convoluted reasons best left behind us, I require direct access the contents of a WordPress database. I'm using MySQL 5.0.70-r1 on Gentoo with WordPress 2.6, and Perl 5.8.8 ftr.

So, sometimes we get high-order characters in the blog, we have quite a few authors contributing too, for the most part these characters end up in WordPress's database in wp_posts.post_content or wp_postmeta.meta_value, Wordpress is displaying these correctly on its site, but the database stores it using single byte encoding that I can't figure out how to convert to the correct string. Today's example:

The blog shows this, and doesn't even seem to escape any chars in the HTML,

   Hãhãhães  

but the database, when viewed via the MySQL prompt, has,

   Hãhãhães

So clearly this is some kind of double-byte encoding issue, but I don't know how I can correct it. I need to be able to pull that second string from the database (because that's what it gives me) and convert it to the first one, and I need to do so using Perl.

Also, just to help unmuddy any waters, I took these strings and printed out the ASCII codes for each character using Perl's ord() function.

Here is the output of the "wrong" string:

H = 72
à = 195
£ = 163
h = 104
à = 195
£ = 163
h = 104
à = 195
£ = 163
e = 101
s = 115

This is the correct string, that I need to produce in my script:

H = 72
ã = 227
h = 104
ã = 227
h = 104
ã = 227
e = 101
s = 115
+1  A: 

How about SET NAMES 'utf8'?

jholster
"SET NAMES indicates what character set the client will use to send SQL statements to the server"//mmm.. i'm not sending sql to the server... my problem is in the data the server is returning, and how to reencode it//also, from the mysql docs on set names, i found character_set_results, character_set_client, though SETting these to 'utf8' doesn't seem to affect the select query to produce the badly encoded string above. I also tried merely "charset utf8;" which also seemed to have no effect
qbxk
Could it be issue with your terminal? (Assuming you are using one.)
jholster
no the terminal is not the problem, that's a red herring. the issue is that i don't know how to convert this single-byte encoded string with the correct double-byte encoded string
qbxk
Corrected the syntax `UTF-8 → 'utf8'`. This maybe gives you some ideas: http://www.phpwact.org/php/i18n/utf-8/mysql
jholster
+1  A: 

I fixed it... Thanks to those who read and/or tried.

my $dbh = DBI->connect('mysql:etc:etc');
$dbh->{mysql_enable_utf8}++;  #<---- solution

That's all, sigh...

Not sure about the MySQL prompt thing, because I don't really care, but I'm sure it's a similar solution, to make sure MySQL is returning results in double-bytes to its prompt. Though see my comment above, "set character_set_*" variables didn't seem to affect it though.

qbxk