views:

400

answers:

1

I have a big troubles with display of UTF-8 data retrieved from the MySQL to the Linux-based C++ application. UTF text is shown as question marks.

The application uses the MySQL C API. So I passed the UTF-8 option after mysql_init() and before mysql_real_connect():

mysql_options(&mysql, MYSQL_SET_CHARSET_NAME, 'utf8');

and

mysql_options(&mysql,MYSQL_INIT_COMMAND, 'SET NAMES utf8');

But without luck. The test is still displayed as question marks. I made a few tests with a Perl script (I more familiar with it ;) ). And the text is displayed correctly if I specify the UTF-8 option for the connection:

$dbh->{'mysql_enable_utf8'} = 1;
$dbh->do('SET NAMES utf8');

Any idea how to display UTF-8 data in the C++ application correctly?

+1  A: 

You don't need to be setting the charset options like that to get the result you want. They just help the DB do reasonable things with sorting and such.

I suspect that you are indeed getting your data in UTF-8 format, but just aren't processing it correctly. Passing UTF-8 around in C is the easiest thing in the world. Getting it to print out correctly can be more of a challenge, but of course that's not really MySQL's problem.

Based on your tagging for this post, I assume you're running this program on Linux. If so, you should simply be able to print it out to the console (printf(), cout, whatever) to get the correct representation, since Linux consoles almost always default to UTF-8 these days. Check your LANG environment variable.

When dealing with Unicode, it can be helpful to write test programs that get just a very small amount of non-ASCII data -- a single character is best -- print only that out, and redirect that program's output to a file. Then look at the file in a hex editor, and compare that with at least UCS-2LE, to see if you're just seeing the wrong encoding.

I'm the maintainer of MySQL++, and can tell you that MySQL++ deals with UTF-8 quite naturally on Linux, but we don't play any games to get it to do that. I don't see why straight C API code shouldn't behave just as naturally. You might try building MySQL++ on your system and running the examples, as they include UTF-8 tests. Run resetdb to set things up, then simple1 to show the UTF-8 data that resetdb put in the test DB. See README-examples.txt in the distribution for more details.

I'm not telling you to switch to MySQL++, just using that as a known-working test. Once you get it working, you can either modify those examples to work against your own DB, to see if it then breaks.

Warren Young
Thanks for your answer, Warren. I already found a way to display UTF8 data correctly in the C++ application. I just set those options in the my.cnf: <pre> [mysqld] init_connect='SET NAMES utf8; SET collation_connection = utf8_general_ci;' default-character-set=utf8 character-set-server=utf8 collation-server=utf8_general_ci skip-character-set-client-handshake </pre> And it helped :)
michael
I thought of a way you can get that behavior: if the table's default charset isn't UTF-8. In that case, what you've done is force a conversion on every access. Slow and heavy-handed. Better to store the data in UTF-8 to begin with.
Warren Young