views:

706

answers:

4

I have Flex application with UT8-encoding. It is sending back to the Server (PHP), and the data gets written in to Mysql (UT8 charset, utf8_general_ci). I have no problems at all writing/reading Umlaute from/to the database.

I only realized, by looking at the data with PHPmyadmin that the Umlaute get somehow converted to:

ö => ö ü => ü etc.

As I said, I had no problems at all. The strange thing is, when I write Umlaute directly with PHPmyAdmin into the database, they are displayed correctly

Now I am printing a PDF, and I need to call ut8_decode() on all values to display them correctly. However, those entered manually into the DB (which are displayed correctly in phpmyadmin) do not get decoded.

I assume that those are not written to the Db in UT8 then, since decoding malforms them?

  1. )But why are in the first place UT8-encoded values displayed in this strange way in the DB? 2.) How can I enter data into mysql with PHPmyAdmin in UTF-encoding? (I have set the connection to ut8).

Thx, Martin

A: 

There are so many different places to set the character set in MySQL, it's wonderful.

Sounds like you're not actually storing UTF8, but instead storing UTF8 Strings as latin1. If they're in some way converted to UTF8 when you're reading from the database, they will still show up correctly in your application.

Are you setting your connections to UTF-8, like so?

SET CHARACTER SET utf8;
SET SESSION character_set_server = utf8;
SET character_set_connection = utf8;
Henning
A: 

The fundamental fact you have to keep in mind when talking about this kind of problem is this: bytes and text are two different things, and whenever you convert between them you have to use the correct character encoding i.e. the same that was/will be used for the reverse conversion and one that supports all the characters that are being used.

The problem is that with every additional conversion and every additional application that is involved, there is a chance for things to go wrong. Web apps are the worst possible case in this regard since there are always multiple conversions (usually 2*(number of applications-1)) and several different applications involved - at the very least: the web app, the browser and the DB. In your case, PHPMyAdmin as well.

It is hard to tell which conversion went wrong when there are so many. However, it looks like your problems are caused by PHPmyAdmin since it displays umlauts as two characters, which is typical for applications that try to interpret UTF-8 encoded bytes as Latin1. Now the question is whether the erroneous conversion happens when PHPmyAdmin gets the data from the DB or when it sents the data to your browser. What is the encoding declared by PHPmyAdmin in the headers of its HTML pages? Do you have the option of accessing the DB through a non-web app such as DbVisualizer? If so, do that, since it eliminates one conversion (and thus potential for error).

Michael Borgwardt
A: 

Here's one possibility:

It sounds like phpMyAdmin is displaying UTF-8 data as Latin-1. Check the Content-Type header that phpMyAdmin is putting out. If you have firefox with the webdev toolbar you can see the headers directly by going to Information -> View Response Headers, or Information -> View Page Information

Peter Bailey
A: 

I struggled with the same problem for a long time. Run this query as soon as you connect to the database and your web application will display characters as they appear in phpmyadmin:

SET NAMES 'utf8'

For some reason MySQL is set up on my systems to assume input and output are encoded as latin1, which means when I send it utf8 input it stores it in the database incorrectly, but because the conversion is reversed for output, the mess is undone and it displays correctly in the browser (except when using phpmyadmin, which displays it faithfully). This is only true when the conversion results in characters that are permitted by the character set used in the database field where it is stored, so you can get errors unless you stop this conversion from happening with the above query.

anon