views:

393

answers:

7

Hi,

I have a strange problem with some documents on my webpage.

My data is stored in a MYSQL Database, UTF8 encoded. If read the values my webbpage displays

Rezept : Gem�se mal anders (Gem�selaibchen)

I need ü / ü!

Content in the database is "Gemüse ... " ..

The raw data in my error_log looks like this

[title] => Rezept : Gemüse mal anders (Gemüselaibchen)

The webpage header is:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
            "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"&gt;

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

<head>
<!--[if IE]>
  <link rel="stylesheet" href="http://www.dev-twitter-gewitter.com/css//blueprint/ie.css" 
        type="text/css" media="screen, projection">
<![endif]-->

<meta name="text/html; charset=UTF-8" content="Content-Type" />
+9  A: 

You have to set the encoding of your web page.

There are three ways to set the encoding:

  1. HTML/XHTML: Use a HTTP header:

    Content-Type: text/html; charset=UTF-8
    
  2. HTML: Use a meta element: (Also possible for XHTML, but somewhat unusually)

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    
  3. XHTML only: Set the encoding in the preamble: (Preferred for XHTML)

    <?xml version="1.0" encoding="UTF-8"?>
    

If you want to verify the problem first:

First change the encoding manually using your browser. If that works you can set it in your HTML file. Make sure you reset the manual encoding to automatic detection, otherwise it'll work on your workstation, but not on your users' workstations!

A PHP speciality: Make sure your internal encoding is set to UTF-8, too! All outputs are converted to this encoding.

You can enforce the internal encoding using mb_internal_encoding at the top of every file.

After all: All this doesn't help if your code isn't actually UTF-8 encoded! If it is, check if there are any helper functions which might destroy the UTF-8 encoding.

DR
Thanks , when i switch to "ISO" in my Browser it looks good, but i want utf-8 wich is detected by the browser (;
ArneRie
In that case please add further information to your question. Most important: Which language and the print/write statements. It looks like there is an additional layer of encoding in your application.
DR
Updated my answer
DR
Since when does PHP have ANY knowledge of encodings, outside of the mb_* functions? I was under the impression that wouldn't be there until PHP6.
Michael Madsen
Yes, you are right, removed the statement.
DR
You should change the order: First HTTP, then the XML declaration, then the META declaration.
Gumbo
Also with the database, make sure that both the client and server side comms are UTF8 as well as the data field.
Jauder Ho
+2  A: 

Do this:

header('Content-Type: text/html; charset=utf-8');

before outputting any content.

RichieHindle
Note that according to the spec HTTP header names are separated by a hyphen and the first letter of each word is capitalized.So not `Content-type`, but rather `Content-Type`.
Geert
@Geert: Thanks - fixed.
RichieHindle
@Geert: “Field names are case-insensitive.” http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
Gumbo
@Geert, @Gumbo: You're both right. "Be liberal in what you accept, and conservative in what you send." -- Jon Postel
RichieHindle
A: 

utf8_encode fixed my problem. Iam not sure why (; the data in the database is utf8 , the website is utf8 ..

ArneRie
Did you see the addition to my answer?
DR
See my answer for the reason you need to do extra work, and what to do about it. Having to utf8_encode everything from the databased manually is not a proper solution.
Michael Madsen
@ArneRie: Please delete this. Use comments to make a comment, or edit your question to add details. The lower section of the page is for actual answers only, don't use it like a forum. Thanks.
Tomalak
@Tomalak: In a way this IS a valid answer. Not the best one, though, see Michale Madsens comment and answer.
DR
Just saying the data is UTF-8 doesn’t make the data UTF-8.
Gumbo
+7  A: 

MySQL needs to know you want the output as UTF-8 - it's likely configured to send as latin1, so your browser sees the invalid UTF-8 byte sequences and outputs the "not a character" glyph.

Send the query "SET NAMES utf8" immediately after opening the MySQL connection, or change the configuration (if possible).

Michael Madsen
+4  A: 

That Unicode replacement character � only appears when the encoding is incorrect. So in your case you declared your data as UTF-8 encoded but it wasn’t (at least the part you quoted). The ü encoded in ISO 8859-1 is 0xFC, but that’s an invalid octet in UTF-8.

So you need to make sure that your data is actually encoded with UTF-8. There are functions that can check if a given string is UTF-8, e.g. mb_detect_encoding or this is_utf8 function.

Gumbo
A: 

You should check the HTML headers too, especially (if wrong) how your webserver is configured. I had a similar issue in the past which was caused by the configuration of apache -- it was configured to always send the encoding in the content-type, and that overwrote the encoding passed via the <meta> tag as HTML page and webserver differed in that value.

bluebrother
+1  A: 

The problem is likely that the connection to the database uses latin1. This is from what I know the default in many MySQL setups.

That means, even if you store the data as utf-8 in the database you will get it as latin1 when you fetch it, as the charset is converted on the fly to match the connection.

You have two options:

1. Change the default connection character set to be utf-8

This could mean trouble if you have other applications hosted on the same database server that expect iso-8859-1 from the database as when you change the config you will change the behaviour for all users of the MySQL server.

2. Change the connection charset after each connect to the database

If you use PHP5 you can use the built in command:

mysql_set_charset('utf8');

See http://php.net/manual/en/function.mysql-set-charset.php for more details.

If you are on PHP 4 you can do this by a simple SQL query like so:

mysql_query("SET NAMES 'UTF8'");

See http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html for more details.

pcguru