views:

307

answers:

3

I recently changed some of my pages to be displayed via ajax and I am having some confusion as to why the utf8 encoding is now displaying a question mark inside of a box, whereas before it wasn't.

Fore example. The oringal page was index.php. charset was explicitly set to utf8 and is in the <head>. I then used php to query the database

Heres is the original index.php page:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"&gt; <html lang="en">
<head>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
    <title>Title here</title>
</head>
<body class='body_bgcolor'  >

<div id="main_container">
    <?php 
        Data displayed via php was simply a select statement that output the HTML.  
    ?>
</div>

However, when I made the change to add a menu that populated the "main_container" via ajax all the utf8 encoding stopped working. Here's the new code:

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"&gt; <html lang="en">
    <head>
        <meta http-equiv="content-type" content="text/html; charset=UTF-8">
        <title>Title here</title>
    </head>
    <body class='body_bgcolor'  >
<a href="#" onclick="display_html('about_us');"> About Us </a>

    <div id="main_container"></div>

The "display_html()" function calls the javascript page which uses jquery ajax call to retrieve the html stored inside a php page, then places the html inside the div with an id of "main_container". I'm setting the charset in jquery to be utf8 like:

$.ajax({
        async: false,
        type: "GET", 
        url: url, 
        contentType: "charset=utf-8", 
        success: function(data)
            { 
                $("#main_container").html(data);
            }
});

What am I doing wrong?

A: 

the problem is not your HTML page!

your problem is inside the PHP page, that, you have unintentionally saved in a wrong way!

...the utf8 encoding is now displaying a question mark inside of a box....

this question mark is the BOM ( Byte Order Mark )

read this article about it!

aSeptik
+2  A: 

Encoding is more than specifying the meta tag and content type - the files themselves must really be in the encoding you specify, or you'll get mojibake.

Check that everything is using UTF-8, your database, database connection, table columns. Check that any static files you are including are also encoded in UTF-8.

mdma
Ok, I changed all my files including all the includes from ANSI to UTF-8. I looked at the database, but not sure I'm looking in right spot. The columns say under "COLLATION" "Latin1_swedish_ci" Not sure what this means though. How else can I check if the db tables/columns is utf-8?Also, after I saved all the files to utf-8 I now get an error stating `"Cannot send session cache limiter - headers already sent"` but I didn't change any code other than saving the file as utf-8 and everything worked before I did this...not sure whats going on. Thanks for your help
Ronedog
Like the other comments are saying, this is to do with the byte order mark. See the other links from other posters, and also http://forum.mamboserver.com/showthread.php?t=42814
mdma
Ok, I've read all those other posts. Basically they are saying there is a space or new line character either before the `<?php` or after the `?>` tag. I've gone through all my code and all the includes and still get the same result. I even removed out every bit of code from the index.php page so it looked like this: `<?php session_start(); ?>`...no spaces anywhere before or after...no includes of other files that could affect it When I reload the page, I get same error message. HOWEVER, I decided to resave the index.php page as ANSI, and the error went away, but the encoding was no good???
Ronedog
If you save index.php as UTF-8 and then open it up in a hex editor,you will see that there are indeed characters betore <?php - the byte order mark - it's metadata and not shown in notepad/wordpad, but php has trouble with it. Use an editor that can save UTF-8 without the BOM and the problem will go away.
mdma
Thanks for your help...any idea on an editor? All I've ever used is notepad, notepad2, dreamweaver, wordpad. will any of these work?
Ronedog
mdma
thanx...found this post: `http://people.w3.org/rishida/blog/?p=102` that said dreamweaver can remove this. So, I tried it. How do I actually know if the characters are being removed if I can't see them?
Ronedog
Notepad++ has a hex editor you could use to view the file data. The link I gave mentions that Notepad++ also has explicit support for not saving the BOM.
mdma
Thanks for your time mdma, got notepad++ installed, opened all files, saved them with "Encode in UTF-8 without BOM" and the session error went away!...However, the original problem with the question mark inside box is still there. What's weird, is CONSTANT vars that have utf8 encoding display fine, but if I pull the value out of the DB I get the ? with a box. But when I look in the database its displaying the text how it should be? If I process each field out of the db through php's utf8_encode() then it works right, but I didn't have to do this before and I'd like to avoid the rewrite?
Ronedog
This again is an encoding mismatch. Php thinks you're giving it UTF-8 when really it's getting something else. You mentioned one of your columns wan't set to utf-8 (latin1_swedish_ci IIRC) - this should also be set to utf8. See http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html, and research SET NAMES.
mdma
Ok, I'm sure I'm frustrating you a bit...don't mean to, just a newbie...thought this was as simple as changing the html charset and saving the php files. I'll go to work at the rest of my app...as pointed out on the oreillynet.com site you sent me to, I think my whole configuration needs to be thoroughly looked at, as I likely have other issues causing this. I'll report back when completed. Again thanks for your help.
Ronedog
I'm not frustrated, and you're welcome to the help. I hope you get it all sorted.
mdma
It works finally. thank you for guiding me. To sum up what I did: 1st saved all docs as "utf8 without BOM". 2nd updated php.ini, apache http.conf to default charset of utf8 (see oreillynet.com link). 3rd set DB charset to utf8 and any collation on columns to "utf8_unicode_ci". 4th checked sql inserts to convert the input data into utf8. Reason it didn't work before was database was outputting Latin characters, but php expecting utf8, and files were saved in ansi.Thank you mdma!!!! I now feel comfortable that the app is only using utf8, unless u know of something else I should check.
Ronedog
It sounds like you have covered all the bases. Congratulations, that you got it working. We've covered quite a lot of ground - it might be worthwhile reading through the comments again, now that you see the complete picture.
mdma
Thanks, I did that and it has really helped me. One thing I discovered though is a bit confusing and thought you might know. The word `éditez l'océan` being output in the browser is correct, however when I look at the value in the database via phpmyadmin it displays like this in the same browser: `éditez l'océan` Is this a problem with the DB?. The database is UTF8 for charset, and collation is utf8_general_ci. Everything is displaying ok in the app, but I'm just wondering why this is, or if this will cause problems down the road...which I'd much rather figure out now. Thanks.
Ronedog
A: 

You wrote

The "display_html()" function calls the javascript page which uses jquery ajax call to retrieve the html stored inside a php page

What do you mean with "the html stored inside a php page"? If you want to load data and display there as a contain of <div> the loaded data should be formated correspondent. I mean that it should be real a code fragment of HTML. Moreover Together with 'contentType' it would be a good idea to specify 'dataType' as "html" or "text". If you don't specity anything the last version of jQuery will "intelligently try to get the results, based on the MIME type of the response". If you know the 'dataType', it would be better to specify there. And if you use ajax use also default 'async: true' and not 'false'.

You should also verify whether jQuery.load method (see http://api.jquery.com/load/) is the best choice for you. You can load with the mathod a full html page if required and display only a part of there: $('#main_container').load('ajax/about_us.html #container');

And about UTF-8 encoding don't forget to save the file really UTF-8 encoded. Use corresponding option of your editor (in Notepad choose "Save As" and then choose as encoding "UTF-8" and not "ANSI").

Oleg