views:

1866

answers:

7

Why is this the extended ascii character (â, é, etc) getting replaced with <?> characters?

I attached a pic... but I am using PHP to pull the data from MySQL, and some of these locations have extended characters... I am using the Font Arial.

You can see the screen shot here: http://img269.imageshack.us/i/funnychar.png/

Still happening after the suggestions, here is what I did:

My firefox (view->encoding) is set to UTF-8 after adding the line, however, the text inside the option tags is still showing the funny character instead of the actual accented one. What should I look for now?

UPDATE: I have the following in the PHP program that is giving my those <?> characters...

ini_set( 'default_charset', 'UTF-8' );

And right after my zend db object creation, I am setting the following query:

$db->query("SET NAMES utf8;");

I changed all my tables over to UTF-8 and reinserted all the data (waste of time) as it never helped. It was latin1 prior.

Also STATUS is reporting:

Connection:             Localhost via UNIX socket
Server characterset:    latin1
Db     characterset:    latin1
Client characterset:    utf8
Conn.  characterset:    utf8
UNIX socket:            /var/run/mysqld/mysqld.sock
Uptime:                 4 days 20 hours 59 min 41 sec

Looking at the source of the page, I see <option value="Br�l� Lake"> Br�l� Lake

OK- NEW UPDATE- I Changed everything in my PHP and HTML to:

and

header('Content-Type: text/html; charset=latin1');

Now it works, what gives?? How do I convert it all to UTF-8?

+15  A: 

That's what the browser does when it doesn't know the encoding to use for a character. Make sure you specify the encoding type of the text you send to the client either in headers or markup meta.

In HTML:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

In PHP (before any other content is sent to the client):

header('Content-Type: text/html; charset=utf-8');

I'm assuming you'll want UTF-8 encoding. If your site uses another encoding for text, then you should replace UTF-8 with the encoding you're using.

One thing to note about using HTML to specify the encoding is that the browser will restart rendering a page once it sees the Content-Type meta tag, so you should include the <meta /> tag immediately after the <head /> tag in your page so the browser doesn't do any more extra processing than it needs.

Another common charset is "iso-8859-1" (Basic Latin), which you may want to use instead of UTF-8. You can find more detailed info from this awesome article on character encodings and the web. You can also get an exhaustive list of character encodings here if you need a specific type.


If nothing else works, another (rare) possibility is that you may not have a font installed on your computer with the characters needed to display the page. I've tried repeating your results on my own server and had no luck, possibly because I have a lot of fonts installed on my machine so the browser can always substitute unavailable characters from one font with another font.

What I did notice by investigating further is that if text is sent in an encoding different than the encoding the browser reports as, Unicode characters can render unexpectedly. To work around this, I used the HTML character entity representation of special characters, so â becomes &#226; in my HTML and é becomes &#233;. Once I did this, no matter what encoding I reported as, my characters rendered correctly.

Obviously you don't want to modify your database to HTML encode Unicode characters. Your best option if you must do this is to use a PHP function, htmlentities(). You should use this function on any data-driven text you expect to have Unicode characters in. This may be annoying to do, but if specifying the encoding doesn't help, this is a good last resort for forcing Unicode characters to work.

Dan Herbert
Was just about to post this exact answer...
James
Me too. Kudos on your typing-speed =)
David Thomas
A quick way to debug this possibility is to manually change the encoding in the browser. E.g. use View->Character Encoding in Firefox.
Matthew Flaschen
My fire fox is set to UTF-8 after adding this line, however, the text inside the option tags is still showing that character. Its stored perfectly in Mysql with the accented characters.
Mike Curry
Mike, I added some SQL info to my answer below.
Peter Bailey
The htmlentities stuff is an unnecessary complication - utf8 can encode anything, and in fact it's likely that latin1 can encode any of the characters that are necessary for the application. That question mark/diamond thing generally indicates malformed encoding, if it was a character the browser couldn't find a font for, it would be a hollow "tofu" box.
d__
+1 got yourself a good answer there Dan
alex
@Matthew Flaschen and the other +4 - I use View->Character Encoding, and the page reloads, and Firefox sets it back to UTF-8 as it should be.
Mike Curry
+3  A: 

There is no such standard called "extended ASCII", just a bunch of proprietary extensions.

Anyway, there are a variety of possible causes, but it's not your font. You can start by checking the character set in MySQL, and then see what PHP is doing. As Dan said, you need to make sure PHP is specifying the character encoding it's actually using.

Matthew Flaschen
+1  A: 

Simplest fix

ini_set( 'default_charset', 'UTF-8' );

this way you don't have to worry about manually sending the Content-Type header yourself.

EDIT

Make sure you are actually storing data as UTF-8 - sending non-UTF-8 data to the browser as UTF-8 is just as likely to cause problems as sending UTF-8 data as some other character set.

SELECT table_collation
  FROM information_schema.`TABLES` T
 WHERE table_name=[Table Name];

SELECT default_character_set_name
     , default_collation_name
  FROM information_schema.`SCHEMATA` S
 WHERE schema_name=[Schema Name];

Check those values

Peter Bailey
It's neither necessary nor sufficient to change the table encoding. The important thing is to tell mysql which encoding to transmit results in ("show variables like character_set_results"). Mysql is able to correctly transmit as utf8 the data from latin1 tables (and the other way around within certain limits).
d__
my tables are all UTF8
Mike Curry
A: 

You should encode all special chars into HTML entities instead of depending on the charset.

htmlentities() will do the work for you.

VVS
+3  A: 

As others have mentioned, this is a character-encoding question. You should read Joel Spolsky's article about character encoding.

Setting

header('Content-Type: text/html; charset=utf-8');

will fix your problem if your php page is writing UTF-8 characters to the browser. If the text is still garbled, it's possible your text is not UTF-8; in that case you need to use the correct encoding name in the Content-Type header. If you have a choice, always use UTF-8 or some other Unicode encoding.

Mr. Shiny and New
+1  A: 

There are two transmission encodings, PHP<->browser and Mysql<->PHP, and they need to be consistent with each other. Setting up the encoding for Mysql<->PHP is dealt with in the answers to the questions below:

The quick answer is "SET NAMES UTF8".

The slow answer is to read the articles recommended in the other answers - it's a lot better to understand what's going on and make one precise change than to apply trial and error until things seem to work. This isn't just a cosmetic UI issue, bad encoding configurations can mess up your data very badly. Think about the Simpsons episode where Lisa gets chewing gum in her hair, which Marge tries to get out by putting peanut butter on.

d__
A: 

I changed all my tables over to UTF-8 and reinserted all the data (waste of time) as it never helped. It was latin1 prior.

If your original data was latin1, then inserting it into a UTF-8 database won't convert it to UTF-8, AFAIK, it will insert the same data but now believe it's UTF-8, thus breaking.

If you've got a SQL dump, I'd suggest running it through a tool to convert to UTF-8. Notepad++ does this pretty well - simply open the file, check that the accented characters are displaying correctly, then find "convert to UTF-8" in the menu.

DisgruntledGoat
is it too late if I do a sql dump now?
Mike Curry
No, I think it will still be okay - but you can of course try it and see. Do a SQL dump, then convert to utf8 and check the characters are displaying correctly. If they are, then inserting back into the db should be fine. You will still need to run "SET NAMES UTF8" on every page load, as someone else said.
DisgruntledGoat