views:

150

answers:

4

I'm still learning the ropes with PHP & MySQL and I know I'm doing something wrong here with how character sets are set up, but can't quite figure out from reading here and on the web what I should do.

I have a standard LAMP installation with PHP 5, MySQL 5. I set everything up with the defaults. When some of my users input comments to our database some characters show up incorrectly - mostly apostrophes and em dashes at the moment. In MySQL apostrostrophes show up as ’. They display on the page this way also (I'm using htmlentities to output user comments).

In phpMyAdmin it says my MySQL Charset is UTF8-Unicode.

In my database my tables are all set up with the default Latin1-Swedish-ci.

My web pages all have meta http-equiv="Content-Type" content="text/html; charset=utf-8"

When I look at the site's http headers I see: Content-Type: text/html

Like a newbie, I hadn't considered character sets at all until things started looking odd on some of my pages. So does it make most sense for me to convert everything to utf-8 and will this affect my PHP code? Or should I try to get it all into Latin? And do I have to go into the database and replace these odd codes, or will they magically display once I set up the charsets properly? All the fiddling I've done so far hasn't helped (I set the http headers to utf-8, and also tried latin).

A: 

Sorry for not understanding all of your question. But when part of the question is "UTF-8 or not?", the answer is: "UTF-8, of course!"

Nikolai Ruhe
+2  A: 

If you really want to understand these issues, I would start by reading this article at mysql.com. Basically, you want every piece of the puzzle to expect UTF-8 unicode. On the PHP side, you want to do something like:

<?php header("Content-type: text/html; charset=utf-8");?>
<html>
  <head>
     <meta http-equiv="Content-type" value="text/html; charset=utf-8">

And when you run your insert queries you want to make sure both the table's character encoding and the encoding that you're running the queries in are UTF-8. You can accomplish the latter by running the query SET NAMES utf8 right before you run an insert query.

Matt Bridges
Sounds like I need to properly commit to UTF-8, buy a ring, start talking about kids... Thanks for the link - I'm reading it now.
mandel
Don't worry, once you're entirely committed, UTF-8 is very easy-going.
Pekka
+1  A: 

http://www.phpwact.org/php/i18n/charsets

That site gave me a lot of good advice on how to make everything play nice in UTF-8.

I also recomened switching from htmlentities to htmlspecialchars as it is more UTF friendly.

The main point is to make sure everything is talking the same language. Your database, your database connection, your PHP, your page is in utf8 (should have a meta tag and a header saying so).

PHP-Steven
Yes, switching to htmlspecialchars solved the output issue on my pages. Thanks for the link!
mandel
Can you do me a favor then and "accept" my answer?
PHP-Steven
Accepted. This solved the immediate problem. If I could give two answers an accept, I'd also accept Matt Bridge's as it was very useful for the second part of my problem.
mandel
Oh, and oddly enough, we're located in the same town.
mandel
A: 

You definitely want to sort things out now rather than later. One of the most important programming rules is not to keep going with a bad idea - don't dig yourself in any deeper!

As latin1 and utf-8 are compatible, you can convert your tables to use utf-8 without manipulating the data contained by hand. MySQL will sort this part out for you.

It's then important to check that everything is speaking utf-8. Set the http headers in apache or use a meta tag - this says to a browser that the HTML output is utf-8.

With this in mind, you need to make sure all of the data you send really is utf-8! Configure your IDE to save php/html files as utf-8. Finally make sure that PHP is using a utf-8 connection to MySQL - issue this query after connecting:

SET NAMES 'utf-8';
David Caunt