views:

40

answers:

3

When I display contents from the database, I get this:

��Some will have a job. Others will want one. They are my people, they are my clients and they are being denied their rights.

This text had been entered by the user via textarea with tinyMCE. How can I replace special characters (using preg_replace()) from the sentence to ' ' except for the characters: <>?

+1  A: 

This article is totally worth a read. Dealing with UTF-8 characters is something that we all go through at some point. The trick seems to be to catch them before they go into the database or to fix the database so that when they're going in they aren't broken. Once they're in there though it's slightly more difficult.

Chuck Vose
I already have this <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
unknown
It needs a lot more than just content-type. The problem is almost certainly that you've stored utf-8 characters in the database but the database is only accepting non-unicode characters (which is the default). So you've got broken unicode in the database. Setting your content-type on output isn't going to make them any less broken.
Chuck Vose
A: 

There is another interesting post here in StackOverflow.

Hope that helps,

Ramon Araujo
A: 

As Chuck mentioned above, it is the database problem. Unless you only wish to display non-Unicode, ie Latin characters, then yes, preg_replace is the way to go. You will need to know the character sets well enough to filter out what you don't want.

But if you just want everything to display nicely, ie no garbage characters, then change the corresponding parts of the db to accept utf-8.

e.g. If you are using mySQL, try changing the field and table encoding to be able to accept UTF-8. The default is latin1_general_ci - try changing it to utf8_general_ci. Hope that explains my point.

ongkybeta