tags:

views:

57

answers:

2

Hi folks,

Got a little issue where my client is pasting in content from Word into my little text editor in a CMS.

The double quotes are coming back encoded in what looks like some form of UTF.

Any ideas if I can strip/replace these using PHP when they get displayed out of my mySQL table.

Here is the link to the page that spits out the dodgy characters, you can see the 'black diamonds of doom' which are causing the headaches.

http://linq.milkbarstudios.com/news_detail.php?id=3

Any suggestions would be greatly accepted!

+2  A: 

This sounds like a bug in your code. When handling text data, you must always consider the encoding and convert back and forth as necessary. So when the browser sends you UTF-8, you must decode the string before you send it to the database (MySQL does support UTF-8 in text columns). That way, the original text will be preserved. Of course, you must do the same when you render the page for the browser (set the charset to UTF-8, make sure you actually send UTF-8, etc).

Aaron Digulla
There's a UTF8 meta tag on the site but nothing in the headers to suggest the character set.
Greg K
@Aaron I'm not a PHP developer, I've had to pick this up from someone else as a favour. I could do this in ColdFusion with no problem, however I'm up against time to find a fix, hence the post on here and the one below with my 'fix'.All I want to do is remove/replace any characters that 1. cause my page to fail validation checks with W3C and 2. to stop displaying as black diamonds.Any pointers and if I need to add something to my base page code, whether it's a PHP issue, or something worse would be far more appreciated.
Simon Hume
It's an encoding issue. It's tragic that frameworks still allow to receive or send data without requesting the encoding of the data. Therefore, to fix the issue for real, you must make sure that you read the data with the correct encoding and that you send it with the correct encoding manually. ... according to the PHP manual (http://www.php.net/manual/en/intro.unicode.php) PHP doesn't support unicode, yet. Sweet.
Aaron Digulla
A: 

I was actually looking for PHP to replace the dodgy characters.

in the end I found this, which fixes it perfectly:

$output = preg_replace('/[^(\x20-\x7F)]*/','', $output);
Simon Hume
This isn't a good solution. You are destroying all non-ASCII characters. The smart quote `“` is not a “dodgy” character, it's a perfectly normal Unicode character amongst tens of thousands of others that also won't work.
bobince
Agreed. Let's just hope no one from outside the USA ever stumbles over this page...
Aaron Digulla
Can anyone provide me with a better solution then? And for the record, I'm from the UK.
Simon Hume