tags:

views:

63

answers:

1

In my LAMP app, users sometimes cut and paste input into my web forms from other applications like MS word.

All of my webpages are set, via content type tag, to display in UTF. My PHP script saves the data for the web form into a mysql table that has the character encoding set to UTF-8.

There is an apostrophe character that will display correctly in the html, but in the mysql tables, viewed from a Linux command prompt, it displays as ’.

If both the html page and mysql table are using the same encoding, why is the rendering of that character different?

A: 

Perhaps the reason is that your Linux command prompt uses a font that does not support that character.

Windows' cmd by default will use a font that doesn't support that either. Relax though, your data is stored like the way you wanted it.

andyk
That's certainly a possibility I haven't yet explored, so I love that answer. I'm having a really difficult time wrapping my mind around these character encoding issues. It sees, every program that text passes through has the ability to misinterpret and even alter the encoding. In my case, the content from the web form ends up in an email client where it is not being displayed correctly. So, that path is something like clipboard -> html form -> Apache -> PHP -> Mysql -> PHP -> Email Server -> Email Client.Really, it could be going wrong anywhere. How do you troubleshoot these issues?
Jason Salsiccia
I run them through a function to replace them with similar character before displaying it. Eg. Those fancy double quotes from Word will be replaced by a simple one. It's safer this way, I think, because you'll never know what OS/browser/device/installed fonts your users have.
andyk
.. another way is to validate the inputs before putting it into database. (if you are using a well-known WYSIWYG text editor on your form, there is usually a 'Paste from Word' feat that do this automatically) This way you don't have to worry before displaying anything, because those text are already in a safe enough state.
andyk
A very different case (involving a very long discussion) if you are actually expecting East-Asian characters. (which I assume not)
andyk
.... anyway, the problem here is that you are not supposed to view non-standard characters in command prompt. Every applications have their own default way to render text. Some support non-standards, some do not.
andyk