ansaurus

Question

Why do quotes turn into funny characters when submitted in an HTML form?

Answer 1

+2 A:

Check the encoding that the page uses. Encode it using UTF-8 as well, and add a meta tag describing the encoding:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Marius 2009-09-28 14:35:58

Answer 2

A:

We have a PHP function that tries to clean up the mess with smart quotes. It's a bit of a mess, since it's grown a bit organically as cases popped up during prototype development. It may be of some help, though:

function convert_smart_quotes($string) {
    $search = array(chr(0xe2) . chr(0x80) . chr(0x98),
                    chr(0xe2) . chr(0x80) . chr(0x99),
                    chr(0xe2) . chr(0x80) . chr(0x9c),
                    chr(0xe2) . chr(0x80) . chr(0x9d),
                    chr(0xe2) . chr(0x80) . chr(0x93),
                    chr(0xe2) . chr(0x80) . chr(0x94),
                    chr(226) . chr(128) . chr(153),
                    'â€™','â€œ','â€<9d>','â€"','Â  ');

     $replace = array("'","'",'"','"',' - ',' - ',"'","'",'"','"',' - ',' ');

    return str_replace($search, $replace, $string);
}

Mike A. 2009-09-28 14:43:26

I've done this myself, but I think it's a bad idea. If you have a text process or any other kind of process that corrupts your data, fix the process so it doesn't corrupt the data, don't just make piecemeal corrections to the output.

d__ 2009-09-28 21:13:33

Answer 3

+1 A:

This looks like a classic case of unicode (UTF-8 most likely) characters being interpreted as iso-8859-1. There are a couple places along the way where the characters can get corrupted. First, the client's browser has to send the data. It might corrupt the data if it can't convert the characters properly to the page's character encoding. Then the server reads the data and decodes the bytes into characters. If the client and server disagree about the encoding used then the characters will be corrupted. Then the data is stored in the database; again there is potential for corruption. Finally, when the data is written on the page (for display to the browser) the browser may misinterpret the bytes if the page doesn't adequately indicate it's encoding.

You need to ensure that you are using UTF-8 throughout. The default for web pages is iso-8859-1, so your web pages should be served with the Content-Type header or the meta tag

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

(make sure you really are serving the text in that encoding).

By using UTF-8 along all parts of the process you will avoid problems with all working web browsers and databases.

Mr. Shiny and New 2009-09-28 15:20:28

+1, there's no one local fix for these problems, the important thing is the mindset of being encoding-aware wherever you're transmitting or storing text.

d__ 2009-09-28 20:51:29

ansaurus

tags:

views:

answers:

Why do quotes turn into funny characters when submitted in an HTML form?

related questions