tags:

views:

133

answers:

7

I have a site up that has a form on it. The form POSTs to a php script which then inserts the data into my database. The page has a charset=UTF-8 attribute in the <meta> tag, and the database is setup to use UTF-8. However, when I copy and paste characters from MS Word into the field, the output is messed up.

For example, the quotes in

I am using "Microsoft Word" ''''

become

I am using “Microsoft Word†????

in the database.

Anyone have any idea why this might occur?

A: 

run set names utf8 query and get rid of all recoding functions in your code

Col. Shrapnel
A: 

hmm... maybe try at first, until send data to the DB, cut any whitespace characters, some like space, the account, newline

if (preg_match("/\s/",$text)) {
die("Please do not enter any spaces, tabs or new lines!"); } something like this.

I'm not shure, but seemed I met this information on php form tutorials

eva wins
A: 

Not a real answer but a suggestion. First try the grandma (30 sec) test: paste the MS Word text you want to copy into a good text-editor like editpad pro or notepad++ if everything appear as expected copy It again from there and paste It inside your form.

In other words don't copy and past a text directly from MS WORD.

microspino
A: 

I had a simple java webapp that didn't specify any character set or encodings anywhere, and run into the same problems. In my case, the following changes produced the desired behavior:

  1. Change db schema definition to use UTF-8 (using MySQL).
  2. Change db connector URL to specify UTF-8 in connection URL (using MySQL Connector-J)
  3. Change app server configuration to interpret request parameter data as UTF-8.
  4. Change all generated HTML pages to specify UTF-8.
Eric Rath
A: 

Are you posting from a <textarea> or a WYSIWYG form? The WYSIWYG JavaScript could be doing its own encoding.

Have you tried it in different browsers? It could be a bug with a particular browser. Also, try setting the headers in PHP, instead of with a meta tag, as your server may be sending conflicting headers.

header('Content-Type: text/html; charset=utf-8'); 

What happens if you save the $_POST data to a file? Does the encoding look OK?

file_put_contents('post.log', print_r($_POST, true));

Then what happens if you copy the text from Word into a text file and insert the file's contents into the database?

$db_query = 'INSERT INTO table SET col="' . mysql_real_escape_string(file_get_contents('input.txt')) . '"';
dave1010
A: 

try

<form action="form_action.php" accept-charset="UTF-8">
David Morrow
+1  A: 

Here's what I propose you do to find where the problem lies.

  1. MySQL uses charset Latin1 to store and transfer in/out data per default. To change that, do the following. Create your database with charset UTF8/collation utf8_unicode_ci (see http://dev.mysql.com/doc/refman/5.0/en/create-database.html).

    CREATE DATABASE example DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_unicode_ci;

  2. Tell MySQL to handle in/out data as UTF8. Before any SQL queries are sent to MySQL the command SET NAMES UTF8; must be made. This tells MySQL to accept and handle all in/out data to the server as UTF8. This needs to be set only once per connection. You can set this with mysql_query("SET NAMES 'UTF8'"); for example.

  3. Make sure you're actually using UTF8. Altough you might have specified UTF8 in the <meta> tag, you might acually be sending the content in another charset. To make sure you're sending UTF8 encoded content, add header('Content-Type: text/html; charset=utf-8'); to your PHP file.

Erik Töyrä