views:

4475

answers:

1

I am having trouble with PHP regarding encoding.

I have a JavaScript/jQuery HTML5 page interact with my PHP script using $.post. However, PHP is facing a weird problem, probably related to encoding.

When I write

htmlentities("í")

I expect PHP to output í. However, instead it outputs í At the beginning, I thought that I was making some mistake with the encodings, however

htmlentities("í")=="í"?"Good":"Fail";

is outputing "Fail", where

htmlentities("í")=="í"?"Good":"Fail";

But htmlentities($search, null, "utf-8") works as expected.

I want to have PHP communicate with a MySQL server, but it has encoding problems too, even if I use utf8_encode. What should I do?

EDIT: On the SQL command, writing

SELECT id,uid,type,value FROM users,profile
WHERE uid=id AND type='name' AND value='XXX';

where XXX contains no í chars, works as expected, but it does not if there is any 'í' char.

SET NAMES 'utf8';
SET CHARACTER SET 'utf8';
SELECT id,uid,type,value FROM users,profile
WHERE uid=id AND type='name' AND value='XXX';

Not only fails for í chars, but it ALSO fails for strings without any 'special' characters. Removing the ' chars from SET NAMES and SET CHARACTER SET doesn't seem to change anything.

I am connecting to the MySQL database using PDO.

EDIT 2: I am using MySQL version 5.1.30 of XAMPP for Linux.

EDIT 3: Running SHOW VARIABLES LIKE '%character%' from PhpMyAdmin outputs

character_set_client    utf8
character_set_connection    utf8
character_set_database  latin1
character_set_filesystem    binary
character_set_results   utf8
character_set_server    latin1
character_set_system    utf8
character_sets_dir  /opt/lampp/share/mysql/charsets/

Running the same query from my PHP script(with print_r) outputs:

Array
(
    [0] => Array
        (
            [Variable_name] => character_set_client
            [0] => character_set_client
            [Value] => latin1
            [1] => latin1
        )

    [1] => Array
        (
            [Variable_name] => character_set_connection
            [0] => character_set_connection
            [Value] => latin1
            [1] => latin1
        )

    [2] => Array
        (
            [Variable_name] => character_set_database
            [0] => character_set_database
            [Value] => latin1
            [1] => latin1
        )

    [3] => Array
        (
            [Variable_name] => character_set_filesystem
            [0] => character_set_filesystem
            [Value] => binary
            [1] => binary
        )

    [4] => Array
        (
            [Variable_name] => character_set_results
            [0] => character_set_results
            [Value] => latin1
            [1] => latin1
        )

    [5] => Array
        (
            [Variable_name] => character_set_server
            [0] => character_set_server
            [Value] => latin1
            [1] => latin1
        )

    [6] => Array
        (
            [Variable_name] => character_set_system
            [0] => character_set_system
            [Value] => utf8
            [1] => utf8
        )

    [7] => Array
        (
            [Variable_name] => character_sets_dir
            [0] => character_sets_dir
            [Value] => /opt/lampp/share/mysql/charsets/
            [1] => /opt/lampp/share/mysql/charsets/
        )

)

Running

SET NAMES 'utf8';
SET CHARACTER SET 'utf8';
SHOW VARIABLES LIKE '%character%'

outputs an empty array.

+7  A: 

It's very important to specify the encoding of htmlentities to match that of the input, as you did in your final example but omitted in the first three.

htmlentities($text,ENT_COMPAT,'utf-8');

Regarding communications with MySQL, you need to make sure the connection collation and character set matches the data you are transmitting. You can either set this in the configuration file, or at runtime using the following queries:

SET NAMES utf8;
SET CHARACTER SET utf8;

Make sure the table, database and server character sets match as well. There is one setting you can't change at run-time, and that's the server's character set. You need to modify it in the configuration file:

[mysqld]
character-set-server = utf8
default-character-set = utf8 
skip-character-set-client-handshake

Read more on characters sets and collations in MySQL in the manual.

Eran Galperin
PhpMyAdmin says that the field I am trying to get is encoded using utf8_bin, and I thought that'd be enough. I'll try your solution, though.
luiscubal
The field is encoded in UTF, but you need to make sure the connection is using the same encoding (for some reason the default is ISO-8859)
Eran Galperin
Thank you. But it's still not working. I've updated my question and added further details.
luiscubal
What version of MySQL are you using?
Eran Galperin
I'm using 5.1.30 (XAMPP for Linux)/opt/lampp/bin/mysql --version/opt/lampp/bin/mysql Ver 14.14 Distrib 5.1.30, for pc-linux-gnu (i686) using EditLine wrapperI'll add this information to the post.
luiscubal
Argh! Comments do not support line breaks!
luiscubal
It is very weird you cannot change the collation from the command line. Do you have superadmin privileges?
Eran Galperin
In this particular case, I'm not logged in as root. However, PhpMyAdmin does not list any 'SET' privilege. And I've set database-specific privileges for this user. Which particular one should I be looking at?
luiscubal
Forget what I said about privileges, it should work regardless. Are you sure the collation queries are failing? try to run "SHOW VARIABLES LIKE '%character%';" and see what collation is set for the current connection
Eran Galperin
I'll post the results in the question.
luiscubal
Added some more suggestions to my answer :)
Eran Galperin
The file "etc/my.cnf" doesn't list those settings. Should I add them manually? Also, how can PhpMyAdmin have done it right if it requires manually modifying the configuration?
luiscubal
you can add those manually. and what do you mean done it right? what did it do right?
Eran Galperin
When I manually run the queries on PhpMyAdmin, it usually outputs correctly. This bug happens *only* on MY PHP script. SHOW VARIABLES LIKE, for example, outputs differently using PhpMyAdmin.
luiscubal
Sorry, I missed that the results you posted were different. Yes, the connection collation is definitely not set correctly in your PHP script. Are you sure you are running the queries I told you in the same script?
Eran Galperin
In phpMyAdmin, running SET NAMES utf8;SET CHARACTER SET utf8;SHOW VARIABLES LIKE '%character%'; works perfectly.In my scriptprint_r(callPdoQuery(/*Same query as before, omitted due for comment length reasons*/)->fetchAll());returns an empty array, even if I run as root.Could it be PDO-related?
luiscubal
That's probably it. Instead of callPdoQuery, use exec() on a PDO object to run the queries.
Eran Galperin
However, PDO::exec does not return the result(of SELECT statements, in this case), so I can't use it. prepare(), then execute() has no visible differences.
luiscubal
I meant using exec() only for setting the character sets/names. You can then use a regular fetchAll() to see if it the environment variables are set correctly.
Eran Galperin
Changing my.cnf has no visible effects.I think that "SET" commands are intercepting the result.Perhaps, when I run "SET NAMES", it's that value that's returned to fetchAll(), instead of the SELECT result!This shouldn't be a PDO problem because when I use mysql_*, it gives errors anyway.
luiscubal
Listen, I don't know what to tell you. This the standard practice for enforcing connection encoding, and I've used it on dozens of systems successfully.
Eran Galperin
OK... :(This is only a development server so I guess I'll have to hope that the final server doesn't have this problem...
luiscubal