views:

140

answers:

4

Hi guys,

I'm having an issue with validating chinese characters against other chinese characters, for example I'm creating a simple password script which gets data from a database, and gets the user input through get.

The issue I'm having is for some reason, even though the characters look exactly the same when you echo them out, my if statement still thinks they are different.

I have tried using the htmlentities() function to encode the characters, the password from the database encodes nicely, giving me a working '& #35441;' (I've put a space in it to stop it from converting to a chinese character!).

The other user input value gives me a load of funny characters. The only thing which I believe must be breaking it, is it encodes in a different way and therefore the php thinks it's 2 completely different strings.

Does anybody have any ideas?
Thanks in advance,
Will

Edit:
Thanks for the quick responses guys, I'm gonna look around setting the database encoding to UTF-8, however at the moment, the results from the database are not the problem, they are encoding correctly using htmlentities, it's the results I get from $_GET which is causing the problems.

Cheers,
Will

+3  A: 

For passwords my advice is don't do a direct comparison, because that means you're storing passwords in the clear. At least run them through a hash like MD5 or SHA (preferably with a salt value as well) before storing them. Then you just have to compare the hash values, which are typically Hex values, so shouldn't cause any encoding problems.

For non-password values it sounds like your database and PHP are not on the same encoding, so they are not matching properly. If MySQL is storing them the way you want, have it do the comparison (instead of having it return the values first), that should avoid 1 of the passes through an encoding change which seems likely to be the problem.

acrosman
A: 

If you want to store passwords, read this : what you need to know about secure password schemes.

After reading it, your root problem seem to be some character encoding missmatch between what you receive from the user and what you get from your database. If you are using Mysql and utf-8 encoding, do you first use the SET names "utf-8" query ?

Arkh
A: 

Saving the values using SHA1 and MD5 may solve your problem as the other stated it. It is also a secure process. Here's a code snippet to help out.

public function getHashedPassword()
{
    $salt = 'mysalt';
    return  sprintf( "%d%s",$salt,sha1( sprintf( "%d%s", $salt,$this->_rawPassword) ));
}

Upon comparison, rehash the password input and compare it to the save hashed password in your database. Doing so may remove the encoding issue.

Hanseh
Hi Hanseh, yeah I have tried using md5 to encode the strings, but I still have the same problems, when I echo out the passwords I get 2 completely different set of numbers.
WillDonohoe
A: 

Since you anyway ought to store hashes of passwords rather than the passwords themselves, this might be a part of the solution. You store the hash rather than the password and thus have no problems with the database.

That said, there might be differences to how different browsers encode the strings they submit. It's not something I'm very much into, but you better make sure that you find a solution that makes the exact same string on all browsers. Setting the accept-charset to utf-8 is a nobrainer, you might also want to mess with the enctype.

eBusiness