tags:

views:

263

answers:

2

I have 2 strings in my PHP code, 1 is a parameter to my method and 1 is a string from an ini file. The problem is that they are not equal, although they have the same content, probably due to encoding issues. When using var_dump, it is reported that the first string's lenght is 23 and the second string's length is 47 (see the end of my question for the reason behind this)

How can i make sure they are both encoded the same way and have the same length in the end so comparison won't fail? Preferably, i would like them to be utf8 encoded.

For reference, this is an excerpt from the code:

static function getString($keyword,$file) {

$lang_handle = parse_ini_file($file, true);

var_dump($keyword);
    foreach ($lang_handle as $key => $value) {
        var_dump($key);
        if ($key == $keyword) {
            foreach ($value as $subkey => $subvalue) {
                var_dump("\t" . $subkey . " => " . $subvalue);
            }
        }
    }
}

with the following ini:

[clientcockpit/login.php]
header = "Kunden Login"
username = "Benutzername"
password = "Passwort"
forgot = "Passwort vergessen"
login = "Login"

When calling the method with getString("clientcockpit/login.php", "inifile.ini") the output is:

string 'clientcockpit/login.php' (length=23)
string '�c�l�i�e�n�t�c�o�c�k�p�i�t�/�l�o�g�i�n�.�p�h�p�' (length=47)
A: 

Try this:

$lang_handle = parse_ini_string(file_get_contents($file), true);
reko_t
This only works with PHP > 5.3 and I use 5.2.6
Pascal
+3  A: 

Your INI file seems to be in UTF16 encoding or similar, using two bytes to represent a single character. I guess that the strange characters in your string are actually NULL bytes (\0).

PHP's Unicode support is quite poor and I guess that parse_ini_file() does not support multibyte encodings properly. It will treat the file as if it was encoded using a "ASCII-compatible" single-byte encoding, just looking for special characters [ and ] to detect sections. As a result, the section keys will be corrupted: One byte actually belonging to [ or ] will be part of the section key:

UTF-16:    [c]    (3 characters, 6 bytes)

For UTF-16BE (big endian):

  Bytes:    00 5B    00 63    00 5D    (6 bytes)
  ASCII:    \0  [    \0  c    \0  ]    (6 characters)

For UTF-16LE (little endian):

  Bytes:    5B 00    63 00    5D 00    (6 bytes)
  ASCII:    [  \0    c  \0    ]  \0    (6 characters)

Assuming ASCII, instead of reading c, parse_ini_file() will read \0c\0 if the source file encoding is UTF-16.

If you can control the format of your INI file, make sure to save it in UTF8 or ISO-8859-1 encoding, using your favorite text editor.

Otherwise you will have to read in the file contents using file_get_contents(), do the encoding conversion (eg. using iconv()) and pass the result to parse_ini_string(). The drawback here is that you will have to detect or hardcode the original file encoding.

If the mb multibyte extension is available on your PHP installation, you can use mb_detect_encoding() and mb_convert_encoding() to do the conversion dynamically.

Ferdinand Beyer
Making absolutely sure that the file as saved as UTF-8 instead of UTF-16 (in which the client supplied it), it seemed to work. I was unable to programmaticly convert the strings to other encodings using mbstrings however.
Pascal