tags:

views:

2003

answers:

3

A very strange thing happens to me when I try to insert Non-Latin characters (e.g. Hebrew, Arabic etc. using N'string to insert') using PHP. This thing happens with all the extensions I've tried: mssql, odbc and native driver.

When inserting through PHP code, only when selecting through PHP it works fine and I can read what I've inserted. In query analyzer strings looks gibberish and C++ code also receives gibberish.

When inserting through query analyzer, it looks good in analyzer and in C++ code but when selecting it using PHP I get question marks instead of strings.

Does anybody has a clue?

+1  A: 

You are not controlling your connection collation properly. Set it by sending the proper queries before you start handling the data, or by changing it in the configuration file (I assume the collation you use is a form of UTF).

I'm not sure if translates the same to MsSQL, but in MySQL the queries are:

SET NAMES 'utf8';
SET CHARACTER SET 'utf8';

And there are matching configuration attributes that set it permanently.

Eran Galperin
+1  A: 

This may be due to one of two problems: your database/table/column is using a character set in which the characters you're trying to insert are unknown. In your case, I'd recommend using UTF-8 (unless you're using mostly non-Latin characters, then UTF-16 would be the preferred choice). If you're absolutely sure that your database is capable of storing these characters, it also might be the connection settings which are incorrect (a database connection can be set to use a different character set than the one in use by the database).

Another problem is that PHP has no Unicode support. So if the characters you're trying to insert are embedded in the PHP source file, or if you're applying string functions on the query before executing it, chances are PHP has already mangled the original string. Luckily, PHP has various functions that do support Unicode characters, so you might be able to use those. You can find some of them here:

http://php.net/manual/en/refs.international.php

Victor Welling
+1  A: 

We also tried all available methods and found two working solutions handling unicode data in NVARCHAR fields. Unfortunately, the SQL Server Driver for PHP still has some bugs and is really hard to use as you have to convert NVARCHAR-like fields manually from UTF-16LE. There is support in Zend Framework 1.9.0, but I haven't looked at it yet.

Currently, the best way to access an SQL Server from PHP seems to be using COM and OLEDB:

$conn = new COM('ADODB.Connection', null, 65001);
$conn->Open('Provider=SQLNCLI;Server=SERVER;Database=DB;Uid=USER;Pwd=PASS');
$rs = $conn->Execute('SELECT nvarcharfield FROM sometable');
while (!$rs->EOF) { 
    echo $rs->Fields('nvarcharfield')->value, "\n";
    $rs->MoveNext();
} 
$rs->Close();

The "65001" indicates, that this component should work in UTF-8 mode, which is probably what you want. So all data in and out will be in UTF-8 encoding.

This would be great if prepared queries weren't so complex to program using COM, so you might want to have a look at ADOdb for PHP. The code then becomes just slightly different:

$conn = NewADOConnection('ado_mssql');
$conn->charPage = 65001;
$conn->Connect('Provider=SQLNCLI;Server=SERVER;Database=DB;Uid=USER;Pwd=PASS');
$conn->SetFetchMode('ADODB_FETCH_ASSOC');
$rs = $conn->Execute('SELECT nvarcharfield FROM sometable');
while (!$rs->EOF) { 
    echo $rs->fields['nvarcharfield'], "\n";
    $rs->MoveNext();
} 
$rs->Close();

Using ADOdb, prepared queries in ADOdb are much easier to write and understand.

If you have the new SQL Native Client from SQL Server 2008, you might want to replace the provider "SQLNCLI" with "SQLNCLI10".

As soon as I get some reputation, I'll add the links inline (only one hyperlink allowed for new users).

http://devzone.zend.com/article/4906-Zend-Framework-1.9.0-Released
http://msdn.microsoft.com/de-de/library/system.data.oledb.oledbcommand.prepare.aspx
http://adodb.sourceforge.net/
http://phplens.com/lens/adodb/docs-adodb.htm#prepare
skettler
"...as you have to convert NVARCHAR-like fields manually from UTF-16LE.". Suppose I don't want to change my entire solution to ado, how can I manually convert NVARCHARS to UTF-8? Using $str = iconv("ISO-8859-1", "UTF-8", $str) works well on all Latin-1 languages. Problem is $str = iconv("ISO-8859-5", "UTF-8", $str) does NOT work on Russian at all.
Zamel