views:

5317

answers:

5

Hi

I want to give the user the ability to import a csv file into my php/mysql system, but ran into some problems with encoding when the language is russian which excel only can store in UTF-16 tab-coded tab files.

Right now my database is in latin1, but I will change that to utf-8 as described in question "a-script-to-change-all-tables-and-fields-to-the-utf-8-bin-collation-in-mysql"

But how should I import the file? and store the strings?

Should I for example translate it to html_entitites?

I am using the fgetcsv command to get the data out of the csv file. My code looks something like this right now.


file_put_contents($tmpfile, str_replace("\t", ";", file_get_contents($tmpfile)));
$filehandle = fopen($tmpfile,'r');
while (($data = fgetcsv($filehandle, 1000, ";")) !== FALSE) {
  $values[] = array(
    'id' => $data[0], 
    'type' => $data[1], 
    'text' => $data[4], 
    'desc' => $data[5], 
    'pdf' => $data[7]);
}

As note, if I store the xls file as csv in excel, i special chars are replaced by '_', so the only way I can get the russian chars out of the file, is to store the file in excel as tabbed seperated file in UTF16 format.

A: 

Alternatively you could make use of the MySQL load command. This command lets you specify delimiters, character set, etc. The one caveat is that the server loading the data must have direct visibility of the file, meaning that the file must reside on a filesystem visible and readable by the db server.

toluju
The load command does not support utf16From the documentation:"Note that it is currently not possible to load data files that use the ucs2, utf16, or utf32 character set."
Lauer
I missed that part of the documentation, sorry. :(Sounds like MySQL generally has trouble with utf16, so you may need to convert from utf16 to utf8 in your code. As per a question that has already been asked on SO (http://stackoverflow.com/questions/155514/how-to-convert-a-utf-8-string-to-a-utf-16-string-in-php), the mb_convert_encoding function can help you with this (http://www.php.net/manual/en/function.mb-convert-encoding.php)
toluju
A: 

I would not import it using PHP. Instead consider creating a temporary table to store your data using READ DATA INFILE.

$file_handle = fopen($file_name, 'r');
$first_row = fgetcsv($file_handle, 0, ',', '"');
fclose($file_handle);
# Your usual error checking
if (!is_array($first_row)) {
    ...
}
$columns = 'column'.implode(' TEXT, column', array_keys($first_row)).' TEXT';
query("CREATE TABLE $table ($columns) Engine=MyISAM DEFAULT CHARSET=ucs2");
query("LOAD DATA LOCAL INFILE '$file_name' INTO TABLE $table ...

Then you can do whatever you want with the data in that table.

soulmerge
About the note of the charset not being supported: I would actually try it out, I think that phrase just means that no conversion can be done while loading the data. It should be a simple copy operation dumping a bunch of bytes into columns, which should work
soulmerge
+1  A: 
Lauer
A: 

Hi

you can use http://www.dbTube.org to import CSV/Excel files into mySQL Database.

Greetings

Andreas

Andreas Herz
Nice application. Bad that it isn't free, but I will remember your link an other day where I need to work with excel/csv and mySQL. However it is normal an easy task with LOAD DATA INFILE
Lauer
A: 

Okay, my solution was ALSO to export the file from excel to UTF16 unicode text. The only difference was that I grab my file using a tab delimiter:

fgetcsv($fp, '999999', "\t", '"')
zmonteca