views:

124

answers:

2

Hi all. I am having a real headache with reading in a tab delimited text file and inserting it into a MySQL Database.

The tab delimited text file was generated (I think) from a MS SQL Database, and I have written a simple script to read in the file and insert it into an existing table in my MySQL database.

However, there seems to be some problem with the data in the txt file. When my PHP script parses the file and I output the INSERT statements, the values in each of the fields are longer than they should be. For example, the first field should be a simple two character alphanumeric value. If I echo out the INSERT statements, using Firebug (in Firefox), between each of the characters is a question mark in a black diamond. If I var_dump the values, I get the following:

string(5) "A1" 

Now, this clearly shows a two character string, but var_dump tells me it is five characters long!!

If I trim() the value, all I get is the first character (in this case "A").

How can I get at the other characters, even if it is only to remove them? Additionally, this appears to be forcing MySQL to insert the value as a BLOB, not as a varchar as it should.

Simon

UPDATE If I do:

echo mb_detect_encoding($arr[0]);

I get a result of 'ASCII'. This isn't multibyte, is it??

+1  A: 

Sounds like an encoding issue.

Are you running any strings through PHP functions which are not multi byte safe?

You may need to look at multi byte aware functions in PHP.

alex
Hmm, thought it might be an encoding issue. I am using sprintf(), but unsure whether this is mb safe or not. Not got much experience of multibyte strings. How can I write this to MySQL as a standard string?
Simon S
A: 

OK, solved all these issues by opening the TXT file in notepad and saving it specifically as UTF-8.

I still don't know what encoding was used (maybe UNICODE??) but it's all sorted now

Simon S
UTF8 is an encoding of Unicode. Perhaps it was an ISO standard.
alex