views:

253

answers:

1

Hello,

I have the following UTF-8 file exported from a microsoft access file

http://www.yousendit.com/download/TTZtT214SU84Q1FLSkE9PQ

I have ensured my mysql database is utf8 with the status; command for both client and server. I insert the above file into my database with the following command:

LOAD DATA LOCAL INFILE 'tblAuction1.txt' INTO TABLE Auctions FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\';

All seems to be going kind of OK, unicode characters are displayed in the html as they should be as far as I can tell. The direct contents of the database field is here:

http://www.nomorepasting.com/getpaste.php?pasteid=22622

However the resulting html code is displayed:

http://www.nomorepasting.com/getpaste.php?pasteid=22617

Which displays as

Fee Listing

1.00 
<\/OBJECT>
');\n\t\t<\/SCRIPT>\n\t\t

in the browser

The code I am using to show this is:

http://www.nomorepasting.com/getpaste.php?pasteid=22618

which was working fine before I changed the encoding.

as a side question, I am wondering why changing from tab delimited to semicolon delimited, and enclosing fields would ddecrease the size of the exported file by half. The tab character is a single character just like the ; character, and adding quotes to enclose should have increased the size?

A: 

Depending on the configuration of the web server you may need to explicitly set the encoding to "text/html; charset=UTF-8", with header():

header('Content-Type: text/html; charset=UTF-8');

This should be enough for your specific problem, but - in case you also intend to manipulate the strings - note that PHP contains many functions that are not safe to use with multi-byte characters: you should at least properly configure the mbstring extension.

I also have this cheatsheet in my bookmarks, I think it's still relevant.

Luca Tettamanti
That did not seem to fix anything, is it possibly a problem with the database? It seems to be a problem with the html meant to be passed to document.write, and a tag unclosed somewhere.
Joshxtothe4
The unclosed tag is orthogonal to the UTF-8 encoding... I though you had issues displaying correctly the non-ASCII characters.
Luca Tettamanti