views:

564

answers:

1

Hello,

I have some html that was inserted into a mysql database from a csv file, which in turn was exported from an access mdb file. The mdb file was exported as Unicode, and indeed is unocode. I am however unsure as what encoding the mysql database has.

When I try to echo out html stored in a field however, there is no unicode. This is a direct retrieval of one of the html fields in the database.

http://www.yousendit.com/download/TTZueEVYQzMrV3hMWEE9PQ

it says utf-8 in the source. the actual page code generated from echoing out article_desc is here:

http://www.nomorepasting.com/getpaste.php?pasteid=22566

I need to use this html with JSON, and I am wondering what I should do. I can not use any other frameworks or libraries. Should I convert the data before inserting it into the mysql db, or something else?

+1  A: 

The mdb file was exported as Unicode, and indeed is unocode.

That makes no sense. A file can not be unicode. It can be encoded with a unicode-compatible encoding, such as utf-8, or utf-16 or utf-8 with BOM or ..

Charset issues is a very common problem, and it has its root in ignorance. I don't say this to offend you, but you really need to know the difference between codepoints (strings) and encodings (bytestreams). If you don't know which you're dealing with at all times throughout your entire application, you will get problems eventually. The curse about these issues is, that they only happen in edge cases, so it's easy to oversee them for a long time and when you finally realise something is wrong, it may be triggered in a completely unrelated part of your application. This makes it almost impossible to debug.

troelskn