ansaurus

Question

How to fix "Incorrect string value" errors ?

Answer 1

+1 A:

"\xE4\xC5\xCC\xC9\xD3\xD8" isn't valid UTF-8. Tested using Python:

>>> "\xE4\xC5\xCC\xC9\xD3\xD8".decode("utf-8")
...
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid data

If you're looking for a way to avoid decoding errors within the database, the cp1252 encoding (aka "Windows-1252" aka "Windows Western European") is the most permissive encoding there is - every byte value is a valid code point.

Of course it's not going to understand genuine UTF-8 any more, nor any other non-cp1252 encoding, but it sounds like you're not too concerned about that?

RichieHindle 2009-07-22 20:39:09

What exactly do you mean by, "Of course it's not going to understand genuine UTF-8 any more?"

Brian 2009-07-22 21:28:38

@Brian: If you tell it you're giving it cp1252, and you actually give it the UTF-8 for, say, `café`, it's going to misinterpret that as `cafÃ©`. It won't crash, but it will misunderstand the high-bit characters.

RichieHindle 2009-07-22 21:36:41

@Richie: The database can happily call the data whatever it wants, but if the php code that grabs it is stuffing it into a string, that won't make much difference...will it? I don't see exactly where the lack of understanding of UTF-8 is having an impact.

Brian 2009-07-22 21:38:27

@Brian: No, you're right. The time it would make a difference would be within the database, for instance if you used a ORDER BY clause in your SQL - the sorting would be wonky where you had non-ASCII characters.

RichieHindle 2009-07-22 21:45:27

ansaurus

tags:

views:

answers:

How to fix "Incorrect string value" errors ?

related questions