views:

116

answers:

2

I'm parsing a JSON feed in Python and it contains this character, causing it not to validate.

Is there a way to handle these symbols? Can they be converted or is they're a tidy way to remove them?

I don't even know what this symbol is called or what causes them, otherwise I would research it myself.

EDIT: Stackover Flow is stripping the character so here: http://files.getdropbox.com/u/194177/symbol.jpg

It's that [?] symbol in "Classic 80s"

+1  A: 

That probably means the text you have is in some sort of encoding, and you need to figure out what encoding, and convert it to Unicode with a thetext.decode('encoding') call.

I not sure, but it could possibly be the [?] character, meaning that the display you have there also doesn't know how to display it. That would probably mean that the data you have is incorrect, and that there is a character in there that doesn't exist in the encoding that you are supposed to use. To handle that you call the decode like this: thetext.decode('encoding', 'ignore'). There are other options than ignore, like "replace", "xmlcharrefreplace" and more.

Lennart Regebro
A: 

JSON must be encoded in one of UTF-8, UTF-16, or UTF-32. If a JSON file contains bytes which are illegal in its current encoding, it is garbage.

If you don't know which encoding it's using, you can try parsing using my jsonlib library, which includes an encoding-detector. JSON parsed using jsonlib will be provided to the programmer as Unicode strings, so you don't have to worry about encoding at all.

John Millikin