views:

248

answers:

2

Hi

What is the best way to load JSON string in Python.

Here is my code which give problem for loading json strings...

import json

json.loads(str_to_load)

I also tried supplying 'encoding' parameter with value 'utf-16', but that didn't work either...

Can you please help me solve this problem?

Thanks

+1  A: 

The OP clarifies (in a comment!)...:

Source data is huge unicode encoded string

Then you have to know which of the many unicode encodings it uses -- clearly not 'utf-16', since that failed, but there are so many others -- 'utf-8', 'iso-8859-15', and so forth. You either try them all until one works, or print repr(str_to_load[:80]) and paste what it shows as an edit of your question, so we can guess on your behalf!-).

Alex Martelli
It is difficult to identify particular encoding during load because source data may contain characters from various languages of the world.Is there any way to detect encoding type?
Software Enthusiastic
str_to_load keeps on changing, utf-8 worked for some, utf-32 worked for some... but how do I auto detect it?
Software Enthusiastic
That string is '{"successful":true, "data":[76,{"posting_id":"1753178","site_tender_id":"3188446'
Software Enthusiastic
To try and guess the encoding of a byte string -- try http://chardet.feedparser.org/ . The string you show is ASCII (which is also valid utf-8 by definition, and also valid iso-8859-1, etc: ASCII is the common subset of most encodings!) so it's impossible to guess what potential non-ASCII encoding it might be in. UnicodeDecodeError messages carry the exact index of the first problematic byte, so show the repr of the 80-long byte string centered on that index when you do get an error.
Alex Martelli
When I read the entire string, I found unicode characters, have a look at it in the next string..."Lucaya, Grand Bahama; 4 Bedroom, 3 \xbd Bathroom"
Software Enthusiastic
Thanks for you reply...
Software Enthusiastic
The encoding of the string depends where you got it from. That string is probably one of ISO-8859-1 or Windows code page 1252. If your string is coming from a form submission from a web page, it will be in the same encoding as that web page. You really want to be using UTF-8 if you have any say in the matter. You can also avoid all charset problems by getting the JSON encoder to write non-ASCII commands using the JavaScipt `\u` escape; Python's `json.dump` does this by default but JavaScript's `JSON.stringify` does not.
bobince
`\xbd` in ISO-8859-x for several values of x (and other encodings such as CP-1252) is the single character representing the fraction `1/2` so your encoding's likely to be among this group.
Alex Martelli
A: 

It got resolved. I typecasted string into unicode string using 'latin-1'. Here is that code...

import json

ustr_to_load = unicode(str_to_load, 'latin-1')

json.loads(ustr_to_load)

And it worked.... :)

Software Enthusiastic
BTW, `latin-1` is the old name for `iso-8859-1` and these days you're much more likely to see `iso-8859-15` -- the only difference is that the latter includes the Euro sign. If you decode with `-1` and the string was encoded with `-15` it will mostly be OK but Euro signs will look very peculiar when you print or show them.
Alex Martelli
Thanks Alex. I changed it to 'iso-8859-15'....
Software Enthusiastic