views:

100

answers:

1

I'm getting really tired of trying to figure out why this code works in Python 2 and not in Python 3. I'm just trying to grab a page of json and then parse it. Here's the code in Python 2:

import urllib, json
response = urllib.urlopen("http://reddit.com/.json")
content = response.read()
data = json.loads(content)

I thought the equivalent code in Python 3 would be this:

import urllib.request, json
response = urllib.request.urlopen("http://reddit.com/.json")
content = response.read()
data = json.loads(content)

But it blows up in my face, because the data returned by read() is a "bytes" type. However, I cannot for the life of me get it to convert to something that json will be able to parse. I know from the headers that reddit is trying to send utf-8 back to me, but I can't seem to get the bytes to decode into utf-8:

import urllib.request, json
response = urllib.request.urlopen("http://reddit.com/.json")
content = response.read()
data = json.loads(content.decode("utf8"))

What am I doing wrong?

Edit: the problem is that I cannot get the data into a usable state; even though json loads the data, part of it is undisplayable, and I want to be able to print the data to the screen.

Second edit: The problem has more to do with print than parsing, it seems. Alex's answer provides a way for the script to work in Python 3, by setting the IO to utf8. But a question still remains: why is it that the code worked in Python 2, but not Python 3?

+2  A: 

The code you post is presumably due to wrong cut-and-paste operations because it's clearly wrong in both versions (f.read() fails because there's no f barename defined).

In Py3, ur = response.decode('utf8') works perfectly well for me, as does the following json.loads(ur). Maybe the wrong copys-and-pastes affected your 2-to-3 conversion attempts.

Alex Martelli
Whoops, I will fix the code mistakes... I tried reformatting it for display but screwed it all up in the process. :PRegardless, I can't view the data after I parse it (using a simple "print(data)") because it gives me charmap errors.
Daniel Lew
@Daniel, the problems _after_ you've gotten the data seem to be a separate question from this one about getting the data (which my answer, it appears, responded to -- though seemingly you don't agree, since you didn't even upvote it!). If by `data` you mean the `json.loads(response)`, I can `print` it without any problem (on my Mac Terminal.app, which supports UTF-8). What's your sys.stdout.encoding? Have you set properly the environment variable `PYTHONIOENCODING: Encoding[:errors] used for stdin/stdout/stderr` before starting Python 3? Etc, etc -- totally different issues, see.
Alex Martelli
Sorry if I was unclear at first. The core problem is I can't *use* the data after parsing, for whatever reason (the print is just the beginning of it; if I can't print it, then somewhere down the line I'm going to run into trouble reading the data). I'll check out the encoding, suffice to say it doesn't work on my W7 machine.
Daniel Lew
Alex Martelli
If it were just the output capability of the Windows terminal, then why does the code work in Python 2?
Daniel Lew
@Daniel, perhaps by a different setting of sys.stdout.encoding (e.g. via `PYTHONIOENCODING`, etc) -- I've already asked about that and I've heard nothing from you in response in this interminable thread of comments you insist on perpetuating. Why not just `print(repr(data))` in both cases and check if anything is different? If not, then you **know** it's all about output/terminal issues, as I suspect it may well be -- if specific differences, then of course let us know (editing your Q please, **not** in yet another cramped comment!-).
Alex Martelli
I can't test the code at the moment anyways because reddit itself is down; once I can I'll edit the question with details. I do know that the sys.stdout.encoding is the same between my 2.6 and 3.1 instances (cp437, which I could try setting to something else).
Daniel Lew
@Daniel, CP437 (like most CPs) just won't let you show every Unicode character (a tiny subset, in fact). Type into the Windows console "chcp 65001" (this sets the code page to UTF-8) and change the terminal font to a Unicode font: Right click title bar, Properties, Font, Lucida Console; then `SET PYTHONIOENCODING=utf8`.
Alex Martelli
The PYTHONIOENCODING solved the problem, but I still want to know why it worked in P2 but not P3.
Daniel Lew
Good luck finding the answer to your philosophical question as the 10th or later answer of this absurd comment thread. What I know for sure is never to even look at a Q of yours again, after this ridiculous series of events: I spot your coding mistakes, correctly spot that your claim "I cannot get the data into a usable state" is instead all about mis-set IO, show you how to set it correctly (all in comments, **incredibly** inconvenient), and _still_ no accept because you're apparently too stubborn to admit this really needs a new Q. What an utter, total waste of my time.
Alex Martelli