views:

19

answers:

2

64-bit VISTA
Python 3.1

from urllib import request
a = request.urlopen('http://www.marketwatch.com/investing/currency/CUR_USDYEN').read(20500)
b = a[19000:20500]
idx_pricewrap = b.find('pricewrap')
context = b[idx_pricewrap:idx_pricewrap+80]
idx_bgLast = context.find('bgLast')
rate = context[idx_bgLast+8:idx_bgLast+15]
print(rate)
Traceback (most recent call last):
 File "c:\P31Working\test_urllib.py", line 4, in 
   idx_pricewrap = b.find('pricewrap')
TypeError: expected an object with the buffer interface
Process terminated with an exit code of 1

I have NO idea what that error means.

Please help.

+1  A: 

Python 3 is a lot more strict when it comes to the difference between bytes and (Unicode) strings. The result of urlopen(...).read(...) is of course an object of type bytes, and the implementation of bytes.find doesn't allow you to search for Unicode strings. In your case, you can simply replace "pricewrap" by a binary string:

idx_pricewrap = b.find(b'pricewrap')

Same applies to other .find calls. Python 2 encoded Unicode strings automatically where it made (less or more) sense, but Python 3 has introduced more restrictions that you need to be aware of.

AndiDog
Thanks very much. Before I saw your answer I found a relevant example in the docs, which I think does what you suggest in a different way. I'll answer my own question to show this.
NotSuper
@NotSuper: Yes, decoding the website to a Unicode object is a good solution as well. Actually it's the better solution, but for HTML sites you might rather want to use a parser library that can detect the charset automatically (from the HTTP header or the charset definition inside the HTML, instead of assuming UTF-8).
AndiDog
@AndiDog: I'd like to learn how to use a parser library. Could you point me to some examples? I assume I could do this with what 3.1 has?
NotSuper
@NotSuper: There's a well-known library called [`BeautifulSoup`](http://www.crummy.com/software/BeautifulSoup/) to do just that. It has a version compatible with Python 3.1. As of the documentation, it automatically produces Unicode strings from HTML input, but I don't know how you can pass the "Content-Type" header to it in case the HTML itself doesn't declare the charset. I haven't used it myself but there are a lot of questions about it on SO so you can get help here.
AndiDog
@AndiDog: Thanks. I'll look into BeautifulSoup.
NotSuper
A: 

I finally find a relevant example in the docs:

http://docs.python.org/py3k/library/urllib.request.html?highlight=urllib#examples

The first example gave me some understanding and led me to revising my code to

http://tutoree7.pastebin.com/sUq8s4wh

which works like a charm.

NotSuper