views:

2290

answers:

4

I have a script that I'd like to continue using, but it looks like I either have to find some workaround for a bug in Python 3, or downgrade back to 2.6, and thus having to downgrade other scripts as well...

Hopefully someone here have already managed to find a workaround.

The problem is that due to the new changes in Python 3.0 regarding bytes and strings, not all the library code is apparently tested.

I have a script that downloades a page from a web server. This script passed a username and password as part of the url in python 2.6, but in Python 3.0, this doesn't work any more.

For instance, this:

import urllib.request;
url = "http://username:password@server/file";
urllib.request.urlretrieve(url, "temp.dat");

fails with this exception:

Traceback (most recent call last):
  File "C:\Temp\test.py", line 5, in <module>
    urllib.request.urlretrieve(url, "test.html");
  File "C:\Python30\lib\urllib\request.py", line 134, in urlretrieve
    return _urlopener.retrieve(url, filename, reporthook, data)
  File "C:\Python30\lib\urllib\request.py", line 1476, in retrieve
    fp = self.open(url, data)
  File "C:\Python30\lib\urllib\request.py", line 1444, in open
    return getattr(self, name)(url)
  File "C:\Python30\lib\urllib\request.py", line 1618, in open_http
    return self._open_generic_http(http.client.HTTPConnection, url, data)
  File "C:\Python30\lib\urllib\request.py", line 1576, in _open_generic_http
    auth = base64.b64encode(user_passwd).strip()
  File "C:\Python30\lib\base64.py", line 56, in b64encode
    raise TypeError("expected bytes, not %s" % s.__class__.__name__)
TypeError: expected bytes, not str

Apparently, base64-encoding now needs bytes in and outputs a string, and thus urlretrieve (or some code therein) which builds up a string of username:password, and tries to base64-encode this for simple authorization, fails.

If I instead try to use urlopen, like this:

import urllib.request;
url = "http://username:password@server/file";
f = urllib.request.urlopen(url);
contents = f.read();

Then it fails with this exception:

Traceback (most recent call last):
  File "C:\Temp\test.py", line 5, in <module>
    f = urllib.request.urlopen(url);
  File "C:\Python30\lib\urllib\request.py", line 122, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python30\lib\urllib\request.py", line 359, in open
    response = self._open(req, data)
  File "C:\Python30\lib\urllib\request.py", line 377, in _open
    '_open', req)
  File "C:\Python30\lib\urllib\request.py", line 337, in _call_chain
    result = func(*args)
  File "C:\Python30\lib\urllib\request.py", line 1082, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "C:\Python30\lib\urllib\request.py", line 1051, in do_open
    h = http_class(host, timeout=req.timeout) # will parse host:port
  File "C:\Python30\lib\http\client.py", line 620, in __init__
    self._set_hostport(host, port)
  File "C:\Python30\lib\http\client.py", line 632, in _set_hostport
    raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
http.client.InvalidURL: nonnumeric port: 'password@server'

Apparently the url parsing in this "next gen url retrieval library" doesn't know what to do with username and passwords in the url.

What other choices do I have?

+1  A: 

One of the biggest changes in python 3.0 has been string handling. Because of reported by your exception, I would first check by using a byte string:

import urllib.request;
url = b"http://username:password@server/file";
urllib.request.urlretrieve(url, "temp.dat");

However, in this case, string conversion is not the cause of the issue; please see reply from bishanty for a good solution.
As a matter of fact, I think encoding username and password in the url was not a documented method even in previous versions (see for instance basic authentication introduction from fuzzyman).

Roberto Liffredo
Did you test this? This fails with "unknown url type b'http'", the rest of urllib.request doesn't appear to be ready to handle byte-strings.
Lasse V. Karlsen
Actually, it was intended to be more a generic suggestion over the generic p3k unicode/byte string "issue". I have now changed it, hope it will be clearer.
Roberto Liffredo
+11  A: 

Direct from the Py3k docs: http://docs.python.org/dev/py3k/library/urllib.request.html#examples

import urllib.request
# Create an OpenerDirector with support for Basic HTTP Authentication...
auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password(realm='PDQ Application',
                          uri='https://mahler:8092/site-updates.py',
                          user='klem',
                          passwd='kadidd!ehopper')
opener = urllib.request.build_opener(auth_handler)
# ...and install it globally so it can be used with urlopen.
urllib.request.install_opener(opener)
urllib.request.urlopen('http://www.example.com/login.html')
jb
did you mean to post that password? If not, then I suggest deleting the answer and posting a new one with dummy data there. Thanks for the answer though, this looks promising.
Lasse V. Karlsen
Direct from the Python docs :P
jb
Klem is probably pretty pissed if that's his real password though :)
jb
+1: Direct from the docs.
S.Lott
A: 

My advice would be to maintain your 2.* branch as your production branch until you can get the 3.0 stuff sorted.

I am going to wait a while before moving over to Python 3.0. There seems a lot of people in a rush, but I just want everything sorted out, and a decent selection of third-party libraries. This may take a year, it may take 18 months, but the pressure to "upgrade" is really low for me.

Ali A
A: 

Have any of you made this method work?

I had no problem to retrieve zip files from free access website, but had problems with retrieving zip files from password protected sites. Retrieved files are corrupted.