urllib2

Fetch a Wikipedia article with Python

I try to fetch a Wikipedia article with Phython's urllib: f = urllib.urlopen("http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes") s = f.read() f.close() However instead of the html page I get the following response: Error - Wikimedia Foundation: Request: GET http://en.wikipedia.org/w/index.php?tit...

Caching in urllib2?

Is there an easy way to cache things when using urllib2 that I am over-looking, or do I have to roll my own? ...

Errors with Python's mechanize module

Hello, I'm using the mechanize module to execute some web queries from Python. I want my program to be error-resilient and handle all kinds of errors (wrong URLs, 403/404 responsese) gracefully. However, I can't find in mechanize's documentation the errors / exceptions it throws for various errors. I just call it with: self.browser...

urllib2 file name

If I open a file using urllib2, like so: remotefile = urllib2.urlopen('http://example.com/somefile.zip') Is there an easy way to get the file name other then parsing the original URL? EDIT: changed openfile to urlopen... not sure how that happened. EDIT2: I ended up using: filename = url.split('/')[-1].split('#')[0].split('?')[0] ...

Apache sockets not closing?

I have a web application written using CherryPy, which is run locally on 127.0.0.1:4321. We use mod-rewrite and mod-proxy to have Apache act as a reverse proxy; Apache also handles our SSL encryption and may eventually be used to transfer all of our static content. This all works just fine for small workloads. However, I recently used...

urllib.urlopen works but urllib2.urlopen doesn't

I have a simple website I'm testing. It's running on localhost and I can access it in my web browser. The index page is simply the word "running". urllib.urlopen will successfully read the page but urllib2.urlopen will not. Here's a script which demonstrates the problem (this is the actual script and not a simplification of a differe...

Python: urllib/urllib2/httplib confusion

I'm trying to test the functionality of a web app by scripting a login sequence in Python, but I'm having some troubles. Here's what I need to do: Do a POST with a few parameters and headers. Follow a redirect Retrieve the HTML body. Now, I'm relatively new to python, but the two things I've tested so far haven't worked. First I use...

urllib2 data sending

I've recently written this with help from SO. Now could someone please tell me how to make it actually log onto the board. It brings up everything just in a non logged in format. import urllib2, re import urllib, re logindata = urllib.urlencode({'username': 'x', 'password': 'y'}) page = urllib2.urlopen("http://www.woarl.com/board/index...

Tracking redirects and cookies with Python

Hi, I would like to do be able to follow and track redirects and the cookies that are set by the different webpages with Python (a bit like the tamper plugin for Firefox). So if website1 redirects to website2 which then redirects to website3, I would like to follow that and also see what cookies each website sets. I have been looking ...

How do I send a custom header with urllib2 in a HTTP Request?

I want to send a custom "Accept" header in my request when using urllib2.urlopen(..). How do I do that? ...

Accepting File Argument in Python (from Send To context menu)

I'm going to start of by noting that I have next to no python experience. As you may know, by simply dropping a shortcut in the Send To folder on your Windows PC, you can allow a program to take a file as an argument. How would I write a python program that takes this file as an argument? And, as a bonus if anyone gets a chance -- ...

Python urllib2 file upload problems

Hello everyone, I'm currently trying to initiate a file upload with urllib2 and the urllib2_file library. Here's my code: import sys import urllib2_file import urllib2 URL='http://aquate.us/upload.php' d = [('uploaded', open(sys.argv[1:]))] req = urllib2.Request(URL, d) u = urllib2.urlopen(req) print u.read() I've placed this .py fi...

Cookie Problem in Python

I'm working on a simple HTML scraper for Hulu in python 2.6 and am having problems with logging on to my account. Here's my code so far: import urllib import urllib2 from cookielib import CookieJar #make a cookie and redirect handlers cookies = CookieJar() cookie_handler= urllib2.HTTPCookieProcessor(cookies) redirect_handler= urllib2....

Throttling with urllib2

Hello all, is it possible to easily cap the kbps when using urllib2? If it is, any code examples or resources you could direct me to would be greatly appreciated. ...

Missing first line when downloading .rar file using urllib2.urlopen()

Okey this is really strange. I have this script which basically downloads bunch of achieve files and extracts them. Usually those files are .zip files. Today I sat down and decided to make it work with rar files and I got stuck. At first I thought that the problem is in my unrar code, but it wasn't there. So I did: f = urllib2.urlopen(f...

How do I prevent Python's urllib(2) from following a redirect

I am currently trying to log into a site using Python however the site seems to be sending a cookie and a redirect statement on the same page. Python seems to be following that redirect thus preventing me from reading the cookie send by the login page. How do I prevent Python's urllib (or urllib2) urlopen from following the redirect? ...

How do you get default headers in a urllib2 Request?

I have a Python web client that uses urllib2. It is easy enough to add HTTP headers to my outgoing requests. I just create a dictionary of the headers I want to add, and pass it to the Request initializer. However, other "standard" HTTP headers get added to the request as well as the custom ones I explicitly add. When I sniff the requ...

How to debug a file upload?

I'm trying to upload a PDF file to a website using Hot Banana's content management system using a Python script. I've successfully logged into the site and can log out, but I can't seem to get file uploads to work. The file upload is part of a large complicated web form that submits the form data and PDF file though a POST. Using Fi...

Using MultipartPostHandler to POST form-data with Python

Problem: When POSTing data with Python's urllib2, all data is URL encoded and sent as Content-Type: application/x-www-form-urlencoded. When uploading files, the Content-Type should instead be set to multipart/form-data and the contents be MIME encoded. A discussion of this problem is here: http://code.activestate.com/recipes/146306/ To...

Python: Downloading a large file to a local path and setting custom http headers

I am looking to download a file from a http url to a local file. The file is large enough that I want to download it and save it chunks rather than read() and write() the whole file as a single giant string. The interface of urllib.urlretrieve is essentially what I want. However, I cannot see a way to set request headers when downloadin...