I try to fetch a Wikipedia article with Phython's urllib:
f = urllib.urlopen("http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes")
s = f.read()
f.close()
However instead of the html page I get the following response: Error - Wikimedia Foundation:
Request: GET http://en.wikipedia.org/w/index.php?tit...
Is there an easy way to cache things when using urllib2 that I am over-looking, or do I have to roll my own?
...
Hello,
I'm using the mechanize module to execute some web queries from Python. I want my program to be error-resilient and handle all kinds of errors (wrong URLs, 403/404 responsese) gracefully. However, I can't find in mechanize's documentation the errors / exceptions it throws for various errors.
I just call it with:
self.browser...
If I open a file using urllib2, like so:
remotefile = urllib2.urlopen('http://example.com/somefile.zip')
Is there an easy way to get the file name other then parsing the original URL?
EDIT: changed openfile to urlopen... not sure how that happened.
EDIT2: I ended up using:
filename = url.split('/')[-1].split('#')[0].split('?')[0]
...
I have a web application written using CherryPy, which is run locally on 127.0.0.1:4321. We use mod-rewrite and mod-proxy to have Apache act as a reverse proxy; Apache also handles our SSL encryption and may eventually be used to transfer all of our static content.
This all works just fine for small workloads. However, I recently used...
I have a simple website I'm testing. It's running on localhost and I can access it in my web browser. The index page is simply the word "running". urllib.urlopen will successfully read the page but urllib2.urlopen will not. Here's a script which demonstrates the problem (this is the actual script and not a simplification of a differe...
I'm trying to test the functionality of a web app by scripting a login sequence in Python, but I'm having some troubles.
Here's what I need to do:
Do a POST with a few parameters and headers.
Follow a redirect
Retrieve the HTML body.
Now, I'm relatively new to python, but the two things I've tested so far haven't worked. First I use...
I've recently written this with help from SO. Now could someone please tell me how to make it actually log onto the board. It brings up everything just in a non logged in format.
import urllib2, re
import urllib, re
logindata = urllib.urlencode({'username': 'x', 'password': 'y'})
page = urllib2.urlopen("http://www.woarl.com/board/index...
Hi,
I would like to do be able to follow and track redirects and the cookies that are set by the different webpages with Python (a bit like the tamper plugin for Firefox).
So if website1 redirects to website2 which then redirects to website3, I would like to follow that and also see what cookies each website sets. I have been looking ...
I want to send a custom "Accept" header in my request when using urllib2.urlopen(..). How do I do that?
...
I'm going to start of by noting that I have next to no python experience.
As you may know, by simply dropping a shortcut in the Send To folder on your Windows PC, you can allow a program to take a file as an argument.
How would I write a python program that takes this file as an argument?
And, as a bonus if anyone gets a chance -- ...
Hello everyone,
I'm currently trying to initiate a file upload with urllib2 and the urllib2_file library. Here's my code:
import sys
import urllib2_file
import urllib2
URL='http://aquate.us/upload.php'
d = [('uploaded', open(sys.argv[1:]))]
req = urllib2.Request(URL, d)
u = urllib2.urlopen(req)
print u.read()
I've placed this .py fi...
I'm working on a simple HTML scraper for Hulu in python 2.6 and am having problems with logging on to my account. Here's my code so far:
import urllib
import urllib2
from cookielib import CookieJar
#make a cookie and redirect handlers
cookies = CookieJar()
cookie_handler= urllib2.HTTPCookieProcessor(cookies)
redirect_handler= urllib2....
Hello all, is it possible to easily cap the kbps when using urllib2?
If it is, any code examples or resources you could direct me to would be greatly appreciated.
...
Okey this is really strange. I have this script which basically downloads bunch of achieve files and extracts them. Usually those files are .zip files. Today I sat down and decided to make it work with rar files and I got stuck. At first I thought that the problem is in my unrar code, but it wasn't there. So I did:
f = urllib2.urlopen(f...
I am currently trying to log into a site using Python however the site seems to be sending a cookie and a redirect statement on the same page. Python seems to be following that redirect thus preventing me from reading the cookie send by the login page. How do I prevent Python's urllib (or urllib2) urlopen from following the redirect?
...
I have a Python web client that uses urllib2. It is easy enough to add HTTP headers to my outgoing requests. I just create a dictionary of the headers I want to add, and pass it to the Request initializer.
However, other "standard" HTTP headers get added to the request as well as the custom ones I explicitly add. When I sniff the requ...
I'm trying to upload a PDF file to a website using Hot Banana's content management system using a Python script. I've successfully logged into the site and can log out, but I can't seem to get file uploads to work.
The file upload is part of a large complicated web form that submits the form data and PDF file though a POST. Using Fi...
Problem: When POSTing data with Python's urllib2, all data is URL encoded and sent as Content-Type: application/x-www-form-urlencoded. When uploading files, the Content-Type should instead be set to multipart/form-data and the contents be MIME encoded. A discussion of this problem is here:
http://code.activestate.com/recipes/146306/
To...
I am looking to download a file from a http url to a local file. The file is large enough that I want to download it and save it chunks rather than read() and write() the whole file as a single giant string.
The interface of urllib.urlretrieve is essentially what I want. However, I cannot see a way to set request headers when downloadin...