urllib2

Python urllib2 Progress Hook

I am trying to create a download progress bar in python using the urllib2 http client. I've looked through the API (and on google) and it seems that urllib2 does not allow you to register progress hooks. However the older deprecated urllib does have this functionality. Does anyone know how to create a progress bar or reporting hook usin...

Python fails Tor check using urllib2 to initiate requests

After reading through the other questions on StackOverflow, I got a snippet of Python code that is able to make requests through a Tor proxy: import urllib2 proxy = urllib2.ProxyHandler({'http':'127.0.0.1:8118'}) opener = urllib2.build_opener(proxy) print opener.open('https://check.torproject.org/').read() Since Tor works fine in Fir...

timeout for urllib2.urlopen() in pre Python 2.6 versions

The urllib2 documentation says that timeout parameter was added in Python 2.6. Unfortunately my code base has been running on Python 2.5 and 2.4 platforms. Is there any alternate way to simulate the timeout? All I want to do is allow the code to talk the remote server for a fixed amount of time. Perhaps any alternative built-in library...

Fetch certain .html files from web server

I would like to fetch certain .html files from a web server. My intention is to fetch .html files from a web site (http://www.thetabworld.com/) that has a word "metallica" on file name. How is that possible using python? I have heard about urllib2 but as a python noob, I don't have a slightest idea how to use it. ...

Python: urllib2 or Pycurl?

I have extensive experience with PHP cURL but for the last few months I've been coding primarily in Java, utilizing the HttpClient library. My new project requires me to use Python, once again putting me at the crossroads of seemingly comparable libraries: pycurl and urllib2. Putting aside my previous experience with PHP cURL, what is ...

Python Process blocked by urllib2

I set up a process that read a queue for incoming urls to download but when urllib2 open a connection the system hangs. import urllib2, multiprocessing from threading import Thread from Queue import Queue from multiprocessing import Queue as ProcessQueue, Process def download(url): """Download a page from an url. url [str]: url...

urllib2 times out but doesn't close socket connection

I'm making a python URL grabber program. For my purposes, I want it to time out really really fast, so I'm doing urllib2.urlopen("http://.../", timeout=2) Of course it times out correctly as it should. However, it doesn't bother to close the connection to the server, so the server thinks the client is still connected. How can I ask url...

Python image processing of picture directly from the web

Hello, I am writing python code to take an image from the web and calculate the standard deviation, ... and do other image processing with it. I have the following code: from scipy import ndimage from urllib2 import urlopen from urllib import urlretrieve import urllib2 import Image import ImageFilter def imagesd(imagelist...

Timeout when using urllib2.urlopen with Django in GAE

When I run this code url = ('http://maps.google.com/maps/nav?'+ 'q=from%3A'+from_address+ '+to%3A'+to_address+ '&output=json&oe=utf8&key='+api_key) request = urllib2.Request(url) response = urllib2.urlopen(request) In a simple view in Django running in google app engine via the Google App Engine Helper for Django ...

Which is the best python library to make REST request like PUT, GET, DELETE, POST and how ?

Hi, I am bit confuse over set of libraries of pythons to connect with REST enabled web services. I have tried httplib, urllib and urllib2. I want to know how can methods like PUT, GET, POST, DELETE can be achieved using this library. Regards, Parthiv ...

Python: urllib2.urlopen(url, data) Why do you have to urllib.urlencode() the data?

I thought that a post sent all the information in HTTP headers when you used post (I'm not well informed on this subject obviously), so I'm confused why you have to urlencode() the data to a key=value&key2=value2 format. How does that formatting come into play when using POST?: # Fail data = {'name': 'John Smith'} urllib2.urlopen(foo_ur...

Overriding urllib2 HTTPError and reading response HTML anyway

I am trying to screen scrape multiple pages of a website, that return an 'HTTP Error 500: Internal Server Error' response, but still give important data inside the error HTML. Normally, I would fetch a page using this (Python 2.6.4): import urllib2 url = "http://google.com" data = urllib2.urlopen(url) data = data.read() But when atte...

Tell urllib2 to use custom DNS

I'd like to tell urllib2.urlopen (or a custom opener) to use 127.0.0.1 (or ::1) to resolve addresses. I wouldn't change my /etc/resolv.conf, however. One possible solution is to use a tool like dnspython to query addresses and httplib to build a custom url opener. I'd prefer telling urlopen to use a custom nameserver though. Any suggest...

Making super() work in Python's urllib2.Request

This afternoon I spent several hours trying to find a bug in my custom extension to urllib2.Request. The problem was, as I found out, the usage of super(ExtendedRequest, self), since urllib2.Request is (I'm on Python 2.5) still an old style class, where the use of super() is not possible. The most obvious way to create a new class with ...

how to login to multiple website accounts concurrently with Python

I am using urllib2 and HTTPCookieProcessor to login to a website. I want to login to multiple accounts concurrently and store the cookies to be reused later. Can you recommend an approach or library to achieve this? ...

How do I unit test a module that relies on urllib2?

I've got a piece of code that I can't figure out how to unit test! The module pulls content from external XML feeds (twitter, flickr, youtube, etc.) with urllib2. Here's some pseudo-code for it: params = (url, urlencode(data),) if data else (url,) req = Request(*params) response = urlopen(req) #check headers, content-length, etc... #par...

Python: Getting INVALID response from PayPal's Sandbox IPN, slowly going insane...

...

How can I use a SOCKS 4/5 proxy with urllib2 ?

How can I use a SOCKS 4/5 proxy with urllib2 to download a web page? ...

how to follow meta refreshes in Python

Python's urllib2 follows 3xx redirects to get the final content. Is there a way to make urllib2 (or some other library such as httplib2) also follow meta refreshes? Or do I need to parse the HTML manually for the refresh meta tags? ...

how to save the output to a text file for a python script?

I'm trying to make it so this script from BeautifulSoup import BeautifulSoup import sys, re, urllib2 import codecs html_str = urllib2.urlopen(URL).read() soup = BeautifulSoup(html_str) for row in soup.findAll("tr"): for col in row.findAll(re.compile("td|th")): for sys.stdout.write((col.string if col.string else '') + '|')...