questions about urllib2 | ansaurus

urllib2

Help with Strange Python scraping error. HTTPError with one machine while it works on others.

I am using a proxy and following is the code. 20 req = urllib2.Request(url) 21 # run the request for each proxy 22 # now set the proxy 23 req.set_proxy(proxy, "http") 24 req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3') 25 req.add_hea...

screen-scraping

getting Connection reset by peer error using urllib2

Hi, I'm getting this error: socket.error: [Errno 54] Connection reset by peer All I'm trying to do is the following in python: data = urllib.urlencode(values) req = urllib2.Request(url, data) response = urllib2.urlopen(req) id = response.read() Some previous related questions suggested using time.sleep to fiddle with the threads. ...

testing urllib2 application, http responses loaded from files

My python application makes many http requests to many urls using urllib2. I would like to build a unit test suite to test my data parsing and error handling code. I have a directory full of test data, with a number of files, each file containing a single http response, with headers and response data. (using curl -i) In some cases, t...

Bandwidth test, delay test using urllib2

I want to make a python script that tests the bandwidth of a connection. I am thinking of downloading/uploading a file of a known size using urllib2, and measuring the time it takes to perform this task. I would also like to measure the delay to a given IP address, such as is given by pinging the IP. Is this possible using urllib2? ...

urllib2 and json

Hi, can anyone point out a tutorial that shows me how to do a POST request using urllib2 with the data being in JSON format? ...

python: urllib2 how to send cookie with urlopen request

Hi, I am trying to use urllib2 to open url and to send specific cookie text to the server. E.g. I want to open site Solve chess problems, with a specific cookie, e.g. search=1. How do I do it? I am trying to do the following: import urllib2 (need to add cookie to the request somehow) urllib2.urlopen("http://chess-problems.prg") Tha...

Python - urllib2.urlopen - Why do I get garbled characters?

Here's my problem: import urllib2 response=urllib2.urlopen('http://proxy-heaven.blogspot.com/') html=response.read() print html It's just this site, and I don't know why the result is all garbled characters. Anyone can help? ...

Python's `urllib2`: Why do I get error 403 when I `urlopen` a Wikipedia page?

I have a strange bug when trying to urlopen a certain page from Wikipedia. This is the page: http://en.wikipedia.org/wiki/OpenCola_(drink) This is the shell session: >>> f = urllib2.urlopen('http://en.wikipedia.org/wiki/OpenCola_(drink)') Traceback (most recent call last): File "C:\Program Files\Wing IDE 4.0\src\debug\tserver\_sandb...

How do i declare a timeout using urllib2 on Google App Engine?

I'm aware that urllib2 is available on Google App Engine as a wrapper of Urlfetch and, as you know, Universal Feedparser uses urllib2. Do you know any method to set a timeout on urllib2 ? Is timeout parameter on urllib2 been ported on Google App Engine version? I'm not interested in method like: rssurldata = urlfetch(rssurl, deadline=...

google-app-engine

Python urllib2: How to ignore HTTPError 401

Hello, I want to access a web page with urllib2 and I keep getting an HTTP Error 401: Unauthorized. Now, my problem is that this page doesn't need any authentication when using browsers like Firefox. Only when I use Google Chrome an authentication dialog pops up. Though this happens only after the page is fully loaded. So I can just ca...

Using urllib2 for posting data, following redirects and maintaining cookies

I am using urllib2 in Python to post login data to a web site. After successful login, the site redirects my request to another page. Can someone provide a simple code sample on how to do this in Python with urllib2? I guess I will need cookies also to be logged in when I get redirected to another page. Right? Thanks a lot in advace. ...

How can I create a session-local cookie-aware HTTP client in Django?

I'm using a web service backend to provide authentication to Django, and the get_user method must retain a cookie provided by the web service in order to associate with a session. Right now, I make my remote calls just by calling urllib2.urlopen(myTargetService) but this doesn't pass the cookie for the current session along. I have crea...

"post" method to communicate directly with a server

Just started with python not long ago, and I'm learning to use "post" method to communicate directly with a server. A fun script I'm working on right now is to post comments on wordpress. The script does post comments on my local site, but I don't know why it raises HTTP Error 404 which means page not found. Here's my code, please help m...

Problem making a GET request and spoof User-Agent in urllib2

Hello, With this code, urllib2 make a GET request: #!/usr/bin/python import urllib2 req = urllib2.Request('http://www.google.fr') req.add_header('User-Agent', '') response = urllib2.urlopen(req) With this one (which is almost the same), a POST request: #!/usr/bin/python import urllib2 headers = { 'User-Agent' : '' } req = urllib2.Re...

best way to download large files with python

Which library/module is the best to use for downloading large 500mb+ files in terms of speed, memory, cpu? I was also contemplating using pycurl. ...

python url regexp

Hi all. I have a regexp and i want to add output of regexp to my url for exmaple url = 'blabla.com' r = re.findall(r'<p>(.*?</a>)) r output - /any_string/on/any/server/ but a dont know how to make get-request with regexp output blabla.com/any_string/on/any/server/ ...

Python/Django download Image from URL, modify, and save to ImageField

Hi all, I've been looking for a way to download an image from a URL, preform some image manipulations (resize) actions on it, and then save it to a django ImageField. Using the two great posts (linked below), I have been able to download and save an image to an ImageField. However, I've been having some trouble manipulating the file ...

Python urllib2 URLError http status code.

Hello, I want to grab the http status code once it raises a URLError exception: I tried this but didn't help: except URLError, e: logger.warning( 'It seems like the server is down. Code:' + str(e.code) ) ...

Following a Javascript link using urllib2

I am scraping a website that has a Javascript next link that looks like this <a href="javascript:__doPostBack('DataGrid1$ctl14$ctl02','')">2</a> .The page is written in aspx. Is it possible to call that, to get the information on the next page? Here is the page, http://www.deantechnology.com/hvca/pg_search/fsn.aspx?catalog_sspid=212&am...

Intermittent DownloadError Application Error 2 on Google App Engine

We have two applications that are both running on Google App Engine. App1 makes requests to app2 as an authenticated user. The authentication works by requesting an authentication token from Google ClientLogin that is exchanged for a cookie. The cookie is then used for subsequent requests (as described here). App1 runs the following code...

google-app-engine

1
...
9
10
11
12
13