urllib

python FancyURLopener timeout

Hi, is there a way to set connection timeout for FancyURLopener()? I'm using FancyURLopener.retrieve() to download a file, but sometimes it just stucks and that's all... I think this is because it's still trying to connect and it's not possible. So is there a way to set that timeout? Thanks for every reply ...

Python: ImportError no module named urllib

I just rented a VPS from Linode, it has python2.5 and ubuntu 8.04 When I ho to python shell python import urllib I get ImportError: No module named urllib What can be the reason? How can I add this module to python? Isn't it prepackaged with the basic version? Can it be pythonpath problem? How I can test pythonpath? ...

Python: fetching SVG file using urllib is returning binary when I need ASCII

I'm using urllib (in Python) to fetch an SVG file: import urllib urllib.urlopen('http://alpha.vectors.cloudmade.com/BC9A493B41014CAABB98F0471D759707/-122.2487,37.87588,-122.265823,37.868054?styleid=1&viewport=400x231').read() which produces output of the sort: xb6\xf6\x00\xb3\xfb2\xff\xda\xc5\xf2\xc2\x14\xef\xcd\x82\x0b\xdbU\xb0...

I/O error(socket error): [Errno 111] Connection refused

I have a program that uses urllib to periodically fetch a url, and I see intermittent errors like : I/O error(socket error): [Errno 111] Connection refused. It works 90% of the time, but the othe r10% it fails. If retry the fetch immediately after it fails, it succeeds. I'm unable to figure out why this is so. I tried to see if any p...

Python's urllib2 don't work on some sites

I found that you can't read from some sites using Python's urllib2(or urllib). An example... urllib2.urlopen("http://www.dafont.com/").read() # Returns '' These sites works when you visit the site. I can even scrap them using PHP(didn't try other languages). I have seen other sites with the same issue - but can't remember the URL at t...

Retrieving information with Python's urllib from a page that is done via __doPostBack()?

I'm trying to parse a page that has different sections that are loaded with a Javascript __doPostBack() function. An example of a link is: javascript:__doPostBack('ctl00$cphMain$ucOemSchPicker$dlSch$ctl03$btnSch','') As soon as this is clicked, the browser doesn't fetch a new URL but a section of webpage is updated to reflect new info...

How to ignore windows proxy settings with python urllib?

I want Python to ignore Windows proxy settings when using urllib. The only way I managed to do that was disabling all proxy settings on Internet Explorer. Is there any programmatic way? os.environ['no_proxy'] is not a good option, since I'd like to avoid proxy for all addresses. ...

Inexpensive ways to add seek to a filetype object

PdfFileReader reads the content from a pdf file to create an object. I am querying the pdf from a cdn via urllib.urlopen(), this provides me a file like object, which has no seek. PdfFileReader, however uses seek. What is the simple way to create a PdfFileReader object from a pdf downloaded via url. Now, what can I do to avoid writing...

Python urllib.urlopen IOError

So I have the following lines of code in a function sock = urllib.urlopen(url) html = sock.read() sock.close() and they work fine when I call the function by hand. However, when I call the function in a loop (using the same urls as earlier) I get the following error: > Traceback (most recent call last): File "./headlines.py", lin...

Log into Launchpad from python script

How can I log into my Launchpad account in a python script? Any sample code would be appreciated. The login url is https://launchpad.net/+login and then redirect to something like https://login.launchpad.net/fJLVSRbxPfKTpVDr/+decide Thanks in advance! ...

cant download youtube video

I'm having trouble retrieving the youtube video automatically, heres the code. The problem is the last part. download = urllib.request.urlopen(download_url).read() # Youtube video download script # 10n1z3d[at]w[dot]cn import urllib.request import sys print("\n--------------------------") print (" Youtube Video ...

Urllib and concurrency - Python

Hi there, I'm serving a python script through WSGI. The script accesses a web resource through urllib, computes the resource and then returns a value. Problem is that urllib doesn't seem to handle many concurrent requests to a precise URL. As soon as the requests go up to 30 concurrent request, the requests slow to a crawl! :( Help...

Performing urlopen from various IP addresses - Python

Hi folks, is it possible to perform urlopen requests from different IP addresses? ...

Handling urllib2's timeout? - Python

Hi folks, I'm using the timeout parameter within the urllib2's urlopen. urllib2.urlopen('http://www.example.org', timeout=1) How do I tell Python that if the timeout expires a custom error should be raised? Any ideas? ...

Use Twisted's getPage as urlopen?

Hi folks, I would like to use Twisted non-blocking getPage method within a webapp, but it feels quite complicated to use such function compared to urlopen. This is an example of what I'm trying to achive: def web_request(request): response = urllib.urlopen('http://www.example.org') return HttpResponse(len(response.read())) I...

making urllib request in Python from the client side

Hi Guys, I've written a Python application that makes web requests using the urllib2 library after which it scrapes the data. I could deploy this as a web application which means all urllib2 requests go through my web-server. This leads to the danger of the server's IP being banned due to the high number of web requests for many users. ...

Difference between Python urllib.urlretrieve() and wget

I am trying to retrieve a 500mb file using Python, and I have a script which uses urllib.urlretrieve(). There seems to some network problem between me and the download site, as this call consistently hangs and fails to complete. However, using wget to retrieve the file tends to work without problems. What is the difference between urlret...

Python download without supplying a filename

How do I download a file with progress report using python but without supplying a filename. I have tried urllib.urlretrieve but I seem to have to supply a filename for the downloaded file to save as. So for example: I don't want to supply this: urllib.urlretrieve("http://www.mozilla.com/products/download.html?product=firefox-3.6.3&a...

urllib and proxies

Hello, I need in using Tor+Privoxy with my python-script. proxies = { 'http' : '127.0.0.1:8118', 'ssl' : '127.0.0.1:8118', 'socks' : '127.0.0.1:9050' } The first question: is the 'socks' name right? Maybe there should be something like 'socks5'? The next step is that I should pass user-agent string with this proxies to...

Trouble with encoding and urllib

Hello, I'm loading web-page using urllib. Ther eis russian symbols, but page encoding is 'utf-8' 1 pageData = unicode(requestHandler.read()).decode('utf-8') UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 262: ordinal not in range(128) 2 pageData = requestHandler.read() soupHandler = BeautifulSoup(pageData) pri...