I use the following code to stream large files from the Internet into a local file:
fp = open(file, 'wb')
req = urllib2.urlopen(url)
for line in req:
fp.write(line)
fp.close()
This works but it downloads quite slowly. Is there a faster way? (The files are large so I don't want to keep them in memory.)
...
Hi, I've got a list of 100 websites in CSV format. All of the sites have the same general format, including a large table with 7 columns. I wrote this script to extract the data from the 7th column of each of the websites and then write this data to file. The script below partially works, however: opening the output file (after running ...
the html that I am receiving from urllib2 is missing dozens of fields of data that I can see when I view the source of the URL in Firefox. Any advice would be much appreciated. Here is what it looks like:
from FireFox view source:
# ...<td class=td6>as</td></tr></thead>|ManyFields|<br></div><div id="c1">...
from urllib2 return html...
I want to get to an authenticated page using urllib2. I'm hoping there's a hack to do it directly. something like:
urllib2.urlopen('http://username:pwd@server/page')
If not, how do I use authentication?
...
I want to use python's urllib2 with authentication and I need the realm and uri of a url. How do I get it?
thanks
...
I have a client that connects to an HTTP stream and logs the text data it consumes.
I send the streaming server an HTTP GET request... The server replies and continuously publishes data... It will either publish text or send a ping (text) message regularly... and will never close the connection.
I need to read and log the data it c...
For some reason I'm getting a Trace/BPT trap error when calling urllib.urlopen. I've tried both urllib and urllib2 with identical results. Here is the code which throws the error:
def get_url(url):
from urllib2 import urlopen
if not url or not url.startswith('http://'): return None
return urlopen(url).read() # FIXME!
I sho...
And if it is large...then stop the download?
I don't want to download files that are larger than 12MB.
request = urllib2.Request(ep_url)
request.add_header('User-Agent',random.choice(agents))
thefile = urllib2.urlopen(request).read()
...
I'm trying to use urlretrieve to download files from urls that take the form:
http://example.com/download.php?id=6456&name=foo
yet for some reason I just get an empty response.
I've tried the method suggested in this question didn't seem to help because
remotefile.info()
doesn't contain the key 'content-disposition', only
['...
I imagine this must have a simple answer, but I am struggling: I want to take a url (which outputs json) and get the data in a usable dictionary in python. I am stuck on the last step.
>>> import urllib2
>>> import simplejson
>>> req = urllib2.Request("http://vimeo.com/api/v2/video/38356.json", None, {'user-agent':'syncstream/vimeo'})
...
I was trying to use http://www.jongsma.org/gc/scripts/ofx-ba.py to grab my bank account information from wachovia. Having no luck, I decided that I would just try to manually construct some request data using this example
So, I have this file that I want to use as the request data. Let's call it req.ofxsgml:
FXHEADER:100
DATA:OFXSGML
...
I want to fetch the title of a webpage which I open using urllib2. What is the best way to do this, to parse the html and find what I need (for now only the -tag but might need more in the future).
Is there a good parsing lib for this purpose?
...
Hi everyone.
For example, I have cookies
my_cookies = {'name': 'Albert', 'uid': '654897897564'}
and I want to open page http://website.com
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
opener.addheaders.append(('User-agent', 'Mozilla/5.0 (compatible)'))
opener.open('http://website.com').read()
How I can do this with...
A normal urllib2 works fine:
>>> import urllib2
>>> r = urllib2.urlopen(u"http://bit.ly/4ovTZw")
>>> r.geturl()
'http://www.writing.com/main/handler/action/show_document/item_id/933413.mp3'
>>> r.headers.get("Content-Type")
'audio/mpeg'
But in appengine, the same code shows text/html.
def get(self):
r = urllib2.urlopen(u"http://b...
I currently use mechanize to read gzipped web page as below:
br = mechanize.Browser()
br.set_handle_gzip(True)
response = br.open(url)
data = response.read()
I wonder how to decompress gzipped data fetched by urllib2 to HTML text?
req = urllib2.Request(url)
opener = urllib2.build_opener()
response = opener.open(req)
data = response.r...
The issue stems from the OAuth authentication portion of my code. I truncated a bunch of it and cut at the part where I get my error. My specific error is "gaierror: (11001, 'getaddrinfo failed'". I really have no idea why. I'm using Leah Culver's OAuth library (http://oauth.googlecode.com/svn/code/python/oauth/). Pretty much following t...
I am trying to use Python to write a client that connects to a custom http server that uses digest authentication. I can connect and pull the first request without problem. Using TCPDUMP (I am on MAC OS X--I am both a MAC and a Python noob) I can see the first request is actually two http requests, as you would expect if you are famili...
Like the title says, my code basically does this:
set proxy, test proxy, do some cool stuff
But after the proxy is set the first time, it sticks that way, never changing. This is the failing code:
# Pick proxy
r = random.randint(0, len(proxies) - 1)
proxy = proxies[r]
print proxy
# Setup proxy
l_proxy_support ...
The Objective: A script which cycles through a list of proxies and sends a post request, containing a file to a PHP page on my server, which then calculates delivery time. It's a pretty useless script, but I am using it to teach myself about urllib2.
The Problem: So far I have got multipart/form-data sending correctly using Poster, but ...
theurl = 'http://bit.ly/6IcCtf/'
urlReq = urllib2.Request(theurl)
urlReq.add_header('User-Agent',random.choice(agents))
urlResponse = urllib2.urlopen(urlReq)
htmlSource = urlResponse.read()
if unicode == 1:
#print urlResponse.headers['content-type']
#encoding=urlResponse.headers['content-type'].split('charset=')[-1]
#htmlSour...