urllib2

urllib2 not retrieving entire HTTP response

I'm perplexed as to why I'm not able to download the entire contents of some JSON responses from FriendFeed using urllib2. >>> import urllib2 >>> stream = urllib2.urlopen('http://friendfeed.com/api/room/the-life-scientists/profile?format=json') >>> stream.headers['content-length'] '168928' >>> data = stream.read() >>> len(data) 61058 >>...

How to submit a form with more than 1 submit button. Sending a POST to a website. (Python)

I am creating a script using Python Mechanize that can login to a website and submit a form. However, this form has 3 submit buttons (Preview, Post, and Cancel). I'm used to only one button... This is the form: <TextControl(subject=Is this good for the holidays? Anyone know about the new tech?)> <IgnoreControl(threads=<None>)> <Tex...

Can urllib2 make HTTP/1.1 requests?

EDIT: This question is invalid. Turns out a transparent proxy was making an onward HTTP 1.0 request even though urllib/httplib was indeed making a HTTP 1.1 request originally. ORIGINAL QUESTION: By default urllib2.urlopen always makes a HTTP 1.0 request. Is there any way to get it to talk HTTP 1.1 ? ...

Urllib2 Send Post data through proxy

I have configured a proxy using proxyhandler and sent a request with some POST data: cookiejar = cookielib.CookieJar() proxies = {'http':'http://some-proxy:port/'} opener = urllib2.build_opener(urllib2.ProxyHandler(proxies),urllib2.HTTPCookieProcessor(cookiejar) ) opener.addheaders = [('User-agent', "USER AGENT")] urllib2.install_opene...

Bind different ip addresses to urllib2 object in seperate threads

The following code binds specified ip address to socket in main program globally. import socket true_socket = socket.socket def bound_socket(*a, **k): sock = true_socket(*a, **k) sock.bind((sourceIP, 0)) return sock socket.socket = bound_socket Suppose main program has 10 threads, each with a urllib2 instance running insid...

How to deal with deflated response by urllib2?

I currently use following code to decompress gzipped response by urllib2: opener = urllib2.build_opener() response = opener.open(req) data = response.read() if response.headers.get('content-encoding', '') == 'gzip': data = StringIO.StringIO(data) gzipper = gzip.GzipFile(fileobj=data) html = gzipper.read() Does it handle de...

Using paired certificates with urllib2

I need to create a secure channel between my server and a remote web service. I'll be using HTTPS with a client certificate. I'll also need to validate the certificate presented by the remote service. How can I use my own client certificate with urllib2? What will I need to do in my code to ensure that the remote certificate is corre...

Retrieve the source of a dynamic website using python (bypassing onclick)

I wish to retrieve the source of a website, that is dynamically generated upon clicking a link. The link itself is as below: <a onclick="function(); return false" href="#">Link</a> This stops me from directly querying for a URL that would allow me to get the dynamically generated website (urllib/2). How would one retrieve the source...

Will this urllib2 python code download the page of the file?

urllib2.urlopen(theurl).read() ...this downloads the file. urllib2.urlopen(theurl).geturl()...does this download the file? (how long does it take) ...

python checking for files

Hello, learning python there. I want to write a script to check if my webserver has picture named in the root 123.jpg I have: import urllib2 numeruks=100 adresiuks="http://localhost/" + str(numeruks) +".jpg" try: if numeruks < 150: numeruks = numeruks + 1 urllib2.urlopen(adresiuks).read() reading manuals all day, can't s...

help with python urllib2 import error

In my script, I've imported urrlib2 and the script was working fine. After reboot, I get the following error: File "demo.py", line 2, in <module> import urllib2 File "/usr/lib/python2.6/urllib2.py", line 92, in <module> import httplib File "/usr/lib/python2.6/httplib.py", line 78, in <module> import mimetools File "/...

python - urrlib2 request https site - getting 400 error

Hi... using the following snip of code to access a url with a post. i can get it using wget and the following: wget --post-data 'p_calling_proc=bwckschd.p_disp_dyn_sched&p_term=201010' https://spectrumssb2.memphis.edu/pls/PROD/bwckgens.p%5Fproc%5Fterm%5Fdate for some reason, i'm having an issue with my python text, in that i get a er...

python fetching multiple pages using post and cookies

Hi. Got a test site that I'm fetching. The site uses the POST method, as well as cookies. (Not sure that the cookies are critical, but i'm inclined to think they are..) The app presents a page, with a "next button" to generate the subsequent pages. I've used LiveHttpHeaders/Firefof to determine what the post data should be in the query...

Passing input hidden params through urllib2 POST request

I need to make POST request to CAS SSO server login page, and CAS login page has few input hidden params which are dynamically populated through java. I don't know how to read these hidden param values from response and pass in to CAS server. Without passing these hidden params I am not able to login. Does any one how to read input hidd...

urllib2.urlopen() vs urllib.urlopen() - urllib2 throws 404 while urllib works! WHY?

import urllib print urllib.urlopen('http://www.reefgeek.com/equipment/Controllers_&amp;_Monitors/Neptune_Systems_AquaController/Apex_Controller_&amp;_Accessories/').read() The above script works and returns the expected results while: import urllib2 print urllib2.urlopen('http://www.reefgeek.com/equipment/Controllers_&amp;_Monitors/...

get many pages with pycurl ?

I want to get many pages from a website, like curl "http://farmsubsidy.org/DE/browse?page=[0000-3603]" -o "de.#1" but get the pages' data in python, not disk files. Can someone please post pycurl code to do this, or fast urllib2 (not one-at-a-time) if that's possible, or else say "forget it, curl is faster and more robust" ? Thanks ...

Using Urlllib2.urlopen fails for binary data?

I'm using python to programatically download a zip file from a web server. Using a web browser, it's fine. I've written this (partial) script; response = urllib2.urlopen(url, data, 10) the_page = response.read() f = open(filename, 'w') f.write(the_page) f.close() The request succeeds and I get data. The problem is that the file I'm do...

How to Speed Up Python's urllib2 when doing multiple requests

I am making several http requests to a particular host using python's urllib2 library. Each time a request is made a new tcp and http connection is created which takes a noticeable amount of time. Is there any way to keep the tcp/http connection alive using urllib2? ...

Should I use urllib or urllib2?

In Python (2.5), should I use urllib or urllib2? What's the difference? They seem to do the same thing. (Bonus points, if I'm using Google App Engine, does this change the answer?) ...

Download file using urllib in Python with the wget -c feature

Hello, I am programming a software in Python to download HTTP PDF from a database. Sometimes the download stop with this message : retrieval incomplete: got only 3617232 out of 10689634 bytes How can I ask the download to restart where it stops using the 206 Partial Content HTTP feature ? I can do it using wget -c and it works pret...