questions about urllib2 | ansaurus

urllib2

Downloading a picture via urllib and python.

So I'm trying to make a Python script that downloads webcomics and puts them in a folder on my desktop. I've found a few similar programs on here that do something similar, but nothing quite like what I need. The one that I found most similar is right here (http://bytes.com/topic/python/answers/850927-problem-using-urllib-download-imag...

How to make a POST request with python-webkit?

Hi, I new using python + webkit. I need make a POST request with webkit, but I dont know how to it. I use python-webkit because my app load a form on the GUI (for vote, comments and send more data) and I need post all these data with a POST request and load the html result send for the server to my GUI app with python-webkit. I have o...

how can I get href links from html code

Hello, import urllib2 website = "WEBSITE" openwebsite = urllib2.urlopen(website) html = getwebsite.read() print html so far so good. But i want only href links from the plain text html. How can i solve this problem? ...

Python urllib2 HTTPBasicAuthHandler

Here is the code: import urllib2 as URL def get_unread_msgs(user, passwd): auth = URL.HTTPBasicAuthHandler() auth.add_password( realm='New mail feed', uri='https://mail.google.com', user='%s'%user, passwd=passwd ) opener = URL.build_opener(auth) URL.install_ope...

How to POST an xml element in python

Basically I have this xml element (xml.etree.ElementTree) and I want to POST it to a url. Currently I'm doing something like xml_string = xml.etree.ElementTree.tostring(my_element) data = urllib.urlencode({'xml': xml_string}) response = urllib2.urlopen(url, data) I'm pretty sure that works and all, but was wondering if there is some ...

urllib2 and cookielib thread safety

As far as I've been able to tell cookielib isnt thread safe; but then again the post stating so is five years old, so it might be wrong. Nevertheless, I've been wondering - If I spawn a class like this: class Acc: jar = cookielib.CookieJar() cookie = urllib2.HTTPCookieProcessor(jar) opener = urllib2.build_opener(coo...

Using urllib2 with Jython 2.2

I'm working with a product that has a built-in Jython 2.2 instance. It comes with none of the Python standard libraries. When I run this instance of Jython, the default path is ['./run/Jython/Lib', './run/Jython', '__classpath__'] I added all of the .py module files from Python 2.2 to the ./run/Jython/Lib directory, and I am able t...

Make urllib retry multiple times

My Python application makes a lot of HTTP requests using the urllib2 module. This application might be used over very unreliable networks where latencies could be low and dropped packets and network timeouts might be very common. Is is possible to override a part of the urllib2 module so that each request is retried an X number of times ...

Mocking urllib2.urlopen and lxml.etree.parse using pymox

I'm trying to test some python code that uses urllib2 and lxml. I've seen several blog posts and stack overflow posts where people want to test exceptions being thrown, with urllib2. I haven't seen examples testing successful calls. Am I going down the correct path? Does anyone have a suggestion for getting this to work? Here is what...

Python 2.6: parallel parsing with urllib2

Hi there, I'm currently retrieving and parsing pages from a website using urllib2. However, there are many of them (more than 1000), and processing them sequentially is painfully slow. I was hoping there was a way to retrieve and parse pages in a parallel fashion. If that's a good idea, is it possible, and how do I do it? Also, what...

Authenticated HTTP POST with XML payload using Python urllib2

I'm trying to send a POST message with a purely XML payload (I think) using urllib2 in IronPython. However, everytime I send it, it returns Error code 400 (Bad Request). I'm actually trying to mimick a Boxee remove queue item call for which the actual data packets looks like this (from WireShark): POST /action/add HTTP/1.1 User-Agent:...

How to debug socket error

Hi! I've this code: 1 upload_odl function import os import urllib2_files import urllib2 user = 'patrick' password = 'mypass' url = 'http://localhost:8000/api/odl/' password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm() password_manager.add_password( None, url, user, password ) auth_handler = urllib2.HTTPBasicAuthHandler(p...

django views urllib2.py https error twilio api

I'm looking to send an SMS with the Twilio api, but I'm getting the following error: "unknown url type: https" I've recompiled python with Openssl, so my code runs fine from the python interpretor, but whenever I try to run it in one of my django views I get this error. Here is my code from my view: def send_sms(request): recipient ...

Help with HTML parsing and sending requests to a web server

Hello, I'm working on a small project and I've run into a small problem. The script I have needs to fetch a website and find a specific value in the source HTML file. The value is like this: id='elementID'> <fieldset> <input type='hidden' name='hash' value='e46c945fe32a3' /> </fieldset> Now I'm been trying to use the Elemen...

catch specific HTTP error in python

I want to catch a specific http error and not any one of the entire family.. what I was trying to do is -- import urllib2 try: urllib2.urlopen("some url") except urllib2.HTTPError: <whatever> but what I end up is catching any kind of http error, but I want to catch only if the specified webpage doesn't exist!! probably that's HT...

urllib2 connection timed out error

I am trying to open a page using urllib2 but i keep getting connection timed out errors. The line which i am using is: f = urllib2.urlopen(url) exact error is: URLError: <urlopen error [Errno 110] Connection timed out> ...

BeautifulSoup get innerhtml data

I am trying to read data from a website. I can see the value I need but the value does not appear in the downloaded html code (using urllib2). The value is created by some js file and embedded into the webpage as innerhtml for that id. PS: How can that be extracted? raw source code cannot render js unlike the browsers! ...

urllib2 returns a different page the bowser does?

I'm trying to scrape a page (my router's admin page) but the device seems to be serving a different page to urllib2 than to my browser. has anyone found this before? How can I get around it? this the code I'm using: >>> from BeautifulSoup import BeautifulSoup >>> import urllib2 >>> page = urllib2.urlopen("http://192.168.1.254/index.cgi...

screen-scraping

Checking whether a link is dead or not using Python without downloading the webpage

For those who know wget, it has a option --spider, which allows one to check whether a link is broke or not, without actually downloading the webpage. I would like to do the same thing in Python. My problem is that I have a list of 100'000 links I want to check, at most once a day, and at least once a week. In any case this will generate...

Python URLLib / URLLib2 POST

I'm trying to create a super-simplistic Virtual In / Out Board using wx/Python. I've got the following code in place for one of my requests to the server where I'll be storing the data: data = urllib.urlencode({'q': 'Status'}) u = urllib2.urlopen('http://myserver/inout-tracker', data) for line in u.readlines(): print line Nothing...

1
...
7
8
9
10
11
...
13