urlopen

timeout for urllib2.urlopen() in pre Python 2.6 versions

The urllib2 documentation says that timeout parameter was added in Python 2.6. Unfortunately my code base has been running on Python 2.5 and 2.4 platforms. Is there any alternate way to simulate the timeout? All I want to do is allow the code to talk the remote server for a fixed amount of time. Perhaps any alternative built-in library...

Tell urllib2 to use custom DNS

I'd like to tell urllib2.urlopen (or a custom opener) to use 127.0.0.1 (or ::1) to resolve addresses. I wouldn't change my /etc/resolv.conf, however. One possible solution is to use a tool like dnspython to query addresses and httplib to build a custom url opener. I'd prefer telling urlopen to use a custom nameserver though. Any suggest...

urllib ignore authentication requests

Hi, I'm having little trouble creating a script working with URLs. I'm using urllib.urlopen() to get content of desired URL. But some of these URLs requires authentication. And urlopen prompts me to type in my username and then password. What I need is to ignore every URL that'll require authentication, just easily skip it and continue...

Urllib's urlopen breaking on some sites (e.g. StackApps api): returns garbage results

I'm using urllib2's urlopen function to try and get a JSON result from the StackOverflow api. The code I'm using: >>> import urllib2 >>> conn = urllib2.urlopen("http://api.stackoverflow.com/0.8/users/") >>> conn.readline() The result I'm getting: '\x1f\x8b\x08\x00\x00\x00\x00\x00\x04\x00\xed\xbd\x07`\x1cI\x96%&/m\xca{\x7fJ\... I'm...

Python - Which of these is a good way to request an API?

Whenever looking at API libraries for Python, there seems to be about half of them simply using: response = urllib2.urlopen('https://www.example.com/api', data) and about half using: connection = httplib.HTTPSConnection('www.example.com/api') # ... rest omitted for simplicity I tend to think the second version is "cooler" (I'm bias...

Python - merging many url's and parsing them

Hi, Below is script that I found on forum, and it is almost exactly what I need except I need to read like 30 different url's and print them all together.I have tried few options but script just breaks. How can I merge all 30's urls, parse, and than print them out. If you can help me I would be very greatful, ty. import sys import str...

Google App Engine: upload_data fails because "target machine actively refused it" on devserver

I'm trying to upload data from a CSV to my app using the devserver: appcfg.py upload_data --config_file="DataLoader.py" --filename="data.csv" --kind=Foo --url=http://localhost:8083/remote_api "path/to/app" The result: Application: appname; version: 1. Uploading data records. [INFO ] Logging to bulkloader-log-20100626.181045 [INFO ...

Python: Urllib.urlopen nonnumeric port

for the following code theurl = "https://%s:%[email protected]/nic/update?hostname=%s&myip=%s&wildcard=NOCHG&mx=NOCHG&backmx=NOCHG" % (username, password, hostname, theip) conn = urlopen(theurl) # send the request to the url print(conn.read()) # read the response conn.close() # close the connection i get the fol...

Python: urlopen not downloading the entire site.

Greetings, I have done: import urllib site = urllib.urlopen('http://www.weather.com/weather/today/Temple+TX+76504') site_data = site.read() site.close() but it doesn't compare to viewing the source when loaded in firefox. I suspected the user agent and did this: class AppURLopener(urllib.FancyURLopener): version = "Mozilla/5.0...

Caching options in Python or speeding up urlopen

Hey all, I have a site that looks up info for the end user, is written in Python, and requires several urlopen commands. As a result it takes a bit for a page to load. I was wondering if there was a way to make it faster? Is there an easy Python way to cache or a way to make the urlopen scripts fun last? The urlopens access the Amazon ...

How can I speed up fetching pages with urllib2 in python?

I have a script that fetches several web pages and parses the info. (An example can be seen at http://bluedevilbooks.com/search/?DEPT=MATH&CLASS=103&SEC=01 ) I ran cProfile on it, and as I assumed, urlopen takes up a lot of time. Is there a way to fetch the pages faster? Or a way to fetch several pages at once? I'll do whatever...

Detecting timeout erros in Python's urllib2 urlopen

I'm still relatively new to Python, so if this is an obvious question, I apologize. My question is in regard to the urllib2 library, and it's urlopen function. Currently I'm using this to load a large amount of pages from another server (they are all on the same remote host) but the script is killed every now and then by a timeout error...

Python auth_handler not working for me

I've been reading about Python's urllib2's ability to open and read directories that are password protected, but even after looking at examples in the docs, and here on StackOverflow, I can't get my script to work. import urllib2 # Create an OpenerDirector with support for Basic HTTP Authentication... auth_handler = urllib2.HTTPBasicAut...

Does urllib2.urlopen() cache stuff?

They didn't mention this in python documentation. And recently I'm testing a website simply refreshing the site using urllib2.urlopen() to extract certain content, I notice sometimes when I update the site urllib2.urlopen() seems not get the newly added content. So I wonder it does cache stuff somewhere, right? ...

How to By pass WP super cache using python?

Hi guys. I'm trying to collecting data from a frequently updating blog, so I simply use a while loop which includes urllib2.urlopen("http:\example.com") to refresh the page every 5 minutes to collect the data I wanted. But I notice that I'm not getting the most recent content by doing this, it's different from what I see via browser su...

Why urllib2.urlopen can not open pages like "http://localhost/new-post#comment-29"?

I'm curious, how come I get 404 error running this line: urllib2.urlopen("http://localhost/new-post#comment-29") While everything works fine surfing http://localhost/new-post#comment-29 in any browser... urlopen method does not parse urls with "#" in it? Anybody knows? ...

python mechanize javascript submit button problem!

Hello All im making some script with mechanize.browser module. one of problem is all other thing is ok, but when submit() form,it not working, so i was found some suspicion source part. in the html source i was found such like following. im thinking, loginCheck(this) making problem when submit form. but how to handle this kind of...

Urllib raising invalid argument URLError in Python 3, urllib.request.urlopen

Hi. New to Python, but I'm trying to...retrieve data from a site: import urllib.request response = urllib.request.urlopen("http://www.python.org") This is the same code I've seen from the Python 3.1 docs. And a lot of sites. However, I get: Message File Name Line Position Traceback <module> G:\My...

AppEngine no host given exception

Hello. I've got a Python app, that uses urllib.urlopen. It works fine on dev_appserver.py, but throws [Errno http error] no host given error on my GAE production server. The code is exactly the same, the url, it connects to, is hardcoded. I'm out of ideas, what could be wrong. UPD: the code: def getPic(url): sock = urllib.urlopen("...

unbuffered urllib2.urlopen

I have client for web interface to long running process. I'd like to have output from that process to be displayed as it comes. Works great with urllib.urlopen(), but it doesn't have timeout parameter. On the other hand with urllib2.urlopen() the output is buffered. Is there a easy way to disable that buffer? ...