tags:

views:

498

answers:

2

So I have the following lines of code in a function

sock = urllib.urlopen(url)
html = sock.read()
sock.close()

and they work fine when I call the function by hand. However, when I call the function in a loop (using the same urls as earlier) I get the following error:

> Traceback (most recent call last):
  File "./headlines.py", line 256, in <module>
    main(argv[1:])
  File "./headlines.py", line 37, in main
    write_articles(headline, output_folder + "articles_" + term +"/")
  File "./headlines.py", line 232, in write_articles
    print get_blogs(headline, 5)
  File "/Users/michaelnussbaum08/Documents/College/Sophmore_Year/Quarter_2/Innovation/Headlines/_code/get_content.py", line 41, in get_blogs
    sock = urllib.urlopen(url)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 87, in urlopen
    return opener.open(url)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 203, in open
    return getattr(self, name)(url)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 314, in open_http
    if not host: raise IOError, ('http error', 'no host given')
IOError: [Errno http error] no host given

Any ideas?

Edit more code:

def get_blogs(term, num_results):
    search_term = term.replace(" ", "+")
    print "search_term: " + search_term
    url = 'http://blogsearch.google.com/blogsearch_feeds?hl=en&amp;q='+search_term+'&amp;ie=utf-8&amp;num=10&amp;output=rss'
    print "url: " +url  

    #error occurs on line below

    sock = urllib.urlopen(url)
    html = sock.read()
    sock.close()

def write_articles(headline, output_folder, num_articles=5):

    #calls get_blogs

    if not os.path.exists(output_folder):
    os.makedirs(output_folder)

    output_file = output_folder+headline.strip("\n")+".txt"
    f = open(output_file, 'a')
    articles = get_articles(headline, num_articles)
    blogs = get_blogs(headline, num_articles)


    #NEW FUNCTION
    #the loop that calls write_articles
    for term in trend_list: 
        if do_find_max == True:
        fill_search_term(term, output_folder)
    headlines = headline_process(term, output_folder, max_headlines, do_find_max)
    for headline in headlines:
    try:
        write_articles(headline, output_folder + "articles_" + term +"/")
    except UnicodeEncodeError:
        pass
A: 

In your function's loop, right before the call to urlopen, perhaps put a print statement:

print(url)
sock = urllib.urlopen(url)

This way, when you run the script and get the IOError, you will see the url which is causing the problem. The error "no host given" can be replicated if url equals something like 'http://'...

unutbu
Michael
Can you update the question so we can see the loop?
Eddy Pronk
So does it always crash on the same query or difference ones?Are you behind a web proxy?
MK
+1  A: 

use urllib2 instead if you don't want to handle reading on a per block basis yourself. This probably does what you expect.

import urllib2
req = urllib2.Request(url='http://stackoverflow.com/')
f = urllib2.urlopen(req)
print f.read()
Eddy Pronk
Good idea but no luck, I get "urllib2.URLError: <urlopen error no host given>", both errors are saying no "no host", but I don't know why...
Michael