views:

318

answers:

3

Does anyone know if there is some parameter available for programmatic search on yahoo allowing to restrict results so only links to files of specific type will be returned (like PDF for example)? It's possible to do that in GUI, but how to make it happen through API?

I'd very much appreciate a sample code in Python, but any other solutions might be helpful as well.

A: 

Thank you. I found myself that something like this works OK (file type is the first argument, and query is the second):

format = sys.argv[1]

query = " ".join(sys.argv[2:])

srch = create_search("Web", app_id, query=query, format=format)

A: 

Here's what I do for this sort of thing. It exposes more of the parameters so you can tune it to your needs. This should print out the first ten PDFs URLs from the query "resume" [mine's not one of them ;) ]. You can download those URLs however you like.

The json dictionary that gets returned from the query is a little gross, but this should get you started. Be aware that in real code you will need to check whether some of the keys in the dictionary exist. When there are no results, this code will probably throw an exception.

The link that Tiago provided is good for knowing what values are supported for the "type" parameter.

from yos.crawl import rest
APPID="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
base_url = "http://boss.yahooapis.com/ysearch/%s/v%d/%s?start=%d&count=%d&type=%s" + "&appid=" + APPID
querystr="resume"
start=0
count=10
type="pdf"
search_url = base_url % ("web", 1, querystr, start, count, type)
json_result = rest.load_json(search_url)
for url in [recs['url'] for recs in json_result['ysearchresponse']['resultset_web']]:
    print url
Owen