I am working on an application where I would like to retrieve a list of the day's top news stories from some source (such as the BBC) and parse these for keywords that I can use against my own tag data. There are obviously lots of webservices and APIs out there - but what would you suggest as good routes to take.
One thing I was considering is periodically downloading the RSS feed of BBC News and parsing the content using the Yahoo term extractor. This seems like a good solution to me, but the term extractor is for non-commercial use only and my application is commercial.
YQL looks promising but I'm not sure how easy it will be to condense the data down to keywords.
All suggestions welcome, both for the news source and the keyword/tag extraction, and for both commercial and non-commercial uses.
Update:
Building on the suggestion of an answer, here's the YQL for grabbing the keywords from the top UK news stores on the BBC:
select content
from search.termextract
where context in (
select title
from rss
where url='http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml'
)
which returns something like:
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="46" yahoo:created="2009-11-13T11:49:05Z" yahoo:lang="en-US" yahoo:updated="2009-11-13T11:49:05Z" yahoo:uri="http://query.yahooapis.com/v1/yql?q=select+content+from+search.termextract+where+context+in+%28select+title+from+rss+where+url%3D%27http%3A%2F%2Fnewsrss.bbc.co.uk%2Frss%2Fnewsonline_uk_edition%2Ffront_page%2Frss.xml%27+%29">
<results>
<Result xmlns="urn:yahoo:cate">new york</Result>
<Result xmlns="urn:yahoo:cate">bolt gun</Result>
<Result xmlns="urn:yahoo:cate">stalker</Result>
<Result xmlns="urn:yahoo:cate">russia</Result>
<Result xmlns="urn:yahoo:cate">moon</Result>
<Result xmlns="urn:yahoo:cate">hijack</Result>
<Result xmlns="urn:yahoo:cate">yacht</Result>
<Result xmlns="urn:yahoo:cate">balloon</Result>
<Result xmlns="urn:yahoo:cate">parents</Result>
<Result xmlns="urn:yahoo:cate">bruce forsyth</Result>
<Result xmlns="urn:yahoo:cate">flu</Result>
Ultimately though, I don't think I can use this within a commercial app though due to the restrictions on the term extraction service.