views:

139

answers:

2

I'm looking for software (preferably a free .Net library) that will accept a url and determine its genre/purpose automatically. E.g. www.cnn.com = news, www.google.com = search engine. I would imagine something like that exists, and functions by either scraping the site and analyzing its content, or by simply comparing it to a master list or something. I googled and couldn't really find anything, though.

Does anyone know of anything like this?

+4  A: 

I would imagine something like that exists...

Why would you imagine that? It's an extraordinarily hard task for a computer. Hell, it's hard enough for a human.

Your best bet would probably be buying access to a human-curated content filter like Websense.

ceejayoz
+1  A: 

There are some services which will analyze the content of a web page for subjects (opencalais,inform,openamplify), but these work out what a page is about, not the purpose of a site.

You could probably use dmoz but you would have to write your own service for it. It would only work for sites in the directory.

Jeremy French
I think Calais can work very well for me. Thanks!
Mike Pateras