views:

126

answers:

4
+2  Q: 

Site-Mining tools

Many of the questions asked here are relevant to research I'm doing. These questions and answers are widely dispersed and not always easy to find, doing manual browsing, and sometimes an insightful answer or comment occurs in unrelated topics as well.

I want to automate finding these relevant Q's & A's, based on sets of keywords, then use the information as pointers towards further in-depth research.

What tools, preferably open-source, are available that I can use for this type of site-mining? I am not a web guru & for me to try to develop them will take a long time and also impact on time I could have spent on my R&D.

A: 

Human interaction tools might be useful in such case (no development cost, probably a more consistent outcome, and evolving requirements).

Couple comes to mind:

Tamer Salama
(I always thought that doing web-mining for others would be a could business call). I am a lone, private individual without the capital resources to pay others to do this; for me it's the hard-way or no-way. :-(
slashmais
A: 

All of the tags based on keywords have RSS feeds attached to them, so I'd start by subscribing to relevant keywords and searching the data. It seems like the simplest way to find related concepts and other related keywords.

Bryan Woods
Much of the relevant info I found was unrelated to the tags on the questions; they were keywords within the texts of the answers.
slashmais
+1  A: 

Another option would be using Yahoo! Pipes. (demo)

You can build such system visually online using a combination of feed urls, filters, etc... Learning time is minimal compared to programming. [edited: tense]

Tamer Salama
slashmais
Youtube is your friendTry this one - http://www.youtube.com/watch?v=d3h6ROs__II
Tamer Salama
+1  A: 

It is not clear from your question whether you are a programmer or not, so I'm not sure whether you are after tools in the sense of apps or services that to what you want, or a library that makes site-mining easier.

If the latter is the case and you use ruby, I can thoroughly recommend WWW::Mechanize. It provides a nice API for writing scripts to search web pages (by DOM or by text), follow links, and fill out forms. I've used it several times to organise information that's spread over several web pages within a site.

I believe the ruby version was based on an earlier library for perl but I can't vouch for the perl version it I've not used it.

Mark Reid
The perl modules looks like the ticket. (I don't know ruby.) I'm going to google if someone has done what I need, else I'll write my own. Thanks, this was helpful.
slashmais