I would like any advice on how to create and visualize a link map between blogs so to reflect the "social network" between them.
Here is how I am thinking of doing it:
- Start with one (or more) blog home page and collect all the links on that page
- Remove all the links that are internal links (that is, If I start from www.website.com. Then I want to remove all the links from the shape "www.website.com/*"). But store all the external links.
- Go to each of these links (assuming you haven't visited them already), and repeat step 1.
- Continue until (let's say) X jumps from the first page.
- Plot the data collected.
I imagine that in order to do this in R, one would use RCurl/XML (Thanks Shane for your answer here), combined with something like igraph.
But since I don't have experience with neither of them, is there someone here that might be willing to correct me if I missed any important step, or attach any useful snippet of code to allow this task ?
p.s: My motivation for this question is that in a week I am giving a talk on useR 2010 on "blogging and R", and I thought this might be a nice way to both give something fun to the audience and also motivate them to do something like this themselves.
Thanks a lot!
Tal