Hello all,
I wish to scrape the home page of one of the new stackexchange websites: http://webapps.stackexchange.com/ (just once, and for only several pages, nothing that should bother the servers). If I had wanted it from stackoverflow, I know there is a database dump, but for the new stackexchange, they don't exist yet.
Here is what I want to do.
Step 1: choose URL
URL <- "http://webapps.stackexchange.com/"
Step 2: read the table
readHTMLTable(URL) # oops, doesn't work - gives NULL
Step 2: this time, let's try it with XML
htmlTreeParse(URL) # o.k, this reads the data - but it is all in <div> - now what?
So I was able to read the page, but now the structure is in divs. How can it now be used to create the same thing as readHTMLTable ?