Hi!
I have a small crawler/screen-scraping script that used to work half a year ago, but now, it doesnt work anymore. I checked the html and css values for the reg expression in the page source, but they are still the same, so from this point of view, it should work. Any guesses?
require "open-uri"
# output file
f = open 'results.csv'...
How does Google find relevant content when its parsing the web?
Lets say for instance, Google uses the PHP native DOM Library to parse content, What methods would they be for it to find the most relevant content on a web page.
My thoughts would be that it would search for all paragraphs, order by the length of each paragraph and then f...
I have recently come across a web page containing a graph object that displays the (x, y) values on the object as the mouse is rolled across it. Is there any way to automate the extraction of this data?
...
Hi All,
I have just begun scraping basic text off web pages, and am currently using the HTMLAgilityPack C# library. I had some success with boxscores off rivals.yahoo.com (sports is my thing so why not scrape something interesting?) but I am stuck on NHL's game summary pages. I think this is kind of an interesting problem so I would p...
Hi guys,
Looking for some guidance. I've got a requirement to get article content from specific sources that will be used for data analysis in a nutshell. So we've got to get the latest articles, and store them in our database for processing later on.
I'm not sure really sure of the best approach. Our code for current news retrieval...
BookCrossing doesn't have an API right now (it seems in the roadmap that it's planned, but with no expected date of arrival). Any ideas on how to quickly get the current location of a specific book?
...
Anyone has a PHP function that can grab all links inside a specific DIV on a remote site? So usage might be:
$links = grab_links($url,$divname);
And return an array I can use. Grabbing links I can figure out but not sure how to make it only do it within a specific div.
Thanks!
Scott
...
Hey,
I want to select all comments from a document using JSoup. I would like to do something like this:
for(Element e : doc.select("comment")) {
System.out.println(e);
}
I have tried this:
for (Element e : doc.getAllElements()) {
if (e instanceof Comment) {
}
}
But the following error occurs in eclipse "Incompatible condi...