web-scraping

Error in using Python/mechanize select_form()?

Hello, I am trying to scrap some data from a website. The scripts I am trying to write, should get the content of the page: http://www.atpworldtour.com/Rankings/Singles.aspx Should simulate the user going trough every option for Additional Standings and the dates and simulate clicking on Go then after fetching the data should use the...

Voting on Hacker News stories programmatically?

I decided to write an app like: http://michaelgrinich.com/hackernews/ but for Android devices, my idea will use a web application backend (because I rather code in Python and for the web than completely in Java for Android devices). What I have right now implemented is something like this: $ curl -i http://localhost:8080/stories.json?p...

How does Cell Minute Tracker work?

It's been a mystery how does Cell Minute Tracker manage to fetch AT&T users data. Maybe someone here has the long waited answer. I'm really curious rather they got a confirmation to scrape user’s cellular report And how they can fire up multiple requests to AT&T site without being banned? I'm waiting for someone who could shed some lig...

Perl web scraper, extract content from DIV that only has "style" tag?

I'm stuck on this and have been all day.. I'm still pretty new to parsing / scraping in perl but I thought I had it down until this.. I have been trying this with different perl modules (tokeparser, tokeparser:simple, web parser and some others)... I have the following string (which in reality is actually an entire HTML page, but this is...

How to do really mutithreaded web mining with IE/.Net/C#?

I want to mine large amounts of data from the web using the IE browser. However, spawning lots and lots of instances of IE via WatiN crashes the system. Is there a better way of doing this? Note that I can't simply do WebRequests - I really need the browser due to having to interact with JS-driven behaviors on the site. ...

How to scrape "table like" data from stackexchange homepage? (in R)

Hello all, I wish to scrape the home page of one of the new stackexchange websites: http://webapps.stackexchange.com/ (just once, and for only several pages, nothing that should bother the servers). If I had wanted it from stackoverflow, I know there is a database dump, but for the new stackexchange, they don't exist yet. Here is wha...

Playing with Scrapi in Rails 3.. getting Segmentation Fault error / Abort Trap

What I've done so far.. sudo gem install scrapi sudo gem install tidy This didn't work because it didn't have the libtidy.dylib So I did this : sudo port install tidy sudo cp libtidy.dylib /Library/Ruby/Gems/1.8/gems/scrapi-1.2.0/lib/tidy/libtidy.dylib Then I started following the simple railscast at : http://media.railscasts.co...