screen-scraping

Screen Scrape Form Results

I was recently requested by a client to build a website for their insurance business. As part of this, they want to do some screen scraping of the quote site for one of their providers. They asked if their was an API to do this, and were told there wasn't one, but that if they could get the data from their engine they could use it as t...

Beautiful Soup cannot find a CSS class if the object has other classes, too

if a page has <div class="class1"> and <p class="class1">, then soup.findAll(True, 'class1') will find them both. If it has <p class="class1 class2">, though, it will not be found. How do I find all objects with a certain class, regardless of whether they have other classes, too? ...

Parse a .Net Page with Postbacks

Hello, I need to read data from an online database that's displayed using an aspx page from the UN. I've done HTML parsing before, but it was always by manipulating query-string values. In this case, the site uses asp.net postbacks. So, you click on a value in box one, then box two shows, click on a value in box 2 and click a button to ...

Unit testing screen scraper

I'm in the process of writing an HTML screen scraper. What would be the best way to create unit tests for this? Is it "ok" to have a static html file and read it from disk on every test? Do you have any suggestions? Thanks ...

Web scraping sites that require javascript support

Possible Duplicate: Screen Scraping from a web page with a lot of Javascript I just want to do tasks such as form entry and web scraping, but on sites that require javascript support. And I also need to enter forms, scrape, and so on in the same session. Ideally, I'd like a way to control a web browser from the command line. And...

Take screenshots **quickly** from python

A PIL.Image.grab() takes about 0.5 seconds. That's just to get data from the screen to my app, without any processing on my part. FRAPS, on the other hand, can take screenshots up to 30 FPS. Is there any way for me to do the same from a Python program? If not, how about from a C program? (I could interface it w/ the Python program, pot...

Looking for an example of when screen scraping might be worthwhile

Screen scraping seems like a useful tool - you can go onto someone else's site and steal their data - how wonderful! But I'm having a hard time with how useful this could be. Most application data is pretty specific to that application even on the web. For example, let's say I scrape all of the questions and answers off of StackOverflo...

Scraping Ajax - Using python

I'm trying to scrap a page in youtube with python which has lot of ajax in it I've to call the java script each time to get the info. But i'm not really sure how to go about it. I'm using the urllib2 module to open URLs. Any help would be appreciated. ...

How to store an image using NSData in Objective C

How do I take a UIImage and store it preferably as NSData (to write to a file)? Is there some obvious method out there, or could someone provide a code snippet? Thanks in advance! PS. My next question will probably be for a code snippet to capture the current screen image. The snippets I've seen so far appear to be serious overkill f...

Parsing non-standard date string from StackOverflow into a .NET DateTime

I'm writing a screen-scraper for StackOverflow. The bit I'm writing now takes the HTML and puts all the information into a model object. I've run into a bit of bother while parsing the information from an answer. The problem is the date format that StackOverflow uses to describe absolute times. DateTime.Parse doesn't work on them. I've...

Parsing Information out of a Scraped Screen (HTML)

I'm trying to have my program "rip" news off of a website and place it on the WinForm, but my method is so dumb and redundant, I'm sure there must be a better way to do it. public void LoadLatestNews() { WebClient TheWebClient = new WebClient(); string SourceCode = TheWebClient.DownloadString("http://www.chronic-domination.com/"...

Dynamic screen scraping with PHP and getting past javascript

I have a website that provides a price comparison for students textbooks. I wrote a ruby script to go class by class and grab all the textbook information and store it in a database that the website can query for book information. The problem is that the bookstore keeps changing the books needed for each class so I need to figure out a w...

Python WWW macro

Hi, i need something like iMacros for Python. It would be great to have something like that: browse_to('www.google.com') type_in_input('search', 'query') click_button('search') list = get_all('<p>') Do you know something like that? Thanks in advance, Etam. ...

Screen scraping a mainframe screen in C# *without* 3rd Party Utilities

I'm looking to screen scrape a 3270 mainframe application in C#, but I've got to do so without Attachmate or other 3rd party plugins. Are there free managed libraries to do so in C#? ...

View Generated Source (After AJAX/JavaScript) in C#

Is there a way to view the generated source of a web page (the code after all AJAX calls and JavaScript DOM manipulations have taken place) from a C# application without opening up a browser from the code? Viewing the initial page using a WebRequest or WebClient object works ok, but if the page makes extensive use of JavaScript to alter...

Is there a scala version of Python's Mechanize?

I have used mechanize in Python with great success. However, I am trying to learn Scala. I have an IRC bot that I would like to add some features to, mostly having to do with screen scraping web pages from our corporate intranet. That requires being redirected to a corp-wide login page, then going to the destination, then having to po...

How can I scrape this frame?

If you visit this link right now, you will probably get a VBScript error. On the other hand, if you visit this link first and then the above link (in the same session), the page comes through. The way this application is set up, the first page is meant to serve as a frame in the second (main) page. If you click around a bit, you'll see...

HTML comment scraping in PHP

Hi there, I've been looking around but have yet to find a solution. I'm trying to scrape an HTML document and get the text between two comments however have been unable to do this successfully so far. I'm using PHP and have tried the PHP Simple DOM parser recommended here many times but can't seem to get it to do what I want. Here's (...

submitting form programmatically

Hi guys, Im trying to submit a specific form programatically, but I allways get the initial page back. I must be doing something wrong or missing something here. Im sending the session cookie and some POST data like viewState (that I parse from the initial request), and SessionID (this is the value i change in the form toget data from o...

Looking to scrape a website

I am looking to scrape a website like yelp.com, to get a listing of all the bars they have there. Are there any tools or scripts out there which would help me do this. ...