unit tests for screen-scraping?

views:

answers:

+4 Q:

unit tests for screen-scraping?

I'm new to unit testing so I'd like to get the opinion of some who are a little more clued-in.

I need to write some screen-scraping code shortly. The target system is a web ui where there'll be copious HTML parsing and similar volatile goodness involved. I'll never be notified of any changes by the target system (e.g. they put a redesign on their site or otherwise change functionality). So I anticipate my code breaking regularly.

So I think my real question is, how much, if any, of my unit testing should worry about or deal with the interface (the website I'm scraping) changing?

I think unit tests or not, I'm going to need to test heavily at runtime since I need to ensure the data I'm consuming is pristine. Even if I ran unit tests prior to every run, the web UI could still change between tests and runtime.

So do I focus on in-code testing and exception handling? Does that mean to draw a line in the sand and exclude this kind of testing from unit tests altogether?

Thanks

+1 A:

I think the thing unit tests might be useful for here is if you have a build server they will give you an early warning the code no longer works. You can't write a unit test to prove that screenscraping will still work if the site changes its HTML (because you can't tell what they will change).

You might be able to write a unit test to check that something useful is returned from your efforts.

RichardOD 2009-12-08 16:52:32

Checking that something useful (and falls within known constraints) is basically what I had in mind if I would be writing unit tests to the various scraping methods.

Chris 2009-12-08 19:35:20

+4 A:

Unit testing should always be designed to have repeatable known results.

Therefore, to unit test a screen-scraper, you should be writing the test against a known set of HTML (you may use a mock object to represent this)

The sort of thing you are talking about doesn't really sound like a scenario for unit-testing to me - if you want to ensure your code runs as robustly as possible, then it is more, as you say, about in-code testing and exception handling.

I would also include some alerting code, so they system made you aware of any occasions when the HTML does not get parsed as expected.

DanSingerman 2009-12-08 16:53:25

Yep. I did something very similar. Get HTML that matches various cases (section present, section missing, table empty, etc.) and feed those strings into your parsing class (which should be separate from your web downloader class).

TrueWill 2009-12-08 17:26:45

Thanks I think this really speaks to my exact conflict with this. And agreed about the alerting code!

Chris 2009-12-08 19:34:27

+1 A:

You should try to separate your tests as much as possible. Test the data handling with low level tests that execute the actual code (i.e. not via a simulated browser).

In the simulated browser, just make sure that the right things happen when you click on buttons, when you submit forms, and when you follow links.

Never try to test whether the layout is correct.

Aaron Digulla 2009-12-08 16:54:22

No browser in the mix. Just command line execution and curl.

Chris 2009-12-08 19:36:13

ansaurus

tags:

views:

answers:

unit tests for screen-scraping?

related questions