views:

19

answers:

1

I need to scrape French court cases for a project, but I can't figure out how to get Java to navigate the Court's search engine.

Here's the search page I need to manipulate. I want to start scraping the results page, but I can't get to that page from Java with just the URL. I need some way to have Java order the server to execute a search based on my date parameters (01/01/2003 - 30/06/2003), and then I can run the show by simply manipulating the URL I'm connecting to.

Any Suggestions?

A: 

First make sure the terms of service for the site allow this.

I would httpclient posts to send the data and get the results. See the form on the page, figure out which variables you need to emulate and submit them with httpclient. You should get back the results you are looking for. Also this page has lots of javascript, so you need to figure out what it is doing, maybe its never submitting the form but making ajax calls to update the page, but maybe you can get the same results.

You can always install something like "fiddler" and watch the http traffic the page is sending and then emulate that using httpclient.

Joelio