tags:

views:

143

answers:

3

I'm writing a perl program that was doing a simple get command to retrieve results and process them. But the site has been updated and now has a java component that handles the results (so the actual data is not in the source code anymore).

This is the site: http://wro.westchesterclerk.com/legalsearch.aspx

Try putting in:
Index Number: 11103
Year: 2009

I want to be able to pro grammatically enter the "index number" and "year" at the bottom of the form where it says "search by number" and then retrieve the results listed next to it.

I've written many programs in Perl that simply pass variables via the URL and the results are listed in the source code, so it's easy to parse. (Using LWP:Simple)

Like:

$html = get("http://www.url.com?id=$somenum&year=$someyear")

But this is totally new to me and I don't know where to begin. I'm somewhat familiar with LWP:UserAgent and Mechanize.

I'd really appreciate any help.

Thanks!

A: 

What your asking to do in this case is hard. Not impossible but hard.

method A: You can sift through their javascript code. What their "ajax" is doing is making a get/post request to another web page and dynamically loading the results. If you can decipher what that link is and the proper arguments you can continue to use get. I would recoment Getting the firebug plugin and any other tool that will help you de-obfuscate their javascript.

Another Method: If your program could access a web browser(with javascript url support. like firefox). You could programatticaly go to these addresses, then wait a moment and get your data.

http://wro.westchesterclerk.com/legalsearch.aspx
javascript: function go() { document.getElementById('ctl00_tbSearchArea__ctl1_cphLegalSearch_splMain_tmpl0_tbLegalSearchType__ctl0_txtInde    xNo').value=11109; document.getElementById('ctl00_tbSearchArea__ctl1_cphLegalSearch_splMain_tmpl0_tbLegalSearchType__ctl0_txtYear').value='09';searchClick();} go();

This is a method we have used along with mozembed to programatically get around this stuff. Recently we switched to Web Kit. And to remove this from taking up a video display we have used Xvfb/Xvnc to create a virtual desktop to load the browser in.


Those are the methods I have came up with so far. Let me know if you come up with another. Also I hope I helped.

J.J.
+2  A: 

It might be more logical for you to use one of the modules which drives a browser. Something like Mozilla::Mechanize or the Selenium tools.

A browser knows best how to interact with the server using AJAX and re-render the DOM and so on, so build your script on top of that ability.

AmbroseChapel
+3  A: 

That sort of question gets asked a lot. The standard answer is Wireshark.

I was just using it on that website with the test data you gave and extracted a single responsible POST request. This lets you bypass Javascript altogether.

daxim
Nice. I am gonna have to try that.
J.J.
http://stackoverflow.com/questions/2118415 Run a capture, filter by HTTP, select the request, pick Follow TCP Stream from context menu.
daxim