views:

236

answers:

4

Hi all, I'm a Java developer and I have a question about automating a task I've been given. I'm having to 3 times daily, login to this website we have at work, select a few form elements and then click on submit to get a report printed out. I'm wondering how I can write some sort of script that will automate this task? Where should I start? What language should I do it in? I was thinking PHP might be able to do this or even a greasemonkey script possibly?

Thanks a lot.

A: 

It's called "web scraping" or "screen scraping", and there are a lot of libraries out there to do this. I couldn't speak to a java-specific tool, though: I'm a .Net guy (the .Net way would be System.Net.WebClient or System.Net.HttpWebRequest/System.Net.HttpWebResponse). But I'm sure there's something.

In the meantime, the first step is go to the page where you input the form values, and view the source of the page. Look for the specific <form> element you're filling out, and see where it posts to (it's action). Then, find any <input> <select>, <textarea> elements you use, including any hidden inputs for the form, and figure out what values you need to get. That will tell you how to build your request once you find a library that will let you send it.

If you need to login to the site first to get to the page, things can be more complicated. You may need to retrieve and parse a session value, or be able to send certain cookies to the server.

Joel Coehoorn
A: 

I don't know what language your form is written in, but what you could do is:

  • rewrite the form to a script which generates the report when called
  • use a cron entry to schedule this task to be done daily and mail the output to you

A cron is basically a scheduled task on Unix systems. Windows-based servers can use the Task Scheduler to much the same end.

The above assumes that you have access to the script which generates the report at the moment and can modify it / copy it to a new file which will email the output to you. If not, then you may need to look into screen scraping. As you're a Java developer, you may find this list of Java screen scraping utilities handy to get you started.

ConroyP
+1  A: 

I think the potential sticking point that hasn't been touched on yet is your phrase "login to this website"... Depending on how you need to log in, you may need to go in through a back door to access the report.

I had problems with this kind of thing in the past when I had to download a report from a third party site. The issue was that I couldn't authenticate to access the report parameters because of the hard-coded and less-than-script-friendly way I was required to log in to the site. However, I presume that your site is internal to your organisation, so it may be possible to bypass/rework the security requirements in order to access the data. If this is the case, then you should be able to use one of the screen scraping methods outlined above.

If not, you may need to incorporate the actual login procedure into your script or application, download and capture any cookies that may be set and incorporate them into your data request.

ZombieSheep
+2  A: 

Check out cURL in PHP. It allows you to do all the normal functions of a web browser with code (other than moving the mouse). And yes, you'll need to do screen scraping.

Darryl Hein