views:

27

answers:

2

I want to create a command-line script for Cygwin/Bash that logs into a site, navigates to a specific page and compares it with the results of the last run. So far, I have it working with Lynx like so:

----snpipped, just setting variables----
echo "# Command logfile created by Lynx 2.8.5rel.5 (29 Oct 2005)
----snipped the recorded keystrokes-------
key Right Arrow
key p
key Right Arrow
key ^U" >> $tmp1 #p, right arrow initiate the page saving

#"type" the filename inside the "where to save" dialog
for i in $(seq 0 $((${#tmp2} - 1)))
do
    echo "key ${tmp2:$i:1}" >> $tmp1
done

#hit enter and quit
echo "key ^J
key y
key q
key y
" >> $tmp1

lynx -accept_all_cookies -cmd_script=$tmp1 https://thewebpage.com/login

diff $tmp2 $oldComp
mv $tmp2 $oldComp

It definitely does not feel "right": the cmd_script consists of relative user actions instead of specifying the exact link names and actions. So, if anything on the site ever changes, switches places, or a new link is added - I will have to re-create the actions.

Also, I can't check for any errors so I can't abort the script if something goes wrong (login failed, etc)

Another alternative I have been looking at is Mechanize with Ruby (as a note - I have 0 experience with Ruby).

What would be the best way to improve or rewrite this?

A: 

Could wget be useful here ?

It is a http, https and ftd download command line utility. It is free software (GNU). It has many options such as authentication and timestamping (only download a file if it has changed since last time).

http://www.gnu.org/software/wget/

Pierre Henry
The login is through a form (which then redirects to a web interface), can wget / curl handle that?
DarthShader
Pierre Henry
A: 

I think lynx is a great tool for simple web automation tasks, but of course it has its limits. If you need error checking you should use one of the mechanize modules for Perl, Python or Ruby (if you don't know any of this languages Python may be the easiest one to learn).

To make your lynx script a bit more robust you could use the search function to select links. On some pages using the link list (l) can help.

At the end I'd add some sanity checks to see if the downloaded files is really the one you want.

Florian Diesch