views:

251

answers:

2

I'm trying to download an html file with curl in bash. Like this site: http://www.registrar.ucla.edu/schedule/detselect.aspx?termsel=10S&subareasel=PHYSICS&idxcrs=0001B+++

When I download it manually, it works fine. However, when i try and run my script through crontab, the output html file is very small and just says "Object moved to here." with a broken link. Does this have something to do with the sparse environment the crontab commands run it? I found this question:

http://stackoverflow.com/questions/1279340/php-ssl-curl-object-moved-error

but i'm using bash, not php. What are the equivalent command line options or variables to set to fix this problem in bash?

(I want to do this with curl, not wget)

Edit: well, sometimes downloading the file manually (via interactive shell) works, but sometimes it doesn't (I still get the "Object moved here" message). So it may not be a a specifically be a problem with cron's environment, but with curl itself.

the cron entry:
* * * * * ~/.class/test.sh >> ~/.class/test_out 2>&1

test.sh:

#! /bin/bash
PATH=/usr/local/bin:/usr/bin:/bin:/sbin
cd ~/.class

course="physics 1b"
url="http://www.registrar.ucla.edu/schedule/detselect.aspx?termsel=10S<URL>subareasel=PHYSICS<URL>idxcrs=0001B+++"

curl "$url" -sLo "$course".html  --max-redirs 5

Edit: Problem solved. The issue was the stray tags in the url. It was because I was doing sed s,"<URL>",\""$url"\", template.txt > test.sh to generate the scripts and sed replaced all instances of & with the regular expression <URL>. After fixing the url, curl works fine.

+1  A: 

You want the -L or --location option, which follows 300 series redirects. --maxredirs [n] will limit curl to n redirects.

Its curious that this works from an interactive shell. Are you fetching the same url? You could always try sourcing your environment scripts in your cron entry:

* * * * * . /home/you/.bashrc ; curl -L --maxredirs 5 ...

EDIT: the example url is somewhat different than the one in the script. $url in the script has an additional pair of <URL> tags. Replacing them with &, the conventional argument seperators for GET requests, works for me.

Patrick
using the -L option helps, but it doesn't solve it. It doesn't have the "Object moved here" message, but the page that it downloads just has the site's error message - "System is currently unavaliable or offline", instead of the page it want.
adam n
Yes, I was testing it with the same url.I don't have a .bashrc file, do I don't know if sourcing the environment scripts would help..
adam n
adam n
A: 

Without seeing your script it's hard to guess what exactly is going on, but it's likely that it's an environment problem as you surmise.

One thing that often helps is to specify the full path to executables and files in your script.

If you show your script and crontab entry, we can be of more help.

Dennis Williamson