views:

79

answers:

4

I'm trying to write a bash script that will let me download multiple web pages using curl. For each webpage, I want to be able to pass curl the page and the referer link. I want to be able to supply multiple webpages at once.

In other words, I want to be able to loop through the webpages I supply the script, and for each page, pass the associated webpage and referer link to curl.

I thought I'd use an array to store the webpage and referer link in a single variable, thinking that I could then extract the individual elements of the array when running curl.

My problem is that I can't figure out how to get multiple arrays to work properly in a for loop. Here is an idea of what I want to do. This code does not work, since "$i" (in the for loop) doesn't become an array.

#every array has the information for a separate webpage
array=( "webpage" "referer" )
array2=( "another webpage" "another referer" )

for i in "${array[@]}" "${array2[@]}" #line up multiple web pages
do
    #use curl to download the page, giving the referer ("-e")
    curl -O -e "${i[1]}" "${i[0]}"
done

If I was only working with one array, I could easily do it like this:

array=( "webpage" "referer" )
REFERER="${array[1]}"
PAGE="${array[0]}"
#use curl to download the page, giving the referer ("-e")
curl -O -e "$REFERER" "$LINK"

It's once I have more than one webpage that I want to process at once that I can't figure out how to do it correctly.

If there is another way to handle multiple webpages, without having to use arrays and a for loop, please let me know.

+3  A: 

If there is another way to handle multiple webpages, without having to use arrays and a for loop, please let me know.

Using arrays is fine, at least it's much better than using space-separated lists or similar hacks. Simply loop over the indices:

array=('webpage' 'another webpage')
array2=('referrer' 'another referrer')
# note the different layout!
for i in "${!array[@]}"
do 
    webpage="${array[$i]}"
    referrer="${array2[$i]}"
done
Philipp
A: 

You need a trick here. Note that spaces are not allowed in URLs, so you can say:

webpages=("url referrer" "url2 ref2" ...)

for i in "${webpages[@]}" ; do
    set -- "$i"
    url="$1"
    ref="$2"

    curl -O -e "${url}" "${ref}"
done

[EDIT] Maybe a better solution will be to put all the URLs into a file and then use this code:

while read url ref ; do
    curl -O -e "${url}" "${ref}"
done < file

or if you prefer here documents:

while read url ref ; do
   echo "url=$url ref=$ref"
done <<EOF
url1 ref1
url2 ref2
... xxx
EOF
Aaron Digulla
Bash can do the word splitting on spaces without having to make two calls to an external program in each iteration inside a loop.
Dennis Williamson
Sorry to disappoint you but `expr` is a bash builtin.
Aaron Digulla
`which expr` returns /usr/bin/expr
Menachem
It returns `/bin/echo` for `which echo` and `echo` is definitely a builtin. But you can achieve the same effect with `set -- "$i"`. Updated my answer.
Aaron Digulla
From the advanced bash guide, http://tldp.org/LDP/abs/html/internal.html :A builtin may be a synonym to a system command of the same name, but Bash reimplements it internally. For example, the Bash echo command is not the same as /bin/echo, although their behavior is almost identical.`type -a echo` returns:echo is a shell builtinecho is /bin/echo
Menachem
A: 

Thanks to everyone for their responses. Both ideas had merit, but I found some code in the Advanced Bash Guide that does exactly what I want to do.

I can't say I fully understand it, but by using an indirect reference to the array, I can use multiple arrays in the for loop. I'm not sure what the local command does, but it is the key (I think it runs a sort of eval and assigns the string to the variable).

The advantage of this is that I can group each webpage and referer into their own array. I can then easily add a new website, by creating a new array and adding it to the for loop. Also, should I need to add more variables to the curl command (such as a cookie), I can easily expand the array.

function get_page () {
        OLD_IFS="$IFS"
        IFS=$'\n'       #  If the element has spaces, when using
                        #  local to assign variables

        local ${!1}


        # Print variable
        echo First Variable: "\"$a\""
        echo Second Variable: "\"$b\""
        echo ---------------
        echo curl -O -e "\"$a\"" "\"$b\""
        echo  
        IFS="$OLD_IFS"
}       

#notice the addition of "a=" and "b="
#this is not an associative array, that would be [a]= and [b]=
array=( a="webpage" b="referer" )
array2=( a="another webpage" b="another referer" )

#This is just a regular string in the for loop, it doesn't mean anything
#until the indirect referencing later
for i in "array[*]" "array2[*]" #line up multiple web pages
do
        #must use a function so that the local command works
        #but I'm sure there's a way to do the same thing without using local
        get_page "$i" 
done

This results in:

First Variable: "webpage"
Second Variable: "referer"
---------------
curl -O -e "webpage" "referer"

First Variable: "another webpage"
Second Variable: "another referer"
---------------
curl -O -e "another webpage" "another referer"
Menachem
A: 

Just as a general aside: Inside a function at least just declare the IFS variable to limit its scope to that function only. No need to save & restore IFS via OLD_IFS!

help declare

IFS=$' \t\n'
printf "%q\n" "$IFS"

function ifs_test () {
    declare IFS
    IFS=$'\n'
    printf "%q\n" "$IFS"
    return 0
}

ifs_test

printf "%q\n" "$IFS"
joe