views:

39

answers:

3

I have the following shell script:

#! /bin/sh

while read page_section
  page=${page_section%%\ *}
  section=${page_section#* }     #NOTE: `%* }` is NOT a comment

  wget --quiet --no-proxy www.cs.sun.ac.za/hons/$page -O html.tmp & wait

#  echo ${page_section%%\ *} # verify correct string chopping
#  echo ${page_section#* }   # verify correct string chopping

  ./DokuWikiHtml2Latex.py html.tmp $section & wait
done < inputfile

And an input file like this:

doku.php?id=ndewet:tools:tramonitor TraMonitor
doku.php?id=ndewet:description Implementation -1
doku.php?id=ndewet:description Research\ Areas -1

The script downloads a number of webpages spesified in inputfile and must then pass the rest of line (eg. "Implementation -1" or "Research\ Areas -1") to the python script.

Now for the sticky bit. When the third line of this example file is processed it passes "Research\ Areas" to the python script as two separate arguments, as confirmed by:

>>> print sys.argv
['./DokuWikiHtml2Latex.py', 'html.tmp', 'Research', 'Areas', '-1']

How can I get a multi word section, like "Research Areas" from the input file into a single argument for the python script? I've tried escaping the '\', and also doing

./DokuWikiHtml2Latex.py html.tmp `echo ${section#* }`

among other things, but to no avail.

The number at the end of an input line is another argument, but optional.

+1  A: 

Just let read do the parsing stuff:

while read page section rest
do
    echo "Page: $page"
    echo "Section: $section"
done < inputfile

For handling the optional argument elegantly, use an array:

while read -a fields
do
    wget --quiet --no-proxy "www.cs.sun.ac.za/hons/${fields[0]}" -O html.tmp
    unset "fields[0]"
    ./DokuWikiHtml2Latex.py html.tmp "${fields[@]}"
done < inputfile

Always quote your variables!

Philipp
You should put quotes around array elements that you unset to protect against file globbing: `unset "fields[0]"` (in case there is a file named "fields0"). Demo: `test=(1 2 3); touch test0; unset test[0]; declare -p test; unset "test[0]"; declare -p test`
Dennis Williamson
@Dennis Williamson: Thanks.
Philipp
You're welcome. I forgot to demonstrate that a variable named `test0`, if it existed, would get unset because of the globbing and the presence of the file: `test=(1 2 3); test0=4; touch test0; unset test[0]; echo "test0 = $test0"; declare -p test; unset "test[0]"; declare -p test`
Dennis Williamson
A: 

Normally multi-word arguments can be passed as one by using quotes, so:

doku.php?id=ndewet:description "Research Areas" -1
Ian Wetherbee
+2  A: 

Put quotes around $section:

./DokuWikiHtml2Latex.py html.tmp "$section" & wait
bstpierre