views:

1984

answers:

9

I have a web server that saves the logs files of a web application numbered. A file name example for this would be:

dbsclog01s001.log
dbsclog01s002.log
dbsclog01s003.log

The last 3 digits are the counter and they can get sometime up to 100.

I usually open a web browser, browse to the file like:

http://someaddress.com/logs/dbsclog01s001.log

and save the files. This of course gets a bit annoying when you get 50 logs. I tried to come up with a BASH script for using wget and passing

http://someaddress.com/logs/dbsclog01s*.log

but I am having problems with my the script. Anyway, anyone has a sample on how to do this?

thanks!

A: 

Here you can find a Perl script that looks like what you want

http://osix.net/modules/article/?id=677

#!/usr/bin/perl
$program="wget"; #change this to proz if you have it ;-)
my $count=1; #the lesson number starts from 1
my $base_url= "http://www.und.nodak.edu/org/crypto/crypto/lanaki.crypt.class/lessons/lesson";
my $format=".zip"; #the format of the file to download
my $max=24; #the total number of files to download
my $url;

for($count=1;$count<=$max;$count++) {
    if($count<10) {
    $url=$base_url."0".$count.$format; #insert a '0' and form the URL
    }
    else {
    $url=$base_url.$count.$format; #no need to insert a zero
    }
    system("$program $url");
}
Carlos Tasada
+1  A: 

Not sure precisely what problems you were experiencing, but it sounds like a simple for loop in bash would do it for you.

for i in {1..999}; do
wget -k http://someaddress.com/logs/dbsclog01s$i.log -O your_local_output_dir_$i;
done
anschauung
Of course, you'll want to replace '999' with the actual number of files, or maybe add some logic to count them beforehand. The input and output strings might need some refinement too, depending on how the "real" URL looks.
anschauung
my problem was turning something similar to what you just wrote to a script that can accept the URL and file name as arguments.
wonderer
Ah! So, you're looking for something like a little bash utility that would take the URL literal, the output file literal, and the number of files, then run the wget loop based on that info? $0, $1, $2 etc are the input arguments in bash scripts, so I could adjust the example to reflect that if you confirm this is what you're looking for.
anschauung
yeap. pretty much it.
wonderer
There should only be two dots in the range "{1..999}".
Dennis Williamson
+1  A: 

You can use a combination of a for loop in bash with the printf command (of course modifying echo to wget as needed):

$ for i in {1..10}; do echo "http://www.com/myurl`printf "%03d" $i`.html"; done
http://www.com/myurl001.html
http://www.com/myurl002.html
http://www.com/myurl003.html
http://www.com/myurl004.html
http://www.com/myurl005.html
http://www.com/myurl006.html
http://www.com/myurl007.html
http://www.com/myurl008.html
http://www.com/myurl009.html
http://www.com/myurl010.html
Mark Rushakoff
thanks. How can I turn thing into a full script that accept the URL as an argument?
wonderer
A: 

I just had a look at the wget manpage discussion of 'globbing':

By default, globbing will be turned on if the URL contains a globbing character. This option may be used to turn globbing on or off permanently. You may have to quote the URL to protect it from being expanded by your shell. Globbing makes Wget look for a directory listing, which is system-specific. This is why it currently works only with Unix FTP servers (and the ones emulating Unix "ls" output).

So wget http://... won't work with globbing.

pavium
+1  A: 

curl seems to support ranges. From the man page:

URL  
       The URL syntax is protocol dependent. You’ll find a  detailed  descrip‐
       tion in RFC 3986.

       You  can  specify  multiple  URLs or parts of URLs by writing part sets
       within braces as in:

        http://site.{one,two,three}.com

       or you can get sequences of alphanumeric series by using [] as in:

        ftp://ftp.numericals.com/file[1-100].txt
        ftp://ftp.numericals.com/file[001-100].txt    (with leading zeros)
        ftp://ftp.letters.com/file[a-z].txt

       No nesting of the sequences is supported at the moment, but you can use
       several ones next to each other:

        http://any.org/archive[1996-1999]/vol[1-4]/part{a,b,c}.html

       You  can  specify  any amount of URLs on the command line. They will be
       fetched in a sequential manner in the specified order.

       Since curl 7.15.1 you can also specify step counter for the ranges,  so
       that you can get every Nth number or letter:

        http://www.numericals.com/file[1-100:10].txt
        http://www.letters.com/file[a-z:2].txt

You may have noticed that it says "with leading zeros"!

Dennis Williamson
+1  A: 

Interesting task, so I wrote full script for you (combined several answers and more). Here it is:

#!/bin/bash
# fixed vars
URL=http://domain.com/logs/     # URL address 'till logfile name
PREF=logprefix                  # logfile prefix (before number)
POSTF=.log                      # logfile suffix (after number)
DIGITS=3                        # how many digits logfile's number have
DLDIR=~/Downloads               # download directory
TOUT=5                          # timeout for quit
# code
for((i=1;i<10**$DIGITS;++i))
do
        file=$PREF`printf "%0${DIGITS}d" $i`$POSTF   # local file name
        dl=$URL$file                                 # full URL to download    
        echo "$dl -> $DLDIR/$file"                   # monitoring, can be commented
        wget -T $TOUT -q $dl -O $file
        if [ "$?" -ne 0 ]                            # test if we finished
        then
                exit
        fi
done

At the beggiing of the script you can set URL, log file prefix and suffix, how many digits you have in numbering part and download directory. Loop will download all logfiles it found, and automaticaly exit on first non-existant (using wget's timeout).

Note that this script assumes that logfile indexing starts with 1, not zero, as you mentioned in example.

Hope this helps.

igustin
thanks. I get a "let: not found" error. and then, since max is undefined I get an error in the line after that.
wonderer
Hm, obviously you have some different bash version. :-( OK, I changed script not to use "let", but direct expression in for loop instead. Try now, and let me know.
igustin
+3  A: 
#!/bin/sh

if [ $# -lt 3 ]; then
        echo "Usage: $0 url_format seq_start seq_end [wget_args]"
        exit
fi

url_format=$1
seq_start=$2
seq_end=$3
shift 3

printf "$url_format\\n" `seq $seq_start $seq_end` | wget -i- "$@"
$ ./seq_wget http://someaddress.com/logs/dbsclog01s%03d.log 1 50

Or, if you have Bash 4.0, you could just type

$ wget http://someaddress.com/logs/dbsclog01s{001..050}.log

Or, if you have curl instead of wget, you could follow Dennis Williamson's answer.

ephemient
A: 

Check to see if your system has seq, then it would be easy:

for i in $(seq -f "%03g" 1 10); do wget "http://.../dbsclog${i}.log"; done

If your system has the jot command instead of seq:

for i in $(jot -w "http://.../dbsclog%03d.log" 10); do wget $i; done
Hai Vu
A: 

Hi I need help.

The above scripts are quite confusing what I am trying to do is loop with wget command and get the URL from a text file. it requires looping on wget statement as I need to reload the cookies after every single url or file is read from the text file.

Example:

!/bin/bash

WHILE wget some arguments.

text file.

sample content of text file:

http://url/file1 -> back to wget and load the cookie url2 -> back to wget and load the cookie url3 -> back to wget and load the cookie

farneville