views:

156

answers:

2

I have a command like

echo "abcd0001gfh.DAT" | sed 's/^[^0-9]*\(....\).*$/\1/' | awk '{ print "00"$0 }'

This will give me an output of 000001. But I want to run this in a loop where I receive the file name from 0001-9999 and again it becomes 0001. So my output should like below

abcd0001gfh.DAT 000001
abcd0002gfh.DAT 000002
.
.
.
abcd9999gfh.DAT 009999
abcd0001gfh.DAT 010001
.
.
abcd9999gfh.DAT 019999
abcd0001gfh.DAT 020001

There is also a chance that I will receive 0005 after 0002 and here I consider 0003 and 0004 as missing sequences.

I want a limit to be set so the value of the prefix ranges from 00-99 i.e., the value can go up to 999999. So the loop should go until 9999 is received 99 times in the input file.

How could this be done in a shell script?

+1  A: 

I'm assuming you have your .DAT filenames stored in a file called datfiles.list. What you want is to increment the prefix every time the new extracted value is smaller than the previous.

lastSeq=0;
prefix=0;
for name in `cat datfiles.list`; do 
    seq=`echo $name | sed 's/^[^0-9]*\(....\).*$/\1/' | awk '{ print "00"$0 }'`
    if [[ $seq < $lastSeq ]]; then
     prefix=$(($prefix+1));
    fi
    lastSeq=$seq;
    printf "%02d%06d\n" $prefix $seq
done;

This seems to produce the output you want. Note the use of printf at the end to zero-pad the fields.

Igor
Yes Peter...my .DAT filenames stored in a file called datfiles.list.but the problem over here is i keep receiving the files one after other sequentially and at some instances some sequences can also miss.so the script shoudl keep on polling the directory and give the sequence accordingly.so i think this solution ..even though its good but does solve the purpose.
Vijay Sarathi
A: 

Maybe this script helps a little. But there is still a problem with the missing files and the order in which they arrive. What if there will be no ????9999.DAT file? $sequence will not increment. What if ????9998.DAT arrives after ????9999.DAT? $sequence will be already be incremented. But, perhaps you will find a solution for that. Last but not least, in case you will use the code, you need something to update the .ts file when you break the loop. You could also move the computed files to a different directory.

#!/usr/bin/ksh

datadir=/home/cheko/tmp/test/datloop/data
ts=$datadir/.ts
latest=$datadir/.ts
timeout=20

if [ -f $ts ]
then
    sequence=`cat $ts`
else
    sequence=0
    echo $sequence > $ts
    touch -t 197001011212 $ts
fi

while true
do
    for file in `find $datadir -type f -newer $latest`
    do
        file=`basename $file`
        number=`echo $file | sed -n 's/^.*\([0-9]\{4,4\}\)\.DAT/\1/p'`
        echo $number
        printf "%-20s %02d%s\n" $file $sequence $number
        if [ "$number" = "9999" ]
        then
            sequence=$((sequence+1))
            echo $sequence > $ts
        fi
    done
    latest=$datadir/$file
    sleep $timeout
done