views:

365

answers:

6

I need to write a shell script that pick all the files (not directories) in /exp/files directory. For each file inside the directory I want to find whether the last line of file is received . The last line in the file is a trailer record. Also the third field in the last line is the number of data records count i.e 2315 (Total Number of lines in the file -2 (header,trailer) ) . In my unix shell script i want to check whether the last line is a trailer record by checking T and want to check whether the number of lines in the file is equal to (2315+2). If this is successful then i want to move the file to a different directory /exp/ready.

tail -1 test.csv 
T,Test.csv,2315,80045.96

Also in the inputfile sometimes 0 or 1 more fields of trailer record can be within double quotes

"T","Test.csv","2315","80045.96"
"T", Test.csv, 2212,"80045.96"
T,Test.csv,2315,80045.96
+1  A: 

If you want to move the files after they've been written and closed then you should consider using something like inotify, incron, FAM, gamin, etc.

Ignacio Vazquez-Abrams
Thanks a lot for the info
Arav
+1  A: 

You can test for the presence of the last line with the following:

tail -1 ${filename} | egrep '^T,|^"T",' >/dev/null 2>&1
rc=$?

At that point $rc will be 0 if the line started with either T, or "T",, assuming that's enough to catch the trailer record.

Once you've established that, you can extract the line count with:

lc=$(cat ${filename} | wc -l)

and you can get the expected line count with:

elc=$(tail -1 ${filename} | awk -F, '{sub(/^"/,"",$3);print 2+$3}')

and compare the two.

So, tying all that together, this would be a good start. It outputs the file itself (my test files num[1-9].tst) along with a message indicating whether the file is okay or why it is not okay.

#!/bin/bash
cd /exp/files
for fspec in *.tst ; do
    if [[ -f ${fspec} ]] ; then
        cat ${fspec} | sed 's/^/   /'
        tail -1 ${fspec} | egrep '^T,|^"T",' >/dev/null 2>&1
        rc=$?
        if [[ ${rc} -eq 0 ]] ; then
            lc=$(cat ${fspec} | wc -l)
            elc=$(tail -1 ${fspec} | awk -F, '{sub(/^"/,"",$3);print 2+$3}')
            if [[ ${lc} -eq ${elc} ]] ; then
                echo '***' File ${fspec} is done and dusted.
            else
                echo '***' File ${fspec} line count mismatch: ${lc}/${elc}.
            fi
        else
            echo '***' File ${fspec} has no valid trailer.
        fi
    else
        ls -ald ${fspec} | sed 's/^/   /'
        echo '***' File ${fspec} is not a regular file.
    fi
done

The sample run, showing the test files I used:

   H,Test.csv,other rubbish goes here
   this file does not have a trailer
*** File num1.tst has no valid trailer.
   H,Test.csv,other rubbish goes here
   this file does have a trailer with all quotes and correct count
   "T","Test.csv","1","80045.96"
*** File num2.tst is done and dusted.
   H,Test.csv,other rubbish goes here
   this file does have a trailer with all quotes but bad count
   "T","Test.csv","9","80045.96"
*** File num3.tst line count mismatch: 3/11.
   H,Test.csv,other rubbish goes here
   this file does have a trailer with all quotes except T, and correct count
   T,"Test.csv","1","80045.96"
*** File num4.tst is done and dusted.
   H,Test.csv,other rubbish goes here
   this file does have a trailer with no quotes on T or count and correct count
   T,"Test.csv",1,"80045.96"
*** File num5.tst is done and dusted.
   H,Test.csv,other rubbish goes here
   this file does have a traier with quotes on T only, and correct count
   "T",Test.csv,1,80045.96
*** File num6.tst is done and dusted.
   drwxr-xr-x+ 2 pax None 0 Feb 23 09:55 num7.tst
*** File num7.tst is not a regular file.
   H,Test.csv,other rubbish goes here
   this file does have a trailer with all quotes except the bad count
   "T","Test.csv",8,"80045.96"
*** File num8.tst line count mismatch: 3/10.
   H,Test.csv,other rubbish goes here
   this file does have a trailer with no quotes and a bad count
   T,Test.csv,7,80045.96
*** File num9.tst line count mismatch: 3/9.
paxdiablo
My Tail Record can be one of the below. "T", Test.csv, 2212,"80045.96" T,Test.csv,"2212",80045.96 Does this one elc=$(tail -1 ${fspec} | awk -F, '{print 2+$3}') handle if the record count 2212 appears with or without of double quotes? If not how can i modify it?
Arav
Good catch, @arav, I really should test my code before inflicting it on the unsuspecting public :-) The new code should fix that problem (and I've added some unit tests to hopefully give you a measure of some confidence as well).
paxdiablo
@paxdiablo: You can make your code more efficient by getting rid of all the pipes to cat as they are not needed. Ex. `cat ${fspec} | sed 's/^/ /'` can be simplified to just `sed 's/^/ /' "$fspec"` and `lc=$(cat ${fspec} | wc -l)` to `lc=$(wc -l < "$fspec")`. Also, it is very important that you **always** quote your variables when dealing with strings that can contain spaces.
SiegeX
I got into the habit long ago of starting pipelines with cat simply because it looks "cleaner" to me (every other stage of the pipeline being a pure stdin/stdout process), and habits die harder with age. I realise it's less efficient but I rarely care with shell scripts: the cost of an extra stdout-stdin connection is usually small compared to the real processing. But point taken. I also actively track down and kill those who put spaces in their filenames :-) You won't see any of those abominations on systems that _I_ manage.
paxdiablo
Thanks a lot. what does the below line do?ls -ald ${fspec} | sed 's/^/ /'I will try the program and let you know
Arav
Also what is the purpose of this linecat ${fspec} | sed 's/^/ /'
Arav
If i change it to ksh there is no need of code changes?
Arav
I tried testing the program it's giving me an syntax error. tail -1 ABC_DEF_D_mb440_MMMINU_11Feb09_1.txt_20100211111203_09.csv |egrep '^T,|^"T",' T,ABC_DEF_D_mb440_MMMINU_11Feb09_1.txt_20100211111203_09.csv,591,266922.00 When i run the program i am getting an syntax awk: syntax error near line 1 awk: illegal statement near line 1 elc=$(tail -1 ${fspec} | awk -F, '{sub(/^"/,"",$3);print 2+$3}') Not sure what is this syntax error. Does this sub removes begnining and ending double quotes? if the count happens to be "591" does it remove both first and ending double quotes?
Arav
Using SUNOS. Is it a issue?
Arav
I tried the below simple awk statement in sun os echo 'Moo' | awk '{ sub(/M/,"B"); print }' it gives me an error awk: syntax error near line 1 awk: illegal statement near line 1 do i need to to use nawk? also sub command in the code does it remove both begining and ending double quotes? "2512"
Arav
tried the nawk with the code given by you. it's working now. The csv file has 1000 line and it's getting displayed when the program is running. I want to suppress it. what can i do?cat ${fspec} | sed 's/^/ /' Are using the aboving statement for display? can i remove it?
Arav
@arav, you've been busy :-) Numbering your question "Thanks a lot. what does ..." as 1: (1) It lists the directory and it's a debug statement. (2) It shows the file, another debug statement. (3) bash and ksh are very similar but there may be subtle differences, suggest you try, then open another question if there's a problem. (4) Only removes first quote, syntax error probably Sun's limited awk - get GNU awk. (5) see 4. (6) simple error confirms your awk is deficient, see 4. (7) Finally! :-) Glad it's working. You can remove all the echos since they were for debugging only.
paxdiablo
Just make sure you remove the elses when removal of an echo makes the else part empty. The "done and dusted" bit is where you should put your code to process the file.
paxdiablo
Thanks a lot for the info. am testing it. Will let you know
Arav
The program works very well. Thanks a lot. I will close the answer once i resoved the issue with SiegeX program
Arav
@arav, I'm assuming you mean _accept_ an answer. The whole point of SO is not to solve a problem just for you, but to leave the question here for others to find help later. You should accept the best answer, upvote all those that helped and leave the question here.
paxdiablo
I 2nd what pax said
SiegeX
Thanks a lot for your solution. I will accepting your solution as well as SiegeX solution since both are good one's.
Arav
I am unable to vote. It's saying vote too old to be changed unless answer is edited. Could u pls let know know what i can do now for vote?
Arav
@arav, try now.
paxdiablo
Thanks a lot for your time.
Arav
A: 

Don't have a UNIX shell handy here, but

#!/bin/bash
files=$(find /exp/files -type f)

should put all files in a BASH array; then iterating through each of them as paxdiablo suggested above should get you sorted

lorenzog
Thanks a lot for the info
Arav
A: 
destination=/exp/ready
for file in /exp/files/*.csv
do
    var=$(tail -1 "$file" | awk -F"," '{ gsub(/\042|\047/,"") }
    $1=="T" && $3 == "2315" { print "ok" }')
    if [ "$var" = "ok" ]; then
        echo mv "$file" "$destination"
    else
        echo "invalid: $file"
    fi
done
ghostdog74
Thanks a lot for the info
Arav
A: 
#!/bin/bash

ex findready.sh <<'HERE'
  i#!/bin/bash/

  let NUMLINES=$(wc -l $1)
  let TRAILER=$(cat $1 | tail -1 | tr -d '"' | sed 's/^\(.\).*$/\1/')

  if [[ $NUMLINES -eq 2317 && $TRAILER == "T" ]] ; then
      mv $1 /exp/ready/$1
  fi
  .
  wq
HERE

chmod a+x findready.sh

find /exp/files/ -type f -name '*.csv' -exec ./findready.sh {} ';' > /dev/null 2>&1
Hadewijch Debaillie
Thanks a lot for the info
Arav
+1  A: 

This code does all of the logic calculations via a single call to awk which makes it very efficient. It also does NOT hardcode the example value of 2315 but rather uses the value contained in the trailer line as I believe this was your intent.

Remember to remove the echo if you are satisfied with the results.

#!/bin/bash

for file in /exp/files/*; do
  if [[ -f "$file" ]]; then
    if nawk -F, '{v0=$0;v1=$1;v3=$3}END{gsub(/"/,"",v0);exit !(v1 == "T" && NR == v3+2)}' "$file"; then
      echo mv "$file" /ext/ready
    fi
  fi
done

Update

I had to add {v0=$0;v1=$1;v3=$3} because SunOS's implementation of awk does not support END{} having access to the field variables ($0, $1, $2, etc.) but instead must be saved to a user-defined variable if you want to work on them inside END{}. See the last row of the first table in This awk feature comparison link

SiegeX
what is the gsub do? Is the exit in the awk will break out of the for loop?
Arav
the `gsub()` is to remove the quotations marks (if they exist). The exit() is actually apart of the `awk` command, not bash. So no, it doesn't break out of the for-loop but rather sets awk's return value as seen by bash -- '0' if we match, '1' if we don't.
SiegeX
You really should check for a regular file first. Your (clever, I'll admit) trick, throwing away stderr to get rid of the errors when catting a directory, won't work too well with a pipe made with mkfifo (for example). It gets stuck reading that pipe forever. But, still, an elegant solution.
paxdiablo
good suggestion, updated code to reflect. I kept the redirection of stderr to hide any sort of permission denied problems. If you want to see those, just remove the `2>/dev/null` portion.
SiegeX
Thanks a lot. SiegeX, "won't work too well with a pipe made with mkfifo". What does it mean? Does this Code will work for this case?
Arav
If i change to bash to ksh in the first line there are no changes to program?
Arav
I tried the program it's not working giving syntax errors. I remove the 2>/dev/null at the end and did a echo of the file name to find the error./test.sh/exp/ready/XYZ_ABC_XYONFNU-wa011_19Feb09.txt_20100211104459_17.csvawk: syntax error near line 1awk: illegal statement near line 1/exp/ready/XYSABC_x87699993333_f100215101awk: syntax error near line 1awk: illegal statement near line 1What i need to change to make it work
Arav
Also I want the error to be handled like logging a mesage to text file can i do like this done 2>> /exp/log.txt
Arav
Arav
I use SUNOS. Is it a issue?
Arav
I tried the below simple awk statement in sun os echo 'Moo' | awk '{ sub(/M/,"B"); print }' it gives me an errorawk: syntax error near line 1awk: illegal statement near line 1do i need to to use nawk?
Arav
i tried the nawk with the code given by you. It's giving me syntax error
Arav
@arav, the code as modified by @SiegeX will now work with pipes, it was just a hole in the original code, and you're unlikely to have mkfifo pipes in your directory anyway.
paxdiablo
I changed the sub to gsub in the above program and awk to nawk. The program runs successfully but it is not echoing echo mv "$file" /ext/ready
Arav
@arav: if you are on SunOS, you'll definitely need to use `nawk`. Does ` echo "Moo,Moo" | nawk -F, '{gsub(/M/,"B",$2);}1'` print `Moo Boo` or does it give you a syntax error?
SiegeX
I tried the echo command and it's working. echo "Moo,Moo" | nawk -F, '{gsub(/M/,"B",$2);}1'Moo BooNot sure why the program is not echoing the move command. When i tried the same set of files with the program posted by paxdiablo it's working.
Arav
Can you post the entire contents of your script to http://www.codepad.org and also copy and paste the output when you try to run it on a directory? Please be sure to use `nawk` and not just `awk`
SiegeX
Pasted the code in the below linkhttp://codepad.org/Kq3ZISVi Ran the program again and it's not giving output.
Arav
@arav: very odd. I just ran that code against the test.csv file and it worked fine. What output does this code give you (if any)? http://codepad.org/hduwRo4z
SiegeX
Copied your code and ran it on the same test.csv file. Got the below output./test.sh/exp/incoming/ready/test.csv 4
Arav
Not sure why $1 and $3 is not getting printed . NR is 4 lines
Arav
@arav: aha, it is most likely the `-F,`, try using `-F","` instead.
SiegeX
I tried "," after F $1 and $3 are not getting printed. Not sure what is the reason
Arav
Hmm, what about using `RS` instead of `-F` like so http://codepad.org/RVvhlGT2
SiegeX
I tried your new code it didn't work. So i tried the below one. It's printing the $1,$2 values if it is not enclosed in END braces. Only the $1,$2 that are in END braces are not getting printed. Pls see my sample code. http://codepad.org/Pyqrx1WJ For this sample code it prints all the lines including the last line and goes to the END braces doesn't print $1 and $2. Not sure why is it so.
Arav
I will be accepting your answer once we solve this issue.
Arav
Ok, problem solved. Please see my original post and look under **Update** as to what I had to do to fix it and why
SiegeX
Thanks a lot for your time. It's working now.
Arav
I am unable to accept two answers. Not sure whether it's possible in stackoverflow. pls let me know what can i do to accept your answer also,
Arav
Hmm, I don't believe it is possible to accept more than one answer. You can certainly change your accepted answer, so my advice to you is give +1 votes to the answers that you felt were helpful (including the one you accept) and choose whichever answer you feel handled your question the best as the accepted answer.
SiegeX