views:

328

answers:

4

I need to search for a pattern in files. For example the content of the file is below:

3555005!K!00630000078!C!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!K!204042880166840!I!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!D!16042296!DUMMY!20090805235959!0!47001231000000!0!336344324!1!1!POST!USAGE!336344324!0!
3555005!C!336344324!1!!!EUR!1!1!!I!
3555005!S!00630000078!20090805172515!LF010300!

Here I want to search for lines with !D! and the 7th field in the line is less than the system date, then I want to delete the line and save the file.

Is that possible?

+3  A: 

Something like this should do the trick... you may want to parse the time if this is not how you have the field formatted

perl -ne '/^([^!]+!){6}([^!]+).*/; print if $2 < time && /!D!/;'
dsm
+1  A: 

If you prefer AWK...

awk -f logstrip.awk  in.log > out.log

where logstrip.awk looks something like

# *** Simple AWK script to delete lines from log file ***
#    Rule: keep all lines except these that have their 2nd
#          field equal to "D" and their 7th field more than
#          current date time


BEGIN {
    FS = "!";   #delimiter

    stopDate = systime();
    # stopDate = 47001231000001;   for test purposes

    deletedLineCtr = 0;   #diagnostics counter, unused at this time
}

{
  if (match($2, "D") && ($7 < stopDate) ) {
    deletedLineCtr++;
  }
  else
     print $0
}

should do the trick.

Attention, however, your field #7 contains an odd date format. I think I recognize an recent epoch value (123...) but it is preceded by 4 apparently unrelated digit. These can easily be removed before comparing to StopDate

mjv
It may be the fifth field. By the way, instead of printing "zz" and $0, just reverse the logic of your test and only print lines which (don't) match.
Dennis Williamson
Thks, Dennis W, this "zz" thing was some test-time code I forgot to clean-up before posting... I cleaned this up, not reversing the test (but that was a good idea, since we don't do anything useful w/ deletedLineCtr)!
mjv
This is a very good solution thanks a lot people:).
Vijay Sarathi
+2  A: 

Based on mjv's answer, but simplified and using (assuming) the fifth field for the date (broken into two lines for readability):

awk -F! 'BEGIN {stopdate=strftime("%Y%m%d%H%M%S",systime())} 
         $2 != "D" || $5 >= stopdate {print}' file.log > newfile.log
Dennis Williamson
"print $0" is reduntant -- simply "print" will do. In fact, if that's the last action in the awk program, you can omit the action entirely, as it is the default action: awk -F! -v date=$(date '+%Y%m%d%H%M%S') '$2 != "D" || $5 >= date'
glenn jackman
You're right. However, I'll leave the `print` in to avoid too much obfuscation.
Dennis Williamson
+1  A: 

i tested with the sample data in a file

3555005!K!00630000078!C!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!K!204042880166840!I!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!D!16042296!DUMMY!20090805235959!0!20090912000000!0!336344324!1!1!POST!vijay!336344324!0!
3555005!C!336344324!1!!!EUR!1!1!!I!
3555005!S!00630000078!20090805172515!LF010300!
3555005!K!204042880166840!I!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!D!16042296!DUMMY!20090805235959!0!20090912000000!0!336344324!1!1!POST!vijay!336344324!0!
3555005!C!336344324!1!!!EUR!1!1!!I!
3555005!S!00630000078!20090805172515!LF010300!
3555005!D!16042296!DUMMY!20090805235959!0!20090917000000!0!336344324!1!1!POST!USAGE!336344324!0!
3555005!C!336344324!1!!!EUR!1!1!!I!
3555005!S!00630000078!20090805172515!LF010300!
3555005!K!204042880166840!I!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!D!16042296!DUMMY!20090805235959!0!20090919000000!0!336344324!1!1!POST!USAGE!336344324!0!
3555005!C!336344324!1!!!EUR!1!1!!I!
3555005!S!00630000078!20090805172515!LF010300!
3555005!K!204042880166840!I!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!D!16042296!DUMMY!20090805235959!0!20090914000000!0!336344324!1!1!POST!vijay!336344324!0!
3555005!C!336344324!1!!!EUR!1!1!!I!
3555005!S!00630000078!20090805172515!LF010300!
3555005!K!204042880166840!I!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!D!16042296!DUMMY!20090805235959!0!20090915000000!0!336344324!1!1!POST!vijay!336344324!0!
3555005!C!336344324!1!!!EUR!1!1!!I!
3555005!S!00630000078!20090805172515!LF010300!
3555005!K!204042880166840!I!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!D!16042296!DUMMY!20090805235959!0!20090913000000!0!336344324!1!1!POST!vijay!336344324!0!
3555005!C!336344324!1!!!EUR!1!1!!I!
3555005!S!00630000078!20090805172515!LF010300!
3555005!K!204042880166840!I!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!D!16042296!DUMMY!20090805235959!0!20090912000000!0!336344324!1!1!POST!USAGE!336344324!0!
3555005!C!336344324!1!!!EUR!1!1!!I!
3555005!S!00630000078!20090805172515!LF010300!
3555005!K!204042880166840!I!20090805235959!47001231000000!16042296!336344324!A!1!ENG!0!00630000078!NO!00630000078!
3555005!D!16042296!DUMMY!20090805235959!0!20090912000000!0!336344324!1!1!POST!USAGE!336344324!0!

but it's is deleting all the lines which consists of !D!. I used the following awk script

# *** Simple AWK script to delete lines from log file ***
#    Rule: keep all lines except these that have their 2nd
#    field equal to "D" and their 7th field more than
#          current date time
BEGIN {
       FS = "!";
         #delimiter
         stopDate = "date +%Y%m%d%H%M%S";
         # stopDate = 47001231000001;  for test purposes
         deletedLineCtr = 0;   #diagnostics counter, unused at this time
      }
      {
      if ( match($2, "D") && ($7 < stopDate) )
          {
           deletedLineCtr++;
          }
      else
           print $0
      }

Am I doing anything wrong?

Vijay Sarathi
Please format your postings using the code and blockquote features so they are readable. Awk doesn't have a "date" command like that. Also, I don't think "47001231000001" is a date (unless you strip off the "4700" then it looks like a count of seconds since the epoch). In your first post, that value is in field 7, but field five looks like a date. In this post, that value is in field 6 of some records but field 7 looks like a date. See my answer or mjv's. Either should work if the correct field is chosen. Keep trying. You're almost there.
Dennis Williamson