tags:

views:

1285

answers:

8

Ok, Regex wizards. I want to be able to search through my logfile and find any sessions with the word 'error' in it and then return the entire session log entry.

I know I can do this with a string/array but I'd like to learn how to do it with Regex but here's the question. If I decide to do this with Regex do I have one or two problems? ;o)

here's the log: PS:I'm using the perl Regex engine.

Note: I don't think I can get this done in Regex. In other words, I now have two problems ;o) Ive tried the solutions below but, since I've confused the issue by stating that I was using a Perl engine, many of the answers were in Perl (which cannot be used in my case). I did however post my solution below. Please vote it up if you feel appropriate.


2008.08.27 08:04:21 (Wed)------------Start of Session-----------------
Blat v2.6.2 w/GSS encryption (build : Feb 25 2007 12:06:19)
Sending stdin.txt to [email protected]
Subject: test 1
Login name is [email protected]
The SMTP server does not require AUTH LOGIN.
Are you sure server supports AUTH?
The SMTP server does not like the sender name.
Have you set your mail address correctly?
2008.08.27 08:04:24 (Wed)-------------End of Session------------------

2008.08.27 08:05:56 (Wed)------------Start of Session-----------------
Blat v2.6.2 w/GSS encryption (build : Feb 25 2007 12:06:19)
Error: Wait a bit (possible timeout).
SMTP server error
Error: Not a socket.
Error: Not a socket.
2008.08.27 08:06:26 (Wed)-------------End of Session------------------

2008.08.27 08:07:58 (Wed)------------Start of Session-----------------
Blat v2.6.2 w/GSS encryption (build : Feb 25 2007 12:06:19)
Sending stdin.txt to [email protected]
Subject: Lorem Update 08/27/2008
Login name is [email protected]
2008.08.27 08:07:58 (Wed)-------------End of Session------------------


+5  A: 
Kyle
I'm not sure I understand this; what part of this is perl and what part is Regex? I'm using the perl regex but not perl itself.
Keng
The perl part is `$/=""` which makes perl read the input file in paragraph mode. Whatever code you are using could do the same, by simply reading up to the next blank line and only then matching the regex.
moritz
ah...that's the probrem; i'm not running perl so this won't work. i'm using the perl regex engine (one of many different regex engines) not perl itselft.
Keng
Ok, then go with my solution instead ;-)
moritz
+6  A: 

Kyle's answer is probably the most perlish, but in case you have it all in one string and want to use a single regex, here's a (tested) solution:

(Second update: fixed a bit, now more readable then ever ;-)

my $re = qr{
        (           # capture in $1
         (?:
          (?!\n\n). # Any character that's not at a paragraph break
         )*        # repeated
         error
         (?:
          (?!\n\n).
         )*
        )
}msxi;


while ($s =~ m/$re/g){
    print "'$1'\n";
}

Ugly, but you asked for it.

moritz
add an x at the end(ignore whitespace) and make that multi line. That will pretty it up considerably.
J.J.
these expressions don't match. did i do it wrong?
Keng
$s =~ m/((?:(?!\n\n).)*error(?:(?!\n\n).)*)/msgi
Keng
$s =~ m/((?:(?!\n\n).)*error(?:(?!\n\n).)*)/msgix
Keng
I just stretched it a bit out and corrected. Now matches your example input. (Make sure that it's really in one string, not read line by line)
moritz
A: 

Like the last guy said, perl from the command line will work. So will awk from the command line:
awk '/-Start of Session-/ { text=""; gotError=0; } /Error/{gotError=1;}/-End of Session-/{ if(gotError) {print text}} { text=text "\n" $0}' logFileName.txt

Basically, start recording on a line with "-Start of Session-", set a flag on a line with "Error", and conditionally output on a line with "-End of Session-".

Or put this into errorLogParser.awk:

/-Start of Session-/{
    text="";
    gotError=0;
}
/Error/{
    gotError=1;
}
/-End of Session-/{
    if(gotError)
    {
        print text
    }
}
{
    text=text "\n" $0
}
... and invoke like so: awk -f errorLineParser.awk logFileName.txt

KeyserSoze
i don't have the option of using perl or awk in this situation.
Keng
A: 

With a perl regexp engine, the simple regexp

Error:.+

does the trick according to quickrex.

(With a java regexp engine, another regexp would have been required:

(?ms)^Error:[^\r\n]+$

)

a regexp with a capturing group would allow to redirect only the error message and not 'Error' itself, as in:

Error:\s*(\S.+)

The group n°1 capture only what follows 'Error: '

Anyhow, for for to regexp, see regular-Expressions.info tutorial, a first-class introduction to this technique.

VonC
Sorry, that will only print out one matching line, not the entire session block.
moritz
However, none of those are returning the entire block merely just the line that contains the word "error".
Keng
Right, I was only focusing on the Error line itself. Correct regexp have been given by MizardX and moritz. I leave my answer only for tutorial link and regexp, and I have up-voted their answers.
VonC
+1  A: 
/(?:[^\n\r]|\r?\n(?!\r|\n))*?Error:(?:[^\n\r]|\r?\n(?!\r|\n))*/g

This takes advantage of the blank lines in between the entries. It works for both unix and windows line breaks. You can replace the text "Error:" in the middle with almost anything else if you would like.

MizardX
A: 

If you want to understand or play with any of these solutions, I high recommend downloading Regex Coach, which helps you build up and test regular expressions

Paul Dixon
thanks. i broke down and bought Regex Buddy last month just for stuff like this. 80)It's awesome BTW!
Keng
A: 

What I did was to run the entire log into a string then went through line by line and added each line to a third variable until the line contained "--End of Session--". I then added that line to the 3rd var as well and then searched that 3rd var for the word "error". If it contained it, I added the 3rd var to a forth and then cleared the 3rd var and started going back through the var with the log on the next line.

It looks like this:

str a b email gp lgf
lgf.getfile( "C:\blat\log.txt")
foreach a lgf
    if(find(a "--End of Session--")>-1)
     gp.from(gp "[]" a)
     if(find(gp "error" 0 1)>-1)
      gp.trim
      email.from(email gp "[]")
     gp=""
     continue
    gp.from(gp "[]" a)
email.trim

It turns out that regex can really be a bear-cat to implement when it doesn't fit well. Kind of like using a screwdriver instead of a hammer. It'll get the job done, but takes a long time, break the screwdriver, and probably hurt you in the process.

Keng
A: 

Once in a while when only Vim was available (and sed, awk which I did not master at that time), I did something like:

Via vim I had joined all the lines between - in your case - Start of Session/End of Session to a Single line:

  • First replaced all the line endings to some specific char

    :%s:$:#

  • Then turned the double enters into some other separator:

    :%s:#\n#\n:#\r@\r

  • Joining the lines:

    :%s:#\n:#

  • Displayed only the lines with Error:

    :v/[Ee]rror/d

  • Split lines to their original format:

    :%s:#:\r

HTH

Zsolt Botykai