ansaurus

Question

Parsing a log file with regular expressions

Answer 1

A:

You need to pass the RegexOptions.Singleline flag in to the regular expression, so that "." matches all characters, not just all characters except new lines (which is the default).

samjudson 2008-09-03 10:19:00

Answer 2

+1 A:

The problem you have is that you need to terminate the RegEx pattern so it knows when one message ends and then next starts.

When you were running in default mode the newline was working as an implicit terminator.

The problem is if you go into multiline mode there's no terminator so the pattern will gobble up the whole file. Non-greedy matches a few characters as possible which will be just one.

Now, if use the date for the next message as the terminator I think your parser will only get every other line.

Is there something else in the file you could to terminate the pattern?

Dave Webb 2008-09-03 10:34:08

Answer 3

+2 A:

You obviously need that "messages lines" can be distinguished from "log lines"; if you allow the message part to start with date/time after a new line, then there is simply no way to determine what is part of a message and what not. So, instead of using the dot, you need an expression that allows anything that does not include a newline followed by a date and time.

Personally, however, I would not use a regular expression to parse the whole log entry. I prefer using my own loop to iterate over each line and use one simple regular expression to determine whether a line is the start of a new entry or not. Also from the point of readability this would have my preference.

mweerden 2008-09-03 10:37:02

Answer 4

+3 A:

This will only work if the log message doesn't contain a date at the beginning of the line, but you could try adding a negative look-ahead assertion for a date in the "message" group:

(?<date>\d{2}/\d{2}/\d{2})\s(?<time>\d{2}:\d{2}:\d{2},\d{3})\s(?<message>(.(?!^\d{2}/\d{2}/
\d{2}))+)

Note that this requires the use of the RegexOptions.MultiLine flag.

Jeff Hillman 2008-09-03 10:37:36

Answer 5

A:

You might find it a lot easier to parse the file with a proper parser generator - ANTLR can generate one in C#... Context Free parsers only seem hard until you "get" them - after that, they are much simpler and friendlier to use than Regular Expressions...

Daren Thomas 2008-09-03 12:26:07

ansaurus

tags:

views:

answers:

Parsing a log file with regular expressions

related questions