I'm currently working on a parser for our internal log files (generated by log4php, log4net and log4j). So far I have a nice regular expression to parse the logs, except for one annoying bit: Some log messages span multiple lines, which I can't get to match properly. The regex I have now is this:
(?<date>\d{2}/\d{2}/\d{2})\s(?<time>\d{2}):\d{2}:\d{2}),\d{3})\s(?<message>.+)
The log format (which I use for testing the parser) is this:
07/23/08 14:17:31,321 log
message
spanning
multiple
lines
07/23/08 14:17:31,321 log message on one line
When I run the parser right now, I get only the line the log starts on. If I change it to span multiple lines, I get only one result (the whole log file).
Help please ;-)
@samjudson:
You need to pass the RegexOptions.Singleline flag in to the regular expression, so that "." matches all characters, not just all characters except new lines (which is the default).
I tried that, but then it matches the whole file. I also tried to set the message-group to .+? (non-greedy), but then it matches a single character (which isn't what I'm looking for either).
The problem is that the pattern for the message matches on the date-group as well, so when it doesn't break on a new-line it just goes on and on and on.
I use this regex for the message group now. It works, unless there's a pattern IN the log message which is the same as the start of the log message.
(?<message>(.(?!\d{2}/\d{2}/\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\s\[\d{4}\]))+)