tags:

views:

70

answers:

3

Hello,

I am trying to wrap my head around the feasability of parsing a log file with a single RegEx in .NET

What is making it difficult is the log file has items that can (but don't always) span multiple lines and that each log file may actually contain multiple 'logs'. Example format:

log:  
  event 1  
  event 2  
    additional information  
  event 3  
log:  
  event 1  
    additional information  
    more additional information  
  event 2  
    additional information  

The necessity here being able to distinguish which events belong to which log and have the additonal information also captured. I was able to, of course, just grab events... I have been unable to grab events with additional information, let alone grouping them in to captures by log:

I would appreciate information rather than being handed a solution, so I can learn. I guess my question is: should this be possible? It's already been done with a parser I was just trying to discover alternative methods.

+2  A: 

This seems like it would be easier and more transparent to parse manually vs trying to do it in a RegEx. The pattern is pretty simple.

Sam
Yeah, I tend to agree. Regex becomes more difficult with nested stuff and multi-line.
Andy White
+2  A: 

Why are you trying to use a single regex for this? Use a proper parser.

Regular expressions are awesome for simple string manipulation, but once you get to more complex stuff an actual parser is much better.

Anon.
Pure curiosity/challenge. The parser itself is already done.
Doug
Well, as a quick example (assuming the format is as posted), `/\n [^\n]*(\n [^\n]*)*/` *should* (untested) match an entry and any number of lines of additional content.
Anon.
Eh, looks like the spaces have been compressed. I'm sure you know what I mean.
Anon.
That does work. As it's been said I think a proper parser is going to be needed to deal with the fact multiple 'logs' can be present in one log file. That having been said, I can easily split the file and then grab the events using that simple RegExp.
Doug
A: 

It would be possible (and quite easy) to pull out each log entry separately using a pattern, but not to split the match into groups of information using captures in that same pattern.

What you’d need to do is construct a pattern for an info line (basically, space followed by something else to the end of the line), and repeat it.

Ciarán Walsh