tags:

views:

905

answers:

4

I'm trying to read a log file and extract some machine/setting information using regular expressions. Here is a sample from the log:

...
COMPUTER INFO:
 Computer Name:                 TESTCMP02
 Windows User Name:             testUser99
 Time Since Last Reboot:        405 Minutes
 Processor:                     (2 processors) Intel(R) Xeon(R) CPU            5160  @ 3.00GHz
 OS Version:                    5.1 .number 2600:Service Pack 2
 Memory:                        RAM: 48% used, 3069.6 MB total, 1567.3 MB free
 ServerTimeOffSet:              -146 Seconds 
 Use Local Time for Log:        True

INITIAL SETTINGS:
 Command Line:                  /SKIPUPDATES
 Remote Online:                 True
 INI File:                      c:\demoapp\system\DEMOAPP.INI
 DatabaseName:                  testdb
 SQL Server:                    10.254.58.1
 SQL UserName:                  SQLUser
 ODBC Source:                   TestODBC
 Dynamic ODBC (not defined):    True
...

I would like to capture each 'block' of data, using the header as one group, and the data as a second (i.e. "COMPUTER INFO", "Computer Name:.......") and repeat this for each block. The expression if have so far is

(?s)(\p{Lu}{1,} \p{Lu}{1,}:\r\n)(.*\r\n\r\n)

This pulls out the block into the groups like it should, which is great. But I need to have it repeat the capture, which I can't seem to get. I've tried several grouping expressions, including:

(?s)(?:(\p{Lu}{1,} \p{Lu}{1,}:\r\n)(.*\r\n\r\n))*

which would seem to be correct, but I get back lots of NULL result groups with empty group item values. I'm using the .Net RegEx class to apply the expressions, can anyone help me out here?

A: 
((?<header>[^:]+:)(?<content>[^\r\n]+)?\r\n)+

or, if you have empty lines between items:

(((?<header>[^:]+:)(?<content>[^\r\n]+)?\r\n)|\r\n)+
Victor Hurdugaci
Sorry...that didn't work at all. Likely due to the .Net parsing engine. I am running my expressions through Expresso to simulate.
Jason
A: 

Here is how I would go about it. This would allow you to get the value of a specific group easily but the expression would be a bit more complicated. I add line feeds to make it easier to read. Here is the start:

COMPUTER INFO:.*Computer Name:\s*(?<ComputerName>[\w\s]+).*Windows User Name:\s*(?<WindowUserName>[\w\s]+).*Time Since Last Reboot:\s*(?<TimeSinceLastReboot>[\w\s]+).* (?# This continues on through each of the lines... )

with Comiled, IgnoreCase, SingleLine, and CultureInvariant

Then you would be able to match this via the groups ex:

string computerName = match.Group["ComputerName"].Value;
string windowUserName = match.Group["WindowUserName"].Value;
// etc.
J.13.L
I had thought about doing that, but the groups aren't finite. The developer may add more blocks later, or some may be missing. I can identify the start of the group of blocks, but need to process any number of them.
Jason
+5  A: 

It's not possible to have repeated groups. The group will contain the last match.

You'll need to break this into two problems. First, find each section:

new Regex(@"(?>^[A-Z\s]+:\s*$)\s*(?:(?!^\S).)*", RegexOptions.Singleline | RegexOptions.Multiline);

And then, within each match, use another regex to match each field/value into groups:

new Regex(@"^\s+(?<name>[^:]*):\s*(?<value>.*)$", RegexOptions.Multiline);


The code to use this would look something like this:

Regex sectionRegex = new Regex(@"(?>^[A-Z\s]+:\s*$)\s*(?:(?!^\S).)*", RegexOptions.Singleline | RegexOptions.Multiline);
Regex nameValueRegex = new Regex(@"^\s+(?<name>[^:]*):\s*(?<value>.*)$", RegexOptions.Multiline);
MatchCollection sections = sectionRegex.Matches(logData);
foreach (Match section in sections)
{
    MatchCollection nameValues = nameValueRegex.Matches(section.ToString());
    foreach (Match nameValue in nameValues)
    {
        string name = nameValue.Groups["name"].Value;
        string value = nameValue.Groups["value"].Value;
        // OK, do something here.
    }
}
Jeremy Stein
I understand the approach, but the first expressions is not returning matching groups, and I don't know why. Any suggestions?
Jason
In the first case, you're not getting a group, you're just getting a match. I'll add more code to the example.
Jeremy Stein
I appologize. One I did this in code, it worked like a charm. I was trying the examples by themselves in Expresso. It must be the Singleline|Multiline options, which I will have to explore in more detail so I can understand how they make the expressions work. Thank you very much for your time.
Jason