ansaurus

Question

Answer 1

+2 A:

It is possible, but it will take more than one pass over the data. A regex group can only hold one chunk of information per match. So, you could have an MD group and find all your MD matches or an MI group which contained an MD group and that would find all your MI matches...but the MD group would not be separated out.

One solution is nested regex calls, with the first one finding each MI group and the second one finding each MD group within the MI group.

Brian 2009-05-13 18:33:02

Answer 2

A:

I think this will do it. At least it works with RegexBuddy using Perl.

MD[^MI]*

Data just repeated from above.

EDIT: This seems to capture all MD and the initial MI in its own little block.

MI([^MI]*(MD[^MI]*)*)

Keng 2009-05-13 18:39:51

How would you handle the grouping?

Austin Salonen 2009-05-13 18:42:17

I guess I don't understand what you mean by grouping. Do you need to tie each MD with the specific MI?

Keng 2009-05-13 19:09:13

Answer 3

A:

I'm not an expert in C#, but in Java, you'd want to change (MD...)+ to ((MD...)+). That way, you can use the outer pair of parentheses to capture all MDs.

Adam Crume 2009-05-13 18:51:45

Answer 4

A:

I would reccomend you implement a state machine for this task..

But here is a regex I think will also work:

MI\r\d\d\r(\d)\r\r(MD\r\d\r[0-9\.]+\r?)*

duckyflip 2009-05-13 19:06:39

Answer 5

+3 A:

Every Match has a Groups collection. In your case Matches[0].Groups[1] would match the MD records, like "MD\n1\n0.0000MD\n2\n0.0000MD\n3\n0.0000".

Every Group has a Captures collection, which you can iterate over to find all MD instances. This will give you one string per MD, so Matches[0].Groups[1].Captures[0] will be "MD\n1\n0.0000".

EDIT: Although you've already accepted the answer, here's a way to parse everything in a single go:

string pat = @"MI[\r\n]*(?<MI1>\d\d)[\r\n]*(?<MI2>\d+)[\r\n]*" +
    @"(MD[\r\n]*(?<MD1>\d+)*[\r\n]*(?<MD2>[\d\.\-]+)+[\r\n]*)*";

var r = new Regex(pat);
foreach (Match match in r.Matches(text))
{
    Console.WriteLine("MI v1:{0} v2:{1}", 
         match.Groups["MI1"], match.Groups["MI2"]);

    if (match.Groups.Count > 2)
        for (var i = 0; i < match.Groups["MD1"].Captures.Count; i++)
            Console.WriteLine("  MD v1:{0} v2:{1}", 
                match.Groups["MD1"].Captures[i], 
                match.Groups["MD2"].Captures[i]);
}

This is the test text I used:

MI
00
3

MD
1
0.1000
MD
2
0.2000
MD
3
0.3000

MI
12
5

MI
24
5

MD
1
0.1000

The output is:

MI v1:00 v2:3
  MD v1:1 v2:0.1000
  MD v1:2 v2:0.2000
  MD v1:3 v2:0.3000
MI v1:12 v2:5
MI v1:24 v2:5
  MD v1:1 v2:0.1000

Andomar 2009-05-13 19:33:45

Exactly what I was looking for. Thanks!

Austin Salonen 2009-05-13 20:34:58

ansaurus

tags:

views:

answers:

Regular expression grouping issue

related questions