tags:

views:

126

answers:

4

I have the following text:

<i><b>It is noticeably faster.</b></i> <i><b>They take less disk space.</i>

And the following regex:

(</[b|i|u]>)+(\s*)(<[b|i|u]>)+

The matching creates the following groups:

0: </b></i>   <b><i>
1: </i>
2: spaces
3: <b>

How can I change my regex so it creates groups like that:

0: </b></i>   <b><i>
1: </b>
2: </i>
3: spaces
4: <i>
3: <b>
+1  A: 

You can't. A group can only hold one thing, even if it hits more than one thing in the same match because of a +, *, or similar. You could, of course, use a regex or similar on that group to get the individual items.

Thus, every match will have exactly one thing per group.

Brian
+4  A: 

I suspect you've already got what you need - you just need to enumerate the captures for each group. Here's a sample program showing that in action:

using System;
using System.Text.RegularExpressions;

class Test
{
    static void Main()
    {
        string text = 
"<i><b>It is noticeably faster.</b></i> <i><b>They take less disk space.</i>";
        Regex pattern = new Regex(@"(</[b|i|u]>)+(\s*)(<[b|i|u]>)+");

        Match match = pattern.Match(text);
        foreach (Group group in match.Groups)
        {
            Console.WriteLine("Next group:");
            foreach (Capture capture in group.Captures)
            {
                Console.WriteLine("  " + capture.Value);
            }
        }
    }
}
Jon Skeet
But don't expect to find this feature in any other regex flavor; it's unique to .NET.
Alan Moore
A: 

You can only alter the regular expression so that is matches every closing tag before and every closing tag after the spaces:

((?:</[biu]>)+)(\s*)((?:<[biu]>)+)

This would match

0: </b></i> <i><b>
1: </b></i>
2: _
3: <i><b>
Gumbo
A: 

I've found this web page http://regexlib.com/RETester.aspx useful for testing RegEx expressions. It can evaluate using the .Net engine or the client side engines for VBScript or JavaScript.

I like this online tool from RegExLib because it's available on any machine I'm at, but the Expresso app from UltraPico.com, that Jackson recommended in a comment to the original question post, looks good. Better than just testing, it helps build your RegEx. I just downloaded it and I'm going to give it a try.

Now if there was only a tool that could read a complex regex, and give a natural language description of what it was supposed to be doing. Especially if you could indicate you were parsing HTML or some other type of data format, so that the description would be tailored to the use. :)

Adam Porad