tags:

views:

2558

answers:

4

Sample data: !!Part|123456,ABCDEF,ABC132!!

The comma delimited list can be any number of any combination of alphas and numbers

I want a regex to match the entries in the comma separated list:

What I have is: !!PART\|(\w+)(?:,{1}(\w+))*!!

Which seems to do the job, the thing is I want to retrieve them in order into an ArrayList or similar so in the sample data I would want:

  • 1 - 132456
  • 2 - ABCDEF
  • 3 - ABC123

The code I have is:

string partRegularExpression = @"!!PART\|(\w+)(?:,{1}(\w+))*!!"
Match match = Regex.Match(tag, partRegularExpression);
ArrayList results = new ArrayList();

foreach (Group group in match.Groups)
{
    results.Add(group.Value);
}

But that's giving me unexpected results. What am I missing?

Thanks

Edit: A solution would be to use a regex like !!PART\|(\w+(?:,??\w+)*)!! to capture the comma separated list and then split that as suggested by Marc Gravell

I am still curious for a working regex for this however :o)

+1  A: 

Unless I'm mistaken, that still only counts as one group. I'm guessing you'll need to do a string.Split(',') to do what you want? Indeed, it looks a lot simpler to not bother with regex at all here... Depending on the data, how about:

        if (tag.StartsWith("!!Part|") && tag.EndsWith("!!"))
        {
            tag = tag.Substring(7, tag.Length - 9);
            string[] data = tag.Split(',');
        }
Marc Gravell
The (?: ) bracket isn't captured, only the group (\w+) inside. But no reason it couldn't be. I guess I'm over complicating things going 100% regex.
DeletedAccount
A regex of the form: !!PART\|(\w+(?:,??\w+)*)!! seems to capture only the comma seperated list. I could then split the single group (if needed).
DeletedAccount
A: 

The following code

string testString = "!!Part|123456,ABCDEF,ABC132!!";
foreach(string component in testString.Split("|!,".ToCharArray(),StringSplitOptions.RemoveEmptyEntries) )
{
    Console.WriteLine(component);
}

will give the following output

Part
123456
ABCDEF
ABC132

This has the advantage of making the comma separated part of the string match up with the index numbers you (possibly accidentally incorrectly) specified in the original question (1,2,3).

HTH

-EDIT- forgot to mention, this may have drawbacks if the format of each string is not as expected above, but then again it would break just as easily without stupendously complex regex too.

ZombieSheep
yeah should have been 0,1,2. D'oh! :o)
DeletedAccount
i have a seperate validating regex for the tag, so by the time I get to extracting the data it would have been validated, but thanks for the heads up! :o)
DeletedAccount
+1  A: 

I think the RegEx you are looking for is this:

(?:^!!PART\|){0,1}(?<value>.*?)(?:,|!!$)

This can then be run like this

        string tag = "!!Part|123456,ABCDEF,ABC132!!";

        string partRegularExpression = @"(?:^!!PART\|){0,1}(?<value>.*?)(?:,|!!$)";
        ArrayList results = new ArrayList();

        Regex extractNumber = new Regex(partRegularExpression, RegexOptions.IgnoreCase);
        MatchCollection matches = extractNumber.Matches(tag);
        foreach (Match match in matches)
        {
            results.Add(match.Groups["value"].Value);
        }            

        foreach (string s in results)
        {
            Console.WriteLine(s);
        }
Martin Brown
Failed the case !!PART|123456!! In the array list I had two entries "!!" and """"
DeletedAccount
You need to run this slightly differently, see the example code I've added.
Martin Brown
Ok cool, thanks Martin :o)
DeletedAccount
+3  A: 

You can either use split:

string csv = tag.Substring(7, tag.Length - 9);
string[] values = csv.Split(new char[] { ',' });

Or a regex:

Regex csvRegex = new Regex(@"!!Part\|(?:(?<value>\w+),?)+!!");
List<string> valuesRegex = new List<string>();
foreach (Capture capture in csvRegex.Match(tag).Groups["value"].Captures)
{
    valuesRegex.Add(capture.Value);
}
ICR
Just threw my unit tests against this and it passed them all. Thanks :o)
DeletedAccount
It is interesting to see that this regex solution is slightly faster than mine. As usual though, the split version is fastest of all.
Martin Brown
On 1 million iterations I’m getting 0.54sec for my RegEx 0.44sec for this one and 0.10sec for the split. Both RegEx’s are compiled.
Martin Brown
For one thing your regex is using a non-greedy wildcard match, which requires quite a bit of backtracking.
ICR