tags:

views:

472

answers:

12

I am attempting to parse a string like the following using a .NET regular expression:

H3Y5NC8E-TGA5B6SB-2NVAQ4E0

and return the following using Split: H3Y5NC8E TGA5B6SB 2NVAQ4E0

I validate each character against a specific character set (note that the letters 'I', 'O', 'U' & 'W' are absent), so using string.Split is not an option. The number of characters in each group can vary and the number of groups can also vary. I am using the following expression:

([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8}-?){3}

This will match exactly 3 groups of 8 characters each. Any more or less will fail the match. This works insofar as it correctly matches the input. However, when I use the Split method to extract each character group, I just get the final group. RegexBuddy complains that I have repeated the capturing group itself and that I should put a capture group around the repeated group. However, none of my attempts to do this achieve the desired result. I have been trying expressions like this:

(([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8})-?){4}

But this does not work.

Since I generate the regex in code, I could just expand it out by the number of groups, but I was hoping for a more elegant solution.

A: 

Why use Regex? If the groups are always split by a -, can't you use Split()?

Steve M
A: 

Sorry if this isn't what you intended, but your string always has the hyphen separating the groups then instead of using regex couldn't you use the String.Split() method?

Dim stringArray As Array = someString.Split("-")
Mark Glorie
A: 

I cannot just use the string.Split method for the following reasons:

  1. I am validating the input against a character set.
  2. If you examine the regex, the hyphens are optional. They could be missing entirely.
Mike Thompson
A: 

You can use this pattern:

Regex.Split("H3Y5NC8E-TGA5B6SB-2NVAQ4E0", "([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8}+)-?")

But you will need to filter out empty strings from resulting array. Citation from MSDN:

If multiple matches are adjacent to one another, an empty string is inserted into the array.

aku
A: 

What are the defining characteristics of a valid block? We'd need to know that in order to really be helpful.

My generic suggestion, validate the charset in a first step, then split and parse in a seperate method based on what you expect. If this is in a web site/app then you can use the ASP Regex validation on the front end then break it up on the back end.

Rob Allen
+3  A: 

After reviewing your question and the answers given, I came up with this:

RegexOptions options = RegexOptions.None;
Regex regex = new Regex(@"([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8})", options);
string input = @"H3Y5NC8E-TGA5B6SB-2NVAQ4E0";

MatchCollection matches = regex.Matches(input);
for (int i = 0; i != matches.Count; ++i)
{
    string match = matches[i].Value;
}

Since the "-" is optional, you don't need to include it. I am not sure what you was using the {4} at the end for? This will find the matches based on what you want, then using the MatchCollection you can access each match to rebuild the string.

Dale Ragan
A: 

Please note that the character set does not include the entire alphabet. It is part of a product activation system. As such, any characters that can be accidentally interpreted as numbers or other characters are removed. e.g. The letters 'I', 'O', 'U' & 'W' are not in the character set.

The hyphens are optional since a user does not need top type them in, but they can be there if the user as done a copy & paste.

Mike Thompson
A: 

If you're just checking the value of the group, with group(i).value, then you will only get the last one. However, if you want to enumerate over all the times that group was captured, use group(2).captures(i).value, as shown below.

system.text.RegularExpressions.Regex.Match("H3Y5NC8E-TGA5B6SB-2NVAQ4E0","(([ABCDEFGHJKLMNPQRSTVXYZ0123456789]+)-?)*").Groups(2).Captures(i).Value
Kibbee
A: 

Mike,

You can use character set of your choice inside character group. All you need is to add "+" modifier to capture all groups. See my previous answer, just change [A-Z0-9] to whatever you need (i.e. [ABCDEFGHJKLMNPQRSTVXYZ0123456789])

aku
+2  A: 

I have discovered the answer I was after. Here is my working code:

    static void Main(string[] args)
    {
        string pattern = @"^\s*((?<group>[ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8})-?){3}\s*$";
        string input = "H3Y5NC8E-TGA5B6SB-2NVAQ4E0";
        Regex re = new Regex(pattern);
        Match m = re.Match(input);

        if (m.Success)
            foreach (Capture c in m.Groups["group"].Captures)
                Console.WriteLine(c.Value);
    }
Mike Thompson
+3  A: 

BTW, you can replace [ABCDEFGHJKLMNPQRSTVXYZ0123456789] character class with a more readable subtracted character class.

[[A-Z\d]-[IOUW]]

If you just want to match 3 groups like that, why don't you use this pattern 3 times in your regex and just use captured 1, 2, 3 subgroups to form the new string?

([[A-Z\d]-[IOUW]]){8}-([[A-Z\d]-[IOUW]]){8}-([[A-Z\d]-[IOUW]]){8}

In PHP I would return (I don't know .NET)

return "$1 $2 $3";
Imran
A: 

How about you first split it, then check every string you got from split with simple regex?

Michał Piaskowski