views:

182

answers:

2

Hello!

i really like Regex, unfortantly Im not that good at it yet. So therfore I hope you guys can help me out.

The text string I want to validate consists of what I call "segments". A single segment might look like this:

 [A-Z,S,3]

So far I managed to build this regex pattern

(?:\[(?<segment>[^,\]\[}' ]+?,[S|D],\d{1})\])+?

it works but it will return matches even though the whole text string contains invalid text. I guess I need to use ^ and $ somewhere in my pattern but I can't figure out how!?

I would like my pattern to produce the following results:

  • [A-Z,S,3][A-Za-z0-9åäöÅÄÖ,D,4] OK(two segments)
  • [A-Z,S,3]aaaa[A-Za-z0-9åäöÅÄÖ,D,4] No match
  • crap[A-Z,S,3][A-Za-z0-9åäöÅÄÖ,D,4] No match
  • [A-Z,S,3][] No match
  • [A-Z,S,3][klm,D,4][0-9,S,1] OK(three segments)
+1  A: 

You want something like this:

/^(\[[^],]+,[SD],\d\])+$/

Here is an example of how you could use this regular expression in C#:

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main(string[] args)
    {
        string[] tests = {
            "[A-Z,S,3][A-Za-z0-9,D,4]",
            "[A-Z,S,3]aaaa[A-Za-z0-9,D,4]",
            "crap[A-Z,S,3][A-Za-z0-9,D,4]",
            "[A-Z,S,3][]",
            "[A-Z,S,3][klm,D,4][0-9,S,1]"
        };

        string segmentRegex = @"\[([^],]+,[SD],\d)\]";
        string lineRegex = "^(" + segmentRegex + ")+$";

        foreach (string test in tests)
        {
            bool isMatch = Regex.Match(test, lineRegex).Success;
            if (isMatch)
            {
                Console.WriteLine("Successful match: " + test);
                foreach (Match match in Regex.Matches(test, segmentRegex))
                {
                    Console.WriteLine(match.Groups[1]);
                }
            }
        }
    }
}

Output:

Successful match: [A-Z,S,3][A-Za-z0-9,D,4]
A-Z,S,3
A-Za-z0-9,D,4
Successful match: [A-Z,S,3][klm,D,4][0-9,S,1]
A-Z,S,3
klm,D,4
0-9,S,1
Mark Byers
+3  A: 
Roger Pate
Thanks! Great answer to my question. The thing is I would also like to extract the "segments". Either in a match collection or in groups. If you look at my original pattern you see that I first have a non-capturing group, then a capturing group "extracting" the "segment". Is it possible to incorporate that into your pattern?
David
Yes, exactly the same way, add the capturing group around what you're interested in. However, you'll likely need to call your regex library with a different function in order to capture all of them, instead of just the first or last, as the capturing group is then instead a repetition. I'll update with an example.
Roger Pate
+1: That's a nice way to solve it in Python. It saves having two almost identical regexps, and the performance hit of matching on the same string twice. But does .NET's Regex have an option to say where the match should start like Python or will this require copying strings, nullifying the performance advantage?
Mark Byers
"is then inside* a repetition." (You're entering regex-engine-specific territory.)
Roger Pate
Mark: beats me---had this question been marked C# specific from the start I'd likely have refrained (but you seem to have that part covered well anyway). Picking up where another regex stops is essential at times---you can apply a completely different expression, or try out various ones, at that point.
Roger Pate
Thanks all of you! You really put a lot of effort into helping me!
David